Instructions to use walter-bd/npc-voice-soup06 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use walter-bd/npc-voice-soup06 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="walter-bd/npc-voice-soup06", filename="gguf/npc-voice-soup06.Q5_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use walter-bd/npc-voice-soup06 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M # Run inference directly in the terminal: llama-cli -hf walter-bd/npc-voice-soup06:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M # Run inference directly in the terminal: llama-cli -hf walter-bd/npc-voice-soup06:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf walter-bd/npc-voice-soup06:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf walter-bd/npc-voice-soup06:Q5_K_M
Use Docker
docker model run hf.co/walter-bd/npc-voice-soup06:Q5_K_M
- LM Studio
- Jan
- Ollama
How to use walter-bd/npc-voice-soup06 with Ollama:
ollama run hf.co/walter-bd/npc-voice-soup06:Q5_K_M
- Unsloth Studio
How to use walter-bd/npc-voice-soup06 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for walter-bd/npc-voice-soup06 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for walter-bd/npc-voice-soup06 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for walter-bd/npc-voice-soup06 to start chatting
- Pi
How to use walter-bd/npc-voice-soup06 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "walter-bd/npc-voice-soup06:Q5_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use walter-bd/npc-voice-soup06 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf walter-bd/npc-voice-soup06:Q5_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default walter-bd/npc-voice-soup06:Q5_K_M
Run Hermes
hermes
- Docker Model Runner
How to use walter-bd/npc-voice-soup06 with Docker Model Runner:
docker model run hf.co/walter-bd/npc-voice-soup06:Q5_K_M
- Lemonade
How to use walter-bd/npc-voice-soup06 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull walter-bd/npc-voice-soup06:Q5_K_M
Run and chat with the model
lemonade run user.npc-voice-soup06-Q5_K_M
List all available models
lemonade list
NPC Voice Model — soup-0.6 (Best)
Best performing model in the NPC voice series. A weight-averaged merge of v5-SFT (60%) and v5-DPO (40%), fine-tuned on Qwen3-0.6B.
The model takes a plain factual sentence and rewrites it in a character's voice, conditioned on 6 persona parameters.
Why soup-0.6?
SFT alone generates some verbatim copies and quote-wrapping failures. DPO alone over-corrects and forgets rare relation types. Weight averaging (model soup) gives the best of both: DPO's structural fixes + SFT's coverage.
| Model | Pass | Halluc fail | Fact pres | EN | ES |
|---|---|---|---|---|---|
| v4 | 59.0% | 32.5% | 1.31 | 66% | 50% |
| v5-SFT | 60.0% | 29.6% | 1.39 | 68% | 44% |
| v5-DPO | 57.0% | 36.0% | 1.26 | 69% | 40% |
| soup-0.6 | 61.5% | 28.5% | 1.42 | 75% | 44% |
Task
INPUT: TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith
FACT: Iron swords cost 15 gold.
OUTPUT: Fifteen gold. Don't haggle.
Parameters
| Param | Values |
|---|---|
| TONE | grumpy, cheerful, neutral, fearful, proud, bitter, nervous, wise, cunning, melancholic, playful |
| STYLE | short, verbose, blunt, rambling, poetic |
| HUMOR | none, dry, sarcastic, warm, dark |
| RELATION | stranger, friend, enemy, ally, rival, mentor, debtor, heretic, worshipper, + 20 more |
| ROLE | blacksmith, innkeeper, guard, merchant, peasant, scholar, noble, priest |
Only RELATION changes at runtime based on game state. The other 4 are fixed per NPC at config time.
Inference (Python)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"walter-bd/npc-voice-soup06",
max_seq_length=256,
)
FastLanguageModel.for_inference(model)
prompt = "TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith\nFACT: Iron swords cost 15 gold.\nOUT:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=80, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True).split("OUT:")[-1].strip())
GGUF / Ollama
Note:
ollama run hf.co/walter-bd/npc-voice-soup06:Q5_K_Mwill trigger Qwen3 thinking mode. Use the custom Modelfile below to suppress it and use the correct raw prompt format.
# Download GGUF and Modelfile
wget https://huggingface.co/walter-bd/npc-voice-soup06/resolve/main/gguf/npc-voice-soup06.Q5_K_M.gguf
wget https://huggingface.co/walter-bd/npc-voice-soup06/resolve/main/Modelfile
# Create local model with thinking suppressed
ollama create npc-voice-soup06 -f Modelfile
# Run
ollama run npc-voice-soup06 "TONE:grumpy STYLE:blunt HUMOR:none RELATION:stranger ROLE:blacksmith
FACT: Iron swords cost 15 gold.
OUT:"
How it was built
- v5-SFT — LoRA fine-tune on ~35k bilingual rows with targeted weak-slot coverage
- v5-DPO — Direct Preference Optimization using 1,195 real model failure pairs as rejected examples
- soup-0.6 —
0.6 x v5-SFT + 0.4 x v5-DPOweight average
Dataset: walter-bd/npc-voice-dataset Code: github.com/walter-bd/small-persona-llm
- Downloads last month
- 41