Instructions to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
model = AutoModelForCausalLM.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "chujiezheng/Mistral7B-PairRM-SPPO-ExPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO

SGLang

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Docker Model Runner:
```
docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO
```

Mistral7B-PairRM-SPPO-ExPO / README.md

chujiezheng

Adding Evaluation Results (#1)

90d1dc0 verified over 1 year ago

preview code

raw

history blame contribute delete

6.95 kB

	---
	language:
	- en
	license: apache-2.0
	model-index:
	- name: Mistral7B-PairRM-SPPO-ExPO
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 36.73
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 13.68
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 0.91
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 3.58
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 8.66
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 17.24
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
	name: Open LLM Leaderboard
	---

	# Mistral7B-PairRM-SPPO-ExPO

	The extrapolated (ExPO) model based on [`UCLA-AGI/Mistral7B-PairRM-SPPO`](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO) and [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

	Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

	This extrapolated model achieves the 35.4% win rate and 31.8% LC win rate on AlpacaEval 2.0, outperforming the original `Mistral7B-PairRM-SPPO`'s 32.2% and 30.5%, respectively.

	## Evaluation Results

	Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

	\| \| Win Rate (Ori) \| LC Win Rate (Ori) \| Win Rate (+ ExPO) \| LC Win Rate (+ ExPO) \|
	\| ------------------------------------ \| -------------- \| ----------------- \| ----------------- \| -------------------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.7% \| 10.0% \| 10.6% \| 13.6% \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 10.2% \| 13.2% \| 11.1% \| 14.0% \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 15.0% \| 18.3% \| 18.2% \| 19.5% \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 26.6% \| 25.8% \| 29.6% \| 26.4% \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 24.7% \| 24.0% \| 28.8% \| 26.4% \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 29.2% \| 36.0% \| 32.7% \| 37.8% \|
	\| `internlm/internlm2-chat-1.8b` \| 3.8% \| 4.0% \| 5.2% \| 4.3% \|
	\| `internlm/internlm2-chat-7b` \| 20.5% \| 18.3% \| 28.1% \| 22.7% \|
	\| `internlm/internlm2-chat-20b` \| 36.1% \| 24.9% \| 46.2% \| 27.2% \|
	\| `allenai/tulu-2-dpo-7b` \| 8.5% \| 10.2% \| 11.5% \| 11.7% \|
	\| `allenai/tulu-2-dpo-13b` \| 11.2% \| 15.5% \| 15.6% \| 17.6% \|
	\| `allenai/tulu-2-dpo-70b` \| 15.4% \| 21.2% \| 23.0% \| 25.7% \|

	Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

	\| \| Original \| + ExPO \|
	\| ------------------------------------ \| -------- \| -------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.85 \| 6.87 \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 7.02 \| 7.06 \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 7.82 \| 7.91 \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 8.10 \| 8.18 \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 7.63 \| 7.69 \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 8.08 \| 8.45 \|
	\| `internlm/internlm2-chat-1.8b` \| 5.17 \| 5.26 \|
	\| `internlm/internlm2-chat-7b` \| 7.72 \| 7.80 \|
	\| `internlm/internlm2-chat-20b` \| 8.13 \| 8.26 \|
	\| `allenai/tulu-2-dpo-7b` \| 6.35 \| 6.38 \|
	\| `allenai/tulu-2-dpo-13b` \| 7.00 \| 7.26 \|
	\| `allenai/tulu-2-dpo-70b` \| 7.79 \| 8.03 \|


	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chujiezheng__Mistral7B-PairRM-SPPO-ExPO)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|13.47\|
	\|IFEval (0-Shot) \|36.73\|
	\|BBH (3-Shot) \|13.68\|
	\|MATH Lvl 5 (4-Shot)\| 0.91\|
	\|GPQA (0-shot) \| 3.58\|
	\|MuSR (0-shot) \| 8.66\|
	\|MMLU-PRO (5-shot) \|17.24\|