Instructions to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
model = AutoModelForCausalLM.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "chujiezheng/Mistral7B-PairRM-SPPO-ExPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO

SGLang

How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Docker Model Runner:
```
docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO
```

Mistral7B-PairRM-SPPO-ExPO

File size: 6,950 Bytes

---
language:
- en
license: apache-2.0
model-index:
- name: Mistral7B-PairRM-SPPO-ExPO
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 36.73
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 13.68
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 0.91
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 3.58
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.66
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 17.24
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO
      name: Open LLM Leaderboard
---

# Mistral7B-PairRM-SPPO-ExPO

The extrapolated (ExPO) model based on [`UCLA-AGI/Mistral7B-PairRM-SPPO`](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO) and [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

This extrapolated model achieves the **35.4%** win rate and **31.8%** LC win rate on **AlpacaEval 2.0**, outperforming the original `Mistral7B-PairRM-SPPO`'s 32.2% and 30.5%, respectively.

## Evaluation Results

Evaluation results on the **AlpacaEval 2.0** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

|                                      | Win Rate (Ori) | LC Win Rate (Ori) | Win Rate (+ ExPO) | LC Win Rate (+ ExPO) |
| ------------------------------------ | -------------- | ----------------- | ----------------- | -------------------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.7%           | 10.0%             | **10.6%**         | **13.6%**            |
| `HuggingFaceH4/zephyr-7b-beta`       | 10.2%          | 13.2%             | **11.1%**         | **14.0%**            |
| `berkeley-nest/Starling-LM-7B-alpha` | 15.0%          | 18.3%             | **18.2%**         | **19.5%**            |
| `Nexusflow/Starling-LM-7B-beta`      | 26.6%          | 25.8%             | **29.6%**         | **26.4%**            |
| `snorkelai/Snorkel-Mistral-PairRM`   | 24.7%          | 24.0%             | **28.8%**         | **26.4%**            |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 29.2%          | 36.0%             | **32.7%**         | **37.8%**            |
| `internlm/internlm2-chat-1.8b`       | 3.8%           | 4.0%              | **5.2%**          | **4.3%**             |
| `internlm/internlm2-chat-7b`         | 20.5%          | 18.3%             | **28.1%**         | **22.7%**            |
| `internlm/internlm2-chat-20b`        | 36.1%          | 24.9%             | **46.2%**         | **27.2%**            |
| `allenai/tulu-2-dpo-7b`              | 8.5%           | 10.2%             | **11.5%**         | **11.7%**            |
| `allenai/tulu-2-dpo-13b`             | 11.2%          | 15.5%             | **15.6%**         | **17.6%**            |
| `allenai/tulu-2-dpo-70b`             | 15.4%          | 21.2%             | **23.0%**         | **25.7%**            |

Evaluation results on the **MT-Bench** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

|                                      | Original | + ExPO   |
| ------------------------------------ | -------- | -------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.85     | **6.87** |
| `HuggingFaceH4/zephyr-7b-beta`       | 7.02     | **7.06** |
| `berkeley-nest/Starling-LM-7B-alpha` | 7.82     | **7.91** |
| `Nexusflow/Starling-LM-7B-beta`      | 8.10     | **8.18** |
| `snorkelai/Snorkel-Mistral-PairRM`   | 7.63     | **7.69** |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 8.08     | **8.45** |
| `internlm/internlm2-chat-1.8b`       | 5.17     | **5.26** |
| `internlm/internlm2-chat-7b`         | 7.72     | **7.80** |
| `internlm/internlm2-chat-20b`        | 8.13     | **8.26** |
| `allenai/tulu-2-dpo-7b`              | 6.35     | **6.38** |
| `allenai/tulu-2-dpo-13b`             | 7.00     | **7.26** |
| `allenai/tulu-2-dpo-70b`             | 7.79     | **8.03** |


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chujiezheng__Mistral7B-PairRM-SPPO-ExPO)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |13.47|
|IFEval (0-Shot)    |36.73|
|BBH (3-Shot)       |13.68|
|MATH Lvl 5 (4-Shot)| 0.91|
|GPQA (0-shot)      | 3.58|
|MuSR (0-shot)      | 8.66|
|MMLU-PRO (5-shot)  |17.24|