Text Generation
Transformers
Safetensors
English
mistral
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="chujiezheng/Mistral7B-PairRM-SPPO-ExPO") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO") model = AutoModelForCausalLM.from_pretrained("chujiezheng/Mistral7B-PairRM-SPPO-ExPO") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO
- SGLang
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "chujiezheng/Mistral7B-PairRM-SPPO-ExPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "chujiezheng/Mistral7B-PairRM-SPPO-ExPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use chujiezheng/Mistral7B-PairRM-SPPO-ExPO with Docker Model Runner:
docker model run hf.co/chujiezheng/Mistral7B-PairRM-SPPO-ExPO
| language: | |
| - en | |
| license: apache-2.0 | |
| model-index: | |
| - name: Mistral7B-PairRM-SPPO-ExPO | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: IFEval (0-Shot) | |
| type: HuggingFaceH4/ifeval | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: inst_level_strict_acc and prompt_level_strict_acc | |
| value: 36.73 | |
| name: strict accuracy | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: BBH (3-Shot) | |
| type: BBH | |
| args: | |
| num_few_shot: 3 | |
| metrics: | |
| - type: acc_norm | |
| value: 13.68 | |
| name: normalized accuracy | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MATH Lvl 5 (4-Shot) | |
| type: hendrycks/competition_math | |
| args: | |
| num_few_shot: 4 | |
| metrics: | |
| - type: exact_match | |
| value: 0.91 | |
| name: exact match | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: GPQA (0-shot) | |
| type: Idavidrein/gpqa | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 3.58 | |
| name: acc_norm | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MuSR (0-shot) | |
| type: TAUR-Lab/MuSR | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 8.66 | |
| name: acc_norm | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MMLU-PRO (5-shot) | |
| type: TIGER-Lab/MMLU-Pro | |
| config: main | |
| split: test | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: acc | |
| value: 17.24 | |
| name: accuracy | |
| source: | |
| url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=chujiezheng/Mistral7B-PairRM-SPPO-ExPO | |
| name: Open LLM Leaderboard | |
| # Mistral7B-PairRM-SPPO-ExPO | |
| The extrapolated (ExPO) model based on [`UCLA-AGI/Mistral7B-PairRM-SPPO`](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO) and [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper. | |
| Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference. | |
| This extrapolated model achieves the **35.4%** win rate and **31.8%** LC win rate on **AlpacaEval 2.0**, outperforming the original `Mistral7B-PairRM-SPPO`'s 32.2% and 30.5%, respectively. | |
| ## Evaluation Results | |
| Evaluation results on the **AlpacaEval 2.0** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)): | |
| | | Win Rate (Ori) | LC Win Rate (Ori) | Win Rate (+ ExPO) | LC Win Rate (+ ExPO) | | |
| | ------------------------------------ | -------------- | ----------------- | ----------------- | -------------------- | | |
| | `HuggingFaceH4/zephyr-7b-alpha` | 6.7% | 10.0% | **10.6%** | **13.6%** | | |
| | `HuggingFaceH4/zephyr-7b-beta` | 10.2% | 13.2% | **11.1%** | **14.0%** | | |
| | `berkeley-nest/Starling-LM-7B-alpha` | 15.0% | 18.3% | **18.2%** | **19.5%** | | |
| | `Nexusflow/Starling-LM-7B-beta` | 26.6% | 25.8% | **29.6%** | **26.4%** | | |
| | `snorkelai/Snorkel-Mistral-PairRM` | 24.7% | 24.0% | **28.8%** | **26.4%** | | |
| | `RLHFlow/LLaMA3-iterative-DPO-final` | 29.2% | 36.0% | **32.7%** | **37.8%** | | |
| | `internlm/internlm2-chat-1.8b` | 3.8% | 4.0% | **5.2%** | **4.3%** | | |
| | `internlm/internlm2-chat-7b` | 20.5% | 18.3% | **28.1%** | **22.7%** | | |
| | `internlm/internlm2-chat-20b` | 36.1% | 24.9% | **46.2%** | **27.2%** | | |
| | `allenai/tulu-2-dpo-7b` | 8.5% | 10.2% | **11.5%** | **11.7%** | | |
| | `allenai/tulu-2-dpo-13b` | 11.2% | 15.5% | **15.6%** | **17.6%** | | |
| | `allenai/tulu-2-dpo-70b` | 15.4% | 21.2% | **23.0%** | **25.7%** | | |
| Evaluation results on the **MT-Bench** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)): | |
| | | Original | + ExPO | | |
| | ------------------------------------ | -------- | -------- | | |
| | `HuggingFaceH4/zephyr-7b-alpha` | 6.85 | **6.87** | | |
| | `HuggingFaceH4/zephyr-7b-beta` | 7.02 | **7.06** | | |
| | `berkeley-nest/Starling-LM-7B-alpha` | 7.82 | **7.91** | | |
| | `Nexusflow/Starling-LM-7B-beta` | 8.10 | **8.18** | | |
| | `snorkelai/Snorkel-Mistral-PairRM` | 7.63 | **7.69** | | |
| | `RLHFlow/LLaMA3-iterative-DPO-final` | 8.08 | **8.45** | | |
| | `internlm/internlm2-chat-1.8b` | 5.17 | **5.26** | | |
| | `internlm/internlm2-chat-7b` | 7.72 | **7.80** | | |
| | `internlm/internlm2-chat-20b` | 8.13 | **8.26** | | |
| | `allenai/tulu-2-dpo-7b` | 6.35 | **6.38** | | |
| | `allenai/tulu-2-dpo-13b` | 7.00 | **7.26** | | |
| | `allenai/tulu-2-dpo-70b` | 7.79 | **8.03** | | |
| # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) | |
| Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chujiezheng__Mistral7B-PairRM-SPPO-ExPO) | |
| | Metric |Value| | |
| |-------------------|----:| | |
| |Avg. |13.47| | |
| |IFEval (0-Shot) |36.73| | |
| |BBH (3-Shot) |13.68| | |
| |MATH Lvl 5 (4-Shot)| 0.91| | |
| |GPQA (0-shot) | 3.58| | |
| |MuSR (0-shot) | 8.66| | |
| |MMLU-PRO (5-shot) |17.24| | |