Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores.
shawnxzhu
shawnxzhu
AI & ML interests
None yet
Recent Activity
upvoted a paper 3 days ago
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation authored a paper 18 days ago
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL submitted a paper 19 days ago
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL