Space for RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
models 5
cx-cmu/AutoGEO_mini_Qwen1.7B_ResearchyGEO
Text Generation • 2B • Updated • 16 •
cx-cmu/AutoGEO_mini_Qwen1.7B_GEOBench
Text Generation • 2B • Updated • 4 •
cx-cmu/AutoGEO_mini_Qwen1.7B_Ecommerce
Text Generation • 2B • Updated • 6 •
cx-cmu/repro-rephraser-4B
Text Generation • 196k • Updated • 372 • • 2
cx-cmu/repro-rephraser-1B
Text Generation • 1B • Updated • 4
datasets 11
cx-cmu/AgentWebBench-corpus
Viewer • Updated • 1 • 495
cx-cmu/agent_trajectories
Updated • 123 • 1
cx-cmu/deepresearchgym-agentic-search-logs
Viewer • Updated • 14.3M • 2.67k • 14
cx-cmu/Researchy-GEO
Viewer • Updated • 47k • 392 • 1
cx-cmu/GEO-Bench
Viewer • Updated • 37.4k • 177 • 1
cx-cmu/E-commerce
Viewer • Updated • 7.97k • 296 • 2
cx-cmu/ClueWeb-Reco
Viewer • Updated • 87.2M • 59 • 1
cx-cmu/repro-organic-data-72B
Viewer • Updated • 58.3M • 1.51k
cx-cmu/repro-rl-data
Viewer • Updated • 41k • 31
cx-cmu/repro-rephrased-data-72B
Viewer • Updated • 39M • 428