CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
XLang Lab · Qwen · UCSD · Tsinghua · Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu
CUA-Gym is a scalable, lightweight RLVR pipeline that co-generates verifiable (task, environment, reward) tuples for computer-use reinforcement learning, grounded in real-world economic activity. An adversarial Generator–Discriminator loop separated by an information barrier forces reward functions to verify task completion from semantics alone — preventing the reverse-engineering failure mode common to prior CUA datasets.
The resulting open dataset contains 32,122 verified RLVR tuples spanning 110 environments (16 desktop applications + 94 mock web applications), the largest open CUA RLVR corpus to date. Trained with GSPO on this data, CUA-Gym-A3B and CUA-Gym-A17B reach 62.1% and 72.6% on OSWorld-Verified, with cross-platform transfer to WebArena (44.5% / 56.0%). The smaller A3B checkpoint matches the unmodified 397B-A17B base at roughly 10× fewer active parameters.
- 📊 32,122 RLVR tuples
- 🌐 110 environments
- 🧪 94 mock web apps
- 📈 72.6% on OSWorld-Verified
Loading interactive page… (this static shell is replaced once JavaScript loads)