CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

XLang Lab · Qwen · UCSD · Tsinghua  ·  Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu

CUA-Gym is a scalable, lightweight RLVR pipeline that co-generates verifiable (task, environment, reward) tuples for computer-use reinforcement learning, grounded in real-world economic activity. An adversarial Generator–Discriminator loop separated by an information barrier forces reward functions to verify task completion from semantics alone — preventing the reverse-engineering failure mode common to prior CUA datasets.

The resulting open dataset contains 32,122 verified RLVR tuples spanning 110 environments (16 desktop applications + 94 mock web applications), the largest open CUA RLVR corpus to date. Trained with GSPO on this data, CUA-Gym-A3B and CUA-Gym-A17B reach 62.1% and 72.6% on OSWorld-Verified, with cross-platform transfer to WebArena (44.5% / 56.0%). The smaller A3B checkpoint matches the unmodified 397B-A17B base at roughly 10× fewer active parameters.

Loading interactive page… (this static shell is replaced once JavaScript loads)