CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Name: CUA-Gym RLVR Dataset
Creator: XLang Lab
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: computer-use agents, reinforcement learning, RLVR, GSPO, OSWorld, WebArena, Qwen3.5, agent training data

XLang Lab · Qwen · UCSD · Tsinghua · Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu

CUA-Gym is a scalable, lightweight RLVR pipeline that co-generates verifiable (task, environment, reward) tuples for computer-use reinforcement learning, grounded in real-world economic activity. An adversarial Generator–Discriminator loop separated by an information barrier forces reward functions to verify task completion from semantics alone — preventing the reverse-engineering failure mode common to prior CUA datasets.

The resulting open dataset contains 32,122 verified RLVR tuples spanning 110 environments (16 desktop applications + 94 mock web applications), the largest open CUA RLVR corpus to date. Trained with GSPO on this data, CUA-Gym-A3B and CUA-Gym-A17B reach 62.1% and 72.6% on OSWorld-Verified, with cross-platform transfer to WebArena (44.5% / 56.0%). The smaller A3B checkpoint matches the unmodified 397B-A17B base at roughly 10× fewer active parameters.

📊 32,122 RLVR tuples
🌐 110 environments
🧪 94 mock web apps
📈 72.6% on OSWorld-Verified

Loading interactive page… (this static shell is replaced once JavaScript loads)