Papers
arxiv:2604.18394

OpenGame: Open Agentic Coding for Games

Published on Apr 20
· Submitted by
Jiaming Han
on Apr 21
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,

Abstract

OpenGame is an open-source agentic framework for end-to-end web game creation that uses specialized code models and evaluation benchmarks to overcome challenges in interactive application development.

AI-generated summary

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consistently stumble when asked to produce a fully playable game from a high-level design, collapsing under cross-file inconsistencies, broken scene wiring, and logical incoherence. We bridge this gap with OpenGame, the first open-source agentic framework explicitly designed for end-to-end web game creation. At its core lies Game Skill, a reusable, evolving capability composed of a Template Skill that grows a library of project skeletons from experience and a Debug Skill that maintains a living protocol of verified fixes - together enabling the agent to scaffold stable architectures and systematically repair integration errors rather than patch isolated syntax bugs. Powering this framework is GameCoder-27B, a code LLM specialized for game engine mastery through a three-stage pipeline of continual pre-training, supervised fine-tuning, and execution-grounded reinforcement learning. Since verifying interactive playability is fundamentally harder than checking static code, we further introduce OpenGame-Bench, an evaluation pipeline that scores agentic game generation along Build Health, Visual Usability, and Intent Alignment via headless browser execution and VLM judging. Across 150 diverse game prompts, OpenGame establishes a new state-of-the-art. We hope OpenGame pushes code agents beyond discrete software engineering problems and toward building complex, interactive real-world applications. Our framework will be fully open-sourced.

Community

Paper author Paper submitter

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consistently stumble when asked to produce a fully playable game from a high-level design, collapsing under cross-file inconsistencies, broken scene wiring, and logical incoherence. We bridge this gap with OpenGame, the first open-source agentic framework explicitly designed for end-to-end web game creation. At its core lies Game Skill, a reusable, evolving capability composed of a Template Skill that grows a library of project skeletons from experience and a Debug Skill that maintains a living protocol of verified fixes - together enabling the agent to scaffold stable architectures and systematically repair integration errors rather than patch isolated syntax bugs. Powering this framework is GameCoder-27B, a code LLM specialized for game engine mastery through a three-stage pipeline of continual pre-training, supervised fine-tuning, and execution-grounded reinforcement learning. Since verifying interactive playability is fundamentally harder than checking static code, we further introduce OpenGame-Bench, an evaluation pipeline that scores agentic game generation along Build Health, Visual Usability, and Intent Alignment via headless browser execution and VLM judging. Across 150 diverse game prompts, OpenGame establishes a new state-of-the-art. We hope OpenGame pushes code agents beyond discrete software engineering problems and toward building complex, interactive real-world applications. Our framework will be fully open-sourced.

the most interesting bit for me is how the debug skill turns a living protocol of verified fixes into real cross-file coherence for a multi-file game project. i'd love to see an ablation where you disable the debug skill and rely on the template skill alone, because i suspect the long-horizon integration stability mostly comes from that patch library. the three-stage training for GameCoder-27B is neat, but execution-grounded rl on engine APIs will wobble with API drift, so i'm curious how you keep the patch knowledge in sync across engine versions. btw the arxivlens breakdown helped me parse the method details; there's a solid walkthrough here https://arxivlens.com/PaperView/Details/opengame-open-agentic-coding-for-games-3108-2a06e2e2

·

Thank you for the incredibly insightful feedback! You are exactly right about the Debug Skill; our ablation study confirms that removing it causes a catastrophic drop in Build Health, proving that while the Template Skill establishes the initial scaffolding, it is the Debug Skill's iterative patching that truly enforces intricate cross-file coherence. Regarding your excellent point on API drift, we mitigate this by relying on the framework's dynamic execution loop rather than static syntax memorization. Because GameCoder acquired deep, specialized latent knowledge of Phaser during its continual pre-training phase, it treats API drift as a standard debugging task. When an outdated API throws a runtime exception, the Debug Skill parses the exact browser traceback, hypothesizes an alternative syntax based on its internal engine priors, and tests it in the live environment. This execution-grounded trial-and-error allows the system to empirically "rediscover" the correct API pattern and self-heal across engine versions without needing external retrieval or weight updates.

Hi! Congrats for the great work!

Did you perform CPT+SFT+RL on top of the instruction-tuned Qwen 3.5 27B? Or did you have access to the base model which has not been published? Did you merge models in any part of the pipeline to avoid catastrophic forgetting?

Thank you

·

Thanks for your questions! We performed the entire CPT+SFT+RL pipeline directly on the open-source version of Qwen-3.5-27B. Because our framework targets a highly specialized domain, we intentionally accepted a drop in performance on unrelated general tasks. As long as the model's core coding, reasoning, and agentic abilities remained robust for game development, trading off general knowledge for deep engine specialization was an acceptable and necessary compromise for us.

This model hasn't been released yet, right? I don't seem to be able to find the model's card info.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.18394
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.18394 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.18394 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.18394 in a Space README.md to link it from this page.

Collections including this paper 11