InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
InternVLA-A1 integrates understanding, generation, and action experts via a Mixture-of-Transformers (MoT) framework, which synergizes MLLMs' semantic reasoning with world-model-style dynamics prediction to guide action execution.
Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. Covering different model scales and pre-training data configurations, we release the InternVLA-A1 series:
- InternVLA-A1-3B: pretrained on the large-scale, high-fidelity simulation data InternData-A1, together with open-source robot data (e.g. Agibot-World)
- InternVLA-A1-3B-RoboTwin: finetuned on RoboTwin 2.0 benchmark
- InternVLA-A1-3B-Pretrain-InternData-A1: pretrained on InternData-A1 only
- InternVLA-A1-2B-Pretrain-InternData-A1: pretrained on InternData-A1 only
๐ Key Features
- ๐ฎ The Core: Synergizes MLLM's semantic understanding with world-model-style dynamic prediction, enabling it to "imagine" the future and guide adaptive actions.
- ๐ The Fuel: Enables joint training on heterogeneous data sources over real-world robot data, synthetic simulation data, and egocentric human videos.
- โก The Output: Tackles highly dynamic scenarios with effortless mastery.
Usage
Please refer to our official repo InternVLA-A1.
Demonstrations
InternVLA-A1 exhibits consistent robustness across static manipulation, dynamic manipulation, and simulation benchmarks, especially demonstrating remarkable superiority in dynamic scenarios.
โก Dynamic Manipulation Tasks
InternVLA-A1 exhibits exceptional robustness in highly dynamic scenarios.
๐ค Static Manipulation Tasks
InternVLA-A1 demonstrates superior proficiency in dexterous and fine-grained manipulation.
๐ Simulation benchmark
| Metric | pi0 | pi0.5 | InternVLA-A1-3B |
|---|---|---|---|
| Avg. Success (Easy) | 79.98% | 86.76% | 89.40% |
| Avg. Success (Hard) | 79.50% | 86.96% | 89.64% |
InternVLA-A1 achieves State-of-the-art results on RoboTwin 2.0 Benchmark (averaged over 50 tasks).
License and Citation
All the code within this repo are under CC BY-NC-SA 4.0. Please consider citing our project if it helps your research.
@article{contributors2026internvla_a1,
title={InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation},
author={InternVLA-A1 contributors},
journal={arXiv preprint arXiv:2601.02456},
year={2026}
}
Acknowledgments
- Downloads last month
- 247