Robotics
Safetensors
vision-language-action-model

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

Teaser Image

Paper Code Data Website

InternVLA-A1 integrates understanding, generation, and action experts via a Mixture-of-Transformers (MoT) framework, which synergizes MLLMs' semantic reasoning with world-model-style dynamics prediction to guide action execution.

Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. Covering different model scales and pre-training data configurations, we release the InternVLA-A1 series:

๐Ÿ”‘ Key Features

Teaser Image
  • ๐Ÿ”ฎ The Core: Synergizes MLLM's semantic understanding with world-model-style dynamic prediction, enabling it to "imagine" the future and guide adaptive actions.
  • ๐Ÿš€ The Fuel: Enables joint training on heterogeneous data sources over real-world robot data, synthetic simulation data, and egocentric human videos.
  • โšก The Output: Tackles highly dynamic scenarios with effortless mastery.

Usage

Please refer to our official repo InternVLA-A1.

Demonstrations

InternVLA-A1 exhibits consistent robustness across static manipulation, dynamic manipulation, and simulation benchmarks, especially demonstrating remarkable superiority in dynamic scenarios.

โšก Dynamic Manipulation Tasks

InternVLA-A1 exhibits exceptional robustness in highly dynamic scenarios.

๐Ÿค– Static Manipulation Tasks

InternVLA-A1 demonstrates superior proficiency in dexterous and fine-grained manipulation.

๐Ÿ“Š Simulation benchmark

Metric pi0 pi0.5 InternVLA-A1-3B
Avg. Success (Easy) 79.98% 86.76% 89.40%
Avg. Success (Hard) 79.50% 86.96% 89.64%

InternVLA-A1 achieves State-of-the-art results on RoboTwin 2.0 Benchmark (averaged over 50 tasks).

License and Citation

All the code within this repo are under CC BY-NC-SA 4.0. Please consider citing our project if it helps your research.

@article{contributors2026internvla_a1,
  title={InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation},
  author={InternVLA-A1 contributors},
  journal={arXiv preprint arXiv:2601.02456},
  year={2026}
}

Acknowledgments

Downloads last month
247
Safetensors
Model size
3B params
Tensor type
I64
ยท
F32
ยท
BF16
ยท
Video Preview
loading

Model tree for InternRobotics/InternVLA-A1-3B

Finetuned
(112)
this model
Finetunes
2 models

Dataset used to train InternRobotics/InternVLA-A1-3B

Paper for InternRobotics/InternVLA-A1-3B