StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 6 days ago • 17
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 10 days ago • 190
VABench: A Comprehensive Benchmark for Audio-Video Generation Paper • 2512.09299 • Published 18 days ago • 7
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15 • 30
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published Jul 5 • 51
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15 • 2
ReLIFT Collection ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone. • 8 items • Updated Jun 10 • 1
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 122
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 159
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7 • 65
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published Jan 21 • 45
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published Jan 9 • 44
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98