Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 72 -
FastVLM WebGPU
π419Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 9.98k β’ 355 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.09k β’ 71