Chandra-ocr inference

#1
by gaspachoto - opened

Hello guys, I am currently testing chandra-ocr as discribed here in the demo - https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3/blob/main/app.py and I am really impressed of the quality of the model on ID documents in bulgarian langauge (which is a difficult if not impossible task for many OCR models - e.g I did the same tests with DeepseekOCR). The thing is that im testing it on a L4 VM and the best time i achieved on a single image is 27s which will not work for my project. I stumbled upon these GGUF versions but i couldn't make them work with llama.cpp and i think the main issue is that Qwen3_vl is not supported yet. So I wonder if anyone was able to test them and if so how and should I expect it to run faster with a lower quantization or it will just lower VRAM? I really hope that I will be able to lower the inf time because its the best ocr model so far and i have tried many. Thanks :)

Hello @gaspachoto ,
Llama.cpp supports Qwen3-VL. Please check your version, upgrade, and try again.

Thank you.

Screenshot 2025-12-03 at 21-17-35 ggml-org_llama.cpp LLM inference in C_C

prithivMLmods changed discussion status to closed

Sign up or log in to comment