Chandra-ocr inference
Hello guys, I am currently testing chandra-ocr as discribed here in the demo - https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3/blob/main/app.py and I am really impressed of the quality of the model on ID documents in bulgarian langauge (which is a difficult if not impossible task for many OCR models - e.g I did the same tests with DeepseekOCR). The thing is that im testing it on a L4 VM and the best time i achieved on a single image is 27s which will not work for my project. I stumbled upon these GGUF versions but i couldn't make them work with llama.cpp and i think the main issue is that Qwen3_vl is not supported yet. So I wonder if anyone was able to test them and if so how and should I expect it to run faster with a lower quantization or it will just lower VRAM? I really hope that I will be able to lower the inf time because its the best ocr model so far and i have tried many. Thanks :)
Hello
@gaspachoto
,
Llama.cpp supports Qwen3-VL. Please check your version, upgrade, and try again.
Thank you.
