AI & ML interests
None defined yet.
Recent Activity
Papers
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model
ู ูุณุฑุงุฌ โ Misraj AI
Built on Trust. Measured by Impact.
The next-generation Arabic AI lab โ building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.
๐งญ About Us
Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.
We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems โ all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.
From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.
๐ 15+ research papers ยท 35 billion open Arabic data tokens ยท Honored by AI Pioneers
๐ข Areas of Expertise
Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:
- ๐ฅ Healthcare Technology โ Clinical documentation and Arabic medical NLP
- ๐ฆ Financial Technology โ Document intelligence for banking and finance
- โ๏ธ Legal Technology โ Contract analysis and legal document processing
- ๐ Educational Technology โ Arabic learning and knowledge systems
- ๐๏ธ Administrative Technology โ Government and enterprise document automation
๐ฆ Open Datasets
We are committed to releasing high-quality, openly available Arabic AI resources to empower the global research community.
| Dataset | Description | Scale |
|---|---|---|
| Misraj-DocOCR | Expert-verified Arabic document OCR benchmark | 400 images |
| KITAB PDF-to-Markdown | Corrected Arabic PDF-to-Markdown corpus | 62 documents |
| msdd | Misraj Structured Document Dataset | 26.4M rows |
| mudd | Misraj Unstructured Document Dataset | 4.76M rows |
| Tarjama-25 | Bidirectional Arabic-English translation benchmark | 5,000 expert-reviewed sentence pairs |
| Arabic-Image-Captioning 100M | First large-scale Arabic multimodal captioning dataset | 100M caption pairs |
| SadeedDiac-25 | Arabic diacritization benchmark | 1.2K samples |
| Sadeed Tashkeela | Large-scale Arabic diacritization corpus | 1.05M samples |
35+ billion open Arabic data tokens released and growing.
๐ฌ Connect With Us
| Platform | Link |
|---|---|
| ๐ Misraj AI | misraj.ai/en |
| ๐ Misraj Technology | misraj.sa/en |
| ๐ต Baseer OCR | baseerocr.com |
| ๐ค Hugging Face | huggingface.co/Misraj |
| ๐ผ LinkedIn | linkedin.com/company/aimisraj |
| ๐ฆ X / Twitter | @aimisraj |
| ๐ป GitHub | github.com/misraj-ai |
| ๐ธ Instagram | @misraj__ai |