Where Visual Document Retrieval Goes Arabic
Omartificial Intelligence Space PRO
Omartificial-Intelligence-Space
AI & ML interests
NLP & LLM
Recent Activity
updated a dataset about 14 hours ago
Omartificial-Intelligence-Space/Pearl-vdr-ar-train-hard-mined published a dataset about 14 hours ago
Omartificial-Intelligence-Space/Pearl-vdr-ar-train-hard-mined liked a model about 16 hours ago
NAMAA-Space/NAMAA-Saudi-TTS-V2Organizations
Saudi Dialect Sentence Embedding Models Collection
Here is a collection of Saudi Dialect Embedding models with Sentence Embedding, classifiers and test dataset.
-
Omartificial-Intelligence-Space/SA-STS-Embeddings-0.2B
Feature Extraction • 0.2B • Updated • 23 • 1 -
Omartificial-Intelligence-Space/SA-BERT-V1
Fill-Mask • 0.2B • Updated • 15 • 4 -
Omartificial-Intelligence-Space/SaudiDialect-Triplet-21
Viewer • Updated • 2.96k • 8 • 3 -
Omartificial-Intelligence-Space/saudi-dialect-test-samples
Viewer • Updated • 1.28k • 23 • 4
DIRA – Diraya Arabic Reasoning AI
This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models.
-
Omartificial-Intelligence-Space/gpt-oss-math-ar
Updated • 1 • 3 -
Omartificial-Intelligence-Space/Fanar-Math-R1-GRPO
Text Generation • Updated • 8 • 3 -
Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar
Text Generation • Updated • 5 • 3 -
Omartificial-Intelligence-Space/Arabic-DeepSeek-R1-Distill-8B
Text Generation • Updated • 47 • 4
Arabic NLI & Semantic Similarity Datasets
The Arabic Version of SNLI and MultiNLI datasets, originally used for Natural Language Inference (NLI), may be used for finetuning embedding models.
-
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Score
Viewer • Updated • 981k • 12 • 3 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair
Viewer • Updated • 328k • 24 • 4 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class
Viewer • Updated • 981k • 171 • 2 -
Omartificial-Intelligence-Space/Arabic-Quora-Duplicates
Viewer • Updated • 149k • 22 • 2
AraEuroBERT
Ara-EuroBERT is a collection of Arabic Semantic Embeddings built on EuroBERT, delivering adaptive embeddings with ultra-long context.
ArabianLLM Series
native Arabian Pretrained GPT-2 models with different sizes (0.1B, 0.3B, 0.8B) trained on 20B+ Arabic tokens
- Runtime errorAgents4
ArabianGPT GroundPlay
📊4Generate text based on input using ArabianGPT models
-
ArabianGPT: Native Arabic GPT-based Large Language Model
Paper • 2402.15313 • Published • 3 -
riotu-lab/ArabianGPT-01B
Text Generation • Updated • 159 • 13 -
riotu-lab/ArabianGPT-08B-V2
Text Generation • 0.8B • Updated • 11
Huggingface FineWeb2 Arabic Dataset Portions
Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.
-
HuggingFaceFW/fineweb-2
Viewer • Updated • 4.48B • 112k • 784 -
Omartificial-Intelligence-Space/FineWeb2-MSA
Viewer • Updated • 907M • 117 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic
Viewer • Updated • 23.9M • 30 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic
Viewer • Updated • 69.6M • 57 • 3
Arabic Semantic Embeddings
Find Details for all models here: [https://www.omarai.me/embeddings]
- RunningAgents4
Qwen Arabic Semantic Suite
⚡4Process and analyze Arabic texts for similarity, classification, and more
-
Omartificial-Intelligence-Space/mmbert-base-arabic-nli
Sentence Similarity • 0.3B • Updated • 75 • 1 -
Omartificial-Intelligence-Space/AraGemma-Embedding-300m
Sentence Similarity • 0.3B • Updated • 72 • 14 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.31k • • 17
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria
-
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Paper • 2508.02268 • Published • 3 - RunningAgents
SHAMI MT App
🌍Translate Arabic between MSA and Syrian dialect
-
Omartificial-Intelligence-Space/Shami-MT
Translation • 0.4B • Updated • 88 • 1 -
Omartificial-Intelligence-Space/SHAMI-MT-2MSA
Translation • 0.4B • Updated • 25 • 1
Arabic Matryoshka & GATE Embedding Models
A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face
- Runtime errorAgents4
Matroyshka Eval Retrieval Ar
🌍4 -
GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training
Paper • 2505.24581 • Published • 2 -
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Paper • 2407.21139 • Published • 7 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.31k • • 17
Arabic Re-Ranking Hub
A comprehensive collection of datasets, models, and benchmarks for advancing Arabic Re-ranking systems.
- Runtime errorAgents3
Arabic Reranking Eval
🔥3Evaluate Arabic reranking models with insights
-
Omartificial-Intelligence-Space/ARA-Reranker-V1
Text Ranking • 0.6B • Updated • 1.2k • 4 -
NAMAA-Space/GATE-Reranker-V1
Text Ranking • 0.1B • Updated • 356 • 10 -
NAMAA-Space/Namaa-Reranker-v1
Text Ranking • 0.1B • Updated • 2 • 1
Arabic ModernBERT
This collection highlights efforts to enhance Arabic NLP tasks using the latest ModernBERT models.
Arabic LLAMA3 & 3.1 FineTuned Models
-
Omartificial-Intelligence-Space/Arabic-llama3.1-lora-FT
Text Generation • Updated • 8 • 11 -
Omartificial-Intelligence-Space/Arabic-llama3.1-16bit-FT
Text Generation • 8B • Updated • 16 • • 4 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b
Text Generation • 8B • Updated • 7 • 1 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b-GGUF
Text Generation • 8B • Updated • 2
Arab-culture-aligned Multimodal Embedding Models & Datasets
Where Visual Document Retrieval Goes Arabic
Arabic Semantic Embeddings
Find Details for all models here: [https://www.omarai.me/embeddings]
- RunningAgents4
Qwen Arabic Semantic Suite
⚡4Process and analyze Arabic texts for similarity, classification, and more
-
Omartificial-Intelligence-Space/mmbert-base-arabic-nli
Sentence Similarity • 0.3B • Updated • 75 • 1 -
Omartificial-Intelligence-Space/AraGemma-Embedding-300m
Sentence Similarity • 0.3B • Updated • 72 • 14 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.31k • • 17
Saudi Dialect Sentence Embedding Models Collection
Here is a collection of Saudi Dialect Embedding models with Sentence Embedding, classifiers and test dataset.
-
Omartificial-Intelligence-Space/SA-STS-Embeddings-0.2B
Feature Extraction • 0.2B • Updated • 23 • 1 -
Omartificial-Intelligence-Space/SA-BERT-V1
Fill-Mask • 0.2B • Updated • 15 • 4 -
Omartificial-Intelligence-Space/SaudiDialect-Triplet-21
Viewer • Updated • 2.96k • 8 • 3 -
Omartificial-Intelligence-Space/saudi-dialect-test-samples
Viewer • Updated • 1.28k • 23 • 4
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria
-
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Paper • 2508.02268 • Published • 3 - RunningAgents
SHAMI MT App
🌍Translate Arabic between MSA and Syrian dialect
-
Omartificial-Intelligence-Space/Shami-MT
Translation • 0.4B • Updated • 88 • 1 -
Omartificial-Intelligence-Space/SHAMI-MT-2MSA
Translation • 0.4B • Updated • 25 • 1
DIRA – Diraya Arabic Reasoning AI
This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models.
-
Omartificial-Intelligence-Space/gpt-oss-math-ar
Updated • 1 • 3 -
Omartificial-Intelligence-Space/Fanar-Math-R1-GRPO
Text Generation • Updated • 8 • 3 -
Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar
Text Generation • Updated • 5 • 3 -
Omartificial-Intelligence-Space/Arabic-DeepSeek-R1-Distill-8B
Text Generation • Updated • 47 • 4
Arabic Matryoshka & GATE Embedding Models
A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face
- Runtime errorAgents4
Matroyshka Eval Retrieval Ar
🌍4 -
GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training
Paper • 2505.24581 • Published • 2 -
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Paper • 2407.21139 • Published • 7 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.31k • • 17
Arabic NLI & Semantic Similarity Datasets
The Arabic Version of SNLI and MultiNLI datasets, originally used for Natural Language Inference (NLI), may be used for finetuning embedding models.
-
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Score
Viewer • Updated • 981k • 12 • 3 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair
Viewer • Updated • 328k • 24 • 4 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class
Viewer • Updated • 981k • 171 • 2 -
Omartificial-Intelligence-Space/Arabic-Quora-Duplicates
Viewer • Updated • 149k • 22 • 2
Arabic Re-Ranking Hub
A comprehensive collection of datasets, models, and benchmarks for advancing Arabic Re-ranking systems.
- Runtime errorAgents3
Arabic Reranking Eval
🔥3Evaluate Arabic reranking models with insights
-
Omartificial-Intelligence-Space/ARA-Reranker-V1
Text Ranking • 0.6B • Updated • 1.2k • 4 -
NAMAA-Space/GATE-Reranker-V1
Text Ranking • 0.1B • Updated • 356 • 10 -
NAMAA-Space/Namaa-Reranker-v1
Text Ranking • 0.1B • Updated • 2 • 1
AraEuroBERT
Ara-EuroBERT is a collection of Arabic Semantic Embeddings built on EuroBERT, delivering adaptive embeddings with ultra-long context.
Arabic ModernBERT
This collection highlights efforts to enhance Arabic NLP tasks using the latest ModernBERT models.
ArabianLLM Series
native Arabian Pretrained GPT-2 models with different sizes (0.1B, 0.3B, 0.8B) trained on 20B+ Arabic tokens
- Runtime errorAgents4
ArabianGPT GroundPlay
📊4Generate text based on input using ArabianGPT models
-
ArabianGPT: Native Arabic GPT-based Large Language Model
Paper • 2402.15313 • Published • 3 -
riotu-lab/ArabianGPT-01B
Text Generation • Updated • 159 • 13 -
riotu-lab/ArabianGPT-08B-V2
Text Generation • 0.8B • Updated • 11
Arabic LLAMA3 & 3.1 FineTuned Models
-
Omartificial-Intelligence-Space/Arabic-llama3.1-lora-FT
Text Generation • Updated • 8 • 11 -
Omartificial-Intelligence-Space/Arabic-llama3.1-16bit-FT
Text Generation • 8B • Updated • 16 • • 4 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b
Text Generation • 8B • Updated • 7 • 1 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b-GGUF
Text Generation • 8B • Updated • 2
Huggingface FineWeb2 Arabic Dataset Portions
Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.
-
HuggingFaceFW/fineweb-2
Viewer • Updated • 4.48B • 112k • 784 -
Omartificial-Intelligence-Space/FineWeb2-MSA
Viewer • Updated • 907M • 117 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic
Viewer • Updated • 23.9M • 30 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic
Viewer • Updated • 69.6M • 57 • 3