view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 265
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs Collection A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria • 4 items • Updated Nov 28, 2025 • 2
view article Article How to train a new language model from scratch using Transformers and Tokenizers Feb 14, 2020 • 56
Yiddish Whisper Training Collection Yiddish based Whisper post-training - Crowd Sourced Open Data • 10 items • Updated Oct 24, 2025 • 4
Scaling Low-Res MT via Synthetic Data Generation with LLMs Collection Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025. • 8 items • Updated Sep 16, 2025 • 1
Scaling Low-Resource MT via Synthetic Data Generation with LLMs Paper • 2505.14423 • Published May 20, 2025 • 2
DictaBERT Collection Collection of state-of-the-art language model for Hebrew, finetuned for various tasks, as detailed in the article: https://arxiv.org/abs/2308.16687 • 17 items • Updated Apr 4, 2024 • 5
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction Paper • 2411.17835 • Published Nov 19, 2024 • 4
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community • 17 items • Updated Jun 6, 2024 • 245