MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3 AdaMLLab/AraMix Viewer • Updated 6 days ago • 394M • 2.2k • 5 AdaMLLab/TurMix Viewer • Updated 6 days ago • 681M • 667 • 2 AdaMLLab/HinMix Viewer • Updated 6 days ago • 179M • 288 • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3 AdaMLLab/AraMix Viewer • Updated 6 days ago • 394M • 2.2k • 5 AdaMLLab/TurMix Viewer • Updated 6 days ago • 681M • 667 • 2 AdaMLLab/HinMix Viewer • Updated 6 days ago • 179M • 288 • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3