bluelightai-dev/clt-eval-modernbert-tokenized
Viewer
• Updated
• 328k • 27
bluelightai-dev/clt-train-modernbert-tokenized
Viewer
• Updated
• 1.94M • 37
bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
• Updated
• 212k • 39
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
• Updated
• 4.04M • 13
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
• Updated
• 1.81M • 268
bluelightai-dev/clt-pretrain-data-v3
Viewer
• Updated
• 2.99M • 9
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
• Updated
• 6.32M • 7
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
• Updated
• 4.97M • 37
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
• Updated
• 115k • 30
bluelightai-dev/clt-mixed-eval-data
Viewer
• Updated
• 60k • 24
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
• Updated
• 2.6M • 43
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
• Updated
• 194k • 35
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
• Updated
• 2.52M • 51
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
• Updated
• 19
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
• Updated
• 2.44M • 65
bluelightai-dev/clt-pretrain-data-v2
Preview
• Updated
• 67
bluelightai-dev/MathPile_Commercial-formatted
Viewer
• Updated
• 389k • 57
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
• Updated
• 1.34M • 67
bluelightai-dev/common-corpus-sample-open-web
Viewer
• Updated
• 4.8M • 52
bluelightai-dev/common-corpus-sample-open-source
Viewer
• Updated
• 2.02M • 36
bluelightai-dev/common-corpus-sample-open-science
Viewer
• Updated
• 284k • 38
bluelightai-dev/common-corpus-sample-open-government
Viewer
• Updated
• 373k • 39
• 1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
• Updated
• 462k • 57
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
• Updated
• 1.22k • 11
bluelightai-dev/dclm-full-deduped-sample
Viewer
• Updated
• 4.92M • 70
bluelightai-dev/the-stack-dedup-sample
Viewer
• Updated
• 474k • 39
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
• Updated
• 3.5M • 74
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
• Updated
• 245k • 85
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
• Updated
• 16.7M • 158
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
• Updated
• 1.34M • 90