Upload binary-tokenizer-001-4k tokenizer
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ A cross-platform BPE tokenizer for binary executables and machine code. Trained
|
|
| 37 |
## Training Configuration
|
| 38 |
|
| 39 |
**Training Corpus**:
|
| 40 |
-
- Source:
|
| 41 |
- Size: ~13 GB
|
| 42 |
- Files: 30,738 binary files
|
| 43 |
- Platforms: Linux (ELF), Windows (PE), macOS (Mach-O), Android (APK)
|
|
|
|
| 37 |
## Training Configuration
|
| 38 |
|
| 39 |
**Training Corpus**:
|
| 40 |
+
- Source: [`mjbommar/binary-30k-tokenized`](https://huggingface.co/datasets/mjbommar/binary-30k-tokenized)
|
| 41 |
- Size: ~13 GB
|
| 42 |
- Files: 30,738 binary files
|
| 43 |
- Platforms: Linux (ELF), Windows (PE), macOS (Mach-O), Android (APK)
|