mjbommar commited on
Commit
0674cd8
·
verified ·
1 Parent(s): bbc3b0b

Upload binary-tokenizer-001-4k tokenizer

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -37,7 +37,7 @@ A cross-platform BPE tokenizer for binary executables and machine code. Trained
37
  ## Training Configuration
38
 
39
  **Training Corpus**:
40
- - Source: `/nas4/data/glaurung-data/binaries-small/`
41
  - Size: ~13 GB
42
  - Files: 30,738 binary files
43
  - Platforms: Linux (ELF), Windows (PE), macOS (Mach-O), Android (APK)
 
37
  ## Training Configuration
38
 
39
  **Training Corpus**:
40
+ - Source: [`mjbommar/binary-30k-tokenized`](https://huggingface.co/datasets/mjbommar/binary-30k-tokenized)
41
  - Size: ~13 GB
42
  - Files: 30,738 binary files
43
  - Platforms: Linux (ELF), Windows (PE), macOS (Mach-O), Android (APK)