Cybersecurity NER Model
NER model for cybersecurity domain. F1: 98.31%.
Model Details
Version: v5 Framework: spaCy 3.8+ Training Date: 2025-12-29 Examples: 1922 (stratified 80/10/10) Backbone: Domain-adapted RoBERTa
Entities (13)
| Entity | F1 | Examples |
|---|---|---|
| CERTIFICATION | 100% | CISSP, OSCP, CEH |
| SECURITY_ROLE | 100% | CISO, SOC Analyst |
| SECURITY_TOOL | 100% | Splunk, Metasploit |
| ATTACK_TECHNIQUE | 100% | SQL Injection, XSS |
| FRAMEWORK | 100% | NIST CSF, ISO 27001 |
| THREAT_TYPE | 100% | APT, ransomware |
| AUDIT_TERM | 100% | Compliance, Audit |
| CVE | 100% | CVE-2021-44228 |
| SECURITY_DOMAIN | 99.10% | Cloud Security |
| TECHNICAL_SKILL | 95.30% | Incident Response |
| REGULATION | 94.44% | GDPR, HIPAA |
| ACRONYM | 88.89% | SIEM, EDR |
| CONTROL_ID | 0% | See hybrid approach |
Performance
Metrics:
- F1: 98.31%
- Precision: 97.92%
- Recall: 98.69%
- Inference: ~60ms/doc
v5 changes from v4:
- Tuned hyperparameters (dropout 0.25, L2 0.02)
- Improved REGULATION (+6.64pp), ACRONYM (+22.22pp)
- Overall +0.25pp F1
CONTROL_ID Handling
Model F1 for CONTROL_ID: 0% (insufficient training data: 25 examples).
Solution: Hybrid approach - regex extraction for production use.
Patterns: ISO 27001, NIST CSF, CIS Controls, SOC 2, PCI-DSS.
See service implementation for details.
Usage
pip install spacy>=3.7.0 spacy-transformers>=1.3.0
import spacy
nlp = spacy.load("pki/ner-cybersecurity")
doc = nlp("CISO with CISSP, expert in Splunk and ISO 27001")
for ent in doc.ents:
print(f"{ent.text:20} | {ent.label_}")
Output:
CISO | SECURITY_ROLE
CISSP | CERTIFICATION
Splunk | SECURITY_TOOL
ISO 27001 | FRAMEWORK
Use Cases
- Job/CV matching
- Threat intelligence extraction
- Compliance documentation parsing
- Security policy analysis
Training Config
max_steps = 8000
dropout = 0.25
L2 = 0.02
learning_rate = 0.00003
hidden_width = 128
maxout_pieces = 3
batch_size = 128
Limitations
- ACRONYM: Lower F1 (88.89%) - limited examples (46)
- CONTROL_ID: Requires hybrid regex approach
- Domain-specific: Optimized for cybersecurity text
- Context-dependent ambiguity on some terms
License
MIT
Version History
| Version | Date | F1 | Examples | Notes |
|---|---|---|---|---|
| v5 | 2025-12-29 | 98.31% | 1922 | Hyperparameter tuning |
| v4 | 2025-12-29 | 98.06% | 1922 | Stratified split, domain RoBERTa |
| v3 | 2025-01 | 69.4% | 1000 | spaCy 3.x migration |
| v2 | 2024-12 | 99.5%* | 1805 | spaCy 2.x (*train accuracy) |
Contact
Issues: Model repository
- Downloads last month
- -
Evaluation results
- F1self-reported0.983
- Precisionself-reported0.979
- Recallself-reported0.987