ECHO Tuned Embedding v2
A fine-tuned embedding model optimized for personal knowledge management and semantic search within the ATLES ecosystem.
Model Description
This model is a fine-tuned version of spartan8806/atles-champion-embedding (which itself is based on all-mpnet-base-v2). It has been specifically trained on personal knowledge data from the ATLES-ECHO system to improve semantic search relevance for:
- Code and technical documentation
- Screen captures and OCR text
- Application usage patterns
- Clipboard content
- File system changes
Training Details
Base Model
- Parent Model:
spartan8806/atles-champion-embedding - Architecture: MPNet (110M parameters)
- Embedding Dimension: 768
Training Data
The model was fine-tuned using semantic similarity pairs generated from 10K+ personal knowledge items:
| Dataset | Examples |
|---|---|
| Similarity Pairs | 14,512 |
| Triplets (Hard Negatives) | 10,000 |
| Total | 24,512 |
Training Method
Unlike naive fine-tuning that uses structural heuristics (same file = similar), this model was trained using actual semantic similarity scores computed by the base model itself. This knowledge distillation approach ensures the model learns meaningful semantic relationships.
- Method: CosineSimilarityLoss with real similarity labels
- Hard Negatives: Triplets with carefully selected negatives (0.3-0.5 similarity range)
- Epochs: 3
- Batch Size: 16
- Training Time: ~5 hours on NVIDIA GPU
Training Metrics
| Epoch | Loss | Pearson | Spearman |
|---|---|---|---|
| 0.61 | 0.0013 | 0.9964 | 0.9781 |
| 1.22 | 0.0007 | 0.9983 | 0.9860 |
| 1.84 | 0.0004 | 0.9989 | 0.9880 |
| Final | 0.0002 | 0.9990 | 0.9886 |
Performance Comparison
Tested on domain-specific queries against the base model:
| Query | Base | v2 | Ξ |
|---|---|---|---|
| Phoenix watcher implementation | 0.3985 | 0.4896 | +0.0910 β |
| File watcher event handling | 0.5436 | 0.6208 | +0.0773 β |
| ECHO knowledge base search | 0.3186 | 0.3632 | +0.0446 β |
| FastAPI async endpoint | 0.5228 | 0.5262 | +0.0034 β |
| ATLES embedding model training | 0.2915 | 0.2957 | +0.0041 β |
| Screen capture OCR extraction | 0.1935 | 0.1947 | +0.0012 β |
| Average | 0.3542 | 0.3785 | +0.0243 |
- Improvement Rate: 75% of queries (6/8) improved
- Encoding Speed: 29% faster than base model (1.63s vs 2.30s for 100 items)
Usage
With Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("path/to/echo-tuned-embedding-v2")
sentences = [
"How does the file watcher handle events?",
"Phoenix watcher implementation details",
"Database query optimization"
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (3, 768)
For Semantic Search
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("path/to/echo-tuned-embedding-v2")
# Your knowledge base
documents = [
"The file watcher monitors directory changes...",
"Phoenix implements real-time event handling...",
"FastAPI endpoints are defined with decorators..."
]
# Query
query = "How do I handle file system events?"
# Encode
doc_embeddings = model.encode(documents, normalize_embeddings=True)
query_embedding = model.encode(query, normalize_embeddings=True)
# Search
similarities = util.cos_sim(query_embedding, doc_embeddings)
print(similarities)
Intended Use
This model is designed for:
- Personal knowledge management systems
- Semantic search over mixed content (code, docs, screen text)
- Similar document retrieval
- Context-aware information retrieval
Limitations
- Trained on English content only
- Optimized for technical/developer-focused content
- May not generalize well to domains significantly different from training data
- Performance benefits are most pronounced for domain-specific queries
Model Card Authors
- ATLES Development Team
- Fine-tuned using the ATLES-ECHO knowledge embedding pipeline
Citation
@misc{echo-tuned-embedding-v2,
author = {ATLES Team},
title = {ECHO Tuned Embedding v2: Personal Knowledge Embedding Model},
year = {2024},
publisher = {HuggingFace},
note = {Fine-tuned from spartan8806/atles-champion-embedding}
}
- Downloads last month
- 12
Model tree for spartan8806/echo-tuned-embedding-v2
Base model
spartan8806/atles-champion-embeddingEvaluation results
- Pearson Correlation on ECHO Knowledge Base (Personal)self-reported0.999
- Spearman Correlation on ECHO Knowledge Base (Personal)self-reported0.989
- Average Improvement vs Base on ECHO Knowledge Base (Personal)self-reported0.024