ECHO Tuned Embedding v2

A fine-tuned embedding model optimized for personal knowledge management and semantic search within the ATLES ecosystem.

Model Description

This model is a fine-tuned version of spartan8806/atles-champion-embedding (which itself is based on all-mpnet-base-v2). It has been specifically trained on personal knowledge data from the ATLES-ECHO system to improve semantic search relevance for:

Code and technical documentation
Screen captures and OCR text
Application usage patterns
Clipboard content
File system changes

Training Details

Base Model

Parent Model: spartan8806/atles-champion-embedding
Architecture: MPNet (110M parameters)
Embedding Dimension: 768

Training Data

The model was fine-tuned using semantic similarity pairs generated from 10K+ personal knowledge items:

Dataset	Examples
Similarity Pairs	14,512
Triplets (Hard Negatives)	10,000
Total	24,512

Training Method

Unlike naive fine-tuning that uses structural heuristics (same file = similar), this model was trained using actual semantic similarity scores computed by the base model itself. This knowledge distillation approach ensures the model learns meaningful semantic relationships.

Method: CosineSimilarityLoss with real similarity labels
Hard Negatives: Triplets with carefully selected negatives (0.3-0.5 similarity range)
Epochs: 3
Batch Size: 16
Training Time: ~5 hours on NVIDIA GPU

Training Metrics

Epoch	Loss	Pearson	Spearman
0.61	0.0013	0.9964	0.9781
1.22	0.0007	0.9983	0.9860
1.84	0.0004	0.9989	0.9880
Final	0.0002	0.9990	0.9886

Performance Comparison

Tested on domain-specific queries against the base model:

Query	Base	v2	Δ
Phoenix watcher implementation	0.3985	0.4896	+0.0910 ✅
File watcher event handling	0.5436	0.6208	+0.0773 ✅
ECHO knowledge base search	0.3186	0.3632	+0.0446 ✅
FastAPI async endpoint	0.5228	0.5262	+0.0034 ✅
ATLES embedding model training	0.2915	0.2957	+0.0041 ✅
Screen capture OCR extraction	0.1935	0.1947	+0.0012 ✅
Average	0.3542	0.3785	+0.0243

Improvement Rate: 75% of queries (6/8) improved
Encoding Speed: 29% faster than base model (1.63s vs 2.30s for 100 items)

Usage

With Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("path/to/echo-tuned-embedding-v2")

sentences = [
    "How does the file watcher handle events?",
    "Phoenix watcher implementation details",
    "Database query optimization"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 768)

For Semantic Search

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("path/to/echo-tuned-embedding-v2")

# Your knowledge base
documents = [
    "The file watcher monitors directory changes...",
    "Phoenix implements real-time event handling...",
    "FastAPI endpoints are defined with decorators..."
]

# Query
query = "How do I handle file system events?"

# Encode
doc_embeddings = model.encode(documents, normalize_embeddings=True)
query_embedding = model.encode(query, normalize_embeddings=True)

# Search
similarities = util.cos_sim(query_embedding, doc_embeddings)
print(similarities)

Intended Use

This model is designed for:

Personal knowledge management systems
Semantic search over mixed content (code, docs, screen text)
Similar document retrieval
Context-aware information retrieval

Limitations

Trained on English content only
Optimized for technical/developer-focused content
May not generalize well to domains significantly different from training data
Performance benefits are most pronounced for domain-specific queries

Model Card Authors

ATLES Development Team
Fine-tuned using the ATLES-ECHO knowledge embedding pipeline

Citation

@misc{echo-tuned-embedding-v2,
  author = {ATLES Team},
  title = {ECHO Tuned Embedding v2: Personal Knowledge Embedding Model},
  year = {2024},
  publisher = {HuggingFace},
  note = {Fine-tuned from spartan8806/atles-champion-embedding}
}

Downloads last month: 12

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for spartan8806/echo-tuned-embedding-v2

Base model

spartan8806/atles-champion-embedding

Finetuned

(1)

this model

Evaluation results

Pearson Correlation on ECHO Knowledge Base (Personal)
self-reported

0.999
Spearman Correlation on ECHO Knowledge Base (Personal)
self-reported

0.989
Average Improvement vs Base on ECHO Knowledge Base (Personal)
self-reported

0.024