ECHO Tuned Embedding v2

A fine-tuned embedding model optimized for personal knowledge management and semantic search within the ATLES ecosystem.

Model Description

This model is a fine-tuned version of spartan8806/atles-champion-embedding (which itself is based on all-mpnet-base-v2). It has been specifically trained on personal knowledge data from the ATLES-ECHO system to improve semantic search relevance for:

  • Code and technical documentation
  • Screen captures and OCR text
  • Application usage patterns
  • Clipboard content
  • File system changes

Training Details

Base Model

  • Parent Model: spartan8806/atles-champion-embedding
  • Architecture: MPNet (110M parameters)
  • Embedding Dimension: 768

Training Data

The model was fine-tuned using semantic similarity pairs generated from 10K+ personal knowledge items:

Dataset Examples
Similarity Pairs 14,512
Triplets (Hard Negatives) 10,000
Total 24,512

Training Method

Unlike naive fine-tuning that uses structural heuristics (same file = similar), this model was trained using actual semantic similarity scores computed by the base model itself. This knowledge distillation approach ensures the model learns meaningful semantic relationships.

  • Method: CosineSimilarityLoss with real similarity labels
  • Hard Negatives: Triplets with carefully selected negatives (0.3-0.5 similarity range)
  • Epochs: 3
  • Batch Size: 16
  • Training Time: ~5 hours on NVIDIA GPU

Training Metrics

Epoch Loss Pearson Spearman
0.61 0.0013 0.9964 0.9781
1.22 0.0007 0.9983 0.9860
1.84 0.0004 0.9989 0.9880
Final 0.0002 0.9990 0.9886

Performance Comparison

Tested on domain-specific queries against the base model:

Query Base v2 Ξ”
Phoenix watcher implementation 0.3985 0.4896 +0.0910 βœ…
File watcher event handling 0.5436 0.6208 +0.0773 βœ…
ECHO knowledge base search 0.3186 0.3632 +0.0446 βœ…
FastAPI async endpoint 0.5228 0.5262 +0.0034 βœ…
ATLES embedding model training 0.2915 0.2957 +0.0041 βœ…
Screen capture OCR extraction 0.1935 0.1947 +0.0012 βœ…
Average 0.3542 0.3785 +0.0243
  • Improvement Rate: 75% of queries (6/8) improved
  • Encoding Speed: 29% faster than base model (1.63s vs 2.30s for 100 items)

Usage

With Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("path/to/echo-tuned-embedding-v2")

sentences = [
    "How does the file watcher handle events?",
    "Phoenix watcher implementation details",
    "Database query optimization"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 768)

For Semantic Search

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("path/to/echo-tuned-embedding-v2")

# Your knowledge base
documents = [
    "The file watcher monitors directory changes...",
    "Phoenix implements real-time event handling...",
    "FastAPI endpoints are defined with decorators..."
]

# Query
query = "How do I handle file system events?"

# Encode
doc_embeddings = model.encode(documents, normalize_embeddings=True)
query_embedding = model.encode(query, normalize_embeddings=True)

# Search
similarities = util.cos_sim(query_embedding, doc_embeddings)
print(similarities)

Intended Use

This model is designed for:

  • Personal knowledge management systems
  • Semantic search over mixed content (code, docs, screen text)
  • Similar document retrieval
  • Context-aware information retrieval

Limitations

  • Trained on English content only
  • Optimized for technical/developer-focused content
  • May not generalize well to domains significantly different from training data
  • Performance benefits are most pronounced for domain-specific queries

Model Card Authors

  • ATLES Development Team
  • Fine-tuned using the ATLES-ECHO knowledge embedding pipeline

Citation

@misc{echo-tuned-embedding-v2,
  author = {ATLES Team},
  title = {ECHO Tuned Embedding v2: Personal Knowledge Embedding Model},
  year = {2024},
  publisher = {HuggingFace},
  note = {Fine-tuned from spartan8806/atles-champion-embedding}
}
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for spartan8806/echo-tuned-embedding-v2

Finetuned
(1)
this model

Evaluation results

  • Pearson Correlation on ECHO Knowledge Base (Personal)
    self-reported
    0.999
  • Spearman Correlation on ECHO Knowledge Base (Personal)
    self-reported
    0.989
  • Average Improvement vs Base on ECHO Knowledge Base (Personal)
    self-reported
    0.024