SentenceTransformer based on google-bert/bert-large-uncased

This is a sentence-transformers model finetuned from google-bert/bert-large-uncased on the all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-large-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'A construction worker peeking out of a manhole while his coworker sits on the sidewalk smiling.',
    'A worker is looking out of a manhole.',
    'The workers are both inside the manhole.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8028, 0.6435],
#         [0.8028, 1.0000, 0.7869],
#         [0.6435, 0.7869, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7988
spearman_cosine 0.8165

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.46 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 12.81 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 13.4 tokens
    • max: 50 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768
        ],
        "matryoshka_weights": [
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 17.95 tokens
    • max: 63 tokens
    • min: 4 tokens
    • mean: 9.78 tokens
    • max: 29 tokens
    • min: 5 tokens
    • mean: 10.35 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768
        ],
        "matryoshka_weights": [
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 15
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss sts-dev_spearman_cosine
-1 -1 - - 0.5941
0.0287 500 1.9263 0.7269 0.8006
0.0574 1000 0.8808 0.4899 0.8306
0.0860 1500 0.6811 0.3757 0.8432
0.1147 2000 0.5842 0.3250 0.8448
0.1434 2500 0.5269 0.3007 0.8472
0.1721 3000 0.4937 0.2855 0.8541
0.2008 3500 0.4717 0.2636 0.8510
0.2294 4000 0.4398 0.2596 0.8509
0.2581 4500 0.43 0.2507 0.8575
0.2868 5000 0.4094 0.2419 0.8566
0.3155 5500 0.3927 0.2349 0.8595
0.3442 6000 0.3904 0.2356 0.8568
0.3729 6500 0.3844 0.2275 0.8510
0.4015 7000 0.377 0.2220 0.8560
0.4302 7500 0.363 0.2235 0.8412
0.4589 8000 0.3616 0.2305 0.8531
0.4876 8500 0.3733 0.2306 0.8457
0.5163 9000 0.3675 0.2290 0.8460
0.5449 9500 0.358 0.2291 0.8459
0.5736 10000 0.3322 0.2218 0.8479
0.6023 10500 0.3376 0.2254 0.8339
0.6310 11000 0.3308 0.2140 0.8428
0.6597 11500 0.3475 0.2382 0.8339
0.6883 12000 0.3498 0.2172 0.8325
0.7170 12500 0.3266 0.2290 0.8479
0.7457 13000 0.3214 0.2297 0.8355
0.7744 13500 0.3237 0.2363 0.8325
0.8031 14000 0.3108 0.2334 0.8307
0.8318 14500 0.3143 0.3627 0.7954
0.8604 15000 0.3156 0.2238 0.8378
0.8891 15500 0.3204 0.2271 0.8390
0.9178 16000 0.314 0.2332 0.8349
0.9465 16500 0.3074 0.2277 0.8324
0.9752 17000 0.2937 0.2326 0.8274
1.0038 17500 0.2919 0.2350 0.8288
1.0325 18000 0.2483 0.2381 0.8367
1.0612 18500 0.2534 0.2397 0.8227
1.0899 19000 0.2699 0.2495 0.8221
1.1186 19500 0.2691 0.2468 0.8193
1.1472 20000 0.2843 0.2462 0.8346
1.1759 20500 0.2736 0.2387 0.8321
1.2046 21000 0.2728 0.2415 0.8364
1.2333 21500 0.2769 0.2483 0.8301
1.2620 22000 0.2633 0.2582 0.8340
1.2907 22500 0.2719 0.2484 0.8295
1.3193 23000 0.2787 0.2606 0.8297
1.3480 23500 0.2812 0.2595 0.8290
1.3767 24000 0.2868 0.2659 0.8208
1.4054 24500 0.2776 0.2520 0.8369
1.4341 25000 0.2772 0.2759 0.8307
1.4627 25500 0.2887 0.2735 0.8198
1.4914 26000 0.2892 0.2787 0.8367
1.5201 26500 0.2779 0.2612 0.8173
1.5488 27000 0.2791 0.2593 0.8230
1.5775 27500 0.2939 0.2678 0.8256
1.6061 28000 0.2808 0.2729 0.8241
1.6348 28500 0.2913 0.2700 0.8163
1.6635 29000 0.2919 0.2855 0.8315
1.6922 29500 0.284 0.2684 0.8338
1.7209 30000 0.2867 0.2703 0.8254
1.7496 30500 0.2781 0.2738 0.8186
1.7782 31000 0.2806 0.2621 0.8170
1.8069 31500 0.2859 0.2727 0.8197
1.8356 32000 0.2732 0.2716 0.8238
1.8643 32500 0.2797 0.2728 0.8178
1.8930 33000 0.2701 0.2715 0.8219
1.9216 33500 0.265 0.2638 0.8250
1.9503 34000 0.275 0.2660 0.8188
1.9790 34500 0.2684 0.2765 0.8112
2.0077 35000 0.2607 0.2648 0.8151
2.0364 35500 0.197 0.2673 0.8123
2.0650 36000 0.2075 0.2706 0.8129
2.0937 36500 0.2111 0.2647 0.8263
2.1224 37000 0.2202 0.2736 0.8133
2.1511 37500 0.2135 0.2640 0.8118
2.1798 38000 0.2229 0.2667 0.8166
2.2085 38500 0.209 0.2622 0.8090
2.2371 39000 0.2039 0.2639 0.8104
2.2658 39500 0.2113 0.2827 0.8235
2.2945 40000 0.2065 0.2698 0.8151
2.3232 40500 0.21 0.2593 0.8155
2.3519 41000 0.2083 0.2733 0.7975
2.3805 41500 0.231 0.2822 0.8088
2.4092 42000 0.2109 0.2667 0.8180
2.4379 42500 0.2006 0.2791 0.8071
2.4666 43000 0.2131 0.2747 0.8230
2.4953 43500 0.2101 0.2674 0.8165

Framework Versions

  • Python: 3.13.0
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
50
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sobamchan/bert-large-uncased-no-mrl

Finetuned
(166)
this model

Dataset used to train sobamchan/bert-large-uncased-no-mrl

Papers for sobamchan/bert-large-uncased-no-mrl

Evaluation results