Skip to content

Releases: UKPLab/sentence-transformers

v4.1.0 - ONNX and OpenVINO backends offering 2-3x speedups; improved hard negatives mining

15 Apr 13:47
Compare
Choose a tag to compare

This release introduces 2 new efficient computing backends for CrossEncoder (reranker) models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; improved hard negatives mining strategies, and minor improvements.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==4.1.0

# Inference only, use one of:
pip install sentence-transformers==4.1.0
pip install sentence-transformers[onnx-gpu]==4.1.0
pip install sentence-transformers[onnx]==4.1.0
pip install sentence-transformers[openvino]==4.1.0

Faster ONNX and OpenVINO Backends for CrossEncoder (#3319)

Introducing a new backend keyword argument to the CrossEncoder initialization, allowing values of "torch" (default), "onnx", and "openvino".
These require installing sentence-transformers with specific extras:

pip install sentence-transformers[onnx-gpu]
# or ONNX for CPU only:
pip install sentence-transformers[onnx]
# or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")

query = "Which planet is known as the Red Planet?"
passages = [
   "Venus is often called Earth's twin because of its similar size and proximity.",
   "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
   "Jupiter, the largest planet in our solar system, has a prominent red spot.",
   "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModelForSequenceClassification.from_pretrained or OVModelForSequenceClassification.from_pretrained. The most useful arguments are:

  • provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
  • file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
  • export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:

from sentence_transformers import SentenceTransformer

model = CrossEncoder(
    "cross-encoder/ms-marco-MiniLM-L6-v2",
	backend="onnx",
	model_kwargs={
		"file_name": "model_O3.onnx",
		"provider": "CPUExecutionProvider",
	}
)

query = "Which planet is known as the Red Planet?"
passages = [
   "Venus is often called Earth's twin because of its similar size and proximity.",
   "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
   "Jupiter, the largest planet in our solar system, has a prominent red spot.",
   "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:
image

For GPU, you can expect 1.88x speedup with fp16 at no cost, and for CPU you can expect ~3x speedup at no cost of accuracy in our evaluation. Your mileage with the accuracy hit for quantization may vary, but it seems to remain very small.

Read the Speeding up Inference documentation for more details.

ONNX & OpenVINO Optimization and Quantization

In addition to exporting default ONNX and OpenVINO models, you can also use one of the helper methods for optimizing and quantizing ONNX models:

ONNX Optimization

export_optimized_onnx_model: This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options here. This function accepts:

  • model A SentenceTransformer or CrossEncoder model loaded with backend="onnx".
  • optimization_config: "O1", "O2", "O3", or "O4" from ๐Ÿค— Optimum or a custom OptimizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the optimization_config string or "optimized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")
export_optimized_onnx_model(
	model=onnx_model,
	optimization_config="O4",
	model_name_or_path="cross-encoder/ms-marco-MiniLM-L6-v2",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import CrossEncoder

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = CrossEncoder(
    "cross-encoder/ms-marco-MiniLM-L6-v2",
    backend="onnx",
    model_kwargs={"file_name": "onnx/model_O4.onnx"},
    revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import CrossEncoder

model = CrossEncoder(
    "cross-encoder/ms-marco-MiniLM-L6-v2",
    backend="onnx",
    model_kwargs={"file_name": "onnx/model_O4.onnx"},
)

ONNX Quantization

export_dynamic_quantized_onnx_model: This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts

  • model A SentenceTransformer or CrossEncoder model loaded with backend="onnx".
  • quantization_config: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from AutoQuantizationConfig, or an QuantizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the quantization_config string or e.g. "int8_quantized" if not set.

The usage is like this:

from sentence_transformers import CrossEncoder, export_dynamic_quantized_onnx_model

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")
export_dynamic_quantized_onnx_model(
...
Read more

v4.0.2 - Safer reranker max sequence length logic, typing issues, FSDP & device placement

03 Apr 11:29
Compare
Choose a tag to compare

This patch release updates some logic for maximum sequence lengths, typing issues, FSDP training, and distributed training device placement.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==4.0.2

# Inference only, use one of:
pip install sentence-transformers==4.0.2
pip install sentence-transformers[onnx-gpu]==4.0.2
pip install sentence-transformers[onnx]==4.0.2
pip install sentence-transformers[openvino]==4.0.2

Safer CrossEncoder (reranker) maximum sequence length

When loading CrossEncoder models, we now rely on the minimum of the tokenizer model_max_length and the config max_position_embeddings (if they exist), rather than only relying on the latter if it exists. This previously resulted in the maximum sequence length of BAAI/bge-reranker-base being 514, whereas it can only handle sequences up to 512 tokens.

from sentence_transformers import CrossEncoder

model = CrossEncoder("BAAI/bge-reranker-base")
print(model.max_length)
# => 512

# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [0.99953485 0.01062613]

# Or test really long inputs to ensure that there's no crash:
score = model.predict([["one " * 1000, "two " * 1000]])
print(score)
# => [0.95482624]

Note that you can use the activation_fn option with torch.nn.Identity() to avoid the default Sigmoid that maps everything to [0, 1]:

from sentence_transformers import CrossEncoder
import torch

model = CrossEncoder("BAAI/bge-reranker-base", activation_fn=torch.nn.Identity())

# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [ 7.672551  -4.5337563]

Default device placement (#3303)

By default, in a distributed training setup with multiple CUDA devices, the model is now placed on the CUDA device corresponding with that local rank. This should lower the VRAM usage on GPU 0 when performing distributed training.

Minor patches of note

  • Resolved typing issues for SentenceTransformer class outside of the encode method. In v4.0.1, it was possible to no longer get help from your IDE for e.g. model.similarity, for example. (#3297)
  • Improve FSDP training compatibility by avoiding a faulty "only if model is wrapped"-check. Now, the wrapped model should always be laced in the loss class instance when required for FSDP training. (#3295)

All Changes

  • [docs]: update examples by @emmanuel-ferdman in #3292
  • Update htaccess, in-line comments were problematic by @tomaarsen in #3293
  • [docs] Resolve more broken links throughout the docs by @tomaarsen in #3294
  • [docs] Fix some broken docs redirects by @tomaarsen in #3296
  • [typing] Move encode typings back to .py from .pyi by @tomaarsen in #3297
  • [fix] Avoid "Only if model is wrapped" check which is faulty for FSDP by @tomaarsen in #3295
  • [cross-encoder] Set the tokenizer model_max_length to the min. of model_max_length & max_pos_embeds by @tomaarsen in #3304
  • [ci] Attempt to fix CI by @tomaarsen in #3305
  • Fix device assignment in get_device_name for distributed training by @uminaty in #3303
  • [docs] Add missing docstring for push_to_hub by @tomaarsen in #3306
  • [docs] Specify that exported ONNX/OpenVINO models don't include pooling/normalization by @tomaarsen in #3307

New Contributors

Full Changelog: v4.0.1...v4.0.2

v4.0.1 - Reranker (Cross Encoder) Training Refactor; new losses, docs, examples, etc.

26 Mar 14:59
Compare
Choose a tag to compare

This release consists of a major refactor that overhauls the reranker a.k.a. Cross Encoder training approach (introducing multi-gpu training, bf16, loss logging, callbacks, and much more), including all new Training Overview, Loss Overview, API Reference docs, training examples and more!

Install this version with

# Training + Inference
pip install sentence-transformers[train]==4.0.1

# Inference only, use one of:
pip install sentence-transformers==4.0.1
pip install sentence-transformers[onnx-gpu]==4.0.1
pip install sentence-transformers[onnx]==4.0.1
pip install sentence-transformers[openvino]==4.0.1

Tip

My Training and Finetuning Reranker Models with Sentence Transformers v4 blogpost is an excellent place to learn 1) why finetuning rerankers makes sense and 2) how you can do it, too!

Reranker (Cross Encoder) training refactor (#3222)

The v4.0 release centers around this huge modernization of the training approach for CrossEncoder models, following v3.0 which introduced the same for SentenceTransformer models. Whereas training before v4.0 used to be all about InputExample, DataLoader and model.fit, the new training approach relies on 5 components. You can learn more about these components in our Training and Finetuning Embedding Models with Sentence Transformers v4 blogpost. Additionally, you can read the new Training Overview, check out the Training Examples, or read this summary:

  1. Dataset
    A training Dataset or DatasetDict. This class is much more suited for sharing & efficient modifications than lists/DataLoaders of InputExample instances. A Dataset can contain multiple text columns that will be fed in order to the corresponding loss function. So, if the loss expects (anchor, positive, negative) triplets, then your dataset should also have 3 columns. The names of these columns are irrelevant. If there is a "label" or "score" column, it is treated separately, and used as the labels during training.
    A DatasetDict can be used to train with multiple datasets at once, e.g.:
    DatasetDict({
        natural_questions: Dataset({
            features: ['anchor', 'positive'],
            num_rows: 392702
        })
        gooaq: Dataset({
            features: ['anchor', 'positive', 'negative'],
            num_rows: 549367
        })
        stsb: Dataset({
            features: ['sentence1', 'sentence2', 'label'],
            num_rows: 5749
        })
    })
    When a DatasetDict is used, the loss parameter to the CrossEncoderTrainer must also be a dictionary with these dataset keys, e.g.:
    {
        'natural_questions': CachedMultipleNegativesRankingLoss(...),
        'gooaq': CachedMultipleNegativesRankingLoss(...),
        'stsb': BinaryCrossEntropyLoss(...),
    }
  2. Loss Function
    A loss function, or a dictionary of loss functions like described above.
  3. Training Arguments
    A CrossEncoderTrainingArguments instance, subclass of a TrainingArguments instance. This powerful class controls the specific details of the training.
  4. Evaluator
    An optional SentenceEvaluator instance. Unlike before, models can now be evaluated both on an evaluation dataset with some loss function and/or a SentenceEvaluator instance.
  5. Trainer
    The new CrossEncoderTrainer instance based on the transformers Trainer. This instance can be initialized with a CrossEncoder model, a CrossEncoderTrainingArguments class, a SentenceEvaluator, a training and evaluation Dataset/DatasetDict and a loss function/dict of loss functions. Most of these parameters are optional. Once provided, all you have to do is call trainer.train().

Some of the major features that are now implemented include:

  • MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
  • bf16 training support
  • Loss logging
  • Evaluation datasets + evaluation loss
  • Improved callback support (built-in via Weights and Biases, TensorBoard, CodeCarbon, etc., as well as custom callbacks)
  • Gradient checkpointing
  • Gradient accumulation
  • Improved model card generation
  • Warmup ratio
  • Pushing to the Hugging Face Hub on every model checkpoint
  • Resuming from a training checkpoint
  • Hyperparameter Optimization

This script is a minimal example (no evaluator, no training arguments) of training mpnet-base on a part of the sentence-transformers/hotpotqa dataset using BinaryCrossEntropyLoss:

from datasets import load_dataset
from sentence_transformers import CrossEncoder, CrossEncoderTrainer
from sentence_transformers.cross_encoder.losses import BinaryCrossEntropyLoss

# 1. Define the model. Either from scratch of by loading a pre-trained model
model = CrossEncoder("microsoft/mpnet-base")

# 2. Load a dataset to finetune on
dataset = load_dataset("sentence-transformers/hotpotqa", "triplet", split="train")

def triplet_to_labeled_pair(batch):
    anchors = batch["anchor"]
    positives = batch["positive"]
    negatives = batch["negative"]
    return {
        "sentence_A": anchors * 2,
        "sentence_B": positives + negatives,
        "labels": [1] * len(positives) + [0] * len(negatives),
    }

dataset = dataset.map(triplet_to_labeled_pair, batched=True, remove_columns=dataset.column_names)
train_dataset = dataset.select(range(10_000))
eval_dataset = dataset.select(range(10_000, 11_000))

# 3. Define a loss function
loss = BinaryCrossEntropyLoss(model)

# 4. Create a trainer & train
trainer = CrossEncoderTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss=loss,
)
trainer.train()

# 5. Save the trained model
model.save_pretrained("models/mpnet-base-hotpotqa")
# model.push_to_hub("mpnet-base-hotpotqa")

Additionally, trained models now automatically produce extensive model cards. Each of the following models were trained using some script from the Training Examples, and the model cards were not edited manually whatsoever:

Prior to the Sentence Transformer v4 release, all reranker models would be trained using the CrossEncoder.fit method. Rather than deprecating this method, starting from v4.0, this method will use the CrossEncoderTrainer behind the scenes. This means that your old training code should still work, and should even be upgraded with the new features such as multi-gpu training, loss logging, etc. That said, the new training approach is much more powerful, so it is recommended to write new training scripts using the new approach.

To help you out, all of the Cross Encoder (a.k.a. reranker) training scripts were updated to use the new Trainer-based approach.

Is finetuning worth it?

Finetuning reranker models on your data is very valuable. Consider for example these 2 models that I finetuned on 100k samples from the GooAQ dataset in 30 minutes and 1 hour, respectively. After finetuning, my models heavily outperformed general-purpose reranker models, even though GooAQ is a very generic dataset/domain!

Model size vs NDCG for Rerankers on GooAQ

Read my Training and Finetuning Reranker Models with Sentence Transformers v4 blogpost for many more details on these models and how they were trained.

Resources:

  • How to use Cross Encoder models? [Cross Encoder > Usage](ht...
Read more

v3.4.1 - Model2Vec compatibility & offline model fix

29 Jan 14:26
Compare
Choose a tag to compare

This release introduces a convenient compatibility with Model2Vec models, and fixes a bug that caused an outgoing request even when using a local model.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.4.1

# Inference only, use one of:
pip install sentence-transformers==3.4.1
pip install sentence-transformers[onnx-gpu]==3.4.1
pip install sentence-transformers[onnx]==3.4.1
pip install sentence-transformers[openvino]==3.4.1

Full Model2Vec integration

This release introduces support to load an efficient Model2Vec embedding model directly in Sentence Transformers:

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer(
    "minishlab/potion-base-8M",
    device="cpu",
)

# Run inference
sentences = [
    'Gadofosveset-enhanced MR angiography of carotid arteries: does steady-state imaging improve accuracy of first-pass imaging?',
    'To evaluate the diagnostic accuracy of gadofosveset-enhanced magnetic resonance (MR) angiography in the assessment of carotid artery stenosis, with digital subtraction angiography (DSA) as the reference standard, and to determine the value of reading first-pass, steady-state, and "combined" (first-pass plus steady-state) MR angiograms.',
    'In a longitudinal study we investigated in vivo alterations of CVO during neuroinflammation, applying Gadofluorine M- (Gf) enhanced magnetic resonance imaging (MRI) in experimental autoimmune encephalomyelitis, an animal model of multiple sclerosis. SJL/J mice were monitored by Gadopentate dimeglumine- (Gd-DTPA) and Gf-enhanced MRI after adoptive transfer of proteolipid-protein-specific T cells. Mean Gf intensity ratios were calculated individually for different CVO and correlated to the clinical disease course. Subsequently, the tissue distribution of fluorescence-labeled Gf as well as the extent of cellular inflammation was assessed in corresponding histological slices.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings[0], embeddings[1:])
print(similarities)
# tensor([[0.8085, 0.4884]])
Previously, loading a Model2Vec model required you to load a `StaticEmbedding` module.
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

# Download from the ๐Ÿค— Hub
module = StaticEmbedding.from_model2vec("minishlab/potion-base-8M")
model = SentenceTransformer(modules=[module], device="cpu")

# Run inference
sentences = [
    'Gadofosveset-enhanced MR angiography of carotid arteries: does steady-state imaging improve accuracy of first-pass imaging?',
    'To evaluate the diagnostic accuracy of gadofosveset-enhanced magnetic resonance (MR) angiography in the assessment of carotid artery stenosis, with digital subtraction angiography (DSA) as the reference standard, and to determine the value of reading first-pass, steady-state, and "combined" (first-pass plus steady-state) MR angiograms.',
    'In a longitudinal study we investigated in vivo alterations of CVO during neuroinflammation, applying Gadofluorine M- (Gf) enhanced magnetic resonance imaging (MRI) in experimental autoimmune encephalomyelitis, an animal model of multiple sclerosis. SJL/J mice were monitored by Gadopentate dimeglumine- (Gd-DTPA) and Gf-enhanced MRI after adoptive transfer of proteolipid-protein-specific T cells. Mean Gf intensity ratios were calculated individually for different CVO and correlated to the clinical disease course. Subsequently, the tissue distribution of fluorescence-labeled Gf as well as the extent of cellular inflammation was assessed in corresponding histological slices.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings[0], embeddings[1:])
print(similarities)
# tensor([[0.8085, 0.4884]])

Model2Vec was the inspiration of the recent Static Embedding work; all of these models can be used to approach the performance of normal transformer-based embedding models at a fraction of the latency. For example, both Model2Vec and Static Embedding models are ~25x faster than tiny embedding models on a GPU and ~400x faster than those models on a CPU.

Bug Fix

  • Using local_files_only=True still triggered a request to Hugging Face for the model card metadata; this has been resolved in (#3202).

All Changes

New Contributors

Full Changelog: v3.4.0...v3.4.1

v3.4.0 - Resolved memory leak when deleting a model & trainer; add Matryoshka & Cached loss compatibility; small features & bug fixes

23 Jan 15:21
Compare
Choose a tag to compare

This release resolves a memory leak when deleting a model & trainer, adds compatibility between the Cached... losses and the Matryoshka loss modifier, resolves numerous bugs, and adds several small features.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.4.0

# Inference only, use one of:
pip install sentence-transformers==3.4.0
pip install sentence-transformers[onnx-gpu]==3.4.0
pip install sentence-transformers[onnx]==3.4.0
pip install sentence-transformers[openvino]==3.4.0

Matryoshka & Cached loss compatibility (#3068, #3107)

It is now possible to combine the strong Cached losses (CachedMultipleNegativesRankingLoss, CachedGISTEmbedLoss, CachedMultipleNegativesSymmetricRankingLoss) with the Matryoshka loss modifier:

from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer, losses
from datasets import Dataset

model = SentenceTransformer("microsoft/mpnet-base")
train_dataset = Dataset.from_dict({
    "anchor": ["It's nice weather outside today.", "He drove to work."],
    "positive": ["It's so sunny.", "He took the car to the office."],
})
loss = losses.CachedMultipleNegativesRankingLoss(model, mini_batch_size=16)
loss = losses.MatryoshkaLoss(model, loss, [768, 512, 256, 128, 64])

trainer = SentenceTransformerTrainer(
    model=model,
    train_dataset=train_dataset,
    loss=loss,
)
trainer.train()

See for example tomaarsen/mpnet-base-gooaq-cmnrl-mrl which was trained with CachedMultipleNegativesRankingLoss (CMNRL) with the Matryoshka loss modifier (MRL).

Resolve memory leak when Model and Trainer are reinitialized (#3144)

Due to a circular dependency in the SentenceTransformerTrainer -> SentenceTransformer -> SentenceTransformerModelCardData -> SentenceTransformerTrainer, deleting the trainer and model still doesn't clear them up via garbage disposal. I've moved a lot of components around, and now SentenceTransformerModelCardData does not need to store the SentenceTransformerTrainer, breaking the cycle.

We ran the seed optimization script (which frequently creates and deletes models and trainers):

  • Before: Approximate highest recorded VRAM:
    16332MiB /  24576MiB
    
  • After: Approximate highest recorded VRAM:
    8222MiB /  24576MiB
    

Small Features

  • Add Matthews Correlation Coefficient to the BinaryClassificationEvaluator in #3051.
  • Add a triplet margin parameter to the TripletEvaluator in #2862.
  • Put dataset information in the automatically generated model card in "expanding sections" blocks if there are many datasets in #3088.
  • Add multi-GPU (and CPU multi-process) support for mine_hard_negatives in #2967.

Notable Bug Fixes

  • Subsequent batches were identical when using the no_duplicates Batch Sampler (#3069). This has been resolved in #3073
  • The old-style model.fit() training with write_csv on an evaluator would crash (#3062). This has been resolved in #3066.
  • The output types of some evaluators were np.float instead of float (#3075). This has been resolved in #3076 and #3096.
  • It was not possible to specify a revision or cache_dir when loading a PEFT Adapter model (#3061). This has been resolved in #3079 and #3174.
  • The CrossEncoder was lazily placed on the incorrect device, did not respond to model.to (#3078). This has been resolved in #3104.
  • If a model used a custom module with custom kwargs, those kwargs keys were not saved in modules.json correctly, e.g. relevant for jina-embeddings-v3 (#3111). This has been resolved in #3112.
  • HfArgumentParser(SentenceTransformerTrainingArguments) would crash due to prompts typing (#3090). This has been resolved in #3178.

Example Updates

  • Update the quantization script in #3070.
  • Update the seed optimization script in #3092.
  • Update the TSDAE scripts in #3137.
  • Add PEFT Adapter script in #3180.

Documentation Updates

All Changes

  • [training] Pass steps/epoch/output_path to Evaluator during training by @tomaarsen in #3066
  • [examples] Update the quantization script by @tomaarsen in #3070
  • [fix] Fix different batches per epoch in NoDuplicatesBatchSampler by @tomaarsen in #3073
  • [docs] Add links to backend-export in Speeding up Inference by @tomaarsen in #3071
  • add MCC to BinaryClassificationEvaluator by @JINO-ROHIT in #3051
  • support cached losses in combination with matryoshka loss by @Marcel256 in #3068
  • align model_card_templates.py with code by @amitport in #3081
  • converting np float result to float in binary classification evaluator by @JINO-ROHIT in #3076
  • Add triplet margin for distance functions in TripletEvaluator by @zivicmilos in #2862
  • [model_card] Keep the model card readable even with many datasets by @tomaarsen in #3088
  • [docs] Add NanoBEIR to the Training Overview evaluators by @tomaarsen in #3089
  • [fix] revision of the adapter model can now be specified. by @pesuchin in #3079
  • [docs] Update from Sphinx==3.5.4 to 8.1.3, recommonmark -> myst-parser by @tomaarsen in #3099
  • normalize to float in NanoBEIREvaluator, InformationRetrievalEvaluator, MSEEvaluator by @JINO-ROHIT in #3096
  • [docs] List 'prompts' as a key training argument by @tomaarsen in #3101
  • revert float type cast manually in BinaryClassificationEvaluator by @JINO-ROHIT in #3102
  • update train_sts_seed_optimization with SentenceTransformerTrainer by @JINO-ROHIT in #3092
  • Fix cross encoder device issue by @susnato in #3104
  • [enhancement] Make MultipleNegativesRankingLoss easier to understand by @tomaarsen in #3100
  • [fix] Fix breaking change in PyLate when loading modules by @tomaarsen in #3110
  • multi-GPU support for mine_hard_negatives by @alperctnkaya in #2967
  • raises error when dataset is an empty list in NanoBEIREvaluator by @JINO-ROHIT in #3122
  • Added a note to the documentation stating that the similarity method does not support embeddings other than non-quantized ones. by @pesuchin in #3131
  • [typo] Add missing space between sentences in error message by @tomaarsen in #3125
  • raises ValueError when num_label !=1 when using Crossencoder.rank() by @JINO-ROHIT in #3126
  • fix backward pass for cached losses by @Marcel256 in #3114
  • Adding evaluation checks to prevent Transformer ValueError by @stsfaroz in #3105
  • [typo] Fix incorrect spelling for "corpus" by @ignasgr in #3154
  • [fix] Save custom module kwargs if specified by @tomaarsen in #3112
  • [memory] Avoid storing trainer in ModelCardCallback and SentenceTransformerModelCardData by @tomaarsen in #3144
  • Suport for embedded representation by @Radu1999 in #3156
  • [DRAFT] tests for nanobeir evaluator by @JINO-ROHIT in #3127
  • Update TSDAE examples with SentenceTransformerTrainer by @JINO-ROHIT in #3137
  • [docs] Update the Static Embedding example snippet by @tomaarsen in #3177
  • fix: propagate cache dir to find adapter by @lauralehoczki11 in #3174
  • [fix] Use HfArgumentParser-compatible typing for prompts by @tomaarsen in #3178
  • testcases for community detection by @JINO-ROHIT in #3163
    ...
Read more

v3.3.1 - Patch private model loading without environment variable

18 Nov 14:38
Compare
Choose a tag to compare

This patch release fixes a small issue with loading private models from Hugging Face using the token argument.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.3.1

# Inference only, use one of:
pip install sentence-transformers==3.3.1
pip install sentence-transformers[onnx-gpu]==3.3.1
pip install sentence-transformers[onnx]==3.3.1
pip install sentence-transformers[openvino]==3.3.1

Details

If you're loading model under this scenario:

  • Your model is hosted on Hugging Face.
  • Your model is private.
  • You haven't set the HF_TOKEN environment variable via huggingface-cli login or some other approach.
  • You're passing the token argument to SentenceTransformer to load the model.

Then you may have encountered a crash in v3.3.0. This should be resolved now.

All Changes

  • [docs] Fix the prompt link to the training script by @tomaarsen in #3060
  • [Fix] Resolve loading private Transformer model in version 3.3.0 by @pesuchin in #3058

Full Changelog: v3.3.0...v3.3.1

v3.3.0 - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

11 Nov 12:02
Compare
Choose a tag to compare

4x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free performance boost, convenient evaluation on NanoBEIR: a subset of a strong Information Retrieval benchmark, PEFT compatibility by easily adding/loading adapters, Transformers v4.46.0 compatibility, and Python 3.8 deprecation.

Install this version with:

# Training + Inference
pip install sentence-transformers[train]==3.3.0

# Inference only, use one of:
pip install sentence-transformers==3.3.0
pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0

OpenVINO int8 static quantization (#3025)

We introduce int8 static quantization using OpenVINO, a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

Quantizing directly to the Hugging Face Hub

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

# 1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

# 2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# as a pull request:
export_static_quantized_openvino_model(
    model,
    quantization_config=None,
    model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
    push_to_hub=True,
    create_pr=True,
)

You can immediately use the model, even before it's merged, by using the revision argument:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
    revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)

Quantizing locally

You can also quantize a model and save it locally:

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() # <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "path/to/all-mpnet-base-v2-local",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)

All original Sentence Transformer models already have these new openvino_model_qint8_quantized.xml files, so you can load them without exporting directly! I would recommend making pull requests for other models on Hugging Face that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (#2964)

Many modern embedding models are trained with โ€œinstructionsโ€ or โ€œpromptsโ€ following the INSTRUCTOR paper. These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the mixedbread-ai/mxbai-embed-large-v1 model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the model configuration under the prompt name "query", so users can specify that prompt_name in model.encode:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")
# or
# query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
    "Pandas is a software library written for the Python programming language for data manipulation and analysis.",
    "Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
    "Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# => tensor([[0.7594, 0.7560, 0.4674]])

Various papers (INSTRUCTOR, BGE) show that including prompts or instructions both during training and inference results in stronger performance. As of this release, it's now possible to easily train with prompts in Sentence Transformers with just one extra training argument: prompts. There are 4 accepted formats for it:

  1. str: A single prompt to use for all columns in all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts="text: ",
        ...,
    )
  2. Dict[str, str]: A dictionary mapping column names to prompts, applied to all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "query": "query: ",
            "answer": "document: ",
        },
        ...,
    )
  3. Dict[str, str]: A dictionary mapping dataset names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": "Represent this text for semantic similarity search: ",
            "nq": "Represent this text for retrieval: ",
        },
        ...,
    )
  4. Dict[str, Dict[str, str]]: A dictionary mapping dataset names to dictionaries mapping column names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": {
                "sentence1": "sts: ",
                "sentence2": "sts: ",
            },
            "nq": {
                "query": "query: ",
                "document": "document: ",
            },
        },
        ...,
    )

I've trained models with and without prompts for 2 base models: mpnet-base and bert-base-uncased:

For both base models, the model with prompts consistently outperformed the baseline model. After training, the models with prompts resulted in a 0.66% and 0.90% relative improvement on NDCG@10 at no extra cost.

mpnet-base tests bert-base-uncased tests
mpnet_base_nq_nanobeir bert_base_nq_nanobeir

NanoBEIR Evaluator integration (#2966)

This update introduced a new simple NanoBEIREvaluator, evaluating your model against NanoBEIR: a collection of subsets of the 13 BEIR datasets. BEIR corresponds to the retrieval tab of MTEB, and is commonly seen as a valuable indicator of general-purpose information retrieval performance.

With the NanoBEIREvaluator, you can easily evaluate your models on a much faster benchmark that should give similar insights in performance...

Read more

v3.2.1 - Patch CLIP loading, small ONNX fix, compatibility with other libraries

21 Oct 12:19
Compare
Choose a tag to compare

This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.2.1

# Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1

Fixing Loading non-Transformer models

In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #3007.

Throw error if StaticEmbedding-based model is finetuned with incompatible losses

The following losses are not compatible with StaticEmbedding-based models:

  • CachedGISTEmbedLoss
  • CachedMultipleNegativesRankingLoss
  • CachedMultipleNegativesSymmetricRankingLoss
  • DenoisingAutoEncoderLoss
  • GISTEmbedLoss

An error is now thrown when one of these are used with a StaticEmbedding-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.

Patch ONNX model when the model uses output_hidden_states

For example, this script used to fail, but passes now:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "distiluse-base-multilingual-cased",
    backend="onnx",
    model_kwargs={"provider": "CPUExecutionProvider"},
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)

All changes

  • Bump optimum version by @echarlaix in #2984
  • [docs] Update the training snippets for some losses that should use the v3 Trainer by @tomaarsen in #2987
  • [enh] Throw error if StaticEmbedding-based model is trained with incompatible loss by @tomaarsen in #2990
  • [fix] Fix semantic_search_usearch with 'binary' by @tomaarsen in #2989
  • [enh] Add support for large_string in model card create by @yaohwang in #2999
  • [model cards] Prevent crash on generating widgets if dataset column is empty by @tomaarsen in #2997
  • [fix] Added model2vec import compatible with current and newer version by @Pringled in #2992
  • Fix cache_dir issue with loading CLIPModel by @BoPeng in #3007
  • [warn] Throw a warning if compute_metrics is set, as it's not used by @tomaarsen in #3002
  • [fix] Prevent IndexError if output_hidden_states & ONNX by @tomaarsen in #3008

New Contributors

Full Changelog: v3.2.0...v3.2.1

v3.2.0 - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

10 Oct 17:59
Compare
Choose a tag to compare

This release introduces 2 new efficient computing backends for SentenceTransformer models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; static embeddings via Model2Vec allowing for lightning-fast models (i.e., 50x-500x speedups) at a ~10%-20% performance cost; and various small improvements and fixes.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.2.0

# Inference only, use one of:
pip install sentence-transformers==3.2.0
pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0

Faster ONNX and OpenVINO Backends for SentenceTransformer (#2712)

Introducing a new backend keyword argument to the SentenceTransformer initialization, allowing values of "torch" (default), "onnx", and "openvino".
These come with new installations:

pip install sentence-transformers[onnx-gpu]
# or ONNX for CPU only:
pip install sentence-transformers[onnx]
# or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModel.from_pretrained or OVBaseModel.from_pretrained. The most useful arguments are:

  • provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
  • file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
  • export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
	backend="onnx",
	model_kwargs={
		"file_name": "model_O3.onnx",
		"provider": "CPUExecutionProvider",
	}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:
image

For GPU, you can expect 2x speedup with fp16 at no cost, and for CPU you can expect ~2.5x speedup at a cost of 0.4% accuracy.

ONNX Optimization and Quantization

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

export_optimized_onnx_model: This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options here. This function accepts:

  • model A SentenceTransformer model loaded with backend="onnx".
  • optimization_config: "O1", "O2", "O3", or "O4" from ๐Ÿค— Optimum or a custom OptimizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the optimization_config string or "optimized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
	model=onnx_model,
	optimization_config="O4",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
)

Quantization

export_dynamic_quantized_onnx_model: This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts

  • model A SentenceTransformer model loaded with backend="onnx".
  • quantization_config: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from AutoQuantizationConfig, or an QuantizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the quantization_config string or e.g. "int8_quantized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
	model=onnx_model,
	quantization_config="avx512",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)

Lightning-Fast Static Embeddings via Model2Vec (#2961)

If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to [GLoVe](https://nlp.stanford.edu/proje...

Read more

v3.1.1 - Patch hard negative mining & remove `numpy<2` restriction

19 Sep 14:18
Compare
Choose a tag to compare

This patch release fixes hard negatives mining for models that don't automatically normalize their embeddings and it lifts the numpy<2 restriction that was previously required.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.1.1

# Inference only:
pip install sentence-transformers==3.1.1

Hard Negatives Mining Patch (#2944)

The mine_hard_negatives utility introduced in the previous release would fail if use_faiss=True & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with all Sentence Transformer models:

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

# Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

# Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
    features: ['query', 'answer'],
    num_rows: 10000
})
"""

# Mine hard negatives
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    range_min=10,
    range_max=50,
    max_score=0.8,
    margin=0.1,
    num_negatives=5,
    sampling_strategy="random",
    batch_size=128,
    use_faiss=True,
)
'''
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 75/75 [00:21<00:00,  3.51it/s]
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 79/79 [00:03<00:00, 25.77it/s]
Querying FAISS index: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:00<00:00,  3.98it/s]
Metric       Positive       Negative     Difference
Count          10,000         47,711
Mean           0.7600         0.5376         0.2299
Median         0.7673         0.5379         0.2274
Std            0.0658         0.0387         0.0629
Min            0.3858         0.3732         0.1044
25%            0.7219         0.5129         0.1833
50%            0.7673         0.5379         0.2274
75%            0.8058         0.5617         0.2724
Max            0.9341         0.7024         0.4780
Skipped 48770 potential negatives (9.56%) due to the margin of 0.1.
Could not find enough negatives for 2289 samples (4.58%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
    features: ['query', 'answer', 'negative'],
    num_rows: 47711
})
'''
print(dataset[0])
'''
{
    'query': 'where is the us navy base in japan located',
    'answer': 'United States Fleet Activities Yokosuka The United States Fleet Activities Yokosuka (ๆจช้ ˆ่ณ€ๆตท ่ปๆ–ฝ่จญ, Yokosuka kaigunshisetsu) or Commander Fleet Activities Yokosuka (ๅธไปคๅฎ˜่‰ฆ้šŠๆดปๅ‹•ๆจช้ ˆ่ณ€, Shirei-kan kantai katsudล Yokosuka) is a United States Navy base in Yokosuka, Japan. Its mission is to maintain and operate base facilities for the logistic, recreational, administrative support and service of the U.S. Naval Forces Japan, Seventh Fleet and other operating forces assigned in the Western Pacific. CFAY is the largest strategically important U.S. naval installation in the western Pacific.[1] As of August 2013[update], it was commanded by Captain David Glenister.',
    'negative': "2011 Tลhoku earthquake and tsunami The earthquake took place at 14:46 JST (UTC 05:46) around 67\xa0km (42\xa0mi) from the nearest point on Japan's coastline, and initial estimates indicated the tsunami would have taken 10 to 30\xa0minutes to reach the areas first affected, and then areas farther north and south based on the geography of the coastline.[127][128] Just over an hour after the earthquake at 15:55 JST, a tsunami was observed flooding Sendai Airport, which is located near the coast of Miyagi Prefecture,[129][130] with waves sweeping away cars and planes and flooding various buildings as they traveled inland.[131][132] The impact of the tsunami in and around Sendai Airport was filmed by an NHK News helicopter, showing a number of vehicles on local roads trying to escape the approaching wave and being engulfed by it.[133] A 4-metre-high (13\xa0ft) tsunami hit Iwate Prefecture.[134] Wakabayashi Ward in Sendai was also particularly hard hit.[135] At least 101 designated tsunami evacuation sites were hit by the wave.[136]"
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")

Thanks to @omarnj-lab for pointing out the bug to me.

Numpy restriction lifted (#2937)

The v3.1.0 Sentence Transformers release required numpy<2 to prevent crashes on Windows. However, various third-parties (e.g. scipy) have now been recompiled & released, allowing the Windows tests to pass again.

If you experience the following snippet:

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Then consider 1) upgrading the dependency from which the error occurred or 2) downgrading numpy to below v2:

pip install -U numpy<2

Thanks to @kozlek for pointing this out to me and helping getting it resolved.

All changes

  • [deps] Attempt to remove numpy restrictions by @tomaarsen in #2937
  • [metadata] Extend pyproject.toml metadata by @tomaarsen in #2943
  • [fix] Ensure that the embeddings from hard negative mining are normalized by @tomaarsen in #2944

Full Changelog: v3.1.0...v3.1.1