Thoughts (and potential confusion) on embeddings & meilisearch #828
Replies: 1 comment
-
hello, not sure why the discussion is closed, but I wanted to make sure I would answer you 😊
What are your expectations in terms of documentation here? The exact value was chosen to leave some margin for float approximation while making reasonably sure that the same model is used. As a first step and as a means to prevent user error, we decided to enforce the same model with the same characteristics is used.
This would be nice, but only works if the geometric transformation does not alter the relative position of the vectors, otherwise the similarity check will give poor results.
@meilisearch/docs-team
When declaring a composite embedder, Meilisearch attempts to embed the same texts with both embedders. If that fails, the settings task will fail.
AFAIK from my experience adding support for the BERT models, supporting a model architecture using candle involves doing some tensor operations, pooling, detection of the architecture, etc in code. So adding new architectures hasn't been done yet because it is engineering time. We might add some in the future.
We definitely have something in the oven regarding this :-) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Less of a bug and more just some general feedback/ideas ! (sorry in advance if this is annoying)
on the topic of composite embedders
I initially understood composite embedders in your product as a rerank tool; one model is used to generate a query embedding to quickly fetch semantically similar vectors in the DB, after which the second model reorganizes the retrieved entries in terms of relevance. I think the docs could try to distinguish more how the "composite" embedding approach is distinct from the concept of reranking, especially since both techniques involve specifying 2 embedding models and that rerank is quite well known.
In a notion page I found on the feature it's also written that (though I haven't checked the code to be sure if its enforced)
How was this tolerance decided/and or where is it documented? If the local model is meant to optimize for latency then using quantized or half precision models could be a nice-to-have. But due to the nature of quantization we might not respect that max distance of 0.01 tolerance, even though the local model is derived from the remote.
limited support for hugging face models
Okay the title is a bit of a misnomer since all the top models on the MTEB are BertModels, but if someone passes the hugging face ID of a model which is not a Bert architecture (e.g. a DebertaV2 or XLM Roberta) then the embedding job may fail (depending or not on whether you could coerce the weights into a BertModel here in the code).
This is kind of addressed here in the settings doc, but I think the restriction to embedding models with BertModel architectures for embeddings should maybe be mentioned more explicitly or in the known limitations page.
an aside/thought experiment (or a gedankenexperiment as they call it in germany)
This model:
deepvk/deberta-v1-base
for results in a 500 when i try configuring my embedder settings for an index;Now imagine I'm using composite embedders with my XLM Roberta remote endpoint which handles generating embeddings no issue, whereas the local model crashes on load -- how is this failure mode handled ?
looking forwards
Some features related to embeddings i think would be fun (but probably out of scope for right now)
Anyways I covered a lot of ground here, thanks and feel free to let me know if any of this is planned and/or relevant !
Beta Was this translation helpful? Give feedback.
All reactions