You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a small database of 7700 entries with embedding
the JSON file on disk takes 50mb
the Database Persist in JSON on disk takes 222Mb
the Database Persist in Binary on disk takes 262Mb
Restoring the database takes a lot of minutes JSON or Binary and takes 3gb of Memory.
I don't understand why it takes so much time where the idex should have been created
To Reproduce
persist(db, 'binary') a database of 7000 entries with embedding
restore('binary', config.payload)
It takes 10 minutes ? The CPU switch 100% - 0% that's weird and memory crash
If I create a DataBase then insertMultiple() the raw JSON+Embedding it takes < 1 minute and 1gb of memory
Expected behavior
Loading the Binary data should be straight forward. Even if the memory allocation takes time it shouldn't be that much.
Environment Info
- Linux
- Node-RED v4
- Orama 3.1.6
Affected areas
Initialization
Additional context
No response
The text was updated successfully, but these errors were encountered:
Hello, the Embeding Model is OpenAI Large 3 with 640 dimensions.
My work around is to simply rebuild the db and fill it from a JSON file aside. Very fast.
But I assume if the DB evolve I'd have to maintains the JS file.
Ok so 640 * 4 byte per dimension (f32) * 7700 = 19,712MB of embeddings. So embeddings are not likely the problem. I'm wondering if using SeqProto (https://github.com/oramasearch/seqproto) would speed this up... it certainly compresses it way more than JSON would do. Let me talk with @allevo and see if we can use it
Describe the bug
I have a small database of 7700 entries with embedding
Restoring the database takes a lot of minutes JSON or Binary and takes 3gb of Memory.
I don't understand why it takes so much time where the idex should have been created
To Reproduce
If I create a DataBase then insertMultiple() the raw JSON+Embedding it takes < 1 minute and 1gb of memory
Expected behavior
Loading the Binary data should be straight forward. Even if the memory allocation takes time it shouldn't be that much.
Environment Info
Affected areas
Initialization
Additional context
No response
The text was updated successfully, but these errors were encountered: