Persist / Restore of large file takes a lot of time #932

JpEncausse · 2025-05-05T16:37:33Z

Describe the bug

I have a small database of 7700 entries with embedding

the JSON file on disk takes 50mb
the Database Persist in JSON on disk takes 222Mb
the Database Persist in Binary on disk takes 262Mb

Restoring the database takes a lot of minutes JSON or Binary and takes 3gb of Memory.
I don't understand why it takes so much time where the idex should have been created

To Reproduce

persist(db, 'binary') a database of 7000 entries with embedding
restore('binary', config.payload)
It takes 10 minutes ? The CPU switch 100% - 0% that's weird and memory crash

If I create a DataBase then insertMultiple() the raw JSON+Embedding it takes < 1 minute and 1gb of memory

Expected behavior

Loading the Binary data should be straight forward. Even if the memory allocation takes time it shouldn't be that much.

Environment Info

- Linux
- Node-RED v4
- Orama 3.1.6

Affected areas

Initialization

Additional context

No response

micheleriva · 2025-05-13T07:38:48Z

Hi @JpEncausse thanks for opening this. What embedding model are you using? And how many dimensions does it have?

JpEncausse · 2025-05-13T09:37:34Z

Hello, the Embeding Model is OpenAI Large 3 with 640 dimensions.
My work around is to simply rebuild the db and fill it from a JSON file aside. Very fast.
But I assume if the DB evolve I'd have to maintains the JS file.

micheleriva · 2025-05-13T09:45:56Z

Ok so 640 * 4 byte per dimension (f32) * 7700 = 19,712MB of embeddings. So embeddings are not likely the problem. I'm wondering if using SeqProto (https://github.com/oramasearch/seqproto) would speed this up... it certainly compresses it way more than JSON would do. Let me talk with @allevo and see if we can use it

JpEncausse · 2025-05-13T13:06:24Z

Yes it's not that big, so I was surprised it was very slow and heavier than simply loading the JSON :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Persist / Restore of large file takes a lot of time #932

Persist / Restore of large file takes a lot of time #932

JpEncausse commented May 5, 2025

micheleriva commented May 13, 2025

Uh oh!

JpEncausse commented May 13, 2025

Uh oh!

micheleriva commented May 13, 2025

Uh oh!

JpEncausse commented May 13, 2025

Uh oh!

Uh oh!

Persist / Restore of large file takes a lot of time #932

Persist / Restore of large file takes a lot of time #932

Comments

JpEncausse commented May 5, 2025

Describe the bug

To Reproduce

Expected behavior

Environment Info

Affected areas

Additional context

micheleriva commented May 13, 2025

Uh oh!

JpEncausse commented May 13, 2025

Uh oh!

micheleriva commented May 13, 2025

Uh oh!

JpEncausse commented May 13, 2025

Uh oh!