Skip to content

compact doc #2402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 21, 2024
Merged

compact doc #2402

merged 26 commits into from
May 21, 2024

Conversation

PSeitz
Copy link
Collaborator

@PSeitz PSeitz commented May 17, 2024

Replaces TantivyDocument with a version that is more compact, by storing all data in two vecs. (similar to tape in simd_json)

This new Document has a lower memory footprint, but has a limitation:

  • The number of Field can't exceed u16::MAX

This PR also fixes an issue on the Value implementation on serde_json::Value, which did not handle date parsing on strings

@PSeitz PSeitz requested a review from fulmicoton May 17, 2024 08:32
#[derive(Clone, Copy, Default)]
#[repr(packed)]
/// The value type and the address to its payload in the container.
pub struct ValueAddr {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it does not have to be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll probably add an API later for concat fields, so we can just take the ValueAddrs and add them to a Field, instead of operating on the JSON values

@PSeitz PSeitz merged commit e1679f3 into main May 21, 2024
4 checks passed
@PSeitz PSeitz deleted the compact_doc branch May 21, 2024 08:16
philippemnoel pushed a commit to paradedb/tantivy that referenced this pull request Aug 31, 2024
* compact doc

* add any value type

* pass references when building CompactDoc

* remove OwnedValue from API

* clippy

* clippy

* fail on large documents

* fmt

* cleanup

* cleanup

* implement Value for different types

fix serde_json date Value implementation

* fmt

* cleanup

* fmt

* cleanup

* store positions instead of pos+len

* remove nodes array

* remove mediumvec

* cleanup

* infallible serialize into vec

* remove positions indirection

* remove 24MB limitation in document

use u32 for Addr
Remove the 3 byte addressing limitation and use VInt instead

* cleanup

* extend test

* cleanup, add comments

* rename, remove pub
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants