Skip to content

Fixes bug that causes out-of-order sstable key. #2445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2024

Conversation

fulmicoton
Copy link
Collaborator

The previous way to address the problem was to replace \u{0000} with 0 in different places.

This logic had several flaws:
Done on the serializer side (like it was for the columnar), there was a collision problem.

If a document in the segment contained a json field with a \0 and antoher doc contained the same json field but 0 then we were sending the same field path twice to the serializer.

Another option would have been to normalizes all values on the writer side.

This PR simplifies the logic and simply ignore json path containing a \0, both in the columnar and the inverted index.

Closes #2442

@fulmicoton fulmicoton force-pushed the fulmicoton-null-byte-bug branch from f26e837 to 23cb99f Compare June 25, 2024 05:22
@fulmicoton fulmicoton requested a review from PSeitz June 25, 2024 05:22
@fulmicoton fulmicoton mentioned this pull request Jun 25, 2024
@fulmicoton fulmicoton force-pushed the fulmicoton-null-byte-bug branch from 23cb99f to 53815c8 Compare June 25, 2024 05:24
The previous way to address the problem was to replace \u{0000}
with 0 in different places.

This logic had several flaws:
Done on the serializer side (like it was for the columnar), there was
a collision problem.

If a document in the segment contained a json field with a \0 and
antoher doc contained the same json field but `0` then we were sending
the same field path twice to the serializer.

Another option would have been to normalizes all values on the writer
side.

This PR simplifies the logic and simply ignore json path containing a
\0, both in the columnar and the inverted index.

Closes #2442
@fulmicoton fulmicoton force-pushed the fulmicoton-null-byte-bug branch from 53815c8 to 24954d4 Compare June 25, 2024 05:37
@PSeitz PSeitz merged commit 0f4c2e2 into main Jul 1, 2024
4 checks passed
@PSeitz PSeitz deleted the fulmicoton-null-byte-bug branch July 1, 2024 07:40
philippemnoel pushed a commit to paradedb/tantivy that referenced this pull request Aug 31, 2024
The previous way to address the problem was to replace \u{0000}
with 0 in different places.

This logic had several flaws:
Done on the serializer side (like it was for the columnar), there was
a collision problem.

If a document in the segment contained a json field with a \0 and
antoher doc contained the same json field but `0` then we were sending
the same field path twice to the serializer.

Another option would have been to normalizes all values on the writer
side.

This PR simplifies the logic and simply ignore json path containing a
\0, both in the columnar and the inverted index.

Closes quickwit-oss#2442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

keys should be increasing panic
2 participants