Skip to content

db: prefer compacting into L6 tables with many tombstones #4575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbowens opened this issue Apr 17, 2025 · 0 comments
Open

db: prefer compacting into L6 tables with many tombstones #4575

jbowens opened this issue Apr 17, 2025 · 0 comments

Comments

@jbowens
Copy link
Collaborator

jbowens commented Apr 17, 2025

When picking a file to compact into L6, we consider the L6 file's RangeDeletionBytesEstimate, prioritizing compactions that will compact L6 files with a high estimate:

pebble/compaction_picker.go

Lines 1096 to 1107 in 431a23b

// For files in the bottommost level of the LSM, the
// Stats.RangeDeletionsBytesEstimate field is set to the estimate
// of bytes /within/ the file itself that may be dropped by
// recompacting the file. These bytes from obsolete keys would not
// need to be rewritten if we compacted `f` into `outputFile`, so
// they don't contribute to write amplification. Subtracting them
// out of the overlapping bytes helps prioritize these compactions
// that are cheaper than their file sizes suggest.
if outputLevel == numLevels-1 && outputFile.LargestSeqNum < earliestSnapshotSeqNum {
overlappingBytes -= outputFile.Stats.RangeDeletionsBytesEstimate
}

We do this because when we populate RangeDeletionBytesEstimate, we'll estimate the size within the file itself to capture the fact that the range deletions delete data within the file itself (and presumably still exist due to LSM snapshots that were open when the file was created):

pebble/table_stats.go

Lines 422 to 424 in 431a23b

if level == numLevels-1 && meta.SmallestSeqNum < maxRangeDeleteSeqNum {
size, err := r.EstimateDiskUsage(start, end)
if err != nil {

The compaction picking logic does not take into account PointDeletionBytesEstimate. The PointDeletionBytesEstimate doesn't have any special logic accounting for L6 sstables. My reading of the code is that it already includes the size of the tombstones themselves in the estimate, so it's already optimistic in its calculation.

We should update the compaction picking logic to also consider the PointDeletionBytesEstimate when computing the min-overlapping ratio of a compaction.

Jira issue: PEBBLE-411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant