Skip to content

add a compression analyzer facility #4715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RaduBerinde opened this issue May 12, 2025 · 1 comment
Open

add a compression analyzer facility #4715

RaduBerinde opened this issue May 12, 2025 · 1 comment

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented May 12, 2025

This issue tracks adding a facility for analyzing data in real clusters. The goal is to get a good comparison between various compression algorithms and levels and use it to inform our suggested defaults or to add new adaptive compression algorithms.

We have two ways of doing this:

  • online: we can sample blocks as they are written to or read from disk. For each sampled blocks, we run all experiments and retain statistics. This approach has the advantage of allowing us to accurately estimate CPU usage differences between algorithms within a specific workload. The disadvantage is that we can only produce data on clusters with versions that include this facility.
  • "offline": we can add a CLI tool that looks at all relevant files from a store and samples blocks separately from any running process. This is easier to implement and provides a quicker way to obtain data, as a newer binary can be used just for this tool.

Jira issue: PEBBLE-442

Epic CRDB-49140

@RaduBerinde
Copy link
Member Author

The plan is to start with the "offline" variant and re-evaluate whether we also want the "online" variant later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant