Skip to content

Critical Privacy Issue: Bookwyrm Book Covers do not scrub/drop metadata #3522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Saijin-Naib opened this issue Mar 16, 2025 · 5 comments
Open
Labels
safety To do with privacy, user blocking, spoiler alerts etc

Comments

@Saijin-Naib
Copy link

Describe the bug
Cover images are not scrubbed of metadata, which can include PII (Personally Identifying Information), such as geolocation metadata, author/copyright, et al.

To Reproduce
Steps to reproduce the behavior:

  1. Take a picture of a book cover with OpenCamera with geolocation enabled, and author/copyright filled.
  2. Add image to Bookwyrm book listing
  3. Download image from Bookwyrm book listing, use exiv2 or exiftool to verify all metadata preserved.

Expected behavior
All metadata is stripped upon image submission.

Screenshots
N/A

Instance
Bookwyrm.social

Additional context
There should be a server-side task to periodically check for metadata in cover images and scrub it for already-uploaded images.


Desktop (please complete the following information):
- OS: Alpine Linux
- Browser: Firefox, Epiphany
- Version: 135.x and 48.x

Smartphone (please complete the following information):
- Device: TeraCube 2e
- OS: /e/OS 2.8
- Browser: Fennec
- Version: 136.0

@hughrun hughrun added the safety To do with privacy, user blocking, spoiler alerts etc label Mar 16, 2025
@timothyjrogers
Copy link
Contributor

I have a PR open to strip EXIF data off new cover images when they are uploaded. This change won't address already-existing images that were previously uploaded with EXIF data.

@Saijin-Naib
Copy link
Author

That sounds amazing for going forward, but can a task be made to run at least once to scrub all extant media?

@timothyjrogers
Copy link
Contributor

Yes, I agree that's a necessary second part to this. I haven't done enough homework yet to know the best way to implement it.

@hughrun
Copy link
Contributor

hughrun commented Mar 24, 2025

If it's once-off, probably a management command would be the simplest way to implement it, though I imagine it might be pretty memory-intensive so triggering a low-priority celery task per image might be the safest?

@Saijin-Naib
Copy link
Author

exiv2 is super light on resources, so it should not tax a system much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safety To do with privacy, user blocking, spoiler alerts etc
Projects
None yet
Development

No branches or pull requests

3 participants