Skip to content

Performance: pre-filter document list in scheduled workflow checks #10031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

shamoon
Copy link
Member

@shamoon shamoon commented May 24, 2025

Proposed change

Currently, scheduled workflow checks will check every document that passes the date filter with document_matches_workflow. This works, but of course can be very slow, see the linked thread. Instead, we can try to 'pre-filter' the documents to reduce the number that we fully check. The only 'downside' of this is some redundancy, but theres also an advantage I think because this is just kind of a 'first swipe' but the later document_matches_workflow check will still run.

In particularly the filename check which later runs with fnmatch is simplified to a regex, worst case is no regex in sqlite and it would silently fail but the later check would still run.

Hopefully didnt miss anything here, welcome feedback of course.

See #10012 (8k docs, initially > 30 min --> ~10 seconds)

Type of change

  • Bug fix: non-breaking change which fixes an issue.
  • New feature / Enhancement: non-breaking change which adds functionality. Please read the important note above.
  • Breaking change: fix or feature that would cause existing functionality to not work as expected.
  • Documentation only.
  • Other. Please explain:

Checklist:

  • I have read & agree with the contributing guidelines.
  • If applicable, I have included testing coverage for new code in this PR, for backend and / or front-end changes.
  • If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
  • If applicable, I have checked that all tests pass, see documentation.
  • I have run all pre-commit hooks, see documentation.
  • I have made corresponding changes to the documentation as needed.
  • I have checked my modifications for any breaking changes.

@shamoon shamoon requested a review from a team as a code owner May 24, 2025 23:59
@github-actions github-actions bot added enhancement New feature or enhancement backend non-trivial Requires approval by several team members labels May 24, 2025
@shamoon shamoon changed the title Enhancement: prefilter scheduled workflows Enhancement: pre-filter document list in scheduled workflow checks May 24, 2025
Copy link

codecov bot commented May 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.80%. Comparing base (eb07876) to head (d67326f).
Report is 11 commits behind head on dev.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff            @@
##              dev   #10031    +/-   ##
========================================
  Coverage   97.80%   97.80%            
========================================
  Files         512      512            
  Lines       22005    22021    +16     
  Branches     1845     1720   -125     
========================================
+ Hits        21522    21538    +16     
- Misses        481      483     +2     
+ Partials        2        0     -2     
Components Coverage Δ
backend 96.66% <100.00%> (+<0.01%) ⬆️
frontend 99.07% <ø> (ø)
Flag Coverage Δ
backend-python-3.10 96.66% <100.00%> (+<0.01%) ⬆️
backend-python-3.11 96.66% <100.00%> (+<0.01%) ⬆️
backend-python-3.12 96.66% <100.00%> (+<0.01%) ⬆️
frontend-node-20.x 99.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/documents/matching.py 97.68% <100.00%> (+0.18%) ⬆️
src/documents/tasks.py 97.46% <100.00%> (+0.03%) ⬆️

... and 2 files with indirect coverage changes

@shamoon shamoon changed the title Enhancement: pre-filter document list in scheduled workflow checks Performance: pre-filter document list in scheduled workflow checks May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend enhancement New feature or enhancement non-trivial Requires approval by several team members
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant