"Clear cache and re-download language data files" not working #12312

ericjacolin · 2025-05-18T01:20:34Z

Operating system

Linux

Joplin version

3.3.12

Desktop version info

Joplin 3.3.12 (prod, linux)

Device: linux, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Client ID: c9d562b412294cb3a9cfa76b0c20e0ed
Sync Version: 3
Profile Version: 47
Keychain Supported: No
Alternative instance ID: -

Revision: 4d790b6

Backup: 1.4.3
Csv Import: 1.0.1
Freehand Drawing: 3.0.1
Outline: 1.5.13
Search & Replace: 2.2.0

Current behaviour

I activated OCR for the first time, the scan worked (about 600 images) but only in English
I have a lot of Chinese content so I added Tesseract chi_sim language file (as well as French) with the procedure described in https://joplinapp.org/help/apps/ocr/
I tried to "Clear cache and re-download language data files", but this had no effect
The cached language files (English) are still there

Expected behaviour

I would expect the cache to clear and rebuild with the new set of of languages

Logs

No response

The text was updated successfully, but these errors were encountered:

personalizedrefrigerator · 2025-05-19T14:31:56Z

Thank you for reporting this!

For comparison, "Clear cache and re-download language data files" seems to work for me (on Fedora 42 Linux, with Joplin 3.3.12 (dev)):

Shows a confirmation dialog.
After pressing OK, restarts Joplin.

Logs the following:

OcrDriverTesseract: Clearing cached language data...
OcrDriverTesseract: Clearing language data with key ./eng.traineddata

At this point, inspecting the keyval-store indexedDB table (using the "Application" tab in the development tools) reveals that the ./eng.traineddata cached model is no longer downloaded.

Follow-up questions:

After clicking "Clear cache and re-download language data files", is "OcrDriverTesseract: Clearing language data with key ./eng.traineddata" added to Joplin's log file?
- Logs can either be accessed from Joplin's development tools (Help > Toggle developer tools, includes recent logs) or from the log.txt file in the profile directory (Help > Open profile directory, includes older logs).
- If there is an error, it should log "OCR: Failed to clear language data cache." followed by the error message.
Are new images (not scanned previously) processed using the new OCR models?

Note

"Clear cache and re-download language data files" deletes the local eng.traineddata (and other) models, but does not re-OCR existing attachments.

personalizedrefrigerator · 2025-05-19T16:16:23Z

Proposed changes:

(Maybe) Get rid of "Clear cache and re-download language data files". Instead, automatically remove the models for non-active languages (e.g. after a week or two).
- We could also base this on available disk space.
Add a new option, "Re-OCR all attachments", maybe with a prompt. This would be shown when the user changes the OCR URL.

ericjacolin · 2025-05-20T07:39:28Z

Thanks @personalizedrefrigerator this is very helpful.
Inspecting IndexedDB, I see that after clearing languages the only language that Joplin will reload from local traineddata folder is eng. Will not load chi_sim or fra

personalizedrefrigerator · 2025-05-20T18:50:24Z

Inspecting IndexedDB, I see that after clearing languages the only language that Joplin will reload from local traineddata folder is eng. Will not load chi_sim or fra

Thank you for the additional information!

Is the UI language set to English? If so, hanging Joplin's UI language in settings > general might help. Currently, Joplin uses the global locale setting to select the OCR language:

joplin/packages/lib/services/ocr/OcrService.ts

Line 154 in 88c95cc

const language = toIso639Alpha3(Setting.value('locale'));

As a result, if the UI language is set to English, the OCR service will try to use an eng language model.

ericjacolin · 2025-05-21T09:37:08Z

Thanks. I may have misunderstood the purpose of the feature.
My locale is English indeed and I was expecting to be able to scan documents in both English and simplified Chinese, using tesseract -l eng+chi_sim behind the scenes

ericjacolin added the bug It's a bug label May 18, 2025

laurent22 added desktop All desktop platforms high High priority issues labels May 19, 2025

laurent22 assigned personalizedrefrigerator May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"Clear cache and re-download language data files" not working #12312

"Clear cache and re-download language data files" not working #12312

ericjacolin commented May 18, 2025

personalizedrefrigerator commented May 19, 2025

Uh oh!

personalizedrefrigerator commented May 19, 2025 •

edited

Loading

Uh oh!

ericjacolin commented May 20, 2025

Uh oh!

personalizedrefrigerator commented May 20, 2025

Uh oh!

ericjacolin commented May 21, 2025

Uh oh!

Uh oh!

"Clear cache and re-download language data files" not working #12312

"Clear cache and re-download language data files" not working #12312

Comments

ericjacolin commented May 18, 2025

Operating system

Joplin version

Desktop version info

Current behaviour

Expected behaviour

Logs

personalizedrefrigerator commented May 19, 2025

Uh oh!

personalizedrefrigerator commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericjacolin commented May 20, 2025

Uh oh!

personalizedrefrigerator commented May 20, 2025

Uh oh!

ericjacolin commented May 21, 2025

Uh oh!

personalizedrefrigerator commented May 19, 2025 •

edited

Loading