Non-English texts still detected after fine-tuning text detection model #1906
Replies: 4 comments 22 replies
-
Hi @Yuvaraj-off 👋, In general I don't think this will work this way - our dataset for the detection pre-training does also contain only latin text but the models learns really well to generalize for mostly any kind of text - keep in mind that's all CNN-based architectures so there is no "textual understanding" - It could maybe work by providing negative samples where different text is on the image but only the english annotated - but no guarante Have you fine tuned our model or trained from scratch ? Best, |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick response and insights, @felixdittrich92! We've trained our model from scratch and are now focusing on using a our own dataset annotated exclusively for English text. We'll update you on the results once we test this approach. Additionally, we're exploring the possibility of using a YOLO model for text detection. Would it be feasible to integrate a YOLO-based text detection model with our existing docTR pipeline? We'd love to hear your insights on this! |
Beta Was this translation helpful? Give feedback.
-
Hi @felixdittrich92, We've been working in parallel on retraining the db_mobilenet_v3_large model using our updated dataset, where we re-annotated only the English texts. The initial results on straight (non-rotated) images have been promising. However, when we tested the model on rotated images, the performance dropped significantly — it began detecting all texts regardless of language, and even included a lot of noise/junk. To address this, we'd like to retrain the model using rotated versions of our dataset as well. Could you let us know if there are recommended approaches or existing options for augmenting our dataset with rotated images (while keeping the annotations aligned correctly)? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hi @felixdittrich92, Thank you for your detailed guidance on retraining the text detection model — it was incredibly helpful! I'm now planning to move forward with fine-tuning the recognition model (CRNN). Before I begin, I had a couple of quick questions: Training Dataset: Could you please share which datasets were originally used to train the CRNN model included in docTR? Image Augmentations: For fine-tuning, I’d like to customize certain image transformations or augmentations. Is there built-in support in the training scripts or documentation that allows for easy modification of these pre-processing steps? Appreciate your help in advance! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We fine-tuned db_mobilenet_v3_large using the pdfa-eng-wds dataset to detect only English text, as suggested in issue #1564. Though, we trained with 10 epoch and our accuracy results isnt the best for that, we expected the detections to be limited to english texts, but it is detected non english and other junks as texts. Are we missing something here.
Steps Taken
• Used db_mobilenet_v3_large as the base model and trained it on pdfa-eng-wds.
• The training results are below:
Beta Was this translation helpful? Give feedback.
All reactions