Skip to content

[prototype] object det replacement / init contrib modules #1534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Apr 25, 2024

Conversation

felixdittrich92
Copy link
Contributor

@felixdittrich92 felixdittrich92 commented Mar 28, 2024

NOTE: Only a dirty first draft for discussions :)

The idea behind a contrib module:

  • easy to extend with further functionalites like: doctr++ | UVDoc / table Transformer / super resolution / etc. for example
  • 3 main requirements:
    -- modules needs to work with the DocumentBuilder output structure (List[np.ndarray])
    -- framework independent (no torch or tf allowed)
    -- if model loading is required --> onnxruntime

Further advantages on our side:

  • Less code to maintain
  • Full focus on OCR quality
  • Maybe also a cool new place for contributors without the need to step deep into doctr's core

A first service for object detection which replaces the current (not really maintained and robust) one 😅

  • trained with yolov8 (artefact detection dataset) --> training outsourced to ultralytics -> easy to replace with any yolov8 (no OBB atm) trained model (exported to onnx)

example:

import os

from doctr.contrib import ArtefactDetector
from doctr.io import DocumentFile

root = "/home/felix/Desktop/doctr_test_data"

doc = DocumentFile.from_images([os.path.join(root, "6.jpg"), os.path.join(root, "7.jpg"), "/home/felix/Desktop/5a89cd6d989803e2.jpg"])

detector = ArtefactDetector(batch_size=2, conf_threshold=0.9, iou_threshold=0.5)

res = detector(doc)
detector.show()
print(res)

out:

[[], [], [{'label': 'photo', 'confidence': 0.9760028, 'box': [665, 194, 767, 319]}, {'label': 'logo', 'confidence': 0.97400737, 'box': [158, 745, 275, 862]}, {'label': 'photo', 'confidence': 0.9690048, 'box': [13, 790, 131, 933]}, {'label': 'bar_code', 'confidence': 0.94711626, 'box': [647, 425, 795, 469]}, {'label': 'logo', 'confidence': 0.94670904, 'box': [314, 9, 372, 67]}, {'label': 'qr_code', 'confidence': 0.94380593, 'box': [368, 691, 417, 740]}, {'label': 'bar_code', 'confidence': 0.9333756, 'box': [291, 813, 339, 850]}, {'label': 'bar_code', 'confidence': 0.93306136, 'box': [391, 108, 683, 151]}]]

visualized:
Screenshot from 2024-03-28 11-39-11

TODOS would be:

  • refactor code (each contrib module should inerhit from base.py - BasePredictor (call, (_process _preprocess needs every contrib module to implement))
  • docs
  • tests

@felixdittrich92
Copy link
Contributor Author

@odulcy-mindee wdyt about the idea ?

@felixdittrich92 felixdittrich92 changed the title [prototype] object det replacement / init contrib modules [DRAFT][prototype] object det replacement / init contrib modules Mar 28, 2024
Copy link

codecov bot commented Apr 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.84%. Comparing base (f21ac32) to head (d746593).
Report is 2 commits behind head on main.

❗ Current head d746593 differs from pull request most recent head 4800e59. Consider uploading reports for the commit 4800e59 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1534       +/-   ##
===========================================
+ Coverage   57.70%   95.84%   +38.14%     
===========================================
  Files         167      163        -4     
  Lines        7696     7705        +9     
===========================================
+ Hits         4441     7385     +2944     
+ Misses       3255      320     -2935     
Flag Coverage Δ
unittests 95.84% <100.00%> (+38.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@felixdittrich92
Copy link
Contributor Author

About the Artefact-DocumentBuilder element i'm not sure should we keep it for maybe later usage ? (As a kind of "add to the OCR results") Or should we remove it (currently always empty and unused) ?
In general it makes more sense to encapsulate the object detection (and other possible pre-proc modules as mentioned) from the OCR pipeline to interact directly with the DocumentFile output.
As an example: Now the object detection pre step could be used for example to mask some detected parts of the images before it's passed to the OCR pipeline to ignore irrelevant fields like images which contains text.

@felixdittrich92 felixdittrich92 added topic: documentation Improvements or additions to documentation topic: ci Related to CI ext: tests Related to tests folder topic: onnx ONNX-related topic: object detection Related to the task of object detection type: new feature New feature ext: docs Related to docs folder labels Apr 12, 2024
@felixdittrich92 felixdittrich92 requested a review from frgfm April 13, 2024 14:57
@felixdittrich92 felixdittrich92 marked this pull request as ready for review April 15, 2024 17:53
@felixdittrich92 felixdittrich92 changed the title [DRAFT][prototype] object det replacement / init contrib modules [prototype] object det replacement / init contrib modules Apr 15, 2024
@felixdittrich92 felixdittrich92 marked this pull request as draft April 15, 2024 18:15
@odulcy-mindee
Copy link
Contributor

In general it makes more sense to encapsulate the object detection (and other possible pre-proc modules as mentioned) from the OCR pipeline to interact directly with the DocumentFile output.

So, purpose of contrib is to always wrap a model and doing some predictions ? Or can it be only some postprocessing function ?

@felixdittrich92
Copy link
Contributor Author

In general it makes more sense to encapsulate the object detection (and other possible pre-proc modules as mentioned) from the OCR pipeline to interact directly with the DocumentFile output.

So, purpose of contrib is to always wrap a model and doing some predictions ? Or can it be only some postprocessing function ?

The idea was to make it compatible with the DocumentFile output so in this case it's only preprocessing.
And yes with the actual state each contrib module would need a onnx model do you have something in mind where it would make sense to have something without a model ? (It's a prototype so open for every idea :D)

@felixdittrich92 felixdittrich92 marked this pull request as ready for review April 23, 2024 05:59
@odulcy-mindee
Copy link
Contributor

odulcy-mindee commented Apr 23, 2024

The idea was to make it compatible with the DocumentFile output so in this case it's only preprocessing.
And yes with the actual state each contrib module would need a onnx model do you have something in mind where it would make sense to have something without a model ? (It's a prototype so open for every idea :D)

Ok, for a first version, model is required. Code can be improved later to set onxxmodel as optional.

@@ -155,6 +158,7 @@ module = [
"anyascii.*",
"tensorflow.*",
"torchvision.*",
"onnxruntime.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not aligned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a github view bug see:
Screenshot from 2024-04-24 05-12-27

@felixdittrich92
Copy link
Contributor Author

The idea was to make it compatible with the DocumentFile output so in this case it's only preprocessing.
And yes with the actual state each contrib module would need a onnx model do you have something in mind where it would make sense to have something without a model ? (It's a prototype so open for every idea :D)

Ok, for a first version, model is required. Code can be improved later to set onxxmodel as optional.

Correct :)

Copy link
Contributor

@odulcy-mindee odulcy-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model uploaded

@odulcy-mindee odulcy-mindee merged commit 630d925 into mindee:main Apr 25, 2024
70 of 78 checks passed
@felixdittrich92 felixdittrich92 deleted the obj-prototype branch August 30, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: docs Related to docs folder ext: tests Related to tests folder topic: ci Related to CI topic: documentation Improvements or additions to documentation topic: object detection Related to the task of object detection topic: onnx ONNX-related type: new feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants