Memory Leak in docTR API Integration with FastAPI #1918

volkanncicek · 2025-03-06T21:00:51Z

volkanncicek
Mar 6, 2025

Bug description

I have integrated docTR into an API using the FastAPI framework to test its performance under heavy load. The API is deployed locally and is designed to handle OCR requests. However, I observed a significant memory issue during testing. When sending 100 consecutive OCR requests, the memory usage spikes and remains high even after all requests have been processed. I expected the memory usage to decrease back to its initial state once the processing was complete, but it did not. This behavior suggests a potential memory leak, which could lead to the application crashing due to excessive memory consumption.

Furthermore, I noticed that the docTR library's own API, as provided in their official documentation, exhibits a similar memory leak issue. Running their API integration template under the same conditions also results in elevated memory usage that does not return to the baseline after completing the requests.

Below is a graph illustrating the high memory consumption using the official docTR Dockerfile. This graph demonstrates how the memory usage increases during the requests and fails to decrease afterward, indicating a possible memory leak.

Code snippet to reproduce the bug

import requests

params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

for _ in range(100):
    response = requests.post("http://localhost:8002/ocr", params=params, files=files)
    print(response.json())

Error traceback

There is no specific error traceback as the issue is related to memory usage rather than an explicit error. However, the application eventually crashes when the system runs out of memory.

Environment

The environment was set up using the official docTR Dockerfile.

root@f229a0a64f16:/app# python collect_env.py
Collecting environment information...

DocTR version: 0.11.1a0
TensorFlow version: N/A
PyTorch version: 2.6.0+cu124 (torchvision 0.21.0+cu124)
OpenCV version: 4.11.0
OS: Debian GNU/Linux 12 (bookworm)
Python version: 3.10.16
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): No
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Deep Learning backend

The backend setup is based on the official docTR Dockerfile configuration.

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True

felixdittrich92 · 2025-03-07T06:07:29Z

felixdittrich92
Mar 7, 2025
Maintainer

Hi @volkanncicek 👋,

Thanks for reporting I will have a look 👍

Btw.

files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

You should put the open in a contextmanager with open(..) as f: <-- this could be already the reason for the leak if you process lots of files.

Additional our provided API code is only a reference it's nothing you should use in a production system as is !

Especially the ocr_predictor initialization should be in a lifespan and initialized once by keeping it into RAM instead of beeing dynamic to the request route (like in our template).

0 replies

felixdittrich92 · 2025-03-07T08:13:09Z

felixdittrich92
Mar 7, 2025
Maintainer

I wasn't able to see any leak here:

🚀 Starting API Stress Test...

Sending Requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [33:24<00:00,  4.01s/it]

🕒 Waiting 10 seconds for memory stabilization...


📊 [ Docker Container Memory Usage ]
🔹 Max Memory Used: 2040.51 MB
🔹 Avg Memory Used: 1849.99 MB
🔹 Initial Memory: 1857.31 MB
🔹 Final Memory After 10s: 1906.29 MB

⏱️ Average Response Time: 2.347 sec
🚀 Fastest Response Time: 2.217 sec
🐢 Slowest Response Time: 2.504 sec

to reproduce:

cd doctr/api
make run

then run:

import docker
import requests
import time
import matplotlib.pyplot as plt
from tqdm import tqdm

API_URL = "http://localhost:8080/ocr"

headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}

with open('/home/felix/Desktop/20250301_123152657.jpg', 'rb') as f:
    file_content = f.read()

files = [("files", ("20250301_123152657.jpg", file_content, "image/jpeg"))]

def get_docker_memory(container_name="api_web"):
    """Fetch live memory usage of a Docker container."""
    client = docker.DockerClient(base_url='unix://var/run/docker.sock')
    container = client.containers.get(container_name)
    stats = container.stats(stream=False)

    mem_usage = stats["memory_stats"]["usage"] / (1024 * 1024)  # Convert to MB
    return round(mem_usage, 2)

def send_requests(n_requests=50, container_name="api_web"):
    """Send multiple requests and monitor Docker container memory over time."""
    session = requests.Session()

    response_times = []
    memory_usage = []
    timestamps = []

    initial_mem_usage = get_docker_memory(container_name)
    memory_usage.append(initial_mem_usage)
    timestamps.append(time.time())

    print("🚀 Starting API Stress Test...\n")

    for _ in tqdm(range(n_requests), desc="Sending Requests"):
        mem_usage = get_docker_memory(container_name)
        memory_usage.append(mem_usage)
        timestamps.append(time.time())

        start_time = time.time()
        response = session.post(API_URL, headers=headers, params=params, files=files)
        response_times.append(time.time() - start_time)

        if response.status_code != 200:
            print(f"Error: {response.status_code}, Response: {response.text}")

    print("\n🕒 Waiting 10 seconds for memory stabilization...\n")
    time.sleep(10)

    final_mem_usage = get_docker_memory(container_name)
    memory_usage.append(final_mem_usage)
    timestamps.append(time.time())

    print("\n📊 [ Docker Container Memory Usage ]")
    print(f"🔹 Max Memory Used: {max(memory_usage):.2f} MB")
    print(f"🔹 Avg Memory Used: {sum(memory_usage) / len(memory_usage):.2f} MB")
    print(f"🔹 Initial Memory: {initial_mem_usage:.2f} MB")
    print(f"🔹 Final Memory After 10s: {final_mem_usage:.2f} MB")

    print(f"\n⏱️ Average Response Time: {sum(response_times) / len(response_times):.3f} sec")
    print(f"🚀 Fastest Response Time: {min(response_times):.3f} sec")
    print(f"🐢 Slowest Response Time: {max(response_times):.3f} sec")

    plt.figure(figsize=(10, 5))
    plt.plot(timestamps, memory_usage, marker='o', linestyle='-', color='b', label="Memory Usage (MB)")

    plt.scatter([timestamps[0], timestamps[-1]], [memory_usage[0], memory_usage[-1]],
                color='red', s=100, label="Start & End Points")

    plt.xlabel("Time (s)")
    plt.ylabel("Memory Usage (MB)")
    plt.title(f"Memory Usage Over Time - {container_name}")
    plt.legend()
    plt.grid(True)
    plt.show()

send_requests(n_requests=500, container_name="api_web")

Besides I tracked the docker stats live:

docker ps
docker stats $(docker ps -q --filter "publish=8080")

0 replies

felixdittrich92 · 2025-03-07T08:58:19Z

felixdittrich92
Mar 7, 2025
Maintainer

btw. for prod scenarios I would suggest to use: https://github.com/felixdittrich92/OnnxTR
Its more hardware optimized, requires less ressources and is especially on CPU much faster

0 replies

volkanncicek · 2025-03-07T12:32:21Z

volkanncicek
Mar 7, 2025
Author

Thanks, @felixdittrich92 ! I really appreciate the quick response.

I'll take a look at ONNX.

The initial memory usage starts at 381 MB, but it keeps increasing as requests are processed. Ideally, the memory should be released at some point, but it isn't. In a Kubernetes environment, this leads to the container hitting the memory limit and restarting. I don't have infinite memory. Did you observe a similar behavior on your side?

The service starts with low memory usage, as expected.

After 500 requests, the memory usage increases but does not return to its initial state. This suggests that allocated memory is not being released properly.

After waiting, sent another 500 requests, and memory usage keeps increasing, showing that the application is accumulating memory instead of freeing it.

Additionally, in our service, we load the model into RAM only once and use a context manager when opening images.

Test Environment:
I'm running Docker on Windows, which might be a factor.

0 replies

felixdittrich92 · 2025-03-07T13:19:23Z

felixdittrich92
Mar 7, 2025
Maintainer

Hey @volkanncicek :)

Yeah I see and no while profiling there was nothing similar (max run was 1000 requests) ..and yes I run on Linux (wouldn't be the first time that something is strange on Windows 😅 )

Yeah give it a try with OnnxTR would be happy to get a feedback 👍

0 replies

volkanncicek · 2025-03-12T08:39:10Z

volkanncicek
Mar 12, 2025
Author

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

0 replies

felixdittrich92 · 2025-03-12T08:53:21Z

felixdittrich92
Mar 12, 2025
Maintainer

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

:)

0 replies

felixdittrich92 · 2025-03-12T21:12:30Z

felixdittrich92
Mar 12, 2025
Maintainer

@volkanncicek Have you had the chance to test if the same happens with OnnxTR on your machine ?

0 replies

volkanncicek · 2025-03-13T02:59:30Z

volkanncicek
Mar 13, 2025
Author

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

0 replies

felixdittrich92 · 2025-03-13T09:39:17Z

felixdittrich92
Mar 13, 2025
Maintainer

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

Mh the profiling was also by using only the CPU (i7-14770K in my case) ... any chance that you profile the code with memray and send me the created .bin file that I can generate the flamegraph ?

0 replies

volkanncicek · 2025-03-17T08:49:31Z

volkanncicek
Mar 17, 2025
Author

Hey @felixdittrich92,

We found a potential fix from this discussion: #1422.

After applying the following environment variables, the memory leak issue seems to be resolved for now:

export DOCTR_MULTIPROCESSING_DISABLE=TRUE  
export ONEDNN_PRIMITIVE_CACHE_CAPACITY=1

We're still testing to make sure everything is stable. We'll keep you updated if anything changes.

Thanks again!

0 replies

felixdittrich92 · 2025-03-17T08:54:03Z

felixdittrich92
Mar 17, 2025
Maintainer

Hey @felixdittrich92,

We found a potential fix from this discussion: #1422.

After applying the following environment variables, the memory leak issue seems to be resolved for now:

export DOCTR_MULTIPROCESSING_DISABLE=TRUE
export ONEDNN_PRIMITIVE_CACHE_CAPACITY=1
We're still testing to make sure everything is stable. We'll keep you updated if anything changes.

Thanks again!

Thanks for the update yeah now I remember this discussion 😅

0 replies

felixdittrich92 · 2025-03-31T02:56:42Z

felixdittrich92
Mar 31, 2025
Maintainer

@volkanncicek Solved your issue ? :)

0 replies

volkanncicek · 2025-04-02T07:53:56Z

volkanncicek
Apr 2, 2025
Author

Hi @felixdittrich92,

Yes, it's solved. For two weeks, I monitored the API, and the issue hasn't occurred again. Thanks!

0 replies

felixdittrich92 · 2025-04-02T08:05:31Z

felixdittrich92
Apr 2, 2025
Maintainer

Hi @felixdittrich92,

Yes, it's solved. For two weeks, I monitored the API, and the issue hasn't occurred again. Thanks!

Nice 👍 Thanks for the feedback I will convert it to a discussion that other people who face the same issue can benefit from it :)

0 replies

Memory Leak in docTR API Integration with FastAPI #1918

Uh oh!

Uh oh!

volkanncicek Mar 6, 2025

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

Replies: 15 comments

Uh oh!

felixdittrich92 Mar 7, 2025 Maintainer

Uh oh!

Uh oh!

felixdittrich92 Mar 7, 2025 Maintainer

Uh oh!

felixdittrich92 Mar 7, 2025 Maintainer

Uh oh!

Uh oh!

volkanncicek Mar 7, 2025 Author

Uh oh!

Uh oh!

felixdittrich92 Mar 7, 2025 Maintainer

Uh oh!

volkanncicek Mar 12, 2025 Author

Uh oh!

felixdittrich92 Mar 12, 2025 Maintainer

Uh oh!

felixdittrich92 Mar 12, 2025 Maintainer

Uh oh!

volkanncicek Mar 13, 2025 Author

Uh oh!

felixdittrich92 Mar 13, 2025 Maintainer

Uh oh!

volkanncicek Mar 17, 2025 Author

Uh oh!

felixdittrich92 Mar 17, 2025 Maintainer

Uh oh!

felixdittrich92 Mar 31, 2025 Maintainer

Uh oh!

volkanncicek Apr 2, 2025 Author

Uh oh!

felixdittrich92 Apr 2, 2025 Maintainer

volkanncicek
Mar 6, 2025

felixdittrich92
Mar 7, 2025
Maintainer

felixdittrich92
Mar 7, 2025
Maintainer

felixdittrich92
Mar 7, 2025
Maintainer

volkanncicek
Mar 7, 2025
Author

felixdittrich92
Mar 7, 2025
Maintainer

volkanncicek
Mar 12, 2025
Author

felixdittrich92
Mar 12, 2025
Maintainer

felixdittrich92
Mar 12, 2025
Maintainer

volkanncicek
Mar 13, 2025
Author

felixdittrich92
Mar 13, 2025
Maintainer

volkanncicek
Mar 17, 2025
Author

felixdittrich92
Mar 17, 2025
Maintainer

felixdittrich92
Mar 31, 2025
Maintainer

volkanncicek
Apr 2, 2025
Author

felixdittrich92
Apr 2, 2025
Maintainer