Memory Leak in docTR API Integration with FastAPI #1918
Replies: 15 comments
-
Hi @volkanncicek 👋, Thanks for reporting I will have a look 👍 Btw.
You should put the Additional our provided API code is only a reference it's nothing you should use in a production system as is ! Especially the |
Beta Was this translation helpful? Give feedback.
-
I wasn't able to see any leak here:
to reproduce:
then run: import docker
import requests
import time
import matplotlib.pyplot as plt
from tqdm import tqdm
API_URL = "http://localhost:8080/ocr"
headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
with open('/home/felix/Desktop/20250301_123152657.jpg', 'rb') as f:
file_content = f.read()
files = [("files", ("20250301_123152657.jpg", file_content, "image/jpeg"))]
def get_docker_memory(container_name="api_web"):
"""Fetch live memory usage of a Docker container."""
client = docker.DockerClient(base_url='unix://var/run/docker.sock')
container = client.containers.get(container_name)
stats = container.stats(stream=False)
mem_usage = stats["memory_stats"]["usage"] / (1024 * 1024) # Convert to MB
return round(mem_usage, 2)
def send_requests(n_requests=50, container_name="api_web"):
"""Send multiple requests and monitor Docker container memory over time."""
session = requests.Session()
response_times = []
memory_usage = []
timestamps = []
initial_mem_usage = get_docker_memory(container_name)
memory_usage.append(initial_mem_usage)
timestamps.append(time.time())
print("🚀 Starting API Stress Test...\n")
for _ in tqdm(range(n_requests), desc="Sending Requests"):
mem_usage = get_docker_memory(container_name)
memory_usage.append(mem_usage)
timestamps.append(time.time())
start_time = time.time()
response = session.post(API_URL, headers=headers, params=params, files=files)
response_times.append(time.time() - start_time)
if response.status_code != 200:
print(f"Error: {response.status_code}, Response: {response.text}")
print("\n🕒 Waiting 10 seconds for memory stabilization...\n")
time.sleep(10)
final_mem_usage = get_docker_memory(container_name)
memory_usage.append(final_mem_usage)
timestamps.append(time.time())
print("\n📊 [ Docker Container Memory Usage ]")
print(f"🔹 Max Memory Used: {max(memory_usage):.2f} MB")
print(f"🔹 Avg Memory Used: {sum(memory_usage) / len(memory_usage):.2f} MB")
print(f"🔹 Initial Memory: {initial_mem_usage:.2f} MB")
print(f"🔹 Final Memory After 10s: {final_mem_usage:.2f} MB")
print(f"\n⏱️ Average Response Time: {sum(response_times) / len(response_times):.3f} sec")
print(f"🚀 Fastest Response Time: {min(response_times):.3f} sec")
print(f"🐢 Slowest Response Time: {max(response_times):.3f} sec")
plt.figure(figsize=(10, 5))
plt.plot(timestamps, memory_usage, marker='o', linestyle='-', color='b', label="Memory Usage (MB)")
plt.scatter([timestamps[0], timestamps[-1]], [memory_usage[0], memory_usage[-1]],
color='red', s=100, label="Start & End Points")
plt.xlabel("Time (s)")
plt.ylabel("Memory Usage (MB)")
plt.title(f"Memory Usage Over Time - {container_name}")
plt.legend()
plt.grid(True)
plt.show()
send_requests(n_requests=500, container_name="api_web") Besides I tracked the docker stats live:
|
Beta Was this translation helpful? Give feedback.
-
btw. for prod scenarios I would suggest to use: https://github.com/felixdittrich92/OnnxTR |
Beta Was this translation helpful? Give feedback.
-
Thanks, @felixdittrich92 ! I really appreciate the quick response. I'll take a look at ONNX. The initial memory usage starts at 381 MB, but it keeps increasing as requests are processed. Ideally, the memory should be released at some point, but it isn't. In a Kubernetes environment, this leads to the container hitting the memory limit and restarting. I don't have infinite memory. Did you observe a similar behavior on your side? The service starts with low memory usage, as expected. After 500 requests, the memory usage increases but does not return to its initial state. This suggests that allocated memory is not being released properly. After waiting, sent another 500 requests, and memory usage keeps increasing, showing that the application is accumulating memory instead of freeing it. Additionally, in our service, we load the model into RAM only once and use a context manager when opening images. Test Environment: |
Beta Was this translation helpful? Give feedback.
-
Hey @volkanncicek :) Yeah I see and no while profiling there was nothing similar (max run was 1000 requests) ..and yes I run on Linux (wouldn't be the first time that something is strange on Windows 😅 ) Yeah give it a try with OnnxTR would be happy to get a feedback 👍 |
Beta Was this translation helpful? Give feedback.
-
Hey @felixdittrich92, Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related. Looking forward to your thoughts! 👍 |
Beta Was this translation helpful? Give feedback.
-
:) |
Beta Was this translation helpful? Give feedback.
-
@volkanncicek Have you had the chance to test if the same happens with OnnxTR on your machine ? |
Beta Was this translation helpful? Give feedback.
-
Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly. |
Beta Was this translation helpful? Give feedback.
-
Mh the profiling was also by using only the CPU (i7-14770K in my case) ... any chance that you profile the code with memray and send me the created |
Beta Was this translation helpful? Give feedback.
-
Hey @felixdittrich92, We found a potential fix from this discussion: #1422. After applying the following environment variables, the memory leak issue seems to be resolved for now: export DOCTR_MULTIPROCESSING_DISABLE=TRUE
export ONEDNN_PRIMITIVE_CACHE_CAPACITY=1 We're still testing to make sure everything is stable. We'll keep you updated if anything changes. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Thanks for the update yeah now I remember this discussion 😅 |
Beta Was this translation helpful? Give feedback.
-
@volkanncicek Solved your issue ? :) |
Beta Was this translation helpful? Give feedback.
-
Hi @felixdittrich92, Yes, it's solved. For two weeks, I monitored the API, and the issue hasn't occurred again. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Nice 👍 Thanks for the feedback I will convert it to a discussion that other people who face the same issue can benefit from it :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Bug description
I have integrated docTR into an API using the FastAPI framework to test its performance under heavy load. The API is deployed locally and is designed to handle OCR requests. However, I observed a significant memory issue during testing. When sending 100 consecutive OCR requests, the memory usage spikes and remains high even after all requests have been processed. I expected the memory usage to decrease back to its initial state once the processing was complete, but it did not. This behavior suggests a potential memory leak, which could lead to the application crashing due to excessive memory consumption.
Furthermore, I noticed that the docTR library's own API, as provided in their official documentation, exhibits a similar memory leak issue. Running their API integration template under the same conditions also results in elevated memory usage that does not return to the baseline after completing the requests.
Below is a graph illustrating the high memory consumption using the official docTR Dockerfile. This graph demonstrates how the memory usage increases during the requests and fails to decrease afterward, indicating a possible memory leak.
Code snippet to reproduce the bug
Error traceback
There is no specific error traceback as the issue is related to memory usage rather than an explicit error. However, the application eventually crashes when the system runs out of memory.
Environment
The environment was set up using the official docTR Dockerfile.
Deep Learning backend
The backend setup is based on the official docTR Dockerfile configuration.
Beta Was this translation helpful? Give feedback.
All reactions