Opik

English | 简体中文 | 日本語 | 한국어

Opik

Open-source LLM evaluation platform

Opik helps you build, evaluate, and optimize LLM systems that run better, faster, and cheaper. From RAG chatbots to code assistants to complex agentic pipelines, Opik provides comprehensive tracing, evaluations, dashboards, and powerful features like Opik Agent Optimizer and Opik Guardrails to improve and secure your LLM powered applications in production.

Website • Slack Community • Twitter • Changelog • Documentation

🚀 What is Opik? • 🛠️ Opik Server Installation • 💻 Opik Client SDK • 📝 Logging Traces
🧑‍⚖️ LLM as a Judge • 🔍 Evaluating your Application • ⭐ Star Us • 🤝 Contributing

🚀 What is Opik?

Opik (built by Comet) is an open-source platform designed to streamline the entire lifecycle of LLM applications. It empowers developers to evaluate, test, monitor, and optimize their models and agentic systems. Key offerings include:

Comprehensive Observability: Deep tracing of LLM calls, conversation logging, and agent activity.
Advanced Evaluation: Robust prompt evaluation, LLM-as-a-judge, and experiment management.
Production-Ready: Scalable monitoring dashboards and online evaluation rules for production.
Opik Agent Optimizer: Dedicated SDK and set of optimizers to enhance prompts and agents.
Opik Guardrails: Features to help you implement safe and responsible AI practices.

Key capabilities include:

Development & Tracing:
- Track all LLM calls and traces with detailed context during development and in production (Quickstart).
- Extensive 3rd-party integrations for easy observability: Seamlessly integrate with a growing list of frameworks, supporting many of the largest and most popular ones natively (including recent additions like Google ADK, Autogen, and Flowise AI). (Integrations)
- Annotate traces and spans with feedback scores via the Python SDK or the UI.
- Experiment with prompts and models in the Prompt Playground.
Evaluation & Testing:
- Automate your LLM application evaluation with Datasets and Experiments.
- Leverage powerful LLM-as-a-judge metrics for complex tasks like hallucination detection, moderation, and RAG assessment (Answer Relevance, Context Precision).
- Integrate evaluations into your CI/CD pipeline with our PyTest integration.
Production Monitoring & Optimization:
- Log high volumes of production traces: Opik is designed for scale (40M+ traces/day).
- Monitor feedback scores, trace counts, and token usage over time in the Opik Dashboard.
- Utilize Online Evaluation Rules with LLM-as-a-Judge metrics to identify production issues.
- Leverage Opik Agent Optimizer and Opik Guardrails to continuously improve and secure your LLM applications in production.

Tip

If you are looking for features that Opik doesn't have today, please raise a new Feature request 🚀

🛠️ Opik Server Installation

Get your Opik server running in minutes. Choose the option that best suits your needs:

Option 1: Comet.com Cloud (Easiest & Recommended)

Access Opik instantly without any setup. Ideal for quick starts and hassle-free maintenance.

👉 Create your free Comet account

Option 2: Self-Host Opik for Full Control

Deploy Opik in your own environment. Choose between Docker for local setups or Kubernetes for scalability.

Self-Hosting with Docker Compose (for Local Development & Testing)

This is the simplest way to get a local Opik instance running. Note the new .opik.sh installation script:

On Linux or Mac Enviroment:

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository
cd opik

# Start the Opik platform
./opik.sh

On Windows Enviroment:

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository
cd opik

# Start the Opik platform
powershell -ExecutionPolicy ByPass -c ".\\opik.ps1"

Use the --help or --info options to troubleshoot issues. Dockerfiles now ensure containers run as non-root users for enhanced security. Once all is up and running, you can now visit localhost:5173 on your browser! For detailed instructions, see the Local Deployment Guide.

Self-Hosting with Kubernetes & Helm (for Scalable Deployments)

For production or larger-scale self-hosted deployments, Opik can be installed on a Kubernetes cluster using our Helm chart. Click the badge for the full Kubernetes Installation Guide using Helm.

Important

Version 1.7.0 Changes: Please check the changelog for important updates and breaking changes.

💻 Opik Client SDK

Opik provides a suite of client libraries and a REST API to interact with the Opik server. This includes SDKs for Python, TypeScript, and Ruby (via OpenTelemetry), allowing for seamless integration into your workflows. For detailed API and SDK references, see the Opik Client Reference Documentation.

Python SDK Quick Start

To get started with the Python SDK:

Install the package:

# install using pip
pip install opik

# or install with uv
uv pip install opik

Configure the python SDK by running the opik configure command, which will prompt you for your Opik server address (for self-hosted instances) or your API key and workspace (for Comet.com):

opik configure

Tip

You can also call opik.configure(use_local=True) from your Python code to configure the SDK to run on a local self-hosted installation, or provide API key and workspace details directly for Comet.com. Refer to the Python SDK documentation for more configuration options.

You are now ready to start logging traces using the Python SDK.

📝 Logging Traces with Integrations

The easiest way to log traces is to use one of our direct integrations. Opik supports a wide array of frameworks, including recent additions like Google ADK, Autogen, and Flowise AI:

Integration	Description	Documentation	Try in Colab
AG2	Log traces for AG2 LLM calls	Documentation	(Coming Soon)
aisuite	Log traces for aisuite LLM calls	Documentation
Anthropic	Log traces for Anthropic LLM calls	Documentation
Autogen	Log traces for Autogen agentic workflows	Documentation	(Coming Soon)
Bedrock	Log traces for Amazon Bedrock LLM calls	Documentation
CrewAI	Log traces for CrewAI calls	Documentation
DeepSeek	Log traces for DeepSeek LLM calls	Documentation	(Coming Soon)
Dify	Log traces for Dify agent runs	Documentation	(Coming Soon)
DSPy	Log traces for DSPy runs	Documentation
Flowise AI	Log traces for Flowise AI visual LLM builder	Documentation	(Native UI intergration, see documentation)
Gemini	Log traces for Google Gemini LLM calls	Documentation
Google ADK	Log traces for Google Agent Development Kit (ADK)	Documentation	(Coming Soon)
Groq	Log traces for Groq LLM calls	Documentation
Guardrails	Log traces for Guardrails AI validations	Documentation
Haystack	Log traces for Haystack calls	Documentation
Instructor	Log traces for LLM calls made with Instructor	Documentation
LangChain	Log traces for LangChain LLM calls	Documentation
LangChain JS	Log traces for LangChain JS LLM calls	Documentation	(Coming Soon)
LangGraph	Log traces for LangGraph executions	Documentation
LiteLLM	Log traces for LiteLLM model calls	Documentation
LlamaIndex	Log traces for LlamaIndex LLM calls	Documentation
Ollama	Log traces for Ollama LLM calls	Documentation
OpenAI	Log traces for OpenAI LLM calls	Documentation
OpenAI Agents	Log traces for OpenAI Agents SDK calls	Documentation	(Coming Soon)
OpenRouter	Log traces for OpenRouter LLM calls	Documentation	(Coming Soon)
OpenTelemetry	Log traces for OpenTelemetry supported calls	Documentation	(Coming Soon)
Predibase	Log traces for Predibase LLM calls	Documentation
Pydantic AI	Log traces for PydanticAI agent calls	Documentation
Ragas	Log traces for Ragas evaluations	Documentation
Smolagents	Log traces for Smolagents agents	Documentation
Strands Agents	Log traces for Strands agents calls	Documentation	(Coming Soon)
Vercel AI	Log traces for Vercel AI SDK calls	Documentation	(Coming Soon)
watsonx	Log traces for IBM watsonx LLM calls	Documentation

Tip

If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.

If you are not using any of the frameworks above, you can also use the track function decorator to log traces:

import opik

opik.configure(use_local=True) # Run locally

@opik.track
def my_llm_function(user_question: str) -> str:
    # Your LLM code here

    return "Hello"

Tip

The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.

🧑‍⚖️ LLM as a Judge metrics

The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.

To use them, simply import the relevant metric and use the score function:

from opik.evaluation.metrics import Hallucination

metric = Hallucination()
score = metric.score(
    input="What is the capital of France?",
    output="Paris",
    context=["France is a country in Europe."]
)
print(score)

Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.

🔍 Evaluating your LLM Application

Opik allows you to evaluate your LLM application during development through Datasets and Experiments. The Opik Dashboard offers enhanced charts for experiments and better handling of large traces. You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.

⭐ Star Us on GitHub

If you find Opik useful, please consider giving us a star! Your support helps us grow our community and continue improving the product.

🤝 Contributing

There are many ways to contribute to Opik:

Submit bug reports and feature requests
Review the documentation and submit Pull Requests to improve it
Speaking or writing about Opik and letting us know
Upvoting popular feature requests to show your support

To learn more about how to contribute to Opik, please see our contributing guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 1,983 Commits
.github		.github
.hooks		.hooks
apps		apps
deployment		deployment
scripts		scripts
sdks		sdks
tests_end_to_end		tests_end_to_end
tests_load		tests_load
.gitattributes		.gitattributes
.gitignore		.gitignore
.java-version		.java-version
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
hooks-install.sh		hooks-install.sh
hooks-remove.sh		hooks-remove.sh
opik.ps1		opik.ps1
opik.sh		opik.sh
readme-thumbnail-new.png		readme-thumbnail-new.png
readme_CN.md		readme_CN.md
readme_JP.md		readme_JP.md
readme_KO.md		readme_KO.md
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Opik

Open-source LLM evaluation platform

🚀 What is Opik?

🛠️ Opik Server Installation

Option 1: Comet.com Cloud (Easiest & Recommended)

Option 2: Self-Host Opik for Full Control

Self-Hosting with Docker Compose (for Local Development & Testing)

Self-Hosting with Kubernetes & Helm (for Scalable Deployments)

💻 Opik Client SDK

Python SDK Quick Start

📝 Logging Traces with Integrations

🧑‍⚖️ LLM as a Judge metrics

🔍 Evaluating your LLM Application

⭐ Star Us on GitHub

🤝 Contributing

About

Uh oh!

Releases 148

Packages

Uh oh!

Uh oh!

Contributors 58

Languages

License

comet-ml/opik

Folders and files

Latest commit

History

Repository files navigation

Opik

Open-source LLM evaluation platform

🚀 What is Opik?

🛠️ Opik Server Installation

Option 1: Comet.com Cloud (Easiest & Recommended)

Option 2: Self-Host Opik for Full Control

Self-Hosting with Docker Compose (for Local Development & Testing)

Self-Hosting with Kubernetes & Helm (for Scalable Deployments)

💻 Opik Client SDK

Python SDK Quick Start

📝 Logging Traces with Integrations

🧑‍⚖️ LLM as a Judge metrics

🔍 Evaluating your LLM Application

⭐ Star Us on GitHub

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 148

Packages 0

Uh oh!

Uh oh!

Contributors 58

Languages

Packages