[Feature Request] Custom prompt and enhanced metadata for AgenticChunking #3402

yangxg · 2025-05-29T02:38:48Z

Problem Description

Hi authors, thanks for the amazing product!

I am probing into the AgenticChunking function of agno, and I like the design very much. The following is my test codes.

import os
import typer
from rich.prompt import Prompt
from typing import Optional
from pathlib import Path
import pandas as pd

from agno.agent import Agent
from agno.document.chunking.agentic import AgenticChunking
from agno.knowledge.markdown import MarkdownKnowledgeBase
from agno.vectordb.lancedb import LanceDb
from agno.models.openai import OpenAIChat

# --- Configuration ---

MD_FILE_PATH = "./Auto_extract_topics.md" # Path to your Markdown file
LANCEDB_URI = "./agno_lancedb_agentic" # Path for LanceDB data
TABLE_NAME = "auto_md_agentic_chunks"
# Ensure OpenAI API key is set, or configure for your LLM

llm_instance = OpenAIChat(
    id='gpt-4.1-mini'
)

# --- Load Knowledge Database--- 

# Setup LanceDB Vector Store
print(f"Initializing LanceDB at: {LANCEDB_URI}")
vector_db = LanceDb(
    table_name=TABLE_NAME,
    uri=LANCEDB_URI
)

# Setup Knowledge Base with Agentic Chunking
print(f"Setting up KnowledgeBase with AgenticChunking for file: {MD_FILE_PATH}")

agentic_chunker = AgenticChunking(
    model=llm_instance   
)

knowledge_base = MarkdownKnowledgeBase(
    path=Path('Auto_extract_topics.md'), # Use a list of local file paths
    vector_db=vector_db,
    chunking_strategy=agentic_chunker,
)

# Load the knowledge base
knowledge_base.load(recreate=True)

# --- View the populated KB---

import lancedb

DB_PATH = "./agno_lancedb_agentic"
TABLE_NAME = "auto_md_agentic_chunks"

# Connect to LanceDB
db = lancedb.connect(DB_PATH)

# Open the table
table = db.open_table(TABLE_NAME)

# Transfer into pandas dataframe
data = table.to_pandas()

# Drop vector column for illustration
data = `data.drop(columns='vector')``

The codes work well, and gives a decent chunking result. For example, below is the first chunk ( begning sections of a journal paper. I cut some of the content for better reading)

print(data['payload'][0])

{"name": "Auto_extract_topics", 
 "meta_data": {"chunk": 1, "chunk_size": 1921}, 
 "content": "RESEARCH Open Access
           Automatic extraction of informal topics from online suicidal ideation   Reilly N. Grant1, David Kucher2, Ana M. Le\u00f3n3, Jonathan F. Gemmell4*, Daniela S. Raicu4 and Samah J. Fodeh5
           From The 11th International Workshop on Data and Text Mining in Biomedical Informatics Singapore, Singapore. 10 November 2017     
            Abstract\n\nBackground: Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. .......\n\nConclusions: These informal topics topics can be more... ... and precision of language.\n\nKeywords: Suicidal ideation, Word2Vec, Text mining", 
"usage": {"prompt_tokens": 363, "total_tokens": 363}}

I got two more requirements which might better improve the knowledge base, and hopefully it could be possible.

Custom the prompt in the AgenticChunking so that I can handling the chunking meet the user's personalized need. For example, seperating the author information and the abstract text in the example above.
Incorporatting more meta_data items basing on the chunked piecies. For example, the LLM should identifiy the section type (with tailored custom prompts), so it would be ideal to incoporate it into meta_data.

My ideal chunking looks like below:

print(data['payload'][0])

{"name": "Auto_extract_topics", 
 "meta_data": {"chunk": 1, "chunk_size": 1921, "chunk_type": "general info"}, 
 "content": "RESEARCH Open Access Automatic extraction of informal topics from online suicidal ideation   Reilly N. Grant1, David Kucher2, Ana M. Le\u00f3n3, Jonathan F. Gemmell4*, Daniela S. Raicu4 and Samah J. Fodeh5\n\nFrom The 11th International Workshop on Data and Text Mining in Biomedical Informatics Singapore, Singapore. 10 November 2017 ", 
"usage": {"prompt_tokens": xx, "total_tokens": xx}}

print(data['payload'][1])

{"name": "Auto_extract_topics", 
 "meta_data": {"chunk": 1, "chunk_size": 1921, "chunk_type": "Abstract"}, 
 "content": "Abstract\n\nBackground: Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. .......\n\nConclusions: These informal topics topics can be more... ... and precision of language.\n\nKeywords: Suicidal ideation, Word2Vec, Text mining", 
"usage": {"prompt_tokens": xx, "total_tokens": xx}}

I thought it could provide more robust knowledgebase for downstream application, and the current AgenticChunking gives quiet a good starting point. But I don't kown if it is possible to directly achieved in AgenticChunking, or by other tweaking method?

Many thanks !

btw, I attached the md file in my example above. It is a journal article.
Auto_extract_topics.md

Proposed Solution

Not yet ideal solution.

Alternatives Considered

No response

Additional Context

No response

Would you like to work on this?

Yes, I’d love to work on it!
I’m open to collaborating but need guidance.
No, I’m just sharing the idea.

The text was updated successfully, but these errors were encountered:

linear · 2025-05-29T02:38:51Z

SUPPORT-335 [Feature Request] Custom prompt and enhanced metadata for AgenticChunking

yangxg added the enhancement New feature or request label May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Custom prompt and enhanced metadata for AgenticChunking #3402

[Feature Request] Custom prompt and enhanced metadata for AgenticChunking #3402

yangxg commented May 29, 2025

linear bot commented May 29, 2025

Uh oh!

[Feature Request] Custom prompt and enhanced metadata for AgenticChunking #3402

[Feature Request] Custom prompt and enhanced metadata for AgenticChunking #3402

Comments

yangxg commented May 29, 2025

Problem Description

Proposed Solution

Alternatives Considered

Additional Context

Would you like to work on this?

linear bot commented May 29, 2025

Uh oh!