Skip to content

Commit 30486ac

Browse files
devin-ai-integration[bot]Joe Moura
and
Joe Moura
committed
Add documentation and implementation for custom pgvector knowledge storage (#2883)
Co-Authored-By: Joe Moura <[email protected]>
1 parent e59627a commit 30486ac

File tree

4 files changed

+497
-0
lines changed

4 files changed

+497
-0
lines changed

docs/concepts/knowledge.mdx

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -736,6 +736,214 @@ recent_news = SpaceNewsKnowledgeSource(
736736
)
737737
```
738738

739+
## Custom Knowledge Storage with pgvector
740+
741+
CrewAI allows you to use custom knowledge storage backends to store and retrieve knowledge. One powerful option is using PostgreSQL with the pgvector extension, which provides efficient vector similarity search capabilities.
742+
743+
### Prerequisites
744+
745+
Before using pgvector as your knowledge storage backend, you need to:
746+
747+
1. Set up a PostgreSQL database with the pgvector extension installed
748+
2. Install the required Python packages
749+
750+
#### PostgreSQL Setup
751+
752+
```bash
753+
# Install PostgreSQL (Ubuntu example)
754+
sudo apt update
755+
sudo apt install postgresql postgresql-contrib
756+
757+
# Connect to PostgreSQL
758+
sudo -u postgres psql
759+
760+
# Create a database
761+
CREATE DATABASE crewai_knowledge;
762+
763+
# Connect to the database
764+
\c crewai_knowledge
765+
766+
# Install the pgvector extension
767+
CREATE EXTENSION vector;
768+
769+
# Create a user (optional)
770+
CREATE USER crewai WITH PASSWORD 'your_password';
771+
GRANT ALL PRIVILEGES ON DATABASE crewai_knowledge TO crewai;
772+
```
773+
774+
#### Python Dependencies
775+
776+
Add these dependencies to your project:
777+
778+
```bash
779+
# Install required packages
780+
uv add sqlalchemy pgvector psycopg2-binary
781+
```
782+
783+
### Using pgvector Knowledge Storage
784+
785+
Here's how to use pgvector as your knowledge storage backend in CrewAI:
786+
787+
```python
788+
from crewai import Agent, Task, Crew, Process
789+
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
790+
from crewai.knowledge.storage.pgvector_knowledge_storage import PGVectorKnowledgeStorage
791+
792+
# Create a connection string for PostgreSQL
793+
connection_string = "postgresql://username:password@localhost:5432/crewai_knowledge"
794+
795+
# Create a custom knowledge storage
796+
pgvector_storage = PGVectorKnowledgeStorage(
797+
connection_string=connection_string,
798+
embedding_dimension=1536, # Dimension for OpenAI embeddings
799+
)
800+
801+
# Create a knowledge source
802+
content = "CrewAI is a framework for orchestrating role-playing autonomous agents."
803+
string_source = StringKnowledgeSource(
804+
content=content,
805+
storage=pgvector_storage # Use pgvector storage
806+
)
807+
808+
# Create an agent with the knowledge store
809+
agent = Agent(
810+
role="CrewAI Expert",
811+
goal="Explain CrewAI concepts accurately.",
812+
backstory="You are an expert in the CrewAI framework.",
813+
knowledge_sources=[string_source],
814+
)
815+
816+
# Create a task
817+
task = Task(
818+
description="Answer this question about CrewAI: {question}",
819+
expected_output="A detailed answer about CrewAI.",
820+
agent=agent,
821+
)
822+
823+
# Create a crew with the knowledge sources
824+
crew = Crew(
825+
agents=[agent],
826+
tasks=[task],
827+
verbose=True,
828+
process=Process.sequential,
829+
)
830+
831+
# Run the crew
832+
result = crew.kickoff(inputs={"question": "What is CrewAI?"})
833+
```
834+
835+
### Configuration Options
836+
837+
The `PGVectorKnowledgeStorage` class supports the following configuration options:
838+
839+
| Option | Description | Default |
840+
|--------|-------------|---------|
841+
| `connection_string` | PostgreSQL connection string | Required |
842+
| `embedder` | Embedding configuration | OpenAI embeddings |
843+
| `table_name` | Name of the table to store documents | "documents" |
844+
| `embedding_dimension` | Dimension of the embedding vectors | 1536 |
845+
846+
#### Connection String Format
847+
848+
The PostgreSQL connection string follows this format:
849+
```
850+
postgresql://username:password@hostname:port/database_name
851+
```
852+
853+
#### Custom Embedding Models
854+
855+
You can configure custom embedding models just like with the default knowledge storage:
856+
857+
```python
858+
pgvector_storage = PGVectorKnowledgeStorage(
859+
connection_string="postgresql://username:password@localhost:5432/crewai_knowledge",
860+
embedder={
861+
"provider": "openai",
862+
"config": {
863+
"model": "text-embedding-3-large"
864+
}
865+
},
866+
embedding_dimension=3072, # Dimension for text-embedding-3-large
867+
)
868+
```
869+
870+
### Advanced Usage
871+
872+
#### Custom Table Names
873+
874+
You can specify a custom table name to store your documents:
875+
876+
```python
877+
pgvector_storage = PGVectorKnowledgeStorage(
878+
connection_string="postgresql://username:password@localhost:5432/crewai_knowledge",
879+
table_name="my_custom_documents_table"
880+
)
881+
```
882+
883+
#### Multiple Knowledge Collections
884+
885+
You can create multiple knowledge collections by using different table names:
886+
887+
```python
888+
# Create a storage for product knowledge
889+
product_storage = PGVectorKnowledgeStorage(
890+
connection_string="postgresql://username:password@localhost:5432/crewai_knowledge",
891+
table_name="product_knowledge"
892+
)
893+
894+
# Create a storage for customer knowledge
895+
customer_storage = PGVectorKnowledgeStorage(
896+
connection_string="postgresql://username:password@localhost:5432/crewai_knowledge",
897+
table_name="customer_knowledge"
898+
)
899+
```
900+
901+
### Troubleshooting
902+
903+
#### Common Issues
904+
905+
1. **pgvector Extension Not Found**
906+
907+
Error: `ERROR: could not load library "/usr/local/lib/postgresql/pgvector.so"`
908+
909+
Solution: Make sure the pgvector extension is properly installed in your PostgreSQL instance:
910+
```sql
911+
CREATE EXTENSION vector;
912+
```
913+
914+
2. **Dimension Mismatch**
915+
916+
Error: `ERROR: vector dimensions do not match`
917+
918+
Solution: Ensure that the `embedding_dimension` parameter matches the dimension of your embedding model.
919+
920+
3. **Connection Issues**
921+
922+
Error: `Could not connect to PostgreSQL server`
923+
924+
Solution: Check your connection string and make sure the PostgreSQL server is running and accessible.
925+
926+
#### Performance Tips
927+
928+
1. **Create an Index**
929+
930+
For better performance with large datasets, create an index on the embedding column:
931+
932+
```sql
933+
CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);
934+
```
935+
936+
2. **Batch Processing**
937+
938+
When saving large numbers of documents, process them in batches to avoid memory issues:
939+
940+
```python
941+
batch_size = 100
942+
for i in range(0, len(documents), batch_size):
943+
batch = documents[i:i+batch_size]
944+
pgvector_storage.save(batch)
945+
```
946+
739947
## Best Practices
740948

741949
<AccordionGroup>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from crewai.knowledge.storage.pgvector_knowledge_storage import PGVectorKnowledgeStorage

0 commit comments

Comments
 (0)