You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

75 lines
2.6 KiB
Markdown

---
title: LobeChat Knowledge Base / File Upload
description: >-
Explore LobeChat's file upload and knowledge base management features with
core components.
tags:
- LobeChat
- File Upload
- Knowledge Base
- PostgreSQL
- OpenAI Embedding
---
# Knowledge Base / File Upload
LobeChat supports file upload and knowledge base management. This feature relies on the following core technical components. Understanding these components will help you successfully deploy and maintain the knowledge base system.
## Core Components
### 1. PostgreSQL and PGVector
PostgreSQL is a powerful open-source relational database system, and PGVector is its extension for vector operations.
- **Purpose**: Store structured data and vector indexes
- **Deployment Tip**: Use official Docker image for quick deployment
Deployment script example:
```
docker run -p 5432:5432 -d --name pg -e POSTGRES_PASSWORD=mysecretpassword pgvector/pgvector:pg16
```
- **Note**: Ensure sufficient resources for vector operations
### 2. S3-compatible Object Storage
S3 (or S3-compatible storage services) is used for storing uploaded files.
- **Purpose**: Store raw files
- **Options**: AWS S3, MinIO, or other S3-compatible services
- **Note**: Configure appropriate access permissions and security policies
### 3. OpenAI Embedding
OpenAI's Embedding service is used to convert text into vector representations.
<Callout type={'info'}>
LobeChat currently uses OpenAI's `text-embedding-3-small` model by default. Ensure your API Key has access to this model.
</Callout>
- **Purpose**: Generate vector representations for semantic search
- **Notes**:
- Requires valid OpenAI API key
- Implement proper API call limits and error handling
### 4. Unstructured.io (Optional)
Unstructured.io is a powerful document processing tool.
- **Purpose**: Process complex document formats, extract structured information
- **Use Case**: Handle non-plain text formats like PDF, Word
- **Note**: Evaluate processing needs based on document complexity
By correctly configuring and integrating these core components, you can build a powerful and efficient knowledge base system for LobeChat. Each component plays a crucial role in the overall architecture, supporting advanced document management and intelligent retrieval functions.
### 5. Custom Embedding
- **Purpose**: Use different Embedding generate vector representations for semantic search
- **Options**: support model provider list: zhipu/github/openai/bedrock/ollama
- **Deployment Tip**: Used to configure the default Embedding model
```
environment: DEFAULT_FILES_CONFIG=embedding_model=openai/embedding-text-3-small
```