JZ_QGNB/DigitalHumanWeb/docs/self-hosting/advanced/knowledge-base.mdx

---
title: LobeChat Knowledge Base / File Upload
description: >-
  Explore LobeChat's file upload and knowledge base management features with
  core components.
tags:
  - LobeChat
  - File Upload
  - Knowledge Base
  - PostgreSQL
  - OpenAI Embedding
---

# Knowledge Base / File Upload

LobeChat supports file upload and knowledge base management. This feature relies on the following core technical components. Understanding these components will help you successfully deploy and maintain the knowledge base system.

## Core Components

### 1. PostgreSQL and PGVector

PostgreSQL is a powerful open-source relational database system, and PGVector is its extension for vector operations.

- **Purpose**: Store structured data and vector indexes
- **Deployment Tip**: Use official Docker image for quick deployment

Deployment script example:

```
docker run -p 5432:5432 -d --name pg -e POSTGRES_PASSWORD=mysecretpassword pgvector/pgvector:pg16
```

- **Note**: Ensure sufficient resources for vector operations

### 2. S3-compatible Object Storage

S3 (or S3-compatible storage services) is used for storing uploaded files.

- **Purpose**: Store raw files
- **Options**: AWS S3, MinIO, or other S3-compatible services
- **Note**: Configure appropriate access permissions and security policies

### 3. OpenAI Embedding

OpenAI's Embedding service is used to convert text into vector representations.

<Callout type={'info'}>
  LobeChat currently uses OpenAI's `text-embedding-3-small` model by default. Ensure your API Key has access to this model.
</Callout>

- **Purpose**: Generate vector representations for semantic search
- **Notes**:
  - Requires valid OpenAI API key
  - Implement proper API call limits and error handling

### 4. Unstructured.io (Optional)

Unstructured.io is a powerful document processing tool.

- **Purpose**: Process complex document formats, extract structured information
- **Use Case**: Handle non-plain text formats like PDF, Word
- **Note**: Evaluate processing needs based on document complexity

By correctly configuring and integrating these core components, you can build a powerful and efficient knowledge base system for LobeChat. Each component plays a crucial role in the overall architecture, supporting advanced document management and intelligent retrieval functions.

### 5. Custom Embedding

- **Purpose**: Use different Embedding generate vector representations for semantic search
- **Options**: support model provider list: zhipu/github/openai/bedrock/ollama
- **Deployment Tip**: Used to configure the default Embedding model

```
environment: DEFAULT_FILES_CONFIG=embedding_model=openai/embedding-text-3-small
```