--- title: Using vLLM API Key in LobeChat description: Learn how to configure and use the vLLM language model in LobeChat, obtain an API key, and start a conversation. tags: - LobeChat - vLLM - API Key - Web UI --- # Using vLLM in LobeChat {'Using

[vLLM](https://github.com/vllm-project/vllm) is an open-source local large language model (LLM) deployment tool that allows users to efficiently run LLM models on local devices and provides an OpenAI API-compatible service interface. This document will guide you on how to use vLLM in LobeChat: ### Step 1: Preparation vLLM has certain requirements for hardware and software environments. Be sure to configure according to the following requirements: | Hardware Requirements | | | --------- | ----------------------------------------------------------------------- | | GPU | - NVIDIA CUDA
- AMD ROCm
- Intel XPU | | CPU | - Intel/AMD x86
- ARM AArch64
- Apple silicon | | Other AI Accelerators | - Google TPU
- Intel Gaudi
- AWS Neuron
- OpenVINO | | Software Requirements | | --------------------------------------- | | - OS: Linux
- Python: 3.9 – 3.12 | ### Step 2: Install vLLM If you are using an NVIDIA GPU, you can directly install vLLM using `pip`. However, it is recommended to use `uv` here, which is a very fast Python environment manager, to create and manage the Python environment. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install uv. After installing uv, you can use the following command to create a new Python environment and install vLLM: ```shell uv venv myenv --python 3.12 --seed source myenv/bin/activate uv pip install vllm ``` Another method is to use `uv run` with the `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating an environment: ```shell uv run --with vllm vllm --help ``` You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage your Python environment. ```shell conda create -n myenv python=3.12 -y conda activate myenv pip install vllm ``` For non-CUDA platforms, please refer to the [official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html#installation-index) to learn how to install vLLM. ### Step 3: Start Local Service vLLM can be deployed as an OpenAI API protocol-compatible server. By default, it will start the server at `http://localhost:8000`. You can specify the address using the `--host` and `--port` parameters. The server currently runs only one model at a time. The following command will start a vLLM server and run the `Qwen2.5-1.5B-Instruct` model: ```shell vllm serve Qwen/Qwen2.5-1.5B-Instruct ``` You can enable the server to check the API key in the header by passing the parameter `--api-key` or the environment variable `VLLM_API_KEY`. If not set, no API Key is required to access. For more detailed vLLM server configuration, please refer to the [official documentation](https://docs.vllm.ai/en/latest/). ### Step 4: Configure vLLM in LobeChat - Access the `Application Settings` interface of LobeChat. - Find the `vLLM` settings item under `Language Model`. {'Fill

- Open the vLLM service provider and fill in the API service address and API Key. * If your vLLM is not configured with an API Key, please leave the API Key blank. * If your vLLM is running locally, please make sure to turn on `Client Request Mode`. - Add the model you are running to the model list below. - Select a vLLM model to run for your assistant and start the conversation. {'Select

Now you can use the models provided by vLLM in LobeChat to have conversations.