---
title: Enhancing Multimodal Interaction with Visual Recognition Models
description: >-
Explore how LobeChat integrates visual recognition capabilities into large
language models, enabling multimodal interactions for enhanced user
experiences.
tags:
- Visual Recognition
- Multimodal Interaction
- Large Language Models
- LobeChat
- Custom Model Configuration
---
# Visual Model User Guide
The ecosystem of large language models that support visual recognition is becoming increasingly rich. Starting from `gpt-4-vision`, LobeChat now supports various large language models with visual recognition capabilities, enabling LobeChat to have multimodal interaction capabilities.
## Image Input
If the model you are currently using supports visual recognition, you can input image content by uploading a file or dragging the image directly into the input box. The model will automatically recognize the image content and provide feedback based on your prompts.
## Visual Models
In the model list, models with a `👁️` icon next to their names indicate that the model supports visual recognition. Selecting such a model allows you to send image content.
## Custom Model Configuration
If you need to add a custom model that is not currently in the list and explicitly supports visual recognition, you can enable the `Visual Recognition` feature in the `Custom Model Configuration` to allow the model to interact with images.