You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

23 lines
1.7 KiB
Markdown

---
title: >-
LobeChat Supports Multimodal Interaction: Visual Recognition Enhances
Intelligent Dialogue
description: >-
LobeChat supports various large language models with visual recognition
capabilities, allowing users to upload or drag and drop images. The assistant
will recognize the content and engage in intelligent dialogue, creating a more
intelligent and diverse chat environment.
tags:
- Visual Recognition
- LobeChat
- GPT-4 Vision
- Google Gemini Pro
- Multimodal Interaction
---
# Supported Models for Visual Recognition
LobeChat now supports several large language models with visual recognition capabilities, including OpenAI's [`gpt-4-vision`](https://platform.openai.com/docs/guides/vision), Google Gemini Pro vision, and Zhiyuan GLM-4 Vision. This empowers LobeChat with multimodal interaction capabilities. Users can effortlessly upload images or drag and drop them into the chat window, where the assistant can recognize the image content and engage in intelligent dialogue, building a smarter and more diverse chat experience.
This feature opens up new avenues for interaction, allowing communication that extends beyond text to include rich visual elements. Whether sharing images during everyday use or interpreting graphics in specific industries, the assistant delivers an exceptional conversational experience. Additionally, we have carefully selected a range of high-quality voice options (OpenAI Audio, Microsoft Edge Speech) to cater to users from different regions and cultural backgrounds. Users can choose a suitable voice based on personal preferences or specific contexts, thus receiving a more personalized communication experience.