You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
23 lines
1.7 KiB
Markdown
23 lines
1.7 KiB
Markdown
---
|
|
title: >-
|
|
LobeChat Supports Multimodal Interaction: Visual Recognition Enhances
|
|
Intelligent Dialogue
|
|
description: >-
|
|
LobeChat supports various large language models with visual recognition
|
|
capabilities, allowing users to upload or drag and drop images. The assistant
|
|
will recognize the content and engage in intelligent dialogue, creating a more
|
|
intelligent and diverse chat environment.
|
|
tags:
|
|
- Visual Recognition
|
|
- LobeChat
|
|
- GPT-4 Vision
|
|
- Google Gemini Pro
|
|
- Multimodal Interaction
|
|
---
|
|
|
|
# Supported Models for Visual Recognition
|
|
|
|
LobeChat now supports several large language models with visual recognition capabilities, including OpenAI's [`gpt-4-vision`](https://platform.openai.com/docs/guides/vision), Google Gemini Pro vision, and Zhiyuan GLM-4 Vision. This empowers LobeChat with multimodal interaction capabilities. Users can effortlessly upload images or drag and drop them into the chat window, where the assistant can recognize the image content and engage in intelligent dialogue, building a smarter and more diverse chat experience.
|
|
|
|
This feature opens up new avenues for interaction, allowing communication that extends beyond text to include rich visual elements. Whether sharing images during everyday use or interpreting graphics in specific industries, the assistant delivers an exceptional conversational experience. Additionally, we have carefully selected a range of high-quality voice options (OpenAI Audio, Microsoft Edge Speech) to cater to users from different regions and cultural backgrounds. Users can choose a suitable voice based on personal preferences or specific contexts, thus receiving a more personalized communication experience.
|