You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

110 lines
8.2 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
title: Google Gemini 系列 Tool Calling 评测
description: >-
使用 LobeChat 测试 Google Gemini 系列模型Gemini 1.5 Pro / Gemini 1.5 Flash
的工具调用Function Calling能力并展现评测结果
tags:
- Tools Calling
- Benchmark
- Function Calling 评测
- 工具调用
- 插件
---
# Google Gemini Series Tool Calling
Overview of Google Gemini series model Tools Calling capabilities:
| Model | Tools Calling Support | Streaming | Parallel | Simple Instruction Score | Complex Instruction |
| ---------------- | --------------------- | --------- | -------- | ------------------------ | ------------------- |
| Gemini 1.5 Pro | ✅ | ❌ | ✅ | ⛔ | ⛔ |
| Gemini 1.5 Flash | ❌ | ❌ | ❌ | ⛔ | ⛔ |
<Callout type={'important'}>
Based on our actual tests, we strongly recommend not enabling plugins for Gemini because as of
July 7, 2024, its Tools Calling capability is extremely poor.
</Callout>
## Gemini 1.5 Pro
### Simple Instruction Call: Weather Query
Test Instruction: Instruction ①
<Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a5a35431-2a15-4e79-97d5-502637f829bc" />
In the json output from Gemini, the name is incorrect, so LobeChat cannot recognize which plugin it called. (In the input, the name of the weather plugin is `realtime-weather____fetchCurrentWeather`, while Gemini returns `weather____fetchCurrentWeather`).
<Image alt="Tools Calling for Simple Instruction in Gemini 1.5 Pro" src="https://github.com/lobehub/lobe-chat/assets/28616219/1e077799-c25e-43c7-8492-c5c0bb9aed9b" />
<details>
<summary>Original Tools Calling Output:</summary>
```yml
[stream start] 2024-7-7 17:53:25.647
[chunk 0] 2024-7-7 17:53:25.654
{"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
[chunk 1] 2024-7-7 17:53:26.288
{"candidates":[{"content":{"parts":[{"text":"\n\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
[chunk 2] 2024-7-7 17:53:26.336
{"candidates":[{"content":{"parts":[{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"Hangzhou"}}},{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"Beijing"}}}],"role":"model"},"finishReasoSTOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":79,"totalTokenCount":174}}
[stream finished] total chunks: 3
```
</details>
### Complex Instruction Call: Image Generation
Test Instruction: Instruction ②
<Image alt="Tools Calling for Complex Instruction in Gemini 1.5 Pro" src="https://github.com/lobehub/lobe-chat/assets/28616219/a2454a60-3271-4786-861f-d49ceac1316e" />
When testing a set of complex instructions, Google throws an error directly:
```json
{
"message": "[400 Bad Request] Invalid JSON payload received. Unknown name \"maxItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"minItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field. [{\"@type\":\"type.googleapis.com/google.rpc.BadRequest\",\"fieldViolations\":[{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"maxItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"minItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[1].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[3].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[4].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field.\"}]}]"
}
```
The error above mentions that it does not support a schema containing `maxItems`, so Gemini 1.5 Pro is essentially unable to use the DallE plugin.
Related issues:
- [Support for minItems and maxItems for FunctionDeclarationSchemaType.ARRAY?](https://github.com/google-gemini/generative-ai-js/issues/200)
- [Gemini Models unusable when dalle plugin is enabled](https://github.com/lobehub/lobe-chat/issues/2537)
Based on the above two tests, Google's Tool Calling capability seems to be supported, but it is almost unusable in daily use. I personally think it is equivalent to false advertising.
## Gemini 1.5 Flash
### Simple Command: Weather Query
Test Command: Command ①
<Video src="https://github.com/lobehub/lobe-chat/assets/28616219/6cab77e8-d761-4a91-8325-a61748cebac1" />
Gemini 1.5 Flash is more abstract, and the call ends as soon as it is made. Combining the original output below, it can be seen that Gemini 1.5 Flash does not output Tool Calling data, so it can be considered completely unusable.
```yml
stream start] 2024-7-7 19:4:50.936
[chunk 0] 2024-7-7 19:4:50.943
{"candidates":[{"content":{"parts":[{"text":"Okay"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":1,"totalTokenCount":97}}
[chunk 1] 2024-7-7 19:4:52.209
{"candidates":[{"content":{"parts":[{"text":", please wait, I am checking the weather information for Hangzhou and Beijing."}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
[chunk 2] 2024-7-7 19:4:53.288
{"candidates":[{"content":{"parts":[{"text":"\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
[stream finished] total chunks: 3
```
### Complex Command: Wenshengtu
Test Command: Command ②
This command, like the complex commands of Gemini 1.5 Pro, throws an error directly, so it will not be further elaborated.