|
|
---
|
|
|
title: Google Gemini 系列 Tools Calling 评测
|
|
|
description: >-
|
|
|
使用 LobeChat 测试 Google Gemini 系列模型(Gemini 1.5 Pro / Gemini 1.5
|
|
|
Flash)的工具调用(Function Calling)能力,并展现评测结果
|
|
|
tags:
|
|
|
- Tools Calling
|
|
|
- Benchmark
|
|
|
- Function Calling 评测
|
|
|
- 工具调用
|
|
|
- 插件
|
|
|
---
|
|
|
|
|
|
# Google Gemini 系列 Tools Calling
|
|
|
|
|
|
Google Gemini 系列模型 Tools Calling 能力一览:
|
|
|
|
|
|
| 模型 | 支持 Tools Calling | 流式 (Stream) | 并发(Parallel) | 简单指令得分 | 复杂指令 |
|
|
|
| ---------------- | ---------------- | ----------- | ------------ | ------ | ---- |
|
|
|
| Gemini 1.5 Pro | ✅ | ❌ | ✅ | ⛔ | ⛔ |
|
|
|
| Gemini 1.5 Flash | ❌ | ❌ | ❌ | ⛔ | ⛔ |
|
|
|
|
|
|
<Callout type={'important'}>
|
|
|
根据我们的的实际测试,强烈建议不要给 Gemini 开启插件,因为目前(截止 2024.07.07)它的 Tools Calling
|
|
|
能力实在太烂了。
|
|
|
</Callout>
|
|
|
|
|
|
## Gemini 1.5 Pro
|
|
|
|
|
|
### 简单调用指令:天气查询
|
|
|
|
|
|
测试指令:指令 ①
|
|
|
|
|
|
<Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a5a35431-2a15-4e79-97d5-502637f829bc" />
|
|
|
|
|
|
Gemini 输出的 json 中,name 是错误的,因此 LobeChat 无法识别到它调用了什么插件。(入参中,天气插件的 name 为 `realtime-weather____fetchCurrentWeather`,而 Gemini 返回的是 `weather____fetchCurrentWeather`)。
|
|
|
|
|
|
<Image alt="Gemini 1.5 Pro 简单指令的 Tools Calling" src="https://github.com/lobehub/lobe-chat/assets/28616219/1e077799-c25e-43c7-8492-c5c0bb9aed9b" />
|
|
|
|
|
|
<details>
|
|
|
<summary>Tools Calling 原始输出:</summary>
|
|
|
|
|
|
```yml
|
|
|
[stream start] 2024-7-7 17:53:25.647
|
|
|
[chunk 0] 2024-7-7 17:53:25.654
|
|
|
{"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
|
|
|
|
|
|
[chunk 1] 2024-7-7 17:53:26.288
|
|
|
{"candidates":[{"content":{"parts":[{"text":"\n\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
|
|
|
|
|
|
[chunk 2] 2024-7-7 17:53:26.336
|
|
|
{"candidates":[{"content":{"parts":[{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"杭州"}}},{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"北京"}}}],"role":"model"},"finishReasoSTOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":79,"totalTokenCount":174}}
|
|
|
|
|
|
[stream finished] total chunks: 3
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
### 复杂调用指令:文生图
|
|
|
|
|
|
测试指令:指令 ②
|
|
|
|
|
|
<Image alt="Gemini 1.5 Pro 复杂指令的 Tools Calling" src="https://github.com/lobehub/lobe-chat/assets/28616219/a2454a60-3271-4786-861f-d49ceac1316e" />
|
|
|
|
|
|
在测试复杂指令集时,Google 直接抛错:
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"message": "[400 Bad Request] Invalid JSON payload received. Unknown name \"maxItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"minItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field. [{\"@type\":\"type.googleapis.com/google.rpc.BadRequest\",\"fieldViolations\":[{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"maxItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"minItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[1].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[3].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[4].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field.\"}]}]"
|
|
|
}
|
|
|
```
|
|
|
|
|
|
上述抛错中提到并不支持包含 `maxItems` 的 schema,因此 Gemini 1.5 Pro 相当于无法使用 DallE 插件。
|
|
|
|
|
|
相关 issue:
|
|
|
|
|
|
- [Support for minItems and maxItems for FunctionDeclarationSchemaType.ARRAY?](https://github.com/google-gemini/generative-ai-js/issues/200)
|
|
|
- [Gemini Models unusable when dalle plugin is enabled](https://github.com/lobehub/lobe-chat/issues/2537)
|
|
|
|
|
|
综合以上两个测试来看,Google 的 Tool Calling 能力似乎是支持了,但是几乎没法在日常中使用,我个人认为已经等于虚假宣传了。
|
|
|
|
|
|
## Gemini 1.5 Flash
|
|
|
|
|
|
### 简单调用指令:天气查询
|
|
|
|
|
|
测试指令:指令 ①
|
|
|
|
|
|
<Video src="https://github.com/lobehub/lobe-chat/assets/28616219/6cab77e8-d761-4a91-8325-a61748cebac1" />
|
|
|
|
|
|
而 Gemini 1.5 flash 更为抽象,说完调用就结束了。结合以下原始输出可以看到,Gemini 1.5 Flash 并没有输出 Tool Calling 的数据,因此可以说是完全不可用。
|
|
|
|
|
|
```yml
|
|
|
stream start] 2024-7-7 19:4:50.936
|
|
|
[chunk 0] 2024-7-7 19:4:50.943
|
|
|
{"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":1,"totalTokenCount":97}}
|
|
|
|
|
|
[chunk 1] 2024-7-7 19:4:52.209
|
|
|
{"candidates":[{"content":{"parts":[{"text":",请稍等,我正在查询杭州和北京的天气信息。 "}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"ATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
|
|
|
|
|
|
[chunk 2] 2024-7-7 19:4:53.288
|
|
|
{"candidates":[{"content":{"parts":[{"text":"\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
|
|
|
|
|
|
[stream finished] total chunks: 3
|
|
|
```
|
|
|
|
|
|
### 复杂调用指令:文生图
|
|
|
|
|
|
测试指令:指令 ②
|
|
|
|
|
|
该指令和 Gemini 1.5 Pro 的复杂指令一样,直接抛错,因此不再详细展开。
|