---
title: OpenAI GPT 系列 Tools Calling 评测
description: >-
使用 LobeChat 测试 OpenAI GPT 系列模型(GPT 3.5-turbo / GPT-4 /GPT-4o) 的工具调用(Function
Calling)能力,并展现评测结果
tags:
- Tools Calling
- Benchmark
- Function Calling
- 工具调用
- 插件
---
# OpenAI GPT Series Tool Calling
Overview of the Tool Calling capabilities of OpenAI GPT series models:
| Model | Tool Calling Support | Streaming | Parallel | Simple Instruction Score | Complex Instruction Score |
| ------------- | -------------------- | --------- | -------- | ------------------------ | ------------------------- |
| GPT-3.5-turbo | ✅ | ✅ | ✅ | 🌟🌟🌟 | 🌟 |
| GPT-4-turbo | ✅ | ✅ | ✅ | 🌟🌟 | 🌟🌟 |
| GPT-4o | ✅ | ✅ | ✅ | 🌟🌟🌟 | 🌟🌟 |
For testing instructions, see [Tools Calling - Evaluation Task
Introduction](/docs/usage/tools-calling#evaluation-task-introduction)
## GPT 3.5-turbo
### Simple Instruction Call: Weather Inquiry
Test Instruction: Instruction ①
Streaming Tool Calling Raw Output:
### Complex Instruction Call: Wenshengtu
Test Instruction: Instruction ②
Streaming Tool Calling Raw Output:
## GPT-4 Turbo
### Simple Instruction Call: Weather Inquiry
Test Instruction: Instruction ①
Unlike GPT-3.5 Turbo, GPT-4 Turbo did not respond with "okay" when calling Tool Calling, and after multiple tests, it remained the same. Therefore, in this follow-up of a compound instruction, it is not as good as GPT-3.5 Turbo, but the remaining two capabilities are still good.
Of course, it is also possible that GPT-4 Turbo's model has more "autonomy" and believes that it does not need to output this "okay."
Streaming Tool Calling Raw Output:
### Complex Instruction Call: Wenshengtu
Test Instruction: Instruction ②
Streaming Tool Calling Raw Output:
## GPT-4o
### Simple Instruction Call: Weather Inquiry
Test Instruction: Instruction ①
Similar to GPT-3.5, GPT-4o performs very well in following compound instructions in simple instruction calls.
Streaming Tool Calling Raw Output:
### Complex Instruction Call: Wenshengtu
Test Instruction: Instruction ②
Streaming Tool Calling Raw Output:
```yml
```