[RFC] 097 - 服务商输出性能统计 | Provider Output Performance Statistics #6922

cy948 · 2025-03-12T11:56:50Z

cy948
Mar 12, 2025

背景

随着 DeepSeek, Qwen 等开源模型的爆火，越来越多的服务商开始接入。而服务商之间的服务质量亦有差距，又因为服务商之间的成本相差不大，用户可能经常苦于对比不同平台的服务。因此，在接近的成本下，性能统计可以让用户通过数字，直观地感知到服务商之间输出性能的差距，从而更好地做出适合自己当前地域的服务商选择。

当前计算 token 速度主要有两个性能指标：

Token per second: 指每秒生成的 token 数量，单位为： t/s ；
First token: 第一个 token 返回的延迟，单位为： s；

而当前的 token 速度计算主要有以下方式：

在服务商处进行的速度计算。
- 👍：从源头进行计算，更为准确；
- 👎：可能存在自卖自夸，可信度存疑；
自建中转进行速度计算。
- 👍：使用开源的、可信的算法和平台，可信度较高；
- 👎：需要自建中转，有运行成本，而且无法计算从服务商到用户的端到端速度；
在客户端进行速度计算。
- 👍：无需引入额外成本，能够得到大概结果，得到的数据是符合用户习惯的。
- 👎：精度可能不足。

功能说明

客户端计时

TTFT, Time to First Token: 从用户发出请求（含预检）到接收到服务商第一个 Token 返回（含Thought Token）的延迟。
TPS, Token per Second: 服务商在生成内容过程中输出的平均速度，如服务商不输出 Thoughs，则使用输出 Token 计算。

验证

在此使用模拟工具，在使用 OneHub 中转的前提下进行测量，结果如下。

Tps

tps	OneHub	本实现
40	39.69	39.66
80	81.64	81.67

首 Token:

tps	OneHub	本实现	浏览器
40	0.90	1.13	0.94
80	0.88	0.95	0.91

注：

tps 为模拟工具输出速度，不一定完全准确。
TTFT计算从发出 Option 到 Post 返回的间隔，反映用户的终端体验，详见 💄 style: show token generate performance #6959

目前已经实现 OpenAI Compatible Provider 和 Anthropic Provier 的接入，期待更多的接入！
Currently, access to OpenAI Compatible Provider and Anthropic Provider has been achieved, and we look forward to more integrations!

接入指南

接入示例：

Anthropic
- StreamsOptions: 先前往 Streams 中修改 StreamsOptions，使其支持 inputStartAt 参数传入
- AgentRuntime: 再前往 AgentRuntime 中修改 Stream 的传入参数，并在“合适位置”初始化 inputStartAt 参数。也就是在发出请求之前初始化 inputStartAt
Gemini: 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501
OpenAI兼容：Qwen 💄 style: Show Aliyun Bailian tokens usage tracking #7660

更多

遇到问题？如测试等可以参考 💄 style: show token generate performance #6959 中的处理示例
已知问题：如遇到类似 Google 这类不输出 reasoning 过程的，参考 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501 中的实现。

Integration Guide

Anthropic:
- StreamsOptions: First, go to Streams and modify StreamsOptions to support passing in the inputStartAt parameter.
- AgentRuntime: Next, go to AgentRuntime and modify the parameters passed to the Stream, initializing the inputStartAt parameter in the "appropriate location." This means initializing inputStartAt before making the request.
Gemini: 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501
OpenAI Compatible：Qwen 💄 style: Show Aliyun Bailian tokens usage tracking #7660

More

Encountering issues? For testing and other references, you can look at the handling examples in 💄 style: show token generate performance #6959.
Known issues: If you encounter cases like Google that do not output the reasoning process, see 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501

Sun-drenched · 2025-03-12T14:24:49Z

Sun-drenched
Mar 12, 2025

计算速度是否会对客户端性能（特别是性能一般的移动端）造成影响呢？

1 reply

cy948 Mar 12, 2025
Author

有一定影响，估计不大，每接收一个模型返回的消息，就要获取一次系统时钟。
如果你有兴趣，可以试一下这个构建。https://lobe-chat-liart-ten.vercel.app/ Logto测试账号

[email protected]
tY12nlv3

测试 demo 已经配置好模型和URL，不需要自带 apiky，如果使用了 apikey 请在使用 key 后及时注销。

arvinxx · 2025-03-12T15:10:34Z

arvinxx
Mar 12, 2025
Maintainer

浏览器端统计感觉不准的，这个实现逻辑会有网络请求延迟。比较好的方式我觉得是在 agent runtime 那一侧做统计，然后将统计结果作为一个 chunk type 发出来

14 replies

cy948 Mar 14, 2025
Author

直接在 AgentRuntime 算完 tps 还是说把“输出的时间”和“输出完成的时间”回传，留给客户端算呢？

arvinxx Mar 14, 2025
Maintainer

计算都在 agent runtime 里做。算完然后加一个字段回传

cy948 Mar 14, 2025
Author

目前实现了两个指标的计算：

tps: token 每秒。和中转误差 <0.1 tps
delay: 首 token 生成速度。和中转的首token速度误差 2~3 个 TTL。

arvinxx Mar 14, 2025
Maintainer

不要叫delay，叫 TTFT : time to fist token

cy948 Mar 14, 2025
Author

ok

KKKZOZ · 2025-04-18T00:48:43Z

KKKZOZ
Apr 18, 2025

多久能合并到主线呀，非常期待这个功能！

1 reply

cy948 Apr 18, 2025
Author

快了，补个国际化就好了 😋

liuchao0807 · 2025-05-18T04:59:27Z

liuchao0807
May 18, 2025

openrouter支持了吗

5 replies

cy948 May 19, 2025
Author

OpenRouter 的付费模型也不显示 tps 吗？

liuchao0807 May 19, 2025

OpenRouter 的付费模型也不显示 tps 吗？

是的，都不显示

cy948 May 19, 2025
Author

@danielglh 大佬有兴趣做一下 OpenRouter 的支持吗？

danielglh May 20, 2025

@cy948 不急的话回头我研究一下，最近事有点多

cy948 May 25, 2025
Author

不着急的，可能要给 openrouter provider 做下 debug ，因为预期像 openrouter 这种用 OpenAI factory 的 provider 是无缝就能接入的。

Uh oh!

[RFC] 097 - 服务商输出性能统计 | Provider Output Performance Statistics #6922

Uh oh!

Uh oh!

cy948 Mar 12, 2025

背景

功能说明

客户端计时

验证

接入指南

Integration Guide

Replies: 4 comments · 21 replies

Uh oh!

Sun-drenched Mar 12, 2025

Uh oh!

cy948 Mar 12, 2025 Author

Uh oh!

Uh oh!

arvinxx Mar 12, 2025 Maintainer

Uh oh!

cy948 Mar 14, 2025 Author

Uh oh!

arvinxx Mar 14, 2025 Maintainer

Uh oh!

cy948 Mar 14, 2025 Author

Uh oh!

arvinxx Mar 14, 2025 Maintainer

Uh oh!

cy948 Mar 14, 2025 Author

Uh oh!

KKKZOZ Apr 18, 2025

Uh oh!

cy948 Apr 18, 2025 Author

Uh oh!

liuchao0807 May 18, 2025

Uh oh!

cy948 May 19, 2025 Author

Uh oh!

liuchao0807 May 19, 2025

Uh oh!

cy948 May 19, 2025 Author

Uh oh!

danielglh May 20, 2025

Uh oh!

cy948 May 25, 2025 Author

cy948
Mar 12, 2025

Replies: 4 comments 21 replies

Sun-drenched
Mar 12, 2025

cy948 Mar 12, 2025
Author

arvinxx
Mar 12, 2025
Maintainer

cy948 Mar 14, 2025
Author

arvinxx Mar 14, 2025
Maintainer

cy948 Mar 14, 2025
Author

arvinxx Mar 14, 2025
Maintainer

cy948 Mar 14, 2025
Author

KKKZOZ
Apr 18, 2025

cy948 Apr 18, 2025
Author

liuchao0807
May 18, 2025

cy948 May 19, 2025
Author

cy948 May 19, 2025
Author

cy948 May 25, 2025
Author