[RFC] 097 - 服务商输出性能统计 | Provider Output Performance Statistics #6922
cy948
started this conversation in
RFC | 特性开发
Replies: 4 comments 21 replies
-
计算速度是否会对客户端性能(特别是性能一般的移动端)造成影响呢? |
Beta Was this translation helpful? Give feedback.
1 reply
-
浏览器端统计感觉不准的,这个实现逻辑会有网络请求延迟。比较好的方式我觉得是在 agent runtime 那一侧做统计,然后将统计结果作为一个 chunk type 发出来 |
Beta Was this translation helpful? Give feedback.
14 replies
-
多久能合并到主线呀,非常期待这个功能! |
Beta Was this translation helpful? Give feedback.
1 reply
-
openrouter支持了吗 |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
背景
随着 DeepSeek, Qwen 等开源模型的爆火,越来越多的服务商开始接入。而服务商之间的服务质量亦有差距,又因为服务商之间的成本相差不大,用户可能经常苦于对比不同平台的服务。因此,在接近的成本下,性能统计可以让用户通过数字,直观地感知到服务商之间输出性能的差距,从而更好地做出适合自己当前地域的服务商选择。
当前计算 token 速度主要有两个性能指标:
Token per second
: 指每秒生成的 token 数量,单位为:t/s
;First token
: 第一个 token 返回的延迟,单位为:s
;而当前的 token 速度计算主要有以下方式:
功能说明
客户端计时
验证
在此使用模拟工具 ,在使用 OneHub 中转的前提下进行测量,结果如下。
注:
tps
为模拟工具输出速度,不一定完全准确。目前已经实现 OpenAI Compatible Provider 和 Anthropic Provier 的接入,期待更多的接入!
Currently, access to OpenAI Compatible Provider and Anthropic Provider has been achieved, and we look forward to more integrations!
接入指南
接入示例:
Anthropic
inputStartAt
参数传入inputStartAt
参数。也就是在发出请求之前初始化inputStartAt
Gemini: 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501
OpenAI兼容:Qwen 💄 style: Show Aliyun Bailian tokens usage tracking #7660
更多
Integration Guide
Anthropic:
inputStartAt
parameter.inputStartAt
parameter in the "appropriate location." This means initializinginputStartAt
before making the request.Gemini: 💄 style: add reasoning tokens and token usage statistics for Google Gemini #7501
OpenAI Compatible:Qwen 💄 style: Show Aliyun Bailian tokens usage tracking #7660
More
Beta Was this translation helpful? Give feedback.
All reactions