Skip to content

知识库检索时返回大量无关信息 #8329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sxx412 opened this issue May 23, 2025 · 1 comment
Open

知识库检索时返回大量无关信息 #8329

sxx412 opened this issue May 23, 2025 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@sxx412
Copy link

sxx412 commented May 23, 2025

版本号:

3.8.0

问题描述:

用官网的知识库进行召回测试,会返回很多与关键字无关的片段信息。
重新步骤如下:

  1. 点击【AI知识库-积木报表文档】,选择命中测试;
  2. 输入关键字“sql报错“,返回的结果掺杂了大量无关信息(图1);
  3. 同样用dify导入该文档进行测试,返回结果就比较精准(图2);
错误截图:

图1. jeecgboot的知识库命中率测试
Image

图2. dify的知识库命中率测试
Image

友情提示:

  • 未按格式要求发帖、描述过于简单的,会被直接删掉;
  • 描述问题请图文并茂,方便我们理解并快速定位问题;
  • 如果使用的不是master,请说明你使用的分支;
@sxx412 sxx412 added the help wanted Extra attention is needed label May 23, 2025
@jeecgos
Copy link

jeecgos commented May 23, 2025

你可以通过以下方式来提高知识库命中的精准度:

  1. 修改org.jeecg.modules.airag.llm.handler.EmbeddingHandler类中DEFAULT_SEGMENT_SIZE(分段长度)和DEFAULT_OVERLAP_SIZE(重叠度)。
  2. 调高命中阈值。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants