-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[Question]: When I manually insert text segment markers into the document and modify them to my chosen words when creating the knowledge base, the segmentation effect still does not meet my expectations. #7585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The issue you're experiencing with the segmentation not meeting your expectations, despite manually inserting text segment markers, could be influenced by several factors in RAGFlow's configuration:
To achieve your desired segmentation of 76 segments, you might need to adjust the 'Chunk token number for text' or explore other slicing methods if available. Additionally, ensure that the embedding model and other settings are consistent with those used on other platforms where you achieved the expected segmentation. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
也更改过文本块的大小和文本分段标志符的内容,最后生成的效果还是不符合预期 |
If you want to slice text by the |
单分隔符按文本切块的话,由于文档每一块内容长度不确定,建议文本大小调小了容易把本来某一段(字符较多)拆分两段的 |
Be carefull! It's not a pair of quote mark. |
Self Checks
Describe your problem
When I manually insert text segment markers into the document and modify them to my chosen words when creating the knowledge base, the segmentation effect still does not meet my expectations.

当我手动给文档打入文本分段标志符,并在知识库创建的时候修改成我手动的词,但是分段效果还是不满足我的分段预期
上传文档中我理想分段是76个分段,即我原文档分段词出现了76次
In the uploaded document, my ideal segmentation is 76 segments, which means the segmentation words in my original document appeared 76 times.
但是上传之后是40个分段,也试过换别的嵌入模型,但是效果仍达不到预期,(在别的平台同样的嵌入模型,分段却是76段)
However, after uploading, there are 40 segments. I have tried using other embedding models, but the results still do not meet expectations (on other platforms with the same embedding model, there are 76 segments).
The text was updated successfully, but these errors were encountered: