Replies: 1 comment
-
Word weight calculation in RAGFlow is based on several factors: term frequency, document frequency, named entity recognition (NER), part-of-speech tagging, and inverse document frequency (IDF). The calculation is implemented in the Hyperparameters play a key role in this process. For example:
ner.json is used to map specific words to entity categories such as 'toxic', 'stock', etc. In the weighting function, words that match certain NER categories are assigned different weights. For example, 'stock' entities might get a higher weight than generic words. This allows you to emphasize or de-emphasize certain types of entities in your retrieval process [source]. If you want to use your own data and optimize results:
Experimenting with these settings—especially the NER category weights and the similarity weights—will have the most direct impact on retrieval quality for your specific data. You may need to iterate and validate the results on your own dataset to find the optimal configuration. To continue talking to Dosu, mention @dosu-bot. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to do a rag of vertical data. I see that many hyperparameters are defined in the code for calculating word weights in ragflow, for example, the weights of some words under ner.json. What is the consideration of this? If I want to use my own data, Which part of the weight adjustment will bring better results?
Beta Was this translation helpful? Give feedback.
All reactions