信息检索（四）-- 文本分析及自动标引(Part 1)

最新推荐文章于 2024-08-30 17:05:27 发布

原创

最新推荐文章于 2024-08-30 17:05:27 发布 · 452 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#信息检索

1.0 Ranked retrieval

现在，我们只能用boolean的方法进行查找。但是，不是每个用户都会写布尔查询，而且布尔查询的结果要么太多(OR),要么太少(AND)

1.1 Ranked retrieval models

我们希望，可以用Free text 进行查询，而且查询的结果按照相关度排序。

Term的自动抽取及其加权
Zipf’s law: If the terms in a collection are ranked ® by their frequency ( $f_r$ ), they roughly fit the relation $r * f_r =C$ Different collections have different constants C, but in English text, C tends to be about N / 10, where N is the number of words in the collection. $p_r = f_r / N$ is the probability that a randomly chosen term (with frequency f_r) will have rank r.
$r * p_r = A$