1.0 Ranked retrieval
现在,我们只能用boolean的方法进行查找。但是,不是每个用户都会写布尔查询,而且布尔查询的结果要么太多(OR),要么太少(AND)
1.1 Ranked retrieval models
我们希望,可以用Free text 进行查询,而且查询的结果按照相关度排序。
Term的自动抽取及其加权
Zipf’s law: If the terms in a collection are ranked ® by their frequency (frf_rfr), they roughly fit the relation r∗fr=Cr * f_r =Cr∗fr=C Different collections have different constants C, but in English text, C tends to be about N / 10, where N is the number of words in the collection. pr=fr/Np_r = f_r / Npr=fr/N is the probability that a randomly chosen term (with frequency f_r) will have rank r.
r∗pr=Ar * p_r = A