From wikipedia.org英文版,我主要將其改變成中文。
BM25(Best Match25)是在信息檢索系統中根據提出的query對document進行評分的算法。It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.BM25算法首先由OKapi系統實現,所以又稱為OKapi BM25。
BM25屬於bag-of-words模型,bag-of-words模型只考慮document中詞頻,不考慮句子結構或者語法關系之類,把document當做裝words的袋子,具體袋子里面可以是雜亂無章的。It is not a single function, but actually a whole family of scoring functions, with slightly different components and parameters. One of the most prominent instantiations of the function is as follows.
對於一個query , 包括關鍵字
, 一個文檔的BM25得分: