Solr系列五：solr搜索詳解（solr搜索流程介紹、查詢語法及解析器詳解）

本文轉載自查看原文 2018-06-07 01:04 5226 搜索引擎

一、solr搜索流程介紹

1. 前面我們已經學習過Lucene搜索的流程，讓我們再來回顧一下

流程說明：

首先獲取用戶輸入的查詢串，使用查詢解析器QueryParser解析查詢串生成查詢對象Query，使用所有搜索器IndexSearcher執行查詢對象Query得到TopDocs，遍歷TopDocs得到文檔Document

2. Solr搜索的工作流程：

流程說明：

用戶輸入查詢字符串，根據用戶的請求類型qt（查詢為/select）選擇請求處理器RequestHandler，根據用戶輸入的參數defType來選擇一個查詢解析器解析用戶的查詢串（默認使用RequestHander中配置的默認查詢解析器），查詢解析器解析完以后根據用戶輸入的參數qf指定的字段進行搜索（默認是所有索引字段），查詢到結果以后做一些特殊的處理（fq，sort，start，rows，wt）以后使用響應處理器ResponseWriter返回給用戶

3. 查看內核的solrconfig.xml文件，了解搜索的請求處理器配置

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <bool name="preferLocalShards">false</bool>
     </lst>
</requestHandler>

 <requestHandler name="/query" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <str name="wt">json</str>
       <str name="indent">true</str>
       <str name="df">text</str>
     </lst>
  </requestHandler>

 <requestHandler name="/browse" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>

       <!-- VelocityResponseWriter settings -->
       <str name="wt">velocity</str>
       <str name="v.template">browse</str>
       <str name="v.layout">layout</str>
       <str name="title">Solritas</str>

       <!-- Query settings -->
       <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
          title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
       </str>
       <str name="mm">100%</str>
       <str name="q.alt">*:*</str>
       <str name="rows">10</str>
       <str name="fl">*,score</str>

       <str name="mlt.qf">
         text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
         title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
       </str>
       <str name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename</str>
       <int name="mlt.count">3</int>

       <!-- Faceting defaults -->
       <str name="facet">on</str>
       <str name="facet.missing">true</str>
       <str name="facet.field">cat</str>
       <str name="facet.field">manu_exact</str>
       <str name="facet.field">content_type</str>
       <str name="facet.field">author_s</str>
       <str name="facet.query">ipod</str>
       <str name="facet.query">GB</str>
       <str name="facet.mincount">1</str>
       <str name="facet.pivot">cat,inStock</str>
       <str name="facet.range.other">after</str>
       <str name="facet.range">price</str>
       <int name="f.price.facet.range.start">0</int>
       <int name="f.price.facet.range.end">600</int>
       <int name="f.price.facet.range.gap">50</int>
       <str name="facet.range">popularity</str>
       <int name="f.popularity.facet.range.start">0</int>
       <int name="f.popularity.facet.range.end">10</int>
       <int name="f.popularity.facet.range.gap">3</int>
       <str name="facet.range">manufacturedate_dt</str>
       <str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
       <str name="f.manufacturedate_dt.facet.range.end">NOW</str>
       <str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
       <str name="f.manufacturedate_dt.facet.range.other">before</str>
       <str name="f.manufacturedate_dt.facet.range.other">after</str>

       <!-- Highlighting defaults -->
       <str name="hl">on</str>
       <str name="hl.fl">content features title name</str>
       <str name="hl.preserveMulti">true</str>
       <str name="hl.encoder">html</str>
       <str name="hl.simple.pre">&lt;b&gt;</str>
       <str name="hl.simple.post">&lt;/b&gt;</str>
       <str name="f.title.hl.fragsize">0</str>
       <str name="f.title.hl.alternateField">title</str>
       <str name="f.name.hl.fragsize">0</str>
       <str name="f.name.hl.alternateField">name</str>
       <str name="f.content.hl.snippets">3</str>
       <str name="f.content.hl.fragsize">200</str>
       <str name="f.content.hl.alternateField">content</str>
       <str name="f.content.hl.maxAlternateFieldLength">750</str>

       <!-- Spell checking defaults -->
       <str name="spellcheck">on</str>
       <str name="spellcheck.extendedResults">false</str>
       <str name="spellcheck.count">5</str>
       <str name="spellcheck.alternativeTermCount">2</str>
       <str name="spellcheck.maxResultsForSuggest">5</str>
       <str name="spellcheck.collate">true</str>
       <str name="spellcheck.collateExtendedResults">true</str>
       <str name="spellcheck.maxCollationTries">5</str>
       <str name="spellcheck.maxCollations">3</str>
     </lst>

     <!-- append spellchecking to our list of components -->
     <arr name="last-components">
       <str>spellcheck</str>
     </arr>
  </requestHandler>

通過查看內核techproducts 內核的solrconfig.xml文件發現/select， /query， /browse三個請求處理器都是使用的solr.SearchHandler這個類來完成搜索的各項工作的，請求處理器里面的各項參數類型介紹：

4. SearchHandler介紹

查詢請求在SearcheHandler這個request handler中完成，各個步驟的工作由SearchHandler中組合的組件來完成了（可自定義，在該查詢的requesthandler配置元素內配置）。示例，自定義組件組合：

<arr name="components">
 <str>query</str>
 <str>facet</str>
 <str>mlt</str>
 <str>highlight</str>
 <str>debug</str>
 <str>someothercomponent</str>
</arr>

說明：

"query" (usually QueryComponent)
"facet" (usually FacetComponent)
"mlt" (usually MoreLikeThisComponent)
"highlight" (usually HighlightComponent)
"stats" (usually StatsComponent)
"debug" (usually DebugComponent)

還可在主組件組合前、后加入組件：

<arr name="first-components">
     <str>mycomponent</str>
</arr>

<arr name="last-components">
     <str>myothercomponent</str>
</arr>

SearchHandler的詳細介紹見官方文檔：https://wiki.apache.org/solr/SearchHandler

注意：如果你有這樣的默認查詢參數需要，可在<lst name="defaults"></lst>里面配置

二、查詢語法及解析器詳解

1. 通用查詢參數詳解見官方文檔

http://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

2. 查詢解析器介紹

Standard Query Parser
DisMax Query Parser
Extended DisMax Query parser

默認使用的是 Standard Query Parser 。通過defType參數可指定。

3. Standard Query Parser

solr標准查詢解析器。關鍵優點：它支持一個健壯且相當直觀的語法，允許我們創建各種結構的查詢。這個我們在學習lucene時已學過。最大的缺點：它不能容忍語法錯誤。

Standard Query Parser 請求參數：

除了通用參數外，標准查詢解析器還支持的參數有：

q：用標准查詢語法定義的查詢表達式（查詢串、主查詢），必需。
q.op：指定查詢表達式的默認操作， “AND” or “OR”，覆蓋默認配置值。
df：指定默認查詢字段
sow： Split on whitespace 按空格分割，如果設置為true，則會分別對分割出的文本進行分詞處理。默認false。

3.1 Standard Query Parser 響應內容格式

在瀏覽器輸入地址：http://localhost:8983/solr/techproducts/select?q=id:SP2514N&wt=xml

Standard Query Parser 響應內容格式-練習：

1、加上debug=all參數看看返回什么

http://localhost:8983/solr/techproducts/select?q=cat:book&wt=xml&debug=all

說明：加入debug參數后顯示在瀏覽器地址后查詢的過程

2、加上explainOther參數看看返回什么

http://localhost:8983/solr/techproducts/select?q=cat:book&wt=xml&debug=all&explainOther=id:055357342X

3.2 Solr Standard Query Parser 對傳統 lucene語法的增強

在范圍查詢的邊界兩端都可以用*

field:[* TO 100] finds all field values less than or equal to 100
field:[100 TO *] finds all field values greater than or equal to 100
field:[* TO *] matches all documents with the field

允許純非的查詢（限頂級字節）

-inStock:false   finds all field values where inStock is not false
-field:[* TO *]   finds all documents without a value for field

支持嵌入solr查詢（子查詢），切入查詢可以使用任意的solr查詢解析器

inStock:true OR {!dismax qf='name manu' v='ipod'}

支持特殊的filter(…) 語法來說明某個字句的結果要作為過濾查詢進行緩存

q=features:songs OR filter(inStock:true)
q=+manu:Apple +filter(inStock:true)
q=+manu:Apple & fq=inStock:true

如果過濾查詢中的某個字句需要獨立進行過濾緩存，也可用

q=features:songs & fq=+filter(inStock:true) +filter(price:[* TO 100])
q=manu:Apple & fq=-filter(inStock:true) -filter(price:[* TO 100])

查詢中的時間表示語法

createdate:1976-03-06T23\:59\:59.999Z
createdate:"1976-03-06T23:59:59.999Z"
createdate:[1976-03-06T23:59:59.999Z TO *]
createdate:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
timestamp:[* TO NOW]
pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]
createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]

4. DisMax Query Parser

最大分離查詢器，DisMax：Maximum Disjunction 最大分離

說明：一個查詢，可以為不同字段設置評分權重，在合並它的查詢字句的命中文檔時，每個文檔的分值取各個字句中的最大得分值。

DisMax Query Parser 是設計用於處理用戶輸入的簡單短語查詢的，它的特點：

只支持查詢語法的一個很小的子集：簡單的短語查詢、+  - 修飾符、AND OR 布爾操作； 
簡單的語法，不拋出語法錯誤異常給用戶。 
可以在多個字段上進行短語查詢。
可以靈活設置各個查詢字段的相關性權重。
可以靈活增加滿足某特定查詢文檔的相關性權重

4.1 DisMax Query Parser官方詳細說明文檔

http://lucene.apache.org/solr/guide/7_3/the-dismax-query-parser.html#the-dismax-query-parser

5. Extended DisMax Query Parser

擴展 DisMax Query Parse 使支標准查詢語法（是 Standard Query Parser 和 DisMax Query Parser 的復合）。也增加了不少參數來改進disMax。

強烈建議：使用 edismax 來進行查詢解析，因為它有如下特點

支持的語法很豐富；
很好的容錯能力；
靈活的加權評分設置。

5.1 Extended DisMax Query Parser官方詳細說明文檔

http://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html#the-extended-dismax-query-parser

6. 函數查詢

solr查詢也可使用函數，可用來過濾文檔、提高相關性值、根據函數計算結果進行排序、以及返回函數計算結果。在標准查詢解析器、dismax、edismax中都可以使用函數。

函數可以是：

常量：數值或字符串字面值，如 10、”lucene solr”
字段:    name  title
另一個函數：functionName(…)
替代參數：
         q={!func}min($f1,$f2)&f1=sqrt(popularity)&f2=1

6.1 solr提供的函數官方詳細說明文檔

http://lucene.apache.org/solr/guide/7_3/function-queries.html#function-queries

主要有數據轉換函數，數學函數，相關性函數，布爾函數，距離函數

6.2 函數的使用方式

用作函數查詢，查詢參數值是一個函數表達式，來計算相關性得分或過濾

q={!func}div(popularity,price)&fq={!frange l=1000}customer_ratings

在排序中使用

sort=div(popularity,price) desc, score desc

在結果中使用

&fl=sum(x, y),id,a,b,c,score&wt=xml

在加權參數 bf、boost中使用來計算權重

q=dismax&bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"

在設置評分計算函數的特殊關鍵字 _val_ 中使用

q=_val_:mynumericfield    _val_:"recip(rord(myfield),1,2,3)"

6.3 Function Query 函數查詢說明

函數查詢：指我們在查詢參數q、fq中使用了函數來改變相關性得分或過濾的一類特殊查詢。函數對所有匹配的文檔分別進行計算得到一個值作為一個加分值，加入到文檔的相關性得分中。

改變評分：

方式一：整個查詢就是一個函數表達式，匹配所有文檔，文檔的得分就是函數值

q=*:*
q={!func}div(popularity,price)&debug=all

說明：{!func} 說明q參數需要用func查詢解析器來解析，func:Function Query Parser

方式二：值掛接，加入一個評分項，文檔的得分=其他關鍵字得分 + 函數值

q=ipod AND _val_:"div(popularity,price)"&debug=all

方式三：查詢解析器掛接（顯示嵌套查詢）

q=ipod AND _query_:"{!func}div(popularity,price)"&debug=all

方式四：查詢解析器掛接（隱式嵌套查詢）

q=ipod AND {!func  v ="div(popularity,price)"}&debug=all

6.4 通過函數來過濾文檔

如果需要對搜索結果進行過濾，只留下函數計算產生特定值的文檔，可以選擇函數區間解析器（Function Range query parser，簡稱frange）。在q/fq參數中應用frange 執行一個特定的函數查詢，然后過濾掉函數值落在最低值和最高值范圍之外的文檔。

q={!frange l=0.01 u=0.1}div(popularity,price)&debug=all
q=ipod&fq={!frange l=0.05 u=0.1}div(popularity,price)&debug=all

第一個查詢說明：相除的結果在0.01到0.1之間的

7. 查詢中使用本地參數

7.1 什么是本地參數？

作為查詢參數值的前綴，用來為查詢參數添加元數據說明用的參數。看下面的查詢：

q=solr rocks

如需要為這個查詢說明是進行 AND 組合及默認查詢字段是title:

q={!q.op=AND df=title}solr rocks

7.2 本地參數語法說明用的參數。看下面的查詢：

作為查詢參數值的前綴，用 {!key=value key=value} 包裹的多個key=value

7.3 本地參數用法示例

Query Type 的簡寫形式，type指定查詢解析器

q={!dismax qf=myfield}solr rocks
q={!type=dismax qf=myfield}solr rocks

通過v 關鍵字指定參數值

q={!dismax qf=myfield}solr rocks
q={!type=dismax qf=myfield v='solr rocks'}

參數引用

q={!dismax qf=myfield}solr rocks
q={!type=dismax qf=myfield v=$qq}&qq=solr rocks

8. 其他查詢解析器見官方文檔

https://lucene.apache.org/solr/guide/7_3/other-parsers.html

其他查詢解析器，讓我們可以在查詢中靈活根據需要以本地參數的方式選用

三、總結

如何來寫一個查詢？  掌握語法 q
如何指定查詢字段？  Field:    df（標准查詢解析器）    qf (dismax查詢解析器)
如何添加過濾條件？Fq   , {!frange}
如何指定返回字段？ fl
如何指定排序？ sort
如何為某個詞項、短語加權？詞項、短語^5
如何為字段加權？ qf=title^10   pf   pf2  pf3
如何用字段值來進行加權，如流行度、銷量？ _val_   _query_  函數查詢
如何查看某個查詢的調試信息？ debug

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Solr搜索解析及查詢解析器用法概述 Antlr4 語法解析器(下) elasticsearch系列五：搜索詳解（查詢建議介紹、Suggester 介紹）簡單介紹下各種 JavaScript 解析器使用 java 實現一個簡單的 markdown 語法解析器 SpringMVC介紹之視圖解析器ViewResolver .NET：命令行解析器介紹 SQL解析器 Apache Solr查詢語法（轉） Python HTML解析器BeautifulSoup(爬蟲解析器)