Solr4.8.0源碼分析(5)之查詢流程分析總述


Solr4.8.0源碼分析(5)之查詢流程分析總述

前面已經寫到,solr查詢是通過http發送命令,solr servlet接受並進行處理。所以solr的查詢流程從SolrDispatchsFilter的dofilter開始。dofilter包含了對http的各個請求的操作。Solr的查詢方式有很多,比如q,fq等,本章只關注select和q。頁面下發的查詢請求如下:http://localhost:8080/solr/test/select?q=code%3A%E8%BE%BD*+AND+last_modified%3A%5B0+TO+1408454600265%5D+AND+id%3Acheng&wt=json&indent=true

1   @Override
2   public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
3     doFilter(request, response, chain, false);
4   }

由於只關注select,實際的查詢是從如下代碼開始:this.execute()是查詢的入口函數。這里需要注意下writeResponse()函數。execute只是獲取了符合查詢條件的doc id,最后在writeResponse()中會根據doc id獲取stored屬性的字段信息,並寫入返回結果。

 1  // With a valid handler and a valid core...
 2           if( handler != null ) {
 3             // if not a /select, create the request
 4             if( solrReq == null ) {
 5               solrReq = parser.parse( core, path, req );
 6             }
 7 
 8             if (usingAliases) {
 9               processAliases(solrReq, aliases, collectionsList);
10             }
11             
12             final Method reqMethod = Method.getMethod(req.getMethod());
13             HttpCacheHeaderUtil.setCacheControlHeader(config, resp, reqMethod);
14             // unless we have been explicitly told not to, do cache validation
15             // if we fail cache validation, execute the query
16             if (config.getHttpCachingConfig().isNever304() ||
17                 !HttpCacheHeaderUtil.doCacheHeaderValidation(solrReq, req, reqMethod, resp)) {
18                 SolrQueryResponse solrRsp = new SolrQueryResponse();
19                 /* even for HEAD requests, we need to execute the handler to
20                  * ensure we don't get an error (and to make sure the correct
21                  * QueryResponseWriter is selected and we get the correct
22                  * Content-Type)
23                  */
24                 SolrRequestInfo.setRequestInfo(new SolrRequestInfo(solrReq, solrRsp));
25                 this.execute( req, handler, solrReq, solrRsp );
26                 HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod);
27               // add info to http headers
28               //TODO: See SOLR-232 and SOLR-267.  
29                 /*try {
30                   NamedList solrRspHeader = solrRsp.getResponseHeader();
31                  for (int i=0; i<solrRspHeader.size(); i++) {
32                    ((javax.servlet.http.HttpServletResponse) response).addHeader(("Solr-" + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i)));
33                  }
34                 } catch (ClassCastException cce) {
35                   log.log(Level.WARNING, "exception adding response header log information", cce);
36                 }*/
37                QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
38                writeResponse(solrRsp, response, responseWriter, solrReq, reqMethod);
39             }

進入excute后會進入SolrCore的excute(), preDecorateResponse 對結果的頭信息比如進行預處理,postDecorateResponse對將時間、返回結果寫入response中。handleRequest繼續進行查詢操作。

 1   public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) {
 2     if (handler==null) {
 3       String msg = "Null Request Handler '" +
 4         req.getParams().get(CommonParams.QT) + "'";
 5       
 6       if (log.isWarnEnabled()) log.warn(logid + msg + ":" + req);
 7       
 8       throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, msg);
 9     }
10 
11     preDecorateResponse(req, rsp);
12 
13     // TODO: this doesn't seem to be working correctly and causes problems with the example server and distrib (for example /spell)
14     // if (req.getParams().getBool(ShardParams.IS_SHARD,false) && !(handler instanceof SearchHandler))
15     //   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"isShard is only acceptable with search handlers");
16 
17 
18     handler.handleRequest(req,rsp);
19     postDecorateResponse(handler, req, rsp);
20 
21     if (log.isInfoEnabled() && rsp.getToLog().size() > 0) {
22       log.info(rsp.getToLogAsString(logid));
23     }
24   }

RequestHandlerBase.handleRequest(SolrQueryRequest req, SolrQueryResponse rsp)再次調用了SearchHandle.handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp),這是時候才真正開始加載QueryComponents。

以下語句會加載查詢有關的組件,包括QueryComponents,FacetComponents,MoreLikeThisComponent,HighlightComponent,StatsComponent,

DebugComponent,ExpandComponent。本文只關注查詢,所以進入的QueryComponent.java.

for( SearchComponent c : components ) {
    c.process(rb);
}    

暫且不提QueryComponent.java中的關於Query的處理(查詢的細節將在后面章節中說明,本章只作總述),QueryComponent.process

(ResponseBuilder rb) 會調用SolrindexSearch.search(QueryResult qr, QueryCommand cmd)進行查詢,並在后續代碼中對返回的結果進行處理,主要包括doFieldSortValues(rb, searcher);和doPrefetch(rb);

 1     // normal search result
 2     searcher.search(result,cmd);
 3     rb.setResult( result );
 4 
 5     ResultContext ctx = new ResultContext();
 6     ctx.docs = rb.getResults().docList;
 7     ctx.query = rb.getQuery();
 8     rsp.add("response", ctx);
 9     rsp.getToLog().add("hits", rb.getResults().docList.matches());
10 
11     if ( ! rb.req.getParams().getBool(ShardParams.IS_SHARD,false) ) {
12       if (null != rb.getNextCursorMark()) {
13         rb.rsp.add(CursorMarkParams.CURSOR_MARK_NEXT, 
14                    rb.getNextCursorMark().getSerializedTotem());
15       }
16     }
17     doFieldSortValues(rb, searcher);
18     doPrefetch(rb);

SolrindexSearch.search函數比較簡單,只是調用了SolrindexSearch.getDocListC.顧名思義,該函數返回了查詢結果的doc id 的list。這時候才是真正的查詢開始。查詢之前,Solr會從queryResultCache緩存里面讀取該條件的結果,queryResultCache里面存放了查詢條件和查詢結果的鍵值對。如果queryResultCache里面有這個查詢條件,那Solr就會直接返回查詢條件的值。如果沒有該查詢條件,則會進行正常查詢,並把查詢條件和查詢命令寫入queryResultCache的鍵值對里。queryResultCache具有容量大小,可以在solrconfig的緩存配置里進行配置。

 1     // we can try and look up the complete query in the cache.
 2     // we can't do that if filter!=null though (we don't want to
 3     // do hashCode() and equals() for a big DocSet).
 4     if (queryResultCache != null && cmd.getFilter()==null
 5         && (flags & (NO_CHECK_QCACHE|NO_SET_QCACHE)) != ((NO_CHECK_QCACHE|NO_SET_QCACHE)))
 6     {
 7         // all of the current flags can be reused during warming,
 8         // so set all of them on the cache key.
 9         key = new QueryResultKey(q, cmd.getFilterList(), cmd.getSort(), flags);
10         if ((flags & NO_CHECK_QCACHE)==0) {
11           superset = queryResultCache.get(key);
12 
13           if (superset != null) {
14             // check that the cache entry has scores recorded if we need them
15             if ((flags & GET_SCORES)==0 || superset.hasScores()) {
16               // NOTE: subset() returns null if the DocList has fewer docs than
17               // requested
18               out.docList = superset.subset(cmd.getOffset(),cmd.getLen());
19             }
20           }
21           if (out.docList != null) {
22             // found the docList in the cache... now check if we need the docset too.
23             // OPT: possible future optimization - if the doclist contains all the matches,
24             // use it to make the docset instead of rerunning the query.
25             if (out.docSet==null && ((flags & GET_DOCSET)!=0) ) {
26               if (cmd.getFilterList()==null) {
27                 out.docSet = getDocSet(cmd.getQuery());
28               } else {
29                 List<Query> newList = new ArrayList<>(cmd.getFilterList().size()+1);
30                 newList.add(cmd.getQuery());
31                 newList.addAll(cmd.getFilterList());
32                 out.docSet = getDocSet(newList);
33               }
34             }
35             return;
36           }
37         }
38 
39       // If we are going to generate the result, bump up to the
40       // next resultWindowSize for better caching.
41 
42       if ((flags & NO_SET_QCACHE) == 0) {
43         // handle 0 special case as well as avoid idiv in the common case.
44         if (maxDocRequested < queryResultWindowSize) {
45           supersetMaxDoc=queryResultWindowSize;
46         } else {
47           supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize;
48           if (supersetMaxDoc < 0) supersetMaxDoc=maxDocRequested;
49         }
50       } else {
51         key = null;  // we won't be caching the result
52       }
53     }

如果沒有復合的緩存,那么將進行正常的查詢。這里查詢會走排序和非排序的查詢分支(兩個分支的差別將在后續文章中寫道)。最后查詢會進入getDocListNC(qr,cmd)函數繼續進行查詢。superset.subset()會對查詢結果進行截斷,比如我查詢的結果start=20,row=40,那么Solr查詢實際的結果是start=0,row=60,也就是至少說會查(start+row)個結果,然后再獲取第20到第60的結果集。

if (useFilterCache) {
      // now actually use the filter cache.
      // for large filters that match few documents, this may be
      // slower than simply re-executing the query.
      if (out.docSet == null) {
        out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter());
        DocSet bigFilt = getDocSet(cmd.getFilterList());
        if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);
      }
      // todo: there could be a sortDocSet that could take a list of
      // the filters instead of anding them first...
      // perhaps there should be a multi-docset-iterator
      sortDocSet(qr, cmd);
    } else {
      // do it the normal way...
      if ((flags & GET_DOCSET)!=0) {
        // this currently conflates returning the docset for the base query vs
        // the base query and all filters.
        DocSet qDocSet = getDocListAndSetNC(qr,cmd);
        // cache the docSet matching the query w/o filtering
        if (qDocSet!=null && filterCache!=null && !qr.isPartialResults()) filterCache.put(cmd.getQuery(),qDocSet);
      } else {
        getDocListNC(qr,cmd);
      }
      assert null != out.docList : "docList is null";
    }

    if (null == cmd.getCursorMark()) {
      // Kludge...
      // we can't use DocSlice.subset, even though it should be an identity op
      // because it gets confused by situations where there are lots of matches, but
      // less docs in the slice then were requested, (due to the cursor)
      // so we have to short circuit the call.
      // None of which is really a problem since we can't use caching with
      // cursors anyway, but it still looks weird to have to special case this
      // behavior based on this condition - hence the long explanation.
      superset = out.docList;
      out.docList = superset.subset(cmd.getOffset(),cmd.getLen());
    } else {
      // sanity check our cursor assumptions
      assert null == superset : "cursor: superset isn't null";
      assert 0 == cmd.getOffset() : "cursor: command offset mismatch";
      assert 0 == out.docList.offset() : "cursor: docList offset mismatch";
      assert cmd.getLen() >= supersetMaxDoc : "cursor: superset len mismatch: " +
        cmd.getLen() + " vs " + supersetMaxDoc;
    }

SolrIndexSearch.getDocListNC(qr,cmd)里面定義了許多Collector的內部類,不過暫時與本章節無關,所以直接查看以下這段代碼。首先Solr會創建TopDocsCollector,它會存放所有復合查詢條件的結果集。如果查詢的時候設置了timeAllowed開關,那么查詢就會走TimeLimitingCollector分支。TimeLimitingCollector是Collector的子類,當timeAllowed設定一個數字時,比如200ms,如果Solr查詢一旦獲取到結果就會在200ms內返回,不管查詢的結果是否已經完整。可以看見最后查詢過程最后調用了Lucene IndexSearch.Search(),這層開始進入Lucene.最后Solr會對TopDocsCollector的結果總數以及優先級隊列進行處理。

 1 final TopDocsCollector topCollector = buildTopDocsCollector(len, cmd);
 2       Collector collector = topCollector;
 3       if (terminateEarly) {
 4         collector = new EarlyTerminatingCollector(collector, cmd.len);
 5       }
 6       if( timeAllowed > 0 ) {
 7         collector = new TimeLimitingCollector(collector, TimeLimitingCollector.getGlobalCounter(), timeAllowed);
 8       }
 9       if (pf.postFilter != null) {
10         pf.postFilter.setLastDelegate(collector);
11         collector = pf.postFilter;
12       }
13       try {
14         super.search(query, luceneFilter, collector);
15         if(collector instanceof DelegatingCollector) {
16           ((DelegatingCollector)collector).finish();
17         }
18       }
19       catch( TimeLimitingCollector.TimeExceededException x ) {
20         log.warn( "Query: " + query + "; " + x.getMessage() );
21         qr.setPartialResults(true);
22       }
23 
24       totalHits = topCollector.getTotalHits();
25       TopDocs topDocs = topCollector.topDocs(0, len);
26       populateNextCursorMarkFromTopDocs(qr, cmd, topDocs);
27 
28       maxScore = totalHits>0 ? topDocs.getMaxScore() : 0.0f;
29       nDocsReturned = topDocs.scoreDocs.length;
30       ids = new int[nDocsReturned];
31       scores = (cmd.getFlags()&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
32       for (int i=0; i<nDocsReturned; i++) {
33         ScoreDoc scoreDoc = topDocs.scoreDocs[i];
34         ids[i] = scoreDoc.doc;
35         if 

進入Lucene的IndexSearch.Search()后,Solr開始對所有Segment進行遍歷,AtomicReaderContext包含了Segment的所有信息,包括docbase,doc的個數。

遍歷完后,會調用Weight.bulkScore()對多個條件進行重組,比如多個OR的條件組成一個條件,多個AND的查詢條件再組成一個List。Weight.bulkScore()會對這個List按照查詢條件的詞頻進行排序。對條件處理好以后,就是會從segment里面獲取所有符合查詢條件的doc id(具體的獲取方法,在后續的文章里會詳細介紹),這就是scorer.score(collector);的作用了。

 1  /**
 2    * Lower-level search API.
 3    * 
 4    * <p>
 5    * {@link Collector#collect(int)} is called for every document. <br>
 6    * 
 7    * <p>
 8    * NOTE: this method executes the searches on all given leaves exclusively.
 9    * To search across all the searchers leaves use {@link #leafContexts}.
10    * 
11    * @param leaves 
12    *          the searchers leaves to execute the searches on
13    * @param weight
14    *          to match documents
15    * @param collector
16    *          to receive hits
17    * @throws BooleanQuery.TooManyClauses If a query would exceed 
18    *         {@link BooleanQuery#getMaxClauseCount()} clauses.
19    */
20   protected void search(List<AtomicReaderContext> leaves, Weight weight, Collector collector)
21       throws IOException {
22 
23     // TODO: should we make this
24     // threaded...?  the Collector could be sync'd?
25     // always use single thread:
26     for (AtomicReaderContext ctx : leaves) { // search each subreader
27       try {
28         collector.setNextReader(ctx);
29       } catch (CollectionTerminatedException e) {
30         // there is no doc of interest in this reader context
31         // continue with the following leaf
32         continue;
33       }
34       BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
35       if (scorer != null) {
36         try {
37           scorer.score(collector);
38         } catch (CollectionTerminatedException e) {
39           // collection was terminated prematurely
40           // continue with the following leaf
41         }
42       }
43     }
44   }

到這一步已經獲取到符合查詢條件的所有doc id了,但是我們的查詢結果是需要顯示多有的字段的,所以也就是說Solr后面還是會根據doc id再次取segment獲取所有字段信息,至於這是在哪里實現的,在后續文章中會詳細描述。

 

總結: Solr的查詢過程還是比較繞的,且有很多可以優化的地方。本文主要簡述了Solr查詢的流程,對查詢過程中的細節將在后續的文章里面具體闡述。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM