全文檢索elasticsearch

本文轉載自查看原文 2020-09-11 10:53 1019

全文檢索ElasticSearch

1 ElasticSearch介紹

1.1.介紹

1、ElasticSearch是一個基於Lucene的高擴展的分布式搜索服務器，支持開箱即用。

2、ElasticSearch隱藏了Lucene的復雜性，對外提供Restful 接口來操作索引、搜索。

1.2.原理與應用

邏輯結構部分是一個倒排索引表：

1、將要搜索的文檔內容分詞，所有不重復的詞組成分詞列表

2、將搜索的文檔最終以Document方式存儲起來

3、每個詞和docment都有關聯。

2 ElasticSearch安裝

2.1.windos的安裝

下載ES: https://www.elastic.co/downloads/past-releases

解壓：

bin：腳本目錄，包括：啟動、停止等可執行腳本

config：配置文件目錄

data：索引目錄，存放索引文件的地方（沒有的話自己創建）

logs：日志目錄（沒有的話自己創建）

modules：模塊目錄，包括了es的功能模塊

plugins :插件目錄，es支持插件機制

2.2.配置文件

config目錄下的elasticsearch.yml文件里配置參數

cluster.name: el
node.name: el_node_1
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300", "0.0.0.0:9301"]
discovery.zen.minimum_master_nodes: 1
node.ingest: true
bootstrap.memory_lock: false
node.max_local_storage_nodes: 2

path.data: D:\IDEA\ElasticSearch\elasticsearch-1\data （換為你自己的路徑）
path.logs: D:\IDEA\ElasticSearch\elasticsearch-1\logs （換為你自己的路徑）

http.cors.enabled: true
http.cors.allow-origin: /.*/

2.3.啟動

進入bin目錄，在cmd下運行：elasticsearch.bat

瀏覽器輸入：http://localhost:9200

3 ES快速入門

3.1.創建索引庫

ES的索引庫是一個邏輯概念，它包括了分詞列表及文檔列表，同一個索引庫中存儲了相同類型的文檔。它就相當於MySQL中的表，或相當於Mongodb中的集合。

索引（名詞）：ES是基於Lucene構建的一個搜索服務，它要從索引庫搜索符合條件索引數據。

索引（動詞）：索引庫剛創建起來是空的，將數據添加到索引庫的過程稱為索引。

1）使用postman發送：put http://localhost:9200/索引庫名稱

{ "settings":
  {
    "index":{
    "number_of_shards":1,   #設置分片的數量
    "number_of_replicas":0   #設置副本的數量
        } 
   } 
 }

3.2.創建映射

3.2.1.概念說明：

與關系數據庫的對比

文檔（Document）----------------Row記錄

字段（Field）-------------------Columns 列

3.2.2.創建映射

發送：post http://localhost:9200/索引庫名稱/類型名稱/_mapping

類型名稱：可自定義，在ES6.0中要弱化類型的概念，給它起一個沒有具體業務意義的名稱。

創建映射相當於關系數據庫中創建表結構

{ 
"properties":
  { 
        "name":
        {
        "type": "text" 
        },
        "description":
        { 
        "type": "text"
        },
        "studymodel":
        { 
         "type": "keyword" 
        }
   }
}

3.3.創建文檔

ES中的文檔相當於MySQL數據庫表中的記錄(數據)。

發送：put 或Post http://localhost:9200/xc_course/doc/id

id: 自定義值，如果不指定則會自動生成id

{
  "name":"Bootstrap開發框架",
  "description":"Bootstrap",
  "studymodel":"201001"
}

3.4.搜索文檔

1、根據課程id查詢文檔

發送：get http://localhost:9200/xc_course/doc/1

2、查詢所有記錄

發送 get http://localhost:9200/xc_course/doc/_search

3、查詢名稱中包括spring 關鍵字的的記錄

發送：get http://localhost:9200/xc_course/doc/_search?q=name:bootstrap

4、查詢學習模式為201001的記錄

發送 get http://localhost:9200/xc_course/doc/_search?q=studymodel:201001

4 IK分詞器

4.1.安裝IK分詞器

下載IK分詞器：（Github地址：https://github.com/medcl/elasticsearch-analysis-ik）

解壓，並將解壓的文件拷貝到ES安裝目錄的plugins下的ik目錄下；

4.2.ik分詞器的兩種模式

1.ik_max_word ：最細粒度拆分；

2.ik_smart: 最粗粒度拆分；

4.3.自定義詞庫

iK分詞器自帶一個main.dic的文件，此文件為詞庫文件

5 映射

1、查詢所有索引的映射：

GET： http://localhost:9200/_mapping

2、更新映射

映射創建成功可以添加新字段，已有字段不允許更新。

3、刪除映射

通過刪除索引來刪除映射。

5.1.常用映射類型

1、text

1）analyzer

通過analyzer屬性指定分詞器。

下邊指定name的字段類型為text，使用ik分詞器的ik_max_word分詞模式。

"name": { 
	"type": "text",
	"analyzer":"ik_max_word" 
}

analyzer是指在索引和搜索都使用ik_max_word，如果單獨想定義搜索時使用的分詞器則可以通search_analyzer屬性。

ik分詞器建議是索引時使用ik_max_word將搜索內容進行細粒度分詞，搜索時使用ik_smart提高搜索精確性:

"name": { 
    "type": "text",
    "analyzer":"ik_max_word", 
    "search_analyzer":"ik_smart"
}

2）index

通過index屬性指定是否索引。

默認為index=true，即要進行索引，只有進行索引才可以從索引庫搜索到，若不需要搜索的則需要指定index=false;

keyword關鍵字字段

keyword字段為關鍵字字段，通常搜索keyword是按照整體搜索，所以創建keyword字段的索引時是不進行分詞的,是需要全部匹配；

6 索引管理

6.1.搭建工程

添加依賴

<dependency> 
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch‐rest‐high‐level‐client</artifactId> 
    <version>6.2.1</version> 
</dependency>	
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId> 
    <version>6.2.1</version>
</dependency>

配置文件application.yml

server:
  port: ${port:40100}
spring:
  application:
    name: el-search-service
el:
  elasticsearch:
    hostlist: ${eshostlist:127.0.0.1:9200} #多個結點中間用逗號分隔

配置類：

@Configuration
public class ElasticsearchConfig {

    @Value("${xuecheng.elasticsearch.hostlist}")
    private String hostlist;

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        //解析hostlist配置信息
        String[] split = hostlist.split(",");
        //創建HttpHost數組，其中存放es主機和端口的配置信息
        HttpHost[] httpHostArray = new HttpHost[split.length];
        for(int i=0;i<split.length;i++){
            String item = split[i];
            httpHostArray[i] = new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http");
        }
        //創建RestHighLevelClient客戶端
        return new RestHighLevelClient(RestClient.builder(httpHostArray));
    }

    //項目主要使用RestHighLevelClient，對於低級的客戶端暫時不用
    @Bean
    public RestClient restClient(){
        //解析hostlist配置信息
        String[] split = hostlist.split(",");
        //創建HttpHost數組，其中存放es主機和端口的配置信息
        HttpHost[] httpHostArray = new HttpHost[split.length];
        for(int i=0;i<split.length;i++){
            String item = split[i];
            httpHostArray[i] = new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http");
        }
        return RestClient.builder(httpHostArray).build();
    }

}

6.2.使用Java客戶端

6.2.1.創建索引庫

public class TestIndex {
    @Autowired
    RestHighLevelClient client;

    @Autowired
    RestClient restClient;
     /**
     * 創建索引庫
     */
    @Test
    public void testCreateIndexx() throws IOException {
        //創建索引對象
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("el_01");
        //設置參數
        createIndexRequest.settings(Settings.builder().put("number_of_shards","1").put("number_of_replicas","0"));
        //指定映射
        createIndexRequest.mapping("doc","{\n" +
                "\t\"properties\": {\n" +
                "\t   \"name\": {\n" +
                "\t   \t\"type\": \"text\",\n" +
                "\t    \"analyzer\": \"ik_max_word\",\n" +
                "\t    \"search_analyzer\": \"ik_smart\"\n" +
                "\t   },\n" +
                "\t   \"description\": {\n" +
                "\t   \t\"type\": \"text\",\n" +
                "\t    \"analyzer\": \"ik_max_word\",\n" +
                "\t    \"search_analyzer\": \"ik_smart\"\n" +
                "\t   },\n" +
                "\t   \"pic\": {\n" +
                "\t   \t\"type\": \"text\",\n" +
                "\t   \t\"index\": false\n" +
                "\t   },\n" +
                "\t   \"studymodel\": {\n" +
                "\t   \t\"type\": \"text\"\n" +
                "\t   }\n" +
                "\t}\n" +
                "}", XContentType.JSON);
       //操作索引的客戶端
        IndicesClient indices = client.indices();
        //創建索引
        CreateIndexResponse createIndexResponse = indices.create(createIndexRequest);
        //得到響應
        boolean acknowledged = createIndexResponse.isAcknowledged();
        System.out.println(acknowledged);
    }
    
    /**
     * 添加文檔
     */
    @Test
    public void testAddDoc() throws IOException {
        // 准備json串
        Map<String, Object> map = new HashMap<>();
        map.put("name","spring cloud實戰課程");
        map.put("description","BootStrap是放松放松放松，李智是黑馬程序員，此開發框架在實際中大量使用");
        map.put("studymodel","20200");
        map.put("pic","/img/");
        //創建添加索引的請求對象
        IndexRequest indexRequest = new IndexRequest("xc_course", "doc");
        //文檔內容
        indexRequest.source(map);
        //通過client進行http請求
        IndexResponse index = client.index(indexRequest);
        DocWriteResponse.Result result = index.getResult();
        System.out.println(result);
    }
     /**
     * 查詢文檔
     */
    @Test
    public void testGetDoc() throws IOException {
        //查詢的請求對象
        GetRequest getRequest = new GetRequest("xc_course", "doc", "iCzcUnQB0UogVLzdiAD9");
        GetResponse documentFields = client.get(getRequest);
        //得到文檔內容
        Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
        System.out.println(sourceAsMap);
    }

    /**
     * 修改文檔
     */
    @Test
    public void testUpdateDoc() throws IOException {
        //更新的請求對象
        UpdateRequest updateRequest = new UpdateRequest("xc_course", "doc", "iCzcUnQB0UogVLzdiAD9");
        Map<String, Object> map = new HashMap<>();
        map.put("name","spring");
        updateRequest.doc(map);
        UpdateResponse update = client.update(updateRequest);
        //得到文檔內容
        RestStatus status = update.status();
        System.out.println(status);
    }
    /**
 * 刪除索引庫
 */
    @Test
    public void testDeleteIndex() throws IOException {
        //刪除索引請求對象
        DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("xc_course");
        //操作所有的客戶端
        IndicesClient indices = client.indices();
        //刪除索引
        DeleteIndexResponse delete = indices.delete(deleteIndexRequest);
        //得到響應
        boolean acknowledged = delete.isAcknowledged();
        System.out.println(acknowledged);
    }
   }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch全文檢索學習 ElasticSearch 全文檢索— ElasticSearch 基本操作 Flask使用elasticsearch實現全文檢索 Elasticsearch全文檢索工具入門 PostgreSQL 全文檢索 Lucene全文檢索（一） MySQL全文檢索全文檢索全文檢索功能 Elasticsearch使用REST API實現全文檢索