milvus安裝使用教程


python使用Milvus

版本:(Milvus == 0.10.2,pymilvus == 0.2.14)

拉取 Milvus 鏡像

(Milvus 需要在docker上安裝,虛擬機最好是ubuntu18.04,docker安裝請自行查看菜鳥教程,以下默認已安裝docker)

Milvus官網教程

拉取 CPU 版本的 Milvus 鏡像:

$ sudo docker pull milvusdb/milvus:0.10.2-cpu-d081520-8a2393
  • 如果你的主機由於網絡限制無法在線獲得 Docker 鏡像和配置文件,請從其他主機在線獲取鏡像,保存為 TAR 文件傳輸回本地,傳輸完成后重新加載為 Docker 鏡像:點擊查看離線傳輸相關代碼示例。
  • 如果拉取鏡像的速度過慢或一直失敗,請參考 部署運維問題 中提供的解決辦法。

下載配置文件

$ mkdir -p /home/$USER/milvus/conf
$ cd /home/$USER/milvus/conf
$ wget https://raw.githubusercontent.com/milvus-io/milvus/0.10.2/core/conf/demo/server_config.yaml

如果無法通過 wget 命令下載配置文件,你也可以在 /home/$USER/milvus/conf 目錄下創建 server_config.yaml 文件,然后將 server config 文件 的內容復制到你創建的配置文件中。

啟動 Milvus Docker 容器

啟動 Docker 容器,將本地的文件路徑映射到容器中:

$ sudo docker run -d --name milvus_cpu_0.10.2 \
-p 19530:19530 \
-p 19121:19121 \
-v /home/$USER/milvus/db:/var/lib/milvus/db \
-v /home/$USER/milvus/conf:/var/lib/milvus/conf \
-v /home/$USER/milvus/logs:/var/lib/milvus/logs \
-v /home/$USER/milvus/wal:/var/lib/milvus/wal \
milvusdb/milvus:0.10.2-cpu-d081520-8a2393

上述命令中用到的參數定義如下:

  • -d: 在后台運行容器。
  • --name: 為容器指定一個名字。
  • -p: 指定端口映射。
  • -v: 將宿主機路徑掛載至容器。

確認 Milvus 運行狀態:

$ sudo docker ps 
沒有則說明啟動失敗
$ sudo docker ps -a查看所有容器,調整完后sudo docker restart 容器名

如果 Milvus 服務沒有正常啟動,執行以下命令查詢錯誤日志:

$ sudo docker logs milvus_cpu_0.10.2

win10下鏈接虛擬機中的docker

虛擬機網絡設置為橋接網卡(若需要聯網使用則切換為網絡地址轉換NAT)

查看虛擬機ip

ifconfig

若失敗則需安裝網絡工具包

sodu apt install net-tools

若此時docker已正常啟動milvus容器,此時應有4個設備號(其中看起來像亂碼的是milvus)

復制enp0s3的ip

// 重啟網絡
sudo /etc/init.d/networking restart
// 關閉防火牆
ufw disable

驗證連接

安裝pymilvus==0.2.14

pip install pymilvus==0.2.14

測試連接代碼,按查找到的ip與啟動容器時填寫的port進行連接

簡單測試

from milvus import Milvus, DataType
client = Milvus(host='localhost', port='19530')

官方測試代碼

// example.py

import random
from pprint import pprint

from milvus import Milvus, DataType

# ------
# Setup:
#    First of all, you need a running Milvus(0.11.x). By default, Milvus runs on localhost in port 19530.
#    Then, you can use pymilvus(0.3.x) to connect to the server, You can change the _HOST and _PORT accordingly.
# ------
_HOST = '127.0.0.1'
_PORT = '19530'
client = Milvus(_HOST, _PORT)

# ------
# Basic create collection:
#     You already have a Milvus instance running, and pymilvus connecting to Milvus.
#     The first thing we will do is to create a collection `demo_films`. In case we've already had a collection
#     named `demo_films`, we drop it before we create.
# ------
collection_name = 'demo_films'
if collection_name in client.list_collections():
    client.drop_collection(collection_name)

# ------
# Basic create collection:
#     For a specific field, you can provide extra infos by a dictionary with `key = "params"`. If the field
#     has a type of `FLOAT_VECTOR` and `BINARY_VECTOR`, "dim" must be provided in extra infos. Otherwise
#     you can provide customized infos like `{"unit": "minutes"}` for you own need.
#
#     In our case, the extra infos in "duration" field means the unit of "duration" field is "minutes".
#     And `auto_id` in the parameter is set to `False` so that we can provide our own unique ids.
#     For more information you can refer to the pymilvus
#     documentation (https://pymilvus.readthedocs.io/en/latest/).
# ------
collection_param = {
    "fields": [
        #  Milvus doesn't support string type now, but we are considering supporting it soon.
        #  {"name": "title", "type": DataType.STRING},
        {"name": "duration", "type": DataType.INT32, "params": {"unit": "minute"}},
        {"name": "release_year", "type": DataType.INT32},
        {"name": "embedding", "type": DataType.FLOAT_VECTOR, "params": {"dim": 8}},
    ],
    "segment_row_limit": 4096,
    "auto_id": False
}

# ------
# Basic create collection:
#     After create collection `demo_films`, we create a partition tagged "American", it means the films we
#     will be inserted are from American.
# ------
client.create_collection(collection_name, collection_param)
client.create_partition(collection_name, "American")

# ------
# Basic create collection:
#     You can check the collection info and partitions we've created by `get_collection_info` and
#     `list_partitions`
# ------
print("--------get collection info--------")
collection = client.get_collection_info(collection_name)
pprint(collection)
partitions = client.list_partitions(collection_name)
print("\n----------list partitions----------")
pprint(partitions)

# ------
# Basic insert entities:
#     We have three films of The_Lord_of_the_Rings series here with their id, duration release_year
#     and fake embeddings to be inserted. They are listed below to give you a overview of the structure.
# ------
The_Lord_of_the_Rings = [
    {
        "title": "The_Fellowship_of_the_Ring",
        "id": 1,
        "duration": 208,
        "release_year": 2001,
        "embedding": [random.random() for _ in range(8)]
    },
    {
        "title": "The_Two_Towers",
        "id": 2,
        "duration": 226,
        "release_year": 2002,
        "embedding": [random.random() for _ in range(8)]
    },
    {
        "title": "The_Return_of_the_King",
        "id": 3,
        "duration": 252,
        "release_year": 2003,
        "embedding": [random.random() for _ in range(8)]
    }
]

# ------
# Basic insert entities:
#     To insert these films into Milvus, we have to group values from the same field together like below.
#     Then these grouped data are used to create `hybrid_entities`.
# ------
ids = [k.get("id") for k in The_Lord_of_the_Rings]
durations = [k.get("duration") for k in The_Lord_of_the_Rings]
release_years = [k.get("release_year") for k in The_Lord_of_the_Rings]
embeddings = [k.get("embedding") for k in The_Lord_of_the_Rings]

hybrid_entities = [
    # Milvus doesn't support string type yet, so we cannot insert "title".
    {"name": "duration", "values": durations, "type": DataType.INT32},
    {"name": "release_year", "values": release_years, "type": DataType.INT32},
    {"name": "embedding", "values": embeddings, "type": DataType.FLOAT_VECTOR},
]

# ------
# Basic insert entities:
#     We insert the `hybrid_entities` into our collection, into partition `American`, with ids we provide.
#     If succeed, ids we provide will be returned.
# ------
ids = client.insert(collection_name, hybrid_entities, ids, partition_tag="American")
print("\n----------insert----------")
print("Films are inserted and the ids are: {}".format(ids))


# ------
# Basic insert entities:
#     After insert entities into collection, we need to flush collection to make sure its on disk,
#     so that we are able to retrieve it.
# ------
before_flush_counts = client.count_entities(collection_name)
client.flush([collection_name])
after_flush_counts = client.count_entities(collection_name)
print("\n----------flush----------")
print("There are {} films in collection `{}` before flush".format(before_flush_counts, collection_name))
print("There are {} films in collection `{}` after flush".format(after_flush_counts, collection_name))

# ------
# Basic insert entities:
#     We can get the detail of collection statistics info by `get_collection_stats`
# ------
info = client.get_collection_stats(collection_name)
print("\n----------get collection stats----------")
pprint(info)

# ------
# Basic search entities:
#     Now that we have 3 films inserted into our collection, it's time to obtain them.
#     We can get films by ids, if milvus can't find entity for a given id, `None` will be returned.
#     In the case we provide below, we will only get 1 film with id=1 and the other is `None`
# ------
films = client.get_entity_by_id(collection_name, ids=[1, 200])
print("\n----------get entity by id = 1, id = 200----------")
for film in films:
    if film is not None:
        print(" > id: {},\n > duration: {}m,\n > release_years: {},\n > embedding: {}"
              .format(film.id, film.duration, film.release_year, film.embedding))

# ------
# Basic hybrid search entities:
#      Getting films by id is not enough, we are going to get films based on vector similarities.
#      Let's say we have a film with its `embedding` and we want to find `top3` films that are most similar
#      with it by L2 distance.
#      Other than vector similarities, we also want to obtain films that:
#        `released year` term in 2002 or 2003,
#        `duration` larger than 250 minutes.
#
#      Milvus provides Query DSL(Domain Specific Language) to support structured data filtering in queries.
#      For now milvus supports TermQuery and RangeQuery, they are structured as below.
#      For more information about the meaning and other options about "must" and "bool",
#      please refer to DSL chapter of our pymilvus documentation
#      (https://pymilvus.readthedocs.io/en/latest/).
# ------
query_embedding = [random.random() for _ in range(8)]
query_hybrid = {
    "bool": {
        "must": [
            {
                "term": {"release_year": [2002, 2003]}
            },
            {
                # "GT" for greater than
                "range": {"duration": {"GT": 250}}
            },
            {
                "vector": {
                    "embedding": {"topk": 3, "query": [query_embedding], "metric_type": "L2"}
                }
            }
        ]
    }
}

# ------
# Basic hybrid search entities:
#     And we want to get all the fields back in results, so fields = ["duration", "release_year", "embedding"].
#     If searching successfully, results will be returned.
#     `results` have `nq`(number of queries) separate results, since we only query for 1 film, The length of
#     `results` is 1.
#     We ask for top 3 in-return, but our condition is too strict while the database is too small, so we can
#     only get 1 film, which means length of `entities` in below is also 1.
#
#     Now we've gotten the results, and known it's a 1 x 1 structure, how can we get ids, distances and fields?
#     It's very simple, for every `topk_film`, it has three properties: `id, distance and entity`.
#     All fields are stored in `entity`, so you can finally obtain these data as below:
#     And the result should be film with id = 3.
# ------
results = client.search(collection_name, query_hybrid, fields=["duration", "release_year", "embedding"])
print("\n----------search----------")
for entities in results:
    for topk_film in entities:
        current_entity = topk_film.entity
        print("- id: {}".format(topk_film.id))
        print("- distance: {}".format(topk_film.distance))

        print("- release_year: {}".format(current_entity.release_year))
        print("- duration: {}".format(current_entity.duration))
        print("- embedding: {}".format(current_entity.embedding))

# ------
# Basic delete:
#     Now let's see how to delete things in Milvus.
#     You can simply delete entities by their ids.
# ------
client.delete_entity_by_id(collection_name, ids=[1, 2])
client.flush()  # flush is important
result = client.get_entity_by_id(collection_name, ids=[1, 2])

counts_delete = sum([1 for entity in result if entity is not None])
counts_in_collection = client.count_entities(collection_name)
print("\n----------delete id = 1, id = 2----------")
print("Get {} entities by id 1, 2".format(counts_delete))
print("There are {} entities after delete films with 1, 2".format(counts_in_collection))

# ------
# Basic delete:
#     You can drop partitions we create, and drop the collection we create.
# ------
client.drop_partition(collection_name, partition_tag='American')
if collection_name in client.list_collections():
    client.drop_collection(collection_name)

# ------
# Summary:
#     Now we've went through all basic communications pymilvus can do with Milvus server, hope it's helpful!
# ------

參考鏈接

https://www.runoob.com/docker/ubuntu-docker-install.html

https://milvus.io/cn/docs/v0.10.2/overview.md

https://blog.csdn.net/qq632683582/article/details/107446738

https://blog.csdn.net/weixin_40816738/article/details/90605327

Milvus簡單使用教程

milvus admin

安裝

docker pull milvusdb/milvus-em:latest

docker run -d -p 3000:80 milvusdb/milvus-em:latest

運行

打開瀏覽器,輸入URL: http://localhost:3000/

pymilvus

參數:

topk 表示與目標向量最相似的 k 條向量,在搜索時定義。top_k 的取值范圍是 (0, 2048]

nprobe:查詢時所涉及的向量類的個數。nprobe 影響查詢精度。數值越大,精度越高,速度越慢。

metric_type向量相似度度量標准, MetricType.IP是向量內積; MetricType.L2是歐式距離

網上的參考代碼

# -*- coding: utf-8 -*-
 
#導入相應的包
import numpy as np
from milvus import Milvus, IndexType, MetricType
 
# 初始化一個Milvus類,以后所有的操作都是通過milvus來的
milvus = Milvus()
 
# 連接到服務器,注意端口映射,要和啟動docker時設置的端口一致
milvus.connect(host='localhost', port='19530')
 
# 向量個數
num_vec = 5000
# 向量維度
vec_dim = 768
 
# 創建表
# 參數含義
# table_name: 表名
# dimension: 向量維度
# metric_type: 向量相似度度量標准, MetricType.IP是向量內積; MetricType.L2是歐式距離
table_param = {'table_name': 'mytable', 'dimension':vec_dim, 'index_file_size':1024, 'metric_type':MetricType.IP}
milvus.create_table(table_param)
 
# 隨機生成一批向量數據
vectors_array = np.random.rand(num_vec,vec_dim)
vectors_list = vectors_array.tolist()
 
# 官方建議在插入向量之前,建議先使用 milvus.create_index 以便系統自動增量創建索引
# 索引類型有:FLAT / IVFLAT / IVF_SQ8 / IVF_SQ8H,其中FLAT是精確索引,速度慢,但是有100%的召回率
index_param = {'index_type': IndexType.FLAT, 'nlist': 128}
milvus.create_index('mytable', index_param)
 
# 把向量添加到剛才建立的表格中
# ids可以為None,使用自動生成的id
status, ids = milvus.add_vectors(table_name="mytable",records=vectors_list,ids=None) # 返回這一組向量的ID
 
# 官方建議 向量插入結束后,相同的索引需要手動再創建一次
milvus.create_index('mytable', index_param)
 
# 輸出一些統計信息
status, tables = milvus.show_tables()
print("所有的表格:",tables)
print("表格的數據量(行):{}".format((milvus.count_table('mytable')[1])))
print("mytable表格是否存在:",milvus.has_table("mytable")[1])
 
# 加載表格到內存
milvus.preload_table('mytable')
 
# 創建查詢向量
query_vec_array = np.random.rand(1,vec_dim)
query_vec_list = query_vec_array.tolist()
# 進行查詢, 注意這里的參數nprobe和建立索引時的參數nlist 會因為索引類型不同而影響到查詢性能和查詢准確率
# 對於 FLAT類型索引,兩個參數對結果和速度沒有影響
status, results = milvus.search(table_name='mytable', query_records=query_vec_list, top_k=4, nprobe=16)
print(status)
print(results)
 
 
 
# 刪除表格和索引, 不刪除的話,下一次還可以繼續使用
milvus.drop_index(table_name="mytable")
milvus.delete_table(table_name="mytable")
 
# 斷開連接
milvus.disconnect()


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM