Faiss學習：一

在多個GPU上運行Faiss以及性能測試

一、Faiss的基本使用

1.1在CPU上運行

Faiss的所有算法都是圍繞index展開的。不管運行搜索還是聚類，首先都要建立一個index。

import faiss # make faiss available index = faiss.IndexFlatL2(d) # build the index # d is the dimension of data

在運行上述代碼后，就可以添加數據並運行搜索了。

index.add(xb)
# xb is the base data D, I = index.search(xq, k) # xq is the query data # k is the num of neigbors you want to search # D is the distance matrix between xq and k neigbors # I is the index matrix of k neigbors

1.2在單個GPU上運行

在單個GPU上運行的語法基本與在GPU上運行類似。但是需要申明一個GPU資源的標識.

res = faiss.StandardGpuResources()
# we need only a StandardGpuResources per GPU flat_config = 0 # flat_config is an ID. if you have 3 GPUs, flat_configs maybe 0, 1, 2 index = faiss.GpuIndexFlatL2(res, d, flat_config) # build the index index.add(xb) D, I = index.search(xq, k)

1.3在多個GPU上運行

在多個GPU上運行時便有所不同，我們需要將數據集分割給多個GPU以完成並行搜索。

在Faiss中提供了兩種方法實現：IndexProxy和IndexShards。

下面着重介紹IndexProxy。

res = [faiss.StandardGpuResources() for i in range(ngpu)] # first we get StandardGpuResources of each GPU # ngpu is the num of GPUs indexes = [faiss.GpuIndexFlatL2(res[i], i, d, useFloat16) for i in range(ngpu)] # then we make an Index array # useFloat16 is a boolean value index = faiss.IndexProxy() for sub_index in indexes: index.addIndex(sub_index) # build the index by IndexProxy