背景說明
我們經常要導入大批量的數據進redis以供查詢。這里舉例要導入2億個手機號碼到Redis服務器。數據類型為Set。
比如177字頭的手機號碼,理論上是1億個號碼,即眾17700000000至17799999999。我們這里再將1億個號碼平均分成5等份,即每份2000萬個號碼。key和它相對應的成員分配如下
key:177:1 成員從1770000000至17719999999共2000萬個號碼
key:177:2 成員從1772000000至17739999999共2000萬個號碼
key:177:3 成員從1774000000至17759999999共2000萬個號碼
key:177:4 成員從1776000000至17779999999共2000萬個號碼
key:177:5 成員從1778000000至17799999999共2000萬個號碼
資源准備
准備一台可用內存足夠的Linux主機。安裝好Redis服務器並正常開啟,IP是192.168.7.214
執行步驟
1.准備好指令文件
分別生成177_1.txt、177_2.txt、177_3.txt、177_4.txt、177_5.txt。這里列出177_1.txt文件的部分內容,其他的類似
sadd 177:1 17700000000
sadd 177:1 17700000001
sadd 177:1 17700000002
sadd 177:1 17700000003
sadd 177:1 17700000004
sadd 177:1 17700000005
....
sadd 177:1 17719999999
熟悉redis命令的朋友,應該很清楚。上述命令的作用,及往177:1這個key的Set集合添加成員。假設上述文件保存在/home/c7user目錄下
2.編寫sh腳本
排除網絡帶寬方面的消耗,在Redis所在的主機上編輯sh文件,同時為監控光導入177_1.txt這2000萬的耗時時間,編輯成一個import.sh文件,內容如下:
echo $(date)
cat /home/c7user/177_1.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
echo $(date)
上述粗體字體的是核心命令。
將import.sh文件賦予權限。命令:chmod 755 import.sh
若要導多份文件,則在對應的import.sh文件中追加即可,如:
cat /home/c7user/177_1.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/177_2.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/177_3.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/177_4.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/177_5.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/189_1.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/189_2.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/189_3.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/189_4.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
cat /home/c7user/189_5.txt | ./redis-cli -h 192.168.7.214 -a xxxxx --pipe
3.執行觀察
最后就是執行import.sh文件了,如下圖所示,我們可以觀察到導入2000萬的號碼至一個key所耗的時間為92秒,速度飛快。而如果采用java,c等上層語言的API去導入,哪怕是用了管道方式導入都估計沒這么快。
參考資料
https://redis.io/topics/mass-insert#redis-mass-insertion
Redis Mass Insertion
Sometimes Redis instances need to be loaded with a big amount of preexisting or user generated data in a short amount of time, so that millions of keys will be created as fast as possible.
This is called a mass insertion, and the goal of this document is to provide information about how to feed Redis with data as fast as possible.
Use the protocol, Luke
Using a normal Redis client to perform mass insertion is not a good idea for a few reasons: the naive approach of sending one command after the other is slow because you have to pay for the round trip time for every command. It is possible to use pipelining, but for mass insertion of many records you need to write new commands while you read replies at the same time to make sure you are inserting as fast as possible.
Only a small percentage of clients support non-blocking I/O, and not all the clients are able to parse the replies in an efficient way in order to maximize throughput. For all this reasons the preferred way to mass import data into Redis is to generate a text file containing the Redis protocol, in raw format, in order to call the commands needed to insert the required data.