ELK學習總結(4-2)關於導入數據


用REST API的_bulk來批量插入,可以達到5到10w條每秒

 

把數據寫進json文件,然后再通過批處理,執行文件插入數據:

 

1、先定義一定格式的json文件,文件不能過大,過大會報錯

 

2、后用curl命令去執行Elasticsearch的_bulk來批量插入

建議生成10M一個文件,然后分別去執行這些小文件就可以了!

 

json數據文件內容的定義

{ "index" :{ "_index" : "meterdata" , "_type" : "autoData" }}
{ "Mfid " :1, "TData" :172170, "TMoney" :209, "HTime" : "2016-05-17T08:03:00" }
{ "index" :{ "_index" : "meterdata" , "_type" : "autoData" }}
{ "Mfid " :1, "TData" :172170, "TMoney" :209, "HTime" : "2016-05-17T08:04:00" }
{ "index" :{ "_index" : "meterdata" , "_type" : "autoData" }}
{ "Mfid " :1, "TData" :172170, "TMoney" :209, "HTime" : "2016-05-17T08:05:00" }
{ "index" :{ "_index" : "meterdata" , "_type" : "autoData" }}
{ "Mfid " :1, "TData" :172170, "TMoney" :209, "HTime" : "2016-05-17T08:06:00" }
{ "index" :{ "_index" : "meterdata" , "_type" : "autoData" }}
{ "Mfid " :1, "TData" :172170, "TMoney" :209, "HTime" : "2016-05-17T08:07:00" }
 
批處理內容的定義
cd E:\curl-7.50.3-win64-mingw\bin
curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\437714060.json
curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\743719428.json
curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\281679894.json
curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\146257480.json
curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\892018760.json
pause
 

工具代碼

private void button1_Click(object sender, EventArgs e)
{
//Application.StartupPath + "\\" + NextFile.Name
Task.Run(() => { CreateDataToFile(); });
}
public void CreateDataToFile()
{
StringBuilder sb = new StringBuilder();
StringBuilder sborder = new StringBuilder();
int flag = 1;
sborder.Append(@"cd E:\curl-7.50.3-win64-mingw\bin" + Environment.NewLine);
DateTime endDate = DateTime.Parse("2016-10-22");
for (int i = 1; i <= 10000; i++)//1w個點
{
DateTime startDate = DateTime.Parse("2016-10-22").AddYears(-1);
this.Invoke(new Action(() => { label1.Text = "生成第" + i + "個"; }));

while (startDate <= endDate)//每個點生成一年數據,每分鍾一條
{
if (flag > 100000)//大於10w分割一個文件
{
string filename = new Random(GetRandomSeed()).Next(900000000) + ".json";

FileStream fs3 = new FileStream(Application.StartupPath + "\\testdata\\" + filename, FileMode.OpenOrCreate);
StreamWriter sw = new StreamWriter(fs3, Encoding.GetEncoding("GBK"));
sw.WriteLine(sb.ToString());
sw.Close();
fs3.Close();
sb.Clear();
flag = 1;
sborder.Append(@"curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\" + filename + Environment.NewLine);

}
else
{
sb.Append("{\"index\":{\"_index\":\"meterdata\",\"_type\":\"autoData\"}}" + Environment.NewLine);
sb.Append("{\"Mfid \":" + i + ",\"TData\":" + new Random().Next(1067500) + ",\"TMoney\":" + new Random().Next(1300) + ",\"HTime\":\"" + startDate.ToString("yyyy-MM-ddTHH:mm:ss") + "\"}" + Environment.NewLine);
flag++;
}
startDate = startDate.AddMinutes(1);//
}

}
sborder.Append("pause");
FileStream fs1 = new FileStream(Application.StartupPath + "\\testdata\\order.bat", FileMode.OpenOrCreate);
StreamWriter sw1 = new StreamWriter(fs1, Encoding.GetEncoding("GBK"));
sw1.WriteLine(sborder.ToString());
sw1.Close();
fs1.Close();
MessageBox.Show("生成完畢");

}
static int GetRandomSeed()
{//隨機生成不重復的編號
byte[] bytes = new byte[4];
System.Security.Cryptography.RNGCryptoServiceProvider rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
rng.GetBytes(bytes);
return BitConverter.ToInt32(bytes, 0);
}

 

總結

測試結果,發現Elasticsearch的搜索速度是挺快的,生成過程中,在17億數據時查了一下,根據Mid和時間在幾個月范圍的數據,查十條數據兩秒多完成查詢,

而且同一查詢條件查詢越多,查詢就越快,應該是Elasticsearch緩存了,

52億條數據,大概占用500G空間左右,還是挺大的,

相比Protocol Buffers存儲的數據,要大三倍左右,但搜索速度還是比較滿意的。

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM