MySQL大量數據入庫的性能比較

本文轉載自查看原文 2017-05-22 19:56 2961 MySQL 批量入庫性能

單位IM改版了
用戶聊天內容要存放在數據庫.

一般JAVA Insert MySQL有如下幾種方式
1.自動提交Insert
2.事務提交Insert
3.批量提交
4.使用Load File接口

模擬表結構如下

create table chat_message(
id bigint primary key auto_increment,
src_userid bigint not null,
target_userid bigint not null,
message varchar(200),
ts timestamp not null default current_timestamp,
s1 int,
s2 int,
s3 int,
s4 int
);

下面代碼,分別使用四種方式,Insert 2w記錄.記錄執行時間.

依賴
commons-lang3-3.3.2.jar
mysql-connector-java-5.1.31-bin.jar(低版本驅動有性能影響)

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import org.apache.commons.lang3.RandomStringUtils;
public class Main {
private static String URL = "jdbc:mysql://127.0.0.1:3306/mvbox";
private static String USERNAME = "xx";
private static String PWD = "xx";
private static int MAX = 20000;
private static String SQL = "insert into chat_message(src_userid,target_userid,message,s1,s2,s3,s4) values(?,?,?,?,?,?,?)";
public static void main(String[] args) throws ClassNotFoundException, SQLException, UnsupportedEncodingException {
long start = System.currentTimeMillis();
testLoadFile(100);
long end = System.currentTimeMillis();
System.out.println((end - start));
System.out.println(MAX / ((end - start) / 1000));
}
private static Connection getConnection() throws SQLException, ClassNotFoundException {
Class.forName("com.mysql.jdbc.Driver");
Connection con = DriverManager.getConnection(URL, USERNAME, PWD);
return con;
}
private static void testInsert() throws ClassNotFoundException, SQLException {
Connection con = getConnection();
con.setAutoCommit(false);
PreparedStatement pt = con.prepareStatement(SQL);
int i = 0;
while (i < MAX) {
pt.setLong(1, 1 + (int) (Math.random() * 100000000));
pt.setLong(2, 1 + (int) (Math.random() * 100000000));
pt.setString(3, RandomStringUtils.randomAscii(200));
pt.setInt(4, 1);
pt.setInt(5, 1);
pt.setInt(6, 1);
pt.setInt(7, 1);
pt.executeUpdate();
con.commit();
i++;
}
con.close();
}
private static void testInsertAutoCommit() throws ClassNotFoundException, SQLException {
Connection con = getConnection();
con.setAutoCommit(true);
PreparedStatement pt = con.prepareStatement(SQL);
int i = 0;
while (i < MAX) {
pt.setLong(1, 1 + (int) (Math.random() * 100000000));
pt.setLong(2, 1 + (int) (Math.random() * 100000000));
pt.setString(3, RandomStringUtils.randomAscii(200));
pt.setInt(4, 1);
pt.setInt(5, 1);
pt.setInt(6, 1);
pt.setInt(7, 1);
pt.executeUpdate();
i++;
}
con.close();
}
private static void testBatchInsert(int batchSize) throws ClassNotFoundException, SQLException {
Connection con = getConnection();
con.setAutoCommit(false);
PreparedStatement pt = con.prepareStatement(SQL);
int i = 0;
while (i < MAX) {
pt.setLong(1, 1 + (int) (Math.random() * 100000000));
pt.setLong(2, 1 + (int) (Math.random() * 100000000));
pt.setString(3, RandomStringUtils.randomAscii(200));
pt.setInt(4, 1);
pt.setInt(5, 1);
pt.setInt(6, 1);
pt.setInt(7, 1);
pt.addBatch();
if (i % batchSize == 1) {
pt.executeBatch();
con.commit();
}
i++;
}
pt.executeBatch();
con.commit();
con.close();
}
private static void testLoadFile(int batchSize)
throws ClassNotFoundException, SQLException, UnsupportedEncodingException {
String fieldsterminated = "\t\t";
String linesterminated = "\t\r\n";
String loadDataSql = "LOAD DATA LOCAL INFILE 'sql.csv' INTO TABLE chat_message FIELDS TERMINATED BY '"
+ fieldsterminated + "' LINES TERMINATED BY '" + linesterminated
+ "' (src_userid,target_userid,message,s1,s2,s3,s4) ";
Connection con = getConnection();
con.setAutoCommit(false);
PreparedStatement pt = con.prepareStatement(loadDataSql);
com.mysql.jdbc.PreparedStatement mysqlStatement = null;
if (pt.isWrapperFor(com.mysql.jdbc.Statement.class)) {
mysqlStatement = pt.unwrap(com.mysql.jdbc.PreparedStatement.class);
}
int i = 0;
StringBuilder sb = new StringBuilder(10000);
while (i < MAX) {
sb.append(1 + (int) (Math.random() * 100000000));
sb.append(fieldsterminated);
sb.append(1 + (int) (Math.random() * 100000000));
sb.append(fieldsterminated);
sb.append(RandomStringUtils.randomAscii(200).replaceAll("\\\\", " "));
sb.append(fieldsterminated);
sb.append(1);
sb.append(fieldsterminated);
sb.append(1);
sb.append(fieldsterminated);
sb.append(1);
sb.append(fieldsterminated);
sb.append(1);
sb.append(linesterminated);
if (i % batchSize == 1) {
byte[] bytes = sb.toString().getBytes();
InputStream in = new ByteArrayInputStream(bytes);
mysqlStatement.setLocalInfileInputStream(in);
mysqlStatement.executeUpdate();
con.commit();
sb = new StringBuilder(10000);
}
i++;
}
byte[] bytes = sb.toString().getBytes();
InputStream in = new ByteArrayInputStream(bytes);
mysqlStatement.setLocalInfileInputStream(in);
mysqlStatement.executeUpdate();
con.commit();
con.close();
}
}

測試結果:

執行方式	執行時間(毫秒)	每秒Insert數量
自動提交	17437	1176
事務提交	22990	909
batchInsert 每10條提交	12646	1666
batchInsert 每50條提交	13758	1538
batchInsert 每100條提交	15870	1333
loadfile 每10條提交	6973	3333
loadfile 每50條提交	5037	4000
loadfile 每100條提交	4175	5000

http://blog.itpub.net/29254281/viewspace-1841299/

一、我們遇到了什么問題

在標准SQL里面，我們通常會寫下如下的SQL insert語句。

 
                INSERT  
                INTO  
                TBL_TEST (id)  
                VALUES 
                (1);

很顯然,在MYSQL中，這樣的方式也是可行的。但是當我們需要批量插入數據的時候，這樣的語句卻會出現性能問題。例如說，如果有需要插入100000條數據，那么就需要有100000條insert語句，每一句都需要提交到關系引擎那里去解析，優化，然后才能夠到達存儲引擎做真的插入工作。

正是由於性能的瓶頸問題，MYSQL官方文檔也就提到了使用批量化插入的方式，也就是在一句INSERT語句里面插入多個值。即，

 
                INSERT  
                INTO  
                TBL_TEST (id)  
                VALUES  
                (1), (2), (3)

這樣的做法確實也可以起到加速批量插入的功效，原因也不難理解，由於提交到服務器的INSERT語句少了，網絡負載少了，最主要的是解析和優化的時間看似增多，但是實際上作用的數據行卻實打實地多了。所以整體性能得以提高。根據網上的一些說法，這種方法可以提高幾十倍。

然而，我在網上也看到過另外的幾種方法，比如說預處理SQL，比如說批量提交。那么這些方法的性能到底如何？本文就會對這些方法做一個比較。

二、比較環境和方法

我的環境比較苦逼，基本上就是一個落后的虛擬機。只有2核，內存為6G。操作系統是SUSI Linux，MYSQL版本是5.6.15。

可以想見，這個機子的性能導致了我的TPS一定非常低，所以下面的所有數據都是沒有意義的，但是趨勢卻不同，它可以看出整個插入的性能走向。

由於業務特點，我們所使用的表非常大，共有195個字段，且寫滿（每個字段全部填滿，包括varchar）大致會有略小於4KB的大小，而通常來說，一條記錄的大小也有3KB。

由於根據我們的實際經驗，我們很肯定的是，通過在一個事務中提交大量INSERT語句可以大幅度提高性能。所以下面的所有測試都是建立在每插入5000條記錄提交一次的做法之上。

最后需要說明的是，下面所有的測試都是通過使用MYSQL C API進行的，並且使用的是INNODB存儲引擎。

三、比較方法

理想型測試（一）——方法比較

目的：找出理想情況下最合適的插入機制

關鍵方法：

1. 每個進/線程按主鍵順序插入

2. 比較不同的插入方法

3. 比較不同進/線程數量對插入的影響

*“普通方法”指的是一句INSERT只插入一個VALUE的情況。

*“預處理SQL”指的是使用預處理MYSQL C API的情況。

* “多表值SQL(10條）”是使用一句INSERT語句插入10條記錄的情況。為什么是10條？后面的驗證告訴了我們這樣做性能最高。

結論，很顯然，從三種方法的趨勢上來看，多表值SQL(10條）的方式最為高效。

理想型測試（二）——多表值SQL條數比較

很顯然，在數據量提高的情況下，每條INSERT語句插入10條記錄的做法最為高效。

理想型測試（三）——連接數比較

結論：在2倍與CPU核數的連接和操作的時候，性能最高

一般性測試—— 根據我們的業務量進行測試

目的：最佳插入機制適合普通交易情況？

關鍵方法：

1. 模擬生產數據(每條記錄約3KB)

2. 每個線程主鍵亂序插入

很顯然，如果是根據主鍵亂序插入的話，性能會有直線下降的情況。這一點其實和INNODB的內部實現原理所展現出來的現象一致。但是仍然可以肯定的是，多表值SQL(10條）的情況是最佳的。

壓力測試

目的：最佳插入機制適合極端交易情況？

關鍵方法：

1. 將數據行的每一個字段填滿（每條記錄約為4KB）

2. 每個線程主鍵亂序插入

結果和我們之前的規律類似，性能出現了極端下降。並且這里驗證了隨着記錄的增大（可能已經超過了一個page的大小，畢竟還有slot和page head信息占據空間），會有page split等現象，性能會下降。

四、結論

根據上面的測試，以及我們對INNODB的了解，我們可以得到如下的結論。

•采用順序主鍵策略（例如自增主鍵，或者修改業務邏輯，讓插入的記錄盡可能順序主鍵）

•采用多值表（10條）插入方式最為合適

•將進程/線程數控制在2倍CPU數目相對合適

http://www.cnblogs.com/aicro/p/3851434.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Dapper, Ef core, Freesql 插入大量數據性能比較（一） Dapper, Ef core, Freesql 插入大量數據性能比較（二） MongoDB 與 MySQL 性能比較 mysql in和exists性能比較和使用【轉】 MySQL中distinct和group by性能比較 ObservableCollection與List在加載數據上的性能比較 HttpServer性能比較大數據量下MySQL插入方法的性能比較 TDEngine和MySQL單表100萬數據查詢性能比較 if與switch的性能比較