批量操作的核心就是一次傳入多個數據然后進行相關操作,增刪改查中掌握其中一個,其它的就可以舉一反三,觸類旁通。它之所以執行效率高,是因為合並后日志量(MySQL的binlog和InnoDB的事務日志)減少了,降低日志刷盤的數據量和頻率,從而提高效率;同時也能減少SQL語句解析的次數,減少網絡傳輸的IO。但是,以下幾點需要注意:
-
SQL語句有長度限制,在進行數據合並在同一SQL中務必不能超過SQL長度限制,通過max_allowed_packet配置可以修改,默認是1M。
-
事務需要控制大小,事務太大可能會影響執行的效率。MySQL有innodb_log_buffer_size配置項,超過這個值會把InnoDB的數據刷到磁盤中,這時,效率會有所下降。所以比較好的做法是,在數據達到這個值前進行事務提交。
在《Java 使用線程池分批插入或者更新數據》中介紹了如何在Java端批量切分數據,然后,使用線程池將被切分的數據傳入相應DAO層的方法后,即可完整實現批量操作。在《Mybatis批量insert 返回主鍵值和foreach標簽詳解》中已經介紹了批量插入操作,而且,詳細描述了foreach標簽,這里簡要概述批量刪除、更新和查找。
首先,mysql需要數據庫連接配置&allowMultiQueries=true
jdbc:mysql://127.0.0.1:3306/mybank?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true
批量刪除
<delete id= "batchDeleteByIds" parameterType= "list">
delete from instance where instance_id in
<foreach collection="list" item= "item" index ="index"
open= "(" close =")" separator=",">
#{item}
</foreach >
</delete >
|
批量更新
<update id= "updateUpdateTimeByIds" parameterType= "map">
update instance
set update_time = #{ updateTime } where instance_id in
<foreach collection="idlist" item= "uid" index ="index"
open= "(" close =")" separator=",">
#{ uid}
</foreach >
</update >
|
用法和之前的基本相同,但是這里傳入的參數是map類型,idlist和updateTime是map的key。
批量查詢
<select id="selectByIds" resultType="list" parameterType="map">
SELECT infos, create_time, update_time FROM instance WHERE instance_id in
<foreach collection="ids" item="id" index="index" open="(" close=")" separator=",">
#{id}
</foreach>
</select>
|
這里提供一下DAO層:
List<Instance> selectByIds (Map<String, Object> map);
void batchDeleteByIds (List<Long> list);
void updateUpdateTimeByIds(Map<String, Object> map);
|
乍看上去這個foreach沒有問題,但是經過項目實踐發現,當表的列數較多(20+),以及一次性插入的行數較多(5000+)時,整個插入的耗時十分漫長,達到了14分鍾,這是不能忍的。在資料中也提到了一句話:
Of course don't combine ALL of them, if the amount is HUGE. Say you have 1000 rows you need to insert, then don't do it one at a time. You shouldn't equally try to have all 1000 rows in a single query. Instead break it into smaller sizes.
|
它強調,當插入數量很多時,不能把所有的雞蛋放在同一個籃子里,即一次性全放在一條語句里。可是為什么不能放在同一條語句里呢?這條語句為什么會耗時這么久呢?我查閱了資料發現:
Insert inside MyBatis foreach is not batch, this is a single (could become giant) SQL statement and that brings drawbacks:
Iteration over the collection must not be done in the mybatis XML. Just execute a simple Insertstatement in a Java Foreach loop. The most important thing is the session Executor type.
SqlSession session = sessionFactory.openSession(ExecutorType.BATCH); for (Model model : list) { session.insert("insertStatement", model); }
Unlike default ExecutorType.SIMPLE, the statement will be prepared once and executed for each record to insert.
|
雖然MyBatis官網推薦使用ExecutorType.BATCH 的插入方式,因為,其性能更好;但是,其SQL寫在了Java里,如果SQL比較復雜,則不易於維護。因此,本文只詳細介紹了常見的使用foreach標簽的方式。下面從MyBatis官網借用一個Batch Insert
示例。
A batch insert is a collection of statements that can be used to execute a JDBC batch. A batch is the preferred method of doing bulk inserts with JDBC. The basic idea is that you configure the connection for a batch insert, then execute the same statement multiple times, with different values for each inserted record. MyBatis has a nice abstraction of JDBC batches that works well with statements generated from this library. A batch insert looks like this:
... SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH); try { SimpleTableMapper mapper = session.getMapper(SimpleTableMapper.class); List<SimpleTableRecord> records = getRecordsToInsert(); // not shown
BatchInsert<SimpleTableRecord> batchInsert = insert(records) .into(simpleTable) .map(id).toProperty("id") .map(firstName).toProperty("firstName") .map(lastName).toProperty("lastName") .map(birthDate).toProperty("birthDate") .map(employed).toProperty("employed") .map(occupation).toProperty("occupation") .build() .render(RenderingStrategy.MYBATIS3); batchInsert.insertStatements().stream().forEach(mapper::insert); session.commit(); } finally { session.close(); } ...
Reference