現象
在debezium 抽取 千萬級大表時,發現snapshot時同步速率在 2s 10000 row ,同時還有 young GC 信息打印
原因分析
網絡原因
首先排除網絡延遲的原因,ping 一下目的主機,發現 延遲在0.1ms
本身原因
查看SnapshotReader源碼
// Scan the rows in the table ... long start = clock.currentTimeInMillis(); logger.info("Step {}: - scanning table '{}' ({} of {} tables)", step, tableId, ++counter, capturedTableIds.size()); Map<TableId, String> selectOverrides = context.getConnectorConfig().getSnapshotSelectOverridesByTable(); String selectStatement = selectOverrides.getOrDefault(tableId, "SELECT * FROM " + quote(tableId)); logger.info("For table '{}' using select statement: '{}'", tableId, selectStatement); sql.set(selectStatement); try { int stepNum = step; mysql.query(sql.get(), statementFactory, rs -> { try { // The table is included in the connector's filters, so process all of the table records // ... final Table table = schema.tableFor(tableId); final int numColumns = table.columns().size(); final Object[] row = new Object[numColumns]; while (rs.next()) { for (int i = 0, j = 1; i != numColumns; ++i, ++j) { Column actualColumn = table.columns().get(i); row[i] = readField(rs, j, actualColumn, table); } recorder.recordRow(recordMaker, row, clock.currentTimeAsInstant()); // has no row number! rowNum.incrementAndGet(); if (rowNum.get() % 100 == 0 && !isRunning()) { // We've stopped running ... break; } if (rowNum.get() % 10_000 == 0) { if (logger.isInfoEnabled()) { long stop = clock.currentTimeInMillis(); logger.info("Step {}: - {} of {} rows scanned from table '{}' after {}", stepNum, rowNum, rowCountStr, tableId, Strings.duration(stop - start)); } metrics.rowsScanned(tableId, rowNum.get()); } }
,原來它默認是對表做一個select * ,然后在內存中對 整個表做 個 count ,之后迭代發送數據
解決方案
在官網里找snapshot 的配置,發現一個參數 min.row.count.to.stream.results
https://debezium.io/documentation/reference/1.4/connectors/mysql.html#mysql-property-min-row-count-to-stream-results
,嘗試配了這個參數后,日志打印沒有去select * ,直接 每批次 發送10000,速度提升了20倍。