使用fastjson讀取超巨json文件引起的GC問題

本文轉載自查看原文 2019-06-27 21:10 1866 fastjson/ 內存占用/ 溢出/ gc/ heap

項目中需要將巨量數據生成的json文件解析，並寫入數據庫，使用了 alibaba 的 fastjson，在實踐過程中遇到了 GC 問題，記錄如下：

數據大約為70萬條，文件大小在3~4G左右，使用 fastjson 官方推薦的 Stream Api 例3 的示例，在讀取到30萬數據時，內存使用量開始迅速上升，CPU也迅速達到百分之百，在讀取到40萬數據左右時，出現 GC。

代碼如下：

import com.alibaba.fastjson.JSONObject;
import com.alibaba.fastjson.JSONReader;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
import org.springframework.jdbc.core.namedparam.SqlParameterSourceUtils;
import org.springframework.stereotype.Component;
import java.io.*;
import java.util.*;

@Component
@Slf4j
public class EnterDatabaseUtils {
    @Autowired
    private NamedParameterJdbcTemplate namedParameterJdbcTemplate;

    private final int batchTotal = 50000;


    public boolean enterData(String databaseName, String tableName, File file, String[] fields) {
        String fileName = file.getName();
        try {
            JSONReader reader = new JSONReader(new InputStreamReader(new FileInputStream(file.getAbsoluteFile()),"UTF-8"));
            String insertSql = "INSERT INTO `" + databaseName  + "`.`" + tableName  + "`"
                            + " (`" + StringUtils.join(fields, "`,`") + "`)"
                            + " VALUES(:" + StringUtils.join(fields, ",:") + ")";
            long count = 1;
            ArrayList<Map<String, Object>> recordList = new ArrayList<>();
            reader.startArray();
            while (reader.hasNext()) {
                reader.startObject();
                JSONObject = reader.readObject(JSONObject.class);
                if (count <= batchTotal) {
                    recordList.add(record);
                    count ++;
                }
                if (batchTotal + 1 == count) {
                    namedParameterJdbcTemplate.batchUpdate(insertSql, SqlParameterSourceUtils.createBatch(recordList));
                    count = 1;
                    recordList.clear();
                }
            }
            if (recordList.size() > 0) {
                namedParameterJdbcTemplate.batchUpdate(insertSql, SqlParameterSourceUtils.createBatch(recordList));
                recordList.clear();
            }
            reader.endArray();
            reader.close();
            return true;
        } catch (Exception e) {
            log.error(databaseName + "." + tableName + "：插入失敗");
            log.error("", e);
            return false;
        }
    }
}

測試代碼：

import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.io.File;

@RunWith(SpringRunner.class)
@SpringBootTest
public class EnterDatabaseUtilsTest {

    @Autowired
    private EnterDatabaseUtils enterDatabaseUtils;

    @Test
    public void testEnterDatabase() {
        File file = new File("/xxx/xxx/xxx.json");
        String[] fields = {........};
        boolean res = enterDatabaseUtils.enterData("xxxx", "xxxx", file, );
    }

}

開始的時候，懷疑是 namedParameterJdbcTemplate 引起的內存占用瘋漲。但是將所有的數據庫相關操作刪除，僅保留json讀取代碼，內存仍然瘋漲並導致 GC。

遂懷疑是 fastjson 使用不當，閱讀了大量文章之后，終於在 Json少量數據解析一文中找到了答案：單行直接 readObject 會導致內存不斷被消耗！

將代碼改為使用 startObject 將每行中的 key、value 單獨解析，內存和CPU占用穩定無增長，問題解決。

改進后的代碼如下：

import com.alibaba.fastjson.JSONObject;
import com.alibaba.fastjson.JSONReader;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
import org.springframework.jdbc.core.namedparam.SqlParameterSourceUtils;
import org.springframework.stereotype.Component;
import java.io.*;
import java.util.*;

@Component
@Slf4j
public class EnterDatabaseUtils {
    @Autowired
    private NamedParameterJdbcTemplate namedParameterJdbcTemplate;

    private final int batchTotal = 50000;


    public boolean enterData(String databaseName, String tableName, File file, String[] fields) {
        String fileName = file.getName();
        try {
            JSONReader reader = new JSONReader(new InputStreamReader(new FileInputStream(file.getAbsoluteFile()),"UTF-8"));
            String insertSql = "INSERT INTO `" + databaseName  + "`.`" + tableName  + "`"
                            + " (`" + StringUtils.join(fields, "`,`") + "`)"
                            + " VALUES(:" + StringUtils.join(fields, ",:") + ")";
            long count = 1;
            ArrayList<Map<String, Object>> recordList = new ArrayList<>();
            Map<String, Object> record = new HashMap<>();
            reader.startArray();
            while (reader.hasNext()) {
                reader.startObject();
                while (reader.hasNext()) {
                    record.put(reader.readString(), reader.readObject());
                }
                reader.endObject();
                if (count <= batchTotal) {
                    recordList.add(record);
                    count ++;
                }
                if (batchTotal + 1 == count) {
                    namedParameterJdbcTemplate.batchUpdate(insertSql, SqlParameterSourceUtils.createBatch(recordList));
                    count = 1;
                    recordList.clear();
                }
            }
            if (recordList.size() > 0) {
                namedParameterJdbcTemplate.batchUpdate(insertSql, SqlParameterSourceUtils.createBatch(recordList));
                recordList.clear();
            }
            reader.endArray();
            reader.close();
            return true;
        } catch (Exception e) {
            log.error(databaseName + "." + tableName + "：插入失敗");
            log.error("", e);
            return false;
        }
    }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 spark讀取mongodb 並使用fastjson對讀取json進行處理使用python讀取json文件 Java 使用fastjson 將 json字符串寫到文件中去一次對象過大引起的gc性能問題的分析與定位 Json詳解以及fastjson使用教程 json注解使用（jackson與fastjson） Json詳解以及fastjson使用教程 JSON格式及FastJson使用詳解使用.net core讀取Json文件配置 php file_get_contents讀取遠程文件的亂碼問題(gzip壓縮引起的)