來源於 https://blog.csdn.net/qq_18663253/article/details/102666830
解析超大JSON文件
1、需求
最近項目中需要將一個一個大於50G的JSON文件導入到ES中,試過普通的按行讀取文件和JSONReader流讀取文件,由於json文件實在過於龐大,都不能解決問題。
2、解決方案
要解析的數據結構如下:
{"nameList":[{"name":"zhangsan"},{"name":"lisi"}],"ageList":[{"age1":"18"},{"age2":"12"}],"list":[{"a":"xxx","b":"zzz"}]}
1
結構很簡單,但是就是每個json數組中包含的json對象太多,導致用流和按行讀取時加載到內存會導致內存溢出。.
最終采用了JsonToken的解決方案。
import org.codehaus.jackson.map.*; import org.codehaus.jackson.*; import java.io.File; public class ParseJsonSample { public static void main(String[] args) throws Exception { JsonFactory f = new MappingJsonFactory(); JsonParser jp = f.createJsonParser(new File(args[0])); JsonToken current; current = jp.nextToken(); if (current != JsonToken.START_OBJECT) { System.out.println("Error: root should be object: quiting."); return; } while (jp.nextToken() != JsonToken.END_OBJECT) { String fieldName = jp.getCurrentName(); // move from field name to field value current = jp.nextToken(); if (fieldName.equals("records")) { if (current == JsonToken.START_ARRAY) { // For each of the records in the array while (jp.nextToken() != JsonToken.END_ARRAY) { // read the record into a tree model, // this moves the parsing position to the end of it JsonNode node = jp.readValueAsTree(); // And now we have random access to everything in the object System.out.println("field1: " + node.get("field1").getValueAsText()); System.out.println("field2: " + node.get("field2").getValueAsText()); } } else { System.out.println("Error: records should be an array: skipping."); jp.skipChildren(); } } else { System.out.println("Unprocessed property: " + fieldName); jp.skipChildren(); } } } }
代碼中使用流和樹模型解析的組合讀取此文件。 每個單獨的記錄都以樹形結構讀取,但文件永遠不會完整地讀入內存,因此JVM內存不會爆炸。最終解決了讀取超大文件的問題。
