解析超大JSON文件

本文轉載自查看原文 2020-07-12 21:43 1599 java

來源於 https://blog.csdn.net/qq_18663253/article/details/102666830

解析超大JSON文件
1、需求
最近項目中需要將一個一個大於50G的JSON文件導入到ES中，試過普通的按行讀取文件和JSONReader流讀取文件，由於json文件實在過於龐大，都不能解決問題。

2、解決方案
要解析的數據結構如下：

{"nameList":[{"name":"zhangsan"},{"name":"lisi"}],"ageList":[{"age1":"18"},{"age2":"12"}],"list":[{"a":"xxx","b":"zzz"}]}
1
結構很簡單，但是就是每個json數組中包含的json對象太多，導致用流和按行讀取時加載到內存會導致內存溢出。.

最終采用了JsonToken的解決方案。

    import org.codehaus.jackson.map.*;
    import org.codehaus.jackson.*;
    import java.io.File;
    public class ParseJsonSample {
      public static void main(String[] args) throws Exception {
        JsonFactory f = new MappingJsonFactory();
        JsonParser jp = f.createJsonParser(new File(args[0]));
        JsonToken current;
        current = jp.nextToken();
        if (current != JsonToken.START_OBJECT) {
          System.out.println("Error: root should be object: quiting.");
          return;
        }
        while (jp.nextToken() != JsonToken.END_OBJECT) {
          String fieldName = jp.getCurrentName();
          // move from field name to field value
          current = jp.nextToken();
          if (fieldName.equals("records")) {
            if (current == JsonToken.START_ARRAY) {
              // For each of the records in the array
              while (jp.nextToken() != JsonToken.END_ARRAY) {
                // read the record into a tree model,
                // this moves the parsing position to the end of it
                JsonNode node = jp.readValueAsTree();
                // And now we have random access to everything in the object
                System.out.println("field1: " + node.get("field1").getValueAsText());
                System.out.println("field2: " + node.get("field2").getValueAsText());
              }
            } else {
              System.out.println("Error: records should be an array: skipping.");
              jp.skipChildren();
            }
          } else {
            System.out.println("Unprocessed property: " + fieldName);
            jp.skipChildren();
          }
        }                
      }
    }

代碼中使用流和樹模型解析的組合讀取此文件。每個單獨的記錄都以樹形結構讀取，但文件永遠不會完整地讀入內存，因此JVM內存不會爆炸。最終解決了讀取超大文件的問題。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 超大JSON文件解析方案（Java） Python解析超大的json數據（GB級別） Json字符串解析原理、超大json對象的解析 Json文件解析（上） delphi json文件解析 python之json文件解析 Fastjson處理超大的json文本 json5帶注釋的json文件的解析 java解析本地json文件 python讀取json文件並解析