1、前言
Avro序列化的API主要有兩種,SpecificDatumWriter / SpecificDatumReader及DataFileWriter / DataFileReader,后者是對前者的封裝。兩者的特點分別介紹如下:
2、SpecificDatumWriter / SpecificDatumReader
2.1 SpecificDatumWriter序列化
SpecificDatumWriter序列化一條或多條記錄
1 public static ByteArrayOutputStream serializePrimary(Schema schema, List<GenericRecord> records) throws IOException{ 2 DatumWriter<GenericRecord> datumWriter = new SpecificDatumWriter<GenericRecord>(schema); 3 ByteArrayOutputStream out = new ByteArrayOutputStream(); 4 BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out , null); 5 6 //多次調用write方法 7 for(GenericRecord record : records){ 8 datumWriter.write(record , encoder); 9 encoder.flush(); 10 } 11 return out; 12 }
2.2 SpecificDatumReader反序列化
SpecificDatumReader反序列化獲得一條或多條記錄
1 public static List<GenericRecord> deserializeMulPrimary(Schema schema, ByteArrayOutputStream out) throws IOException{ 2 DatumReader<GenericRecord> datumReader = new SpecificDatumReader<GenericRecord>(schema); 3 Decoder decoder=DecoderFactory.get().binaryDecoder(out.toByteArray(), null); 4 5 List<GenericRecord> records = new ArrayList<GenericRecord>(); 6 while(true){ 7 try { 8 GenericRecord record = datumReader.read(null, decoder); 9 records.add(record); 10 } catch (EOFException eof) { 11 //讀取到字節流的末尾時,結束循環 12 break; 13 } 14 } 15 return records; 16 }
2.3 特點
a 序列化后的內容中不含有schema信息
b 反序列化時必須有schema信息(因為序列化記錄中沒有schema信息)
c 主要以內存為存儲媒介
d 可以序列化和反序列化獲得一條或多條記錄
3、 DataFileWriter / DataFileReader
3.1 DataFileWriter序列化
將數據序列化到內存中
1 public static ByteArrayOutputStream serializeToMemory(Schema schema, List<GenericRecord> records) throws IOException{ 2 DatumWriter<GenericRecord> datumWriter = new SpecificDatumWriter<GenericRecord>(schema); 3 DataFileWriter<GenericRecord> fileWriter = new DataFileWriter<GenericRecord>(datumWriter); 4 5 ByteArrayOutputStream out = new ByteArrayOutputStream(); 6 //先現將Schema寫入到內存中 7 fileWriter.create(schema, out); 8 //再開始追加多條GenericRecord記錄 9 for(GenericRecord record : records){ 10 fileWriter.append(record); 11 } 12 fileWriter.close(); 13 return out; 14 }
將數據序列化到avro文件中
1 public static File serializeToFile(Schema schema, List<GenericRecord> records, String fileDirectory) throws IOException{ 2 DatumWriter<GenericRecord> datumWriter = new SpecificDatumWriter<GenericRecord>(schema); 3 DataFileWriter<GenericRecord> fileWriter = new DataFileWriter<GenericRecord>(datumWriter); 4 5 File file = new File(fileDirectory + "/" + System.currentTimeMillis() + "-" + UUID.randomUUID().toString() + ".avro"); 6 7 //先將schema信息添加到文件中 8 fileWriter.create(schema, file); 9 //再開始追加GenericRecord記錄 10 for(GenericRecord record : records){ 11 fileWriter.append(record); 12 } 13 fileWriter.close(); 14 return file; 15 }
3.1 DataFileReader反序列化
從內存中反序列化獲得一條或多條記錄
1 public static List<GenericRecord> deserializeFromMemory(ByteArrayOutputStream out) throws IOException{ 2 DatumReader<GenericRecord> datumReader = new SpecificDatumReader<GenericRecord>(); 3 SeekableByteArrayInput sin = new SeekableByteArrayInput(out.toByteArray()); 4 DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(sin, datumReader); 5 6 List<GenericRecord> records = new ArrayList<GenericRecord>(); 7 while(fileReader.hasNext()){ 8 records.add(fileReader.next()); 9 } 10 fileReader.close(); 11 return records; 12 }
從avro文件中反序列化獲得一條或多條記錄
1 public static List<GenericRecord> deserializeFromFile(File file) throws IOException{ 2 DatumReader<GenericRecord> datumReader = new SpecificDatumReader<GenericRecord>(); 3 DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(file, datumReader); 4 5 List<GenericRecord> records = new ArrayList<GenericRecord>(); 6 while(fileReader.hasNext()){ 7 records.add(fileReader.next()); 8 } 9 fileReader.close(); 10 return records; 11 }
3.3 特點
a 序列化后的內容中含有Schema信息
b 反序列化時就不再需要Schema信息,因為序列化的內容中已經含有Schema信息
c 可以以內存為存儲媒介,也可以以文件為存儲媒介