Avro提供了兩種序列化和反序列化的方式,一種是通過Schema文件來生成代碼的方式,一種是不生成代碼的通用方式。
下面通過一個簡單的例子來進行演示:
1. 配置pom文件
<dependencies> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.9.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.9.1</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
2.需要定義一個模式文件 person.avsc 用於說明要序列化的數據的結構
{ "namespace":"com.zpark", "type":"record", "name":"Person", "fields":[ {"name":"id","type":"string"}, {"name":"name","type":"string"}, {"name":"age","type":["int","null"]} ] }
在編寫模式文件時用到Avro提供的數據類型,可查閱官網 http://avro.apache.org/docs/current/spec.html
3. 通過使用avro的maven插件,根據person.avsc文件生成Person類
4. 根據生成的代碼進行序列化和反序列化的測試
@Test public void testSerializing() throws Exception{ Person person = new Person("001","zhangsan",23); DatumWriter dw = new SpecificDatumWriter<Person>(Person.class); DataFileWriter<Person> dfw = new DataFileWriter<>(dw); dfw.create(person.getSchema(),new File("d://tmp/person.avro")) ; dfw.append(person); dfw.close(); } @Test public void testDeSerializing() throws Exception{ DatumReader<Person> dr = new SpecificDatumReader<Person>(Person.class) ; DataFileReader<Person> dfr = new DataFileReader<Person>(new File("d://tmp/person.avro"),dr) ; Person person = null ; while (dfr.hasNext()){ person = dfr.next() ; System.out.println(person); } }
以上是通過代碼生成的方式來完成序列化和反序列化,下面我們使用通用的方式進行序列化和反序列化,這種方式更加靈活:
@Test public void testGenericSerializing() throws Exception{ InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ; Schema schema = new Schema.Parser().parse(in) ; GenericRecord person = new GenericData.Record(schema) ; person.put("id","001") ; person.put("name","zhangsan"); person.put("age",44); DatumWriter<GenericRecord> dw = new GenericDatumWriter<>(schema) ; DataFileWriter df = new DataFileWriter(dw) ; df.create(schema,new File("d:\\tmp\\person1.avro")) ; df.append(person); df.close(); } @Test public void testGenericDeSerializing() throws Exception{ InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ; Schema schema = new Schema.Parser().parse(in) ; GenericRecord person = null ; DatumReader<GenericRecord> dr = new GenericDatumReader<>(schema); DataFileReader<GenericRecord> dfr = new DataFileReader(new File("d://tmp/person1.avro"),dr) ; while (dfr.hasNext()){ person = dfr.next(); System.out.println(person); } }