https://my.oschina.net/skyim/blog/479159
1. Parquet 的優點我就不說拉(列存儲和良好的壓縮),列存儲可以參考如下鏈接
2.主要是項目中用到的存儲
3.第一步,首先在hive中創建一張表,操作表語句如下
create external table parquet_example (
basketid bigint,
productid bigint,
quantity int,
price float,
totalbasketvalue float
) stored as parquet location '/user/hive/warehouse/parquet_example';
hive 中操作語句如下

4.我們到界面上面去看看這兩張表,hive左下角已經有parquet-example

5.需要在impala里面查看的話
需要在impala執行如下語句 INVALIDATE METADAT6.現在主要是將表里面寫入相關parquet文件
public class BasketWriter {
public static void main(String[] args) throws IOException {
DateFormat dateFormat = new SimpleDateFormat("YYYYMMddHHmmss");
new BasketWriter().generateBasketData("part_"+dateFormat.format(new Date()));
}
private void generateBasketData(String outFilePath) throws IOException {
final MessageType schema = MessageTypeParser.parseMessageType("message basket { required int64 basketid; required int64 productid; required int32 quantity; required float price; required float totalbasketvalue; }");
Configuration config = new Configuration();
DataWritableWriteSupport.setSchema(schema, config);
Path outDirPath = new Path("hdfs://192.168.0.80/user/hive/warehouse/parquet_example/"+outFilePath); //hdfs 文件目錄
ParquetWriter writer = new ParquetWriter(outDirPath, new DataWritableWriteSupport () {
@Override
public WriteContext init(Configuration configuration) {
if (configuration.get(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA) == null) {
configuration.set(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA, schema.toString());
}
return super.init(configuration);
}
}, CompressionCodecName.SNAPPY, 256*1024*1024, 100*1024);
int numBaskets = 1000000;
Random numProdsRandom = new Random();
Random quantityRandom = new Random();
Random priceRandom = new Random();
Random prodRandom = new Random();
for (int i = 0; i < numBaskets; i++) {
int numProdsInBasket = numProdsRandom.nextInt(30);
numProdsInBasket = Math.max(7, numProdsInBasket);
float totalPrice = priceRandom.nextFloat();
totalPrice = (float)Math.max(0.1, totalPrice) * 100;
for (int j = 0; j < numProdsInBasket; j++) {
Writable[] values = new Writable[5];
values[0] = new LongWritable(i);
values[1] = new LongWritable(prodRandom.nextInt(200000));
values[2] = new IntWritable(quantityRandom.nextInt(10));
values[3] = new FloatWritable(priceRandom.nextFloat());
values[4] = new FloatWritable(totalPrice);
ArrayWritable value = new ArrayWritable(Writable.class, values);
writer.write(value);
}
}
writer.close();
}
}
7.下面可以查看到我們輸入的數據

8.下面可以在hive或者 impala 查詢寫入的數據


源代碼可以用如下找到
https://github.com/wangxuehui/writeparquet/
