如何將json數據導入到Hive中


背景

1、當進程在進行遠程通信時,彼此可以發送各種類型的數據,無論是什么類型的數據都會以二進制序列的形式在網絡上傳送。發送方需要把對象轉化為字節序列才可在網絡上傳輸,稱為對象序列化;接收方則需要把字節序列恢復為對象,稱為對象的反序列化。

2、Hive的反序列化是對key/value反序列化成hive table的每個列的值。

3、Hive可以方便的將數據加載到表中而不需要對數據進行轉換,這樣在處理海量數據時可以節省大量的時間。

Solution 1 : 將json格式數據導入到MongoDB中,然后MongoDB可以將數據轉換為CSV格式數據,然后導入到mysql中;

CSSer.com采用的是wordpress程序,數據庫為mysql,要想移植到MongoDB數據庫,則需要進行數據轉換。

數據轉移有多種方案,本質上需要將mysql數據轉換為一種MongoDB可以直接導入的格式即可。MongoDB提供了mongoimport工具,可以支持導入json,csv的格式。

先來看一下mongoimport支持的參數:

$ mongoimport --help
options:
  --help                  produce help message
  -v [ --verbose ]        be more verbose (include multiple times for more
                          verbosity e.g. -vvvvv)
  -h [ --host ] arg       mongo host to connect to ( <set name>/s1,s2 for sets)
  --port arg              server port. Can also use --host hostname:port
  --ipv6                  enable IPv6 support (disabled by default)
  -u [ --username ] arg   username
  -p [ --password ] arg   password
  --dbpath arg            directly access mongod database files in the given
                          path, instead of connecting to a mongod  server -
                          needs to lock the data directory, so cannot be used
                          if a mongod is currently accessing the same path
  --directoryperdb        if dbpath specified, each db is in a separate
                          directory
  -d [ --db ] arg         database to use
  -c [ --collection ] arg collection to use (some commands)
  -f [ --fields ] arg     comma separated list of field names e.g. -f name,age
  --fieldFile arg         file with fields names - 1 per line
  --ignoreBlanks          if given, empty fields in csv and tsv will be ignored
  --type arg              type of file to import.  default: json (json,csv,tsv)
  --file arg              file to import from; if not specified stdin is used
  --drop                  drop collection first
  --headerline            CSV,TSV only - use first line as headers
  --upsert                insert or update objects that already exist
  --upsertFields arg      comma-separated fields for the query part of the
                          upsert. You should make sure this is indexed
  --stopOnError           stop importing at first error rather than continuing
  --jsonArray             load a json array, not one item per line. Currently
                          limited to 4MB.

由上面的幫助文檔可以看出,采用csv作為中間數據格式,無論對於mysql的導出,還是mongodb的導入,都算得上是成本最低了,於是一回就嘗試了一把:

首先,將mysql數據庫中的wp-posts表導出,一回偷懶了,直接用phpmyadmin的導出功能,選擇csv格式導出,並選中了“刪除字段中的換行符”以及“將字段名放在第一行”,保存文件名為csser.csv。

接着,到mongodb服務器,shell下連接MongoDB數據庫,並進行數據導入:

$ mongoimport -d csser -c posts -type csv -file csser.csv --headerline
connected to: 127.0.0.1
imported 548 objects

$ mongo
MongoDB shell version: 1.8.1
connecting to: test
> use csser
switched to db csser
> db.posts.count()
547

> db.posts.find({}, {"post_title":1}).sort({"ID":-1}).limit(1)
{ "_id" : ObjectId("4df4641d31b0642fe609426d"), "post_title" : "CSS Sprites在線應用推薦-CSS-sprit" }

Solution2 : 通過Hive中SerDe將JSON數據轉換為hive理解的數據格式,原因有:

       1、創建Hive表使用序列化時,需要自寫一個實現Deserializer的類,並且選用create命令的row format參數;

       2、在處理海量數據的時候,如果數據的格式與表結構吻合,可以用到Hive的反序列化而不需要對數據進行轉換,可以   節省大量的時間。

How-to: Use a SerDe in Apache Hive

Apache Hive is a fantastic tool for performing SQL-style queries across data that is often not appropriate for a relational database. For example, semistructured and unstructured data can be queried gracefully via Hive, due to two core features: The first is Hive’s support of complex data types, such as structs, arrays, and unions, in addition to many of the common data types found in most relational databases. The second feature is the SerDe.

What is a SerDe?

The SerDe interface allows you to instruct Hive as to how a record should be processed. A SerDe is a combination of a Serializer and a Deserializer (hence, Ser-De). The Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Hive can manipulate. The Serializer, however, will take a Java object that Hive has been working with, and turn it into something that Hive can write to HDFS or another supported system. Commonly, Deserializers are used at query time to execute SELECT statements, and Serializers are used when writing data, such as through an INSERT-SELECT statement.

In this article, we will examine a SerDe for processing JSON data, which can be used to transform a JSON record into something that Hive can process.

Developing a SerDe

JSONSerDe.java代碼如下:

View Code
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.cloudera.hive.serde;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.serde.serdeConstants;
import org.apache.hadoop.hive.serde2.SerDe;
import org.apache.hadoop.hive.serde2.SerDeException;
import org.apache.hadoop.hive.serde2.SerDeStats;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructField;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo;
import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.codehaus.jackson.map.ObjectMapper;

/**
* This SerDe can be used for processing JSON data in Hive. It supports
* arbitrary JSON data, and can handle all Hive types except for UNION.
* However, the JSON data is expected to be a series of discrete records,
* rather than a JSON array of objects.
*
* The Hive table is expected to contain columns with names corresponding to
* fields in the JSON data, but it is not necessary for every JSON field to
* have a corresponding Hive column. Those JSON fields will be ignored during
* queries.
*
* Example:
*
* { "a": 1, "b": [ "str1", "str2" ], "c": { "field1": "val1" } }
*
* Could correspond to a table:
*
* CREATE TABLE foo (a INT, b ARRAY<STRING>, c STRUCT<field1:STRING>);
*
* JSON objects can also interpreted as a Hive MAP type, so long as the keys
* and values in the JSON object are all of the appropriate types. For example,
* in the JSON above, another valid table declaraction would be:
*
* CREATE TABLE foo (a INT, b ARRAY<STRING>, c MAP<STRING,STRING>);
*
* Only STRING keys are supported for Hive MAPs.
*/
public class JSONSerDe implements SerDe {
  
  private StructTypeInfo rowTypeInfo;
  private ObjectInspector rowOI;
  private List<String> colNames;
  private List<Object> row = new ArrayList<Object>();
  
  /**
* An initialization function used to gather information about the table.
* Typically, a SerDe implementation will be interested in the list of
* column names and their types. That information will be used to help perform
* actual serialization and deserialization of data.
*/
  @Override
  public void initialize(Configuration conf, Properties tbl)
      throws SerDeException {
    // Get a list of the table's column names.
    String colNamesStr = tbl.getProperty(serdeConstants.LIST_COLUMNS);
    colNames = Arrays.asList(colNamesStr.split(","));
    
    // Get a list of TypeInfos for the columns. This list lines up with
    // the list of column names.
    String colTypesStr = tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
    List<TypeInfo> colTypes =
        TypeInfoUtils.getTypeInfosFromTypeString(colTypesStr);
    
    rowTypeInfo =
        (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(colNames, colTypes);
    rowOI =
        TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(rowTypeInfo);
  }

  /**
* This method does the work of deserializing a record into Java objects that
* Hive can work with via the ObjectInspector interface. For this SerDe, the
* blob that is passed in is a JSON string, and the Jackson JSON parser is
* being used to translate the string into Java objects.
*
* The JSON deserialization works by taking the column names in the Hive
* table, and looking up those fields in the parsed JSON object. If the value
* of the field is not a primitive, the object is parsed further.
*/
  @Override
  public Object deserialize(Writable blob) throws SerDeException {
    Map<?,?> root = null;
    row.clear();
    try {
      ObjectMapper mapper = new ObjectMapper();
      // This is really a Map<String, Object>. For more information about how
      // Jackson parses JSON in this example, see
      // http://wiki.fasterxml.com/JacksonDataBinding
      root = mapper.readValue(blob.toString(), Map.class);
    } catch (Exception e) {
      throw new SerDeException(e);
    }

    // Lowercase the keys as expected by hive
    Map<String, Object> lowerRoot = new HashMap();
    for(Map.Entry entry: root.entrySet()) {
      lowerRoot.put(((String)entry.getKey()).toLowerCase(), entry.getValue());
    }
    root = lowerRoot;
    
    Object value= null;
    for (String fieldName : rowTypeInfo.getAllStructFieldNames()) {
      try {
        TypeInfo fieldTypeInfo = rowTypeInfo.getStructFieldTypeInfo(fieldName);
        value = parseField(root.get(fieldName), fieldTypeInfo);
      } catch (Exception e) {
        value = null;
      }
      row.add(value);
    }
    return row;
  }
  
  /**
* Parses a JSON object according to the Hive column's type.
*
* @param field - The JSON object to parse
* @param fieldTypeInfo - Metadata about the Hive column
* @return - The parsed value of the field
*/
  private Object parseField(Object field, TypeInfo fieldTypeInfo) {
    switch (fieldTypeInfo.getCategory()) {
    case PRIMITIVE:
      // Jackson will return the right thing in this case, so just return
      // the object
      if (field instanceof String) {
        field = field.toString().replaceAll("\n", "\\\\n");
      }
      return field;
    case LIST:
      return parseList(field, (ListTypeInfo) fieldTypeInfo);
    case MAP:
      return parseMap(field, (MapTypeInfo) fieldTypeInfo);
    case STRUCT:
      return parseStruct(field, (StructTypeInfo) fieldTypeInfo);
    case UNION:
      // Unsupported by JSON
    default:
      return null;
    }
  }
  
  /**
* Parses a JSON object and its fields. The Hive metadata is used to
* determine how to parse the object fields.
*
* @param field - The JSON object to parse
* @param fieldTypeInfo - Metadata about the Hive column
* @return - A map representing the object and its fields
*/
  private Object parseStruct(Object field, StructTypeInfo fieldTypeInfo) {
    Map<Object,Object> map = (Map<Object,Object>)field;
    ArrayList<TypeInfo> structTypes = fieldTypeInfo.getAllStructFieldTypeInfos();
    ArrayList<String> structNames = fieldTypeInfo.getAllStructFieldNames();
    
    List<Object> structRow = new ArrayList<Object>(structTypes.size());
    for (int i = 0; i < structNames.size(); i++) {
      structRow.add(parseField(map.get(structNames.get(i)), structTypes.get(i)));
    }
    return structRow;
  }

  /**
* Parse a JSON list and its elements. This uses the Hive metadata for the
* list elements to determine how to parse the elements.
*
* @param field - The JSON list to parse
* @param fieldTypeInfo - Metadata about the Hive column
* @return - A list of the parsed elements
*/
  private Object parseList(Object field, ListTypeInfo fieldTypeInfo) {
    ArrayList<Object> list = (ArrayList<Object>) field;
    TypeInfo elemTypeInfo = fieldTypeInfo.getListElementTypeInfo();
    
    for (int i = 0; i < list.size(); i++) {
      list.set(i, parseField(list.get(i), elemTypeInfo));
    }
    
    return list.toArray();
  }

  /**
* Parse a JSON object as a map. This uses the Hive metadata for the map
* values to determine how to parse the values. The map is assumed to have
* a string for a key.
*
* @param field - The JSON list to parse
* @param fieldTypeInfo - Metadata about the Hive column
* @return
*/
  private Object parseMap(Object field, MapTypeInfo fieldTypeInfo) {
    Map<Object,Object> map = (Map<Object,Object>) field;
    TypeInfo valueTypeInfo = fieldTypeInfo.getMapValueTypeInfo();
    
    for (Map.Entry<Object,Object> entry : map.entrySet()) {
      map.put(entry.getKey(), parseField(entry.getValue(), valueTypeInfo));
    }
    return map;
  }

  /**
* Return an ObjectInspector for the row of data
*/
  @Override
  public ObjectInspector getObjectInspector() throws SerDeException {
    return rowOI;
  }

  /**
* Unimplemented
*/
  @Override
  public SerDeStats getSerDeStats() {
    return null;
  }

  /**
* JSON is just a textual representation, so our serialized class
* is just Text.
*/
  @Override
  public Class<? extends Writable> getSerializedClass() {
    return Text.class;
  }

  /**
* This method takes an object representing a row of data from Hive, and uses
* the ObjectInspector to get the data for each column and serialize it. This
* implementation deparses the row into an object that Jackson can easily
* serialize into a JSON blob.
*/
  @Override
  public Writable serialize(Object obj, ObjectInspector oi)
      throws SerDeException {
    Object deparsedObj = deparseRow(obj, oi);
    ObjectMapper mapper = new ObjectMapper();
    try {
      // Let Jackson do the work of serializing the object
      return new Text(mapper.writeValueAsString(deparsedObj));
    } catch (Exception e) {
      throw new SerDeException(e);
    }
  }

  /**
* Deparse a Hive object into a Jackson-serializable object. This uses
* the ObjectInspector to extract the column data.
*
* @param obj - Hive object to deparse
* @param oi - ObjectInspector for the object
* @return - A deparsed object
*/
  private Object deparseObject(Object obj, ObjectInspector oi) {
    switch (oi.getCategory()) {
    case LIST:
      return deparseList(obj, (ListObjectInspector)oi);
    case MAP:
      return deparseMap(obj, (MapObjectInspector)oi);
    case PRIMITIVE:
      return deparsePrimitive(obj, (PrimitiveObjectInspector)oi);
    case STRUCT:
      return deparseStruct(obj, (StructObjectInspector)oi, false);
    case UNION:
      // Unsupported by JSON
    default:
      return null;
    }
  }
  
  /**
* Deparses a row of data. We have to treat this one differently from
* other structs, because the field names for the root object do not match
* the column names for the Hive table.
*
* @param obj - Object representing the top-level row
* @param structOI - ObjectInspector for the row
* @return - A deparsed row of data
*/
  private Object deparseRow(Object obj, ObjectInspector structOI) {
    return deparseStruct(obj, (StructObjectInspector)structOI, true);
  }

  /**
* Deparses struct data into a serializable JSON object.
*
* @param obj - Hive struct data
* @param structOI - ObjectInspector for the struct
* @param isRow - Whether or not this struct represents a top-level row
* @return - A deparsed struct
*/
  private Object deparseStruct(Object obj,
                               StructObjectInspector structOI,
                               boolean isRow) {
    Map<Object,Object> struct = new HashMap<Object,Object>();
    List<? extends StructField> fields = structOI.getAllStructFieldRefs();
    for (int i = 0; i < fields.size(); i++) {
      StructField field = fields.get(i);
      // The top-level row object is treated slightly differently from other
      // structs, because the field names for the row do not correctly reflect
      // the Hive column names. For lower-level structs, we can get the field
      // name from the associated StructField object.
      String fieldName = isRow ? colNames.get(i) : field.getFieldName();
      ObjectInspector fieldOI = field.getFieldObjectInspector();
      Object fieldObj = structOI.getStructFieldData(obj, field);
      struct.put(fieldName, deparseObject(fieldObj, fieldOI));
    }
    return struct;
  }

  /**
* Deparses a primitive type.
*
* @param obj - Hive object to deparse
* @param oi - ObjectInspector for the object
* @return - A deparsed object
*/
  private Object deparsePrimitive(Object obj, PrimitiveObjectInspector primOI) {
    return primOI.getPrimitiveJavaObject(obj);
  }

  private Object deparseMap(Object obj, MapObjectInspector mapOI) {
    Map<Object,Object> map = new HashMap<Object,Object>();
    ObjectInspector mapValOI = mapOI.getMapValueObjectInspector();
    Map<?,?> fields = mapOI.getMap(obj);
    for (Map.Entry<?,?> field : fields.entrySet()) {
      Object fieldName = field.getKey();
      Object fieldObj = field.getValue();
      map.put(fieldName, deparseObject(fieldObj, mapValOI));
    }
    return map;
  }

  /**
* Deparses a list and its elements.
*
* @param obj - Hive object to deparse
* @param oi - ObjectInspector for the object
* @return - A deparsed object
*/
  private Object deparseList(Object obj, ListObjectInspector listOI) {
    List<Object> list = new ArrayList<Object>();
    List<?> field = listOI.getList(obj);
    ObjectInspector elemOI = listOI.getListElementObjectInspector();
    for (Object elem : field) {
      list.add(deparseObject(elem, elemOI));
    }
    return list;
  }
}

Using the SerDe

然后將JSONSerDe.java代碼通過Eclipse打包成*.jar文件,並添加相關屬性到hive-site.xml文件中,否則在hive客戶端執行相關MR操作時提示com.cloudera.hive.serde.JSONSerDe無法找到的Hive ClassNotFoundException相關異常,sulution如下:

you need to tell Hive about the JAR. This is how I do it in hive-site.xml:
將下面的配置語句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中*.jar的路徑為你機器上實際的放置,在$HIVE_INSTALL/lib目錄下尋找。 <property>
  <name>hive.aux.jars.path</name>
  <value>file:///home/landen/UntarFile/hive-0.10.0/lib/*.jar</value>
  <description>These JAR file are available to all users for all jobs</description>
</property>

Notice: 僅僅ADD JAR home/landen/UntarFile/hive-0.10.0/lib/*.jar是不夠的,還需要在hive啟動之前告知其*.jar路徑,否則會出現上述相關異常。

Tables can be configured to process data using a SerDe by specifying the SerDe to use at table creation time, or through the use of an ALTER TABLE statement. For example:

create table if not exists tweets(
       text string comment 'tweet content',
       created_at int comment 'the time the tweet issued',
       user_id int comment 'user id')
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/home/landen/UntarFile/hive-0.10.0/StorageTable' ;       

相關知識:

 

1、SerDe是Serialize/Deserilize的簡稱,目的是用於序列化和反序列化。

2、用戶在建表時可以用自定義的SerDe或使用Hive自帶的SerDe,SerDe能為表指定列,且對列指定相應的數據。

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type
    [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)]
  INTO num_buckets BUCKETS]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION hdfs_path]

創建指定SerDe表時,使用row format row_format參數,例如:

a、添加jar包。在hive客戶端輸入:hive>add jar  /run/serde_test.jar;
或者在linux shell端執行命令:${HIVE_HOME}/bin/hive  -auxpath  /run/serde_test.jar 
b、建表:create table serde_table row format serde  'hive.connect.TestDeserializer';

3、編寫序列化類TestDeserializer。實現Deserializer接口的三個函數:

a)初始化:initialize(Configuration conf, Properties tb1)。

b)反序列化Writable類型返回Object:deserialize(Writable blob)。

c)獲取deserialize(Writable blob)返回值Object的inspector:getObjectInspector()。

public interface Deserializer {

  /**
   * Initialize the HiveDeserializer.
   * @param conf System properties
   * @param tbl  table properties
   * @throws SerDeException
   */
  public void initialize(Configuration conf, Properties tbl) throws  SerDeException;
  
  /**
   * Deserialize an object out of a Writable blob.
   * In most cases, the return value of this function will be  constant since the function
   * will reuse the returned object.
   * If the client wants to keep a copy of the object, the client  needs to clone the
   * returned value by calling  ObjectInspectorUtils.getStandardObject().
   * @param blob The Writable object containing a serialized object
   * @return A Java object representing the contents in the blob.
   */
  public Object deserialize(Writable blob) throws SerDeException;

  /**
   * Get the object inspector that can be used to navigate through  the internal
   * structure of the Object returned from deserialize(...).
   */
  public ObjectInspector getObjectInspector() throws SerDeException;

}

 

 

 

Hive客戶端執行過程如下:

landen@landen-Lenovo:~/UntarFile/hive-0.10.0$ bin/hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/home/landen/UntarFile/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/landen/hive_job_log_landen_201305051116_1786137407.txt
hive (default)> show databases;
OK
database_name
dataprocess
default
economy
financials
human_resources
login
student
Time taken: 4.411 seconds
hive (default)> use dataprocess;
OK
Time taken: 0.032 seconds
hive (dataprocess)> load data local path '/home/landen/文檔/語料庫/NLPIR——tweets.txt' overwrite into table tweets;
Copying data.......... hive (dataprocess)
> describe tweets; OK col_name data_type comment text string from deserializer created_at int from deserializer user_id int from deserializer Time taken: 0.427 seconds
可以看出導入到JSON格式數據已被(JSONSerDe)反序列化為Hive所能理解的數據格式文件 hive (dataprocess)
> select * from tweets limit 20;//此時還未啟動MapReduce過程 OK text created_at user_id @shizhao,我此前一直用dh的,你問問誰用bluehost借用一下就可以了。一般的小站流量根本沒多大的.. 1177292576 1 可以看了 1177248274 0 你給的鏈接無法查看 1177248218 0 轉移備份,在看iyee關於blognetwork的文章... 1177174402 0 當幫主也不錯啊 1177172873 0 沒人告知 1177172446 0 twitter支持中文了? 原來頭字符不能是中文的.... 1177172440 0 我也要 1177172414 0 @geegi 你在skype上嗎? 1177083182 0 ... 可憐的AMD,但我相信它們比Intel更有錢途 1177082821 0 ..... 並購ATi似乎不在這時候體現吧 1177082690 0 ... 不過就是粘了點改革開放的春風,更多有錢的人不是踢足球的 :( 1177081404 0 @QeeGi 很有理 1177081154 0 ... 不漲工資,還要存款,計划買房,壓力不小,生活如此辛苦 1177080852 0 ........ 偶要去吃kfc 1176980497 0 @hung 雖然顯示面積大了,但感覺不太方便啊 1176961521 0 @hung 你不用書簽欄 1176961395 0 $40-45 million ebay買下StumbleUpon 1176954286 0 ... 加班ing 1176890179 0 ... wjs就是典型的小資,鄙視 1176884977 0 Time taken: 12.161 seconds hive (dataprocess)> select count(*) from tweets;//此時開始啟動MapReduce過程 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.pe
    1. LOAD DATA LOCAL INPATH 'disp_20130204_14_disp1.log' OVERWRITE INTO TABLE disp_log_data; 
r.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201305041640_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0008
Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305041640_0008
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2013-05-05 11:20:50,690 Stage-1 map = 0%,  reduce = 0%
2013-05-05 11:21:36,395 Stage-1 map = 6%,  reduce = 0%
2013-05-05 11:22:02,540 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:03,545 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:04,549 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:05,552 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:06,556 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:07,559 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:08,564 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:09,569 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:10,572 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:11,593 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:13,348 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:14,351 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:15,355 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:16,358 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:17,361 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:18,365 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:19,369 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:20,373 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:21,376 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:22,380 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:23,384 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:24,389 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:26,460 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:27,464 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:28,468 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:29,471 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:30,793 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:32,357 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:33,706 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:34,709 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:36,622 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:37,626 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:38,631 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:39,635 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:40,639 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:41,643 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:42,648 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:43,651 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:44,655 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:45,659 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:46,662 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
2013-05-05 11:22:47,669 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:48,683 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:49,686 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:50,693 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:51,696 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:52,699 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
2013-05-05 11:22:53,705 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:22:54,987 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:22:55,994 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:22:56,998 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:22:58,003 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:22:59,010 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:23:00,017 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:23:01,021 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
2013-05-05 11:23:02,655 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:04,766 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:06,201 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:07,945 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:09,201 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:10,624 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:11,628 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:13,317 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:14,323 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:15,327 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:16,331 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:17,334 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:18,405 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:19,409 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:20,412 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:21,417 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:22,420 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:27,402 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:30,861 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:31,865 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:33,569 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:34,573 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:35,576 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
2013-05-05 11:23:36,630 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:37,635 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:38,671 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:39,676 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:40,683 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:41,691 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
2013-05-05 11:23:42,701 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:43,705 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:44,752 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:45,755 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:46,758 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:47,769 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:48,773 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:49,776 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:50,779 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:51,784 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:52,788 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:53,793 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
2013-05-05 11:23:54,812 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:23:55,831 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:23:56,834 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:23:57,838 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:23:58,843 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:23:59,918 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:00,921 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:01,924 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:02,927 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:03,931 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:04,934 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:05,938 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:06,941 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:07,944 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:08,948 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:09,952 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:10,956 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:11,960 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:12,964 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:13,968 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:14,973 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
2013-05-05 11:24:15,977 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:16,981 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:17,985 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:18,988 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:19,992 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:20,995 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:21,998 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:23,001 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:24,008 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:25,012 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:26,016 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:27,024 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
2013-05-05 11:24:28,028 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:29,034 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:30,037 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:31,043 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:32,046 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:33,049 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
2013-05-05 11:24:34,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:35,058 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:36,061 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:37,065 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:38,068 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:39,072 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
2013-05-05 11:24:40,076 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
MapReduce Total cumulative CPU time: 3 minutes 47 seconds 40 msec
Ended Job = job_201305041640_0008
MapReduce Jobs Launched: 
Job 0: Map: 4  Reduce: 1   Cumulative CPU: 227.04 sec   HDFS Read: 845494724 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 3 minutes 47 seconds 40 msec
OK將下面的配置語句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中hive-contrib-*.jar的路徑為你機器上實際的放置,在$HIVE_INSTALL/lib目錄下尋找。
_c0
4999999(500萬條過濾后的twitter語料庫)
Time taken: 266.063 seconds
hive (dataprocess)> select text,created_at from tweets where user_id = 1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201305041640_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0009
Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305041640_0009
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
2013-05-05 20:45:19,007 Stage-1 map = 0%,  reduce = 0%
2013-05-05 20:45:48,825 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:49,836 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:50,838 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:51,841 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:52,844 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:56,152 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:57,158 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:58,161 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:45:59,163 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:00,166 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:01,169 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:02,200 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:03,203 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:04,206 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:05,208 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:06,212 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:07,215 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:08,219 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:09,225 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:10,227 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:11,231 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:12,234 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:13,237 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:14,239 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:15,242 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:16,244 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:17,247 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:18,250 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:19,256 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:20,260 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:21,263 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:22,266 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:23,277 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:24,279 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:25,282 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:26,286 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:27,290 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:28,292 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:29,296 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:30,298 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:31,301 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:32,303 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:33,306 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
2013-05-05 20:46:34,610 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:35,688 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:36,693 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:37,696 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:38,698 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:39,701 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:40,703 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:41,707 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:42,710 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:43,713 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:44,715 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:45,718 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:46,721 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:47,723 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:48,728 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:49,732 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:50,764 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:51,820 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:52,823 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:53,879 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:54,998 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:56,161 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:57,164 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:58,167 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:46:59,262 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:47:00,811 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
2013-05-05 20:47:02,161 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:03,164 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:04,166 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:05,169 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:08,703 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:09,710 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:10,713 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:11,715 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:12,718 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:13,723 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:14,726 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:15,729 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:16,732 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:17,737 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:18,739 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:19,745 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:20,749 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:21,754 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:22,757 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:23,760 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:24,763 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:25,766 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:26,770 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:27,778 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:28,781 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:29,784 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:30,788 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:31,791 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:32,795 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:33,798 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
2013-05-05 20:47:34,964 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
2013-05-05 20:47:35,967 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
2013-05-05 20:47:37,161 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
2013-05-05 20:47:38,173 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
2013-05-05 20:47:39,176 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
2013-05-05 20:47:40,244 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:41,247 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:42,249 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:43,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:44,322 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:45,325 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:46,327 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:47,330 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:48,333 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
2013-05-05 20:47:49,644 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 158.69 sec
MapReduce Total cumulative CPU time: 2 minutes 38 seconds 690 msec
Ended Job = job_201305041640_0009
MapReduce Jobs Launched: 
Job 0: Map: 4   Cumulative CPU: 158.69 sec   HDFS Read: 845494724 HDFS Write: 138 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 38 seconds 690 msec
OK
text created_at @shizhao,我此前一直用dh的,你問問誰用bluehost借用一下就可以了。一般的小站流量根本沒多大的..  1177292576
Time taken: 172.857 seconds
hive (dataprocess)> 

Conclusion

The SerDe interface is extremely powerful for dealing with data with a complex schema. By utilizing SerDes, any dataset can be made queryable through Hive.

參考資料:

(How-to: Use a SerDe in Apache Hive) http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

(Hive中SerDe概述) http://blog.csdn.net/dajuezhao/article/details/5753791

(大數據解決方案設計) http://www.infoq.com/cn/articles/BigDataBlueprint

(把JSON格式的數據儲存到MongDB中) http://www.myexception.cn/database/502613.html

 

 

 

 

 

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM