hive自帶的sql查詢最終是轉化成mapreduce任務進行。
如何轉換的呢?
分為如下幾步:
1、antlr詞法解析器解析SQL成Abstract Syntax Tree即AST樹
2、基於AST樹解析成若干query block即QB,QB和QBParseInfo其實是hive源碼里面的Bean,是SQL解析的關鍵類
下邊是QB的關鍵幾個成員如下:
public class QB { private static final Log LOG = LogFactory.getLog("hive.ql.parse.QB"); private final int numJoins = 0; private final int numGbys = 0; private int numSels = 0; private int numSelDi = 0; private HashMap<String, String> aliasToTabs; private HashMap<String, QBExpr> aliasToSubq; private HashMap<String, Map<String, String>> aliasToProps; private List<String> aliases; private QBParseInfo qbp; private QBMetaData qbm; private QBJoinTree qbjoin; private String id; private boolean isQuery; private boolean isAnalyzeRewrite; private CreateTableDesc tblDesc = null; // table descriptor of the final private CreateTableDesc directoryDesc = null ; private List<Path> encryptedTargetTablePaths; ......
aliasToSubq(表示QB類的aliasToSubq屬性)保存子查詢的QB對象,aliasToSubq key值是子查詢的別名
qbp 即QBParseInfo保存一個基本SQL單元中的給個操作部分的AST Tree結構,
qbm保存每個輸入表的元信息,比如表在HDFS上的路徑,保存表數據的文件格式等。
下面是QBParseInfo主要成員:
public class QBParseInfo { private final boolean isSubQ; private final String alias; private ASTNode joinExpr; private ASTNode hints; private final HashMap<String, ASTNode> aliasToSrc; /** * insclause-0 -> TOK_TAB ASTNode */ private final HashMap<String, ASTNode> nameToDest; /** * For 'insert into FOO(x,y) select ...' this stores the * insclause-0 -> x,y mapping */ private final Map<String, List<String>> nameToDestSchema; private final HashMap<String, TableSample> nameToSample; private final Map<ASTNode, String> exprToColumnAlias; private final Map<String, ASTNode> destToSelExpr; private final HashMap<String, ASTNode> destToWhereExpr; private final HashMap<String, ASTNode> destToGroupby; private final Set<String> destRollups; private final Set<String> destCubes; private final Set<String> destGroupingSets; private final Map<String, ASTNode> destToHaving; private final HashSet<String> insertIntoTables; ......
QBParseInfo下面的JoinExpr保存TOK_JOIN節點。QB#QBJoinTree是對Join語法樹的結構化。
QBParseInfo#nameToDest這個HashMap保存查詢單元的輸出,key的形式是inclause-i(由於Hive 支持Multi Insert語句,所以可能有多個輸出),value是對應的ASTNode節點,即TOK_DESTINATION節點。
類QBParseInfo其余 HashMap屬性分別保存輸出和各個操作的ASTNode節點的對應關系。
3、通過SemanticAnalyzer分析類把QB解析成Operator操作樹
4、優化操作樹
5、基於OperatorTree解析成MapReduce任務
其中antlr是一個開源的詞法解析器,AST樹也不是什么特殊的樹,就是普通的樹結構。