簡介
PostgreSQL查詢優化器執行過程
- 語法分析:生成查詢樹
- 語義檢查:對SQL表達的語義進行檢查
- 查詢優化
- 視圖重寫
- 邏輯優化:子查詢優化,條件化簡,等價謂詞重寫,連接消除,得到邏輯計划
- 物理優化:基於代價優化,得到物理計划。PostgreSQL主要采用動態規划和遺傳算法
- 非SPJ優化:主要針對分組,排序,去重等操作
- 查詢計划執行
在PostgreSQL中,語法樹並不是一棵樹狀結構的,把關系平面化到一個鏈表里面。因為,PostgreSQL認為,在這個階段不清楚表之間如何鏈接。
重要數據結構
查詢語法樹
typedef struct Query
{
//上面還有節點類型,是否存在相關子句
List *cteList; /* WITH 子句 */
List *rtable; /* list of range table entries */
FromExpr *jointree; /* table join tree (FROM and WHERE clauses) */
List *targetList; /* target list (of TargetEntry) */
List *returningList; /* return-values list (of TargetEntry) */
List *groupClause; /* a list of SortGroupClause's */
Node *havingQual; /* qualifications applied to groups */
List *windowClause; /* 窗口函數子句鏈表 */
List *distinctClause; /* a list of SortGroupClause's */
List *sortClause; /* a list of SortGroupClause's */
Node *limitOffset; /* limit的offset子句 */
Node *limitCount; /* limit的個數*/
Node *setOperations; /* 是否為多個SQL UNION/INTERSECT/EXCEPT query */
} Query;
范圍表(優化前)
表示被查詢的對象,可以是一張表,一個From子句中的子查詢,一個連接操作的結果
typedef struct RangeTblEntry
{
//普通表
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
struct TableSampleClause *tablesample; /* sampling info, or NULL */
//子查詢
Query *subquery; /* the sub-query */
bool security_barrier; /* is from security_barrier view?如果是視圖展開的子查詢,PostgreSQL不做優化 */
//連接類型
JoinType jointype; /* type of join */
List *joinaliasvars; /* list of alias-var expansions */
} RangeTblEntry;
關系優化信息(優化過程中)
對應PlannerInfo結構體的兩個成員(simple_rel_array和join_rel_list),是優化階段的操作對象,具有查詢優化的相關信息
typedef struct RelOptInfo
{
/* all relations included in this RelOptInfo */
Relids relids; /* set of base relids (rangetable indexes) */
/* 估算結果的行數 */
double rows;
/* materialization information */
List *pathlist; /* 存放所有可能的路徑 */
List *ppilist; /* ParamPathInfos used in pathlist */
List *partial_pathlist; /* partial Paths */
/* 局部最優不一定后面最優,上一層的3個可能的最優結果 */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
//本關系為單表或者join
/* used by various scans and joins: */
List *baserestrictinfo; /* RestrictInfo structures (if base
* rel) */
QualCost baserestrictcost; /* cost of evaluating the above */
List *joininfo; /* RestrictInfo structures for join clauses
* involving this rel */
bool has_eclass_joins; /* T means joininfo is incomplete */
} RelOptInfo;
計划節點信息
全局查詢優化計划的相關信息,存放在PlannerInfo結構體
typedef struct PlannerInfo
{
Query *parse; /* 開始時的查詢計划樹 */
PlannerGlobal *glob; /* global info for current planner run */
Index query_level; /* 本計划所處的層數*/
struct PlannerInfo *parent_root; /* NULL at outermost Query */
struct RelOptInfo **simple_rel_array; /* 所有基本表信息 */
int simple_rel_array_size; /* allocated size of array */
RangeTblEntry **simple_rte_array; /* rangetable as an array */
//考慮過連接后生成的新關系
List *join_rel_list; /* list of join-relation RelOptInfos */
struct HTAB *join_rel_hash; /* optional hashtable for join relations */
List **join_rel_level; /*結果關系*/
int join_cur_level; /* index of list being extended */
} PlannerInfo;
計划節點
代表根據最有路徑,生成的物理計划(Plan)
typedef struct Plan
{
/*
* estimated execution costs for plan (see costsize.c for more info)
*/
Cost startup_cost; /* cost expended before fetching any tuples */
Cost total_cost; /* total cost (assuming all tuples fetched) */
/*
* 估計的元組數和元組寬度
*/
double plan_rows;
int plan_width;
/*
* Common structural data for all Plan types.
*/
int plan_node_id; /* unique across entire final plan tree */
List *targetlist; /* target list to be computed at this node */
List *qual; /* implicitly-ANDed qual conditions */
struct Plan *lefttree; /* input plan tree(s) */
struct Plan *righttree;
List *initPlan; /* Init Plan nodes (un-correlated expr
* subselects) */
} Plan;
PlannedStmt
優化器結果,保存查詢執行計划,范圍表相關信息
SelectStmt
語法分析結果
typedef struct SelectStmt
{
List *distinctClause; /* distinct子句*/
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* 投影列子句 */
List *fromClause; /* From子句,包括join */
Node *whereClause; /* where子句 */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
List *lockingClause; /* FOR UPDATE (list of LockingClause's) */
} SelectStmt;
結構體關系
- PlannerInfo是邏輯優化的主要產物,擁有查詢樹(Query),關系優化信息,約束條件
- 路徑是物理優化階段的主要產物,擁有排序鍵和連接節點
- PlannerInfo和路徑混雜在一起
- 查詢執行計划Plan是所有路徑的最小代價生成的
代碼入口
planner:主入口函數
PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
PlannedStmt *result;
if (planner_hook)
result = (*planner_hook) (parse, cursorOptions, boundParams);
else
result = standard_planner(parse, cursorOptions, boundParams);
return result;
}
被主函數調用
standard_planner——標准的查詢優化器入口
standard_planner只是查詢優化器的外殼,通過調用subquery_planner完成查詢優化,通過調用set_plan_references完成清理輔助工作
PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
PlannedStmt *result;//結果
PlannerGlobal *glob;//查詢優化一些所有子查詢需要的公共信息
double tuple_fraction;//
PlannerInfo *root;
RelOptInfo *final_rel;
Path *best_path;
Plan *top_plan;
ListCell *lp,
*lr;
···
/* primary planning entry point (may recurse for subqueries) */
root = subquery_planner(glob, parse, NULL,
false, tuple_fraction);
/* Select best Path and turn it into a Plan */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
top_plan = create_plan(root, best_path);
···
/* final cleanup of the plan */
Assert(glob->finalrtable == NIL);
Assert(glob->finalrowmarks == NIL);
Assert(glob->resultRelations == NIL);
top_plan = set_plan_references(root, top_plan);
/* ... and the subplans (both regular subplans and initplans) */
Assert(list_length(glob->subplans) == list_length(glob->subroots));
forboth(lp, glob->subplans, lr, glob->subroots)
{
Plan *subplan = (Plan *) lfirst(lp);
PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);
lfirst(lp) = set_plan_references(subroot, subplan);
}
···
/* build the PlannedStmt result */
result = makeNode(PlannedStmt);
result->commandType = parse->commandType;
result->queryId = parse->queryId; //parse結果
result->planTree = top_plan; //查詢計划
result->rtable = glob->finalrtable; //范圍表
result->resultRelations = glob->resultRelations; //
return result;
}
- subquery_planner返回邏輯優化和物理優化結果root(PlannerInfo *)
- create_plan根據最優路徑,和PlannerInfo生成物理執行計划Plan
- set_plan_references對執行計划部分調整和清理
subquery_planner生成(子)查詢執行計划的函數
subquery_planner分為兩步。第一步是邏輯優化;第二步是物理優化
//傳入glob,parse,typle_fraction,parent_root最開始為NULL
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
PlannerInfo *parent_root,
bool hasRecursion, double tuple_fraction)
{
PlannerInfo *root;
List *newWithCheckOptions;
List *newHaving;
bool hasOuterJoins;
RelOptInfo *final_rel;
ListCell *l;
/* 為當前子查詢創建PlannerInfo */
root = makeNode(PlannerInfo);
root->parse = parse;
root->glob = glob;
root->query_level = parent_root ? parent_root->query_level + 1 : 1;
root->parent_root = parent_root;
root->hasRecursion = hasRecursion;
if (hasRecursion)
root->wt_param_id = SS_assign_special_param(root);
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
/*
* Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
* to transform them into joins. Note that this step does not descend
* into subqueries; if we pull up any subqueries below, their SubLinks are
* processed just before pulling them up.
*子連接:在where和join子句含有ANY和EXISTS
*/
if (parse->hasSubLinks)
pull_up_sublinks(root);
//上拉子查詢
pull_up_subqueries(root);
//子查詢合並
if (parse->setOperations)
flatten_simple_union_all(root);
//上拉子查詢后處理繼承關系
preprocess_rowmarks(root);
expand_inherited_tables(root);
//條件化簡
preprocess_expression
//合並having子句到where子句,如果having里面含有聚集函數,易失函數,子查詢不能合並
//消除外連接
if (hasOuterJoins)
reduce_outer_joins(root);
/*
* Do the main planning. If we have an inherited target relation, that
* needs special processing, else go straight to grouping_planner.
*/
if (parse->resultRelation &&
rt_fetch(parse->resultRelation, parse->rtable)->inh)
//含有繼承關系的物理優化
inheritance_planner(root);
else
//物理優化
grouping_planner(root, false, tuple_fraction);
set_cheapest(final_rel);
return root;
}
- 邏輯優化
- 處理CTE表達式(ss_process_ctes)
- 上拉子連接
- 上拉子查詢
- Union all處理:flatten_simple_union_all
- 處理for update(row lock):preprocess_rowmark
- 繼承表處理(expand_inherited_tables)
- 處理目標列(prepocess_expression)
- 處理withCheckOptions:prepocess_expression
- 處理return 表達式,window子句,limit off子句:prepocess_expression
- 合並having到where子句
- 消除外連接
- 物理優化:生成本成查詢PlanInfo的三條最優路徑,返回給上層
整體流程
- subquery_planner
- 處理CTE表達式
- 上拉子鏈接(去除in, some ,exist)
- 上拉子查詢
- 預處理表達式(and/or, 計算明顯的結果, 處理不能上拉的子連接)
- 消除外連接
- grouping_planer
- 處理集合(生成子查詢)
- 處理非集合,調整order, group, target list之句中的順序
- query_planner
- 構建基本表的RelOptInfo
- 選擇下推
- 投影下推
- 推導隱含表達式
- 生成pathkey
- make_one_rel(通過數據直方圖計算選擇率)
- 處理單表最優的查詢方式
- 處理兩表join最優的方式(判定是否special join)
- 動態規划或者遺傳算法構建多表Join
- 獲取cheatest_path,再加上其他子句
