PostgreSQL查詢優化簡介


簡介

PostgreSQL查詢優化器執行過程

  1. 語法分析:生成查詢樹
  2. 語義檢查:對SQL表達的語義進行檢查
  3. 查詢優化
    1. 視圖重寫
    2. 邏輯優化:子查詢優化,條件化簡,等價謂詞重寫,連接消除,得到邏輯計划
    3. 物理優化:基於代價優化,得到物理計划。PostgreSQL主要采用動態規划和遺傳算法
    4. 非SPJ優化:主要針對分組,排序,去重等操作
  4. 查詢計划執行

在PostgreSQL中,語法樹並不是一棵樹狀結構的,把關系平面化到一個鏈表里面。因為,PostgreSQL認為,在這個階段不清楚表之間如何鏈接。

重要數據結構

查詢語法樹

typedef struct Query
{
    //上面還有節點類型,是否存在相關子句
    List	   *cteList;		/* WITH 子句 */
    List	   *rtable;			/* list of range table entries */
    FromExpr   *jointree;       /* table join tree (FROM and WHERE clauses) */
    List	   *targetList;		/* target list (of TargetEntry) */
    List	   *returningList;	/* return-values list (of TargetEntry) */
    List	   *groupClause;	/* a list of SortGroupClause's */
    Node	   *havingQual;		/* qualifications applied to groups */
    List	   *windowClause;	/* 窗口函數子句鏈表 */
    List	   *distinctClause; /* a list of SortGroupClause's */
    List	   *sortClause;		/* a list of SortGroupClause's */
    
    Node	   *limitOffset;	/* limit的offset子句 */
    Node	   *limitCount;		/* limit的個數*/
    Node	   *setOperations;	/* 是否為多個SQL UNION/INTERSECT/EXCEPT query */
} Query;

范圍表(優化前)

表示被查詢的對象,可以是一張表,一個From子句中的子查詢,一個連接操作的結果

typedef struct RangeTblEntry
{
    //普通表
    Oid			relid;			/* OID of the relation */
    char		relkind;		/* relation kind (see pg_class.relkind) */
    struct TableSampleClause *tablesample;		/* sampling info, or NULL */
    
    //子查詢
    Query	   *subquery;		/* the sub-query */
    bool		security_barrier;		/* is from security_barrier view?如果是視圖展開的子查詢,PostgreSQL不做優化 */
    
    //連接類型
    JoinType	jointype;		/* type of join */
    List	   *joinaliasvars;	/* list of alias-var expansions */
} RangeTblEntry;

關系優化信息(優化過程中)

對應PlannerInfo結構體的兩個成員(simple_rel_array和join_rel_list),是優化階段的操作對象,具有查詢優化的相關信息

typedef struct RelOptInfo
{
    /* all relations included in this RelOptInfo */
    Relids		relids;			/* set of base relids (rangetable indexes) */
    
    /* 估算結果的行數 */
    double		rows;
    
    /* materialization information */
    List	   *pathlist;		/* 存放所有可能的路徑 */
    List	   *ppilist;		/* ParamPathInfos used in pathlist */
    List	   *partial_pathlist;		/* partial Paths */
    
    /* 局部最優不一定后面最優,上一層的3個可能的最優結果 */
    struct Path *cheapest_startup_path;
    struct Path *cheapest_total_path;
    struct Path *cheapest_unique_path;
    List	   *cheapest_parameterized_paths;
    
    //本關系為單表或者join
    
    /* used by various scans and joins: */
    List	   *baserestrictinfo;		/* RestrictInfo structures (if base
    									 * rel) */
    QualCost	baserestrictcost;		/* cost of evaluating the above */
    List	   *joininfo;		/* RestrictInfo structures for join clauses
    							 * involving this rel */
    bool		has_eclass_joins;		/* T means joininfo is incomplete */
    } RelOptInfo;

計划節點信息

全局查詢優化計划的相關信息,存放在PlannerInfo結構體

typedef struct PlannerInfo
{
    Query	   *parse;			/* 開始時的查詢計划樹 */
    PlannerGlobal *glob;		/* global info for current planner run */
    Index		query_level;	/* 本計划所處的層數*/
    
    struct PlannerInfo *parent_root;	/* NULL at outermost Query */
    
    struct RelOptInfo **simple_rel_array;		/* 所有基本表信息 */
    int			simple_rel_array_size;	/* allocated size of array */
    RangeTblEntry **simple_rte_array;	/* rangetable as an array */
    
    //考慮過連接后生成的新關系
    List	   *join_rel_list;	/* list of join-relation RelOptInfos */
    struct HTAB *join_rel_hash; /* optional hashtable for join relations */
    
    List	  **join_rel_level; /*結果關系*/
    int			join_cur_level; /* index of list being extended */
    
} PlannerInfo;

計划節點

代表根據最有路徑,生成的物理計划(Plan)

typedef struct Plan
{
    /*
     * estimated execution costs for plan (see costsize.c for more info)
     */
    Cost		startup_cost;	/* cost expended before fetching any tuples */
    Cost		total_cost;		/* total cost (assuming all tuples fetched) */
    
    /*
     * 估計的元組數和元組寬度
     */
    double		plan_rows;
    int			plan_width;	
    
    /*
     * Common structural data for all Plan types.
     */
    int			plan_node_id;	/* unique across entire final plan tree */
    List	   *targetlist;		/* target list to be computed at this node */
    List	   *qual;			/* implicitly-ANDed qual conditions */
    struct Plan *lefttree;		/* input plan tree(s) */
    struct Plan *righttree;
    List	   *initPlan;		/* Init Plan nodes (un-correlated expr
    							 * subselects) */
} Plan;

PlannedStmt

優化器結果,保存查詢執行計划,范圍表相關信息

SelectStmt

語法分析結果

typedef struct SelectStmt
{
    List	   *distinctClause; /* distinct子句*/
    IntoClause *intoClause;		/* target for SELECT INTO */
    List	   *targetList;		/* 投影列子句 */
    List	   *fromClause;		/* From子句,包括join */
    Node	   *whereClause;	/* where子句 */
    List	   *groupClause;	/* GROUP BY clauses */
    Node	   *havingClause;	/* HAVING conditional-expression */
    List	   *windowClause;	/* WINDOW window_name AS (...), ... */
    
    List	   *sortClause;		/* sort clause (a list of SortBy's) */
    Node	   *limitOffset;	/* # of result tuples to skip */
    Node	   *limitCount;		/* # of result tuples to return */
    List	   *lockingClause;	/* FOR UPDATE (list of LockingClause's) */
} SelectStmt;

結構體關系

  1. PlannerInfo是邏輯優化的主要產物,擁有查詢樹(Query),關系優化信息,約束條件
  2. 路徑是物理優化階段的主要產物,擁有排序鍵和連接節點
  3. PlannerInfo和路徑混雜在一起
  4. 查詢執行計划Plan是所有路徑的最小代價生成的

代碼入口

planner:主入口函數

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
    PlannedStmt *result;
    
    if (planner_hook)
    	result = (*planner_hook) (parse, cursorOptions, boundParams);
    else
    	result = standard_planner(parse, cursorOptions, boundParams);
    return result;
}

被主函數調用

standard_planner——標准的查詢優化器入口

standard_planner只是查詢優化器的外殼,通過調用subquery_planner完成查詢優化,通過調用set_plan_references完成清理輔助工作

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
    PlannedStmt *result;//結果
    PlannerGlobal *glob;//查詢優化一些所有子查詢需要的公共信息
    double		tuple_fraction;//
    PlannerInfo *root;
    RelOptInfo *final_rel;
    Path	   *best_path;
    Plan	   *top_plan;
    ListCell   *lp,
    		   *lr;

···    
    
    /* primary planning entry point (may recurse for subqueries) */
    root = subquery_planner(glob, parse, NULL,
    						false, tuple_fraction);
    
    /* Select best Path and turn it into a Plan */
    final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
    
    top_plan = create_plan(root, best_path);

···        
   
    /* final cleanup of the plan */
    Assert(glob->finalrtable == NIL);
    Assert(glob->finalrowmarks == NIL);
    Assert(glob->resultRelations == NIL);
    top_plan = set_plan_references(root, top_plan);
    /* ... and the subplans (both regular subplans and initplans) */
    Assert(list_length(glob->subplans) == list_length(glob->subroots));
    forboth(lp, glob->subplans, lr, glob->subroots)
    {
    	Plan	   *subplan = (Plan *) lfirst(lp);
    	PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);
    
    	lfirst(lp) = set_plan_references(subroot, subplan);
    }

···        
    /* build the PlannedStmt result */
    result = makeNode(PlannedStmt);
    
    result->commandType = parse->commandType;
    result->queryId = parse->queryId;   //parse結果
    result->planTree = top_plan;    //查詢計划
    result->rtable = glob->finalrtable; //范圍表
    result->resultRelations = glob->resultRelations;    //
    
    return result;
}
  1. subquery_planner返回邏輯優化和物理優化結果root(PlannerInfo *)
  2. create_plan根據最優路徑,和PlannerInfo生成物理執行計划Plan
  3. set_plan_references對執行計划部分調整和清理

subquery_planner生成(子)查詢執行計划的函數

subquery_planner分為兩步。第一步是邏輯優化;第二步是物理優化

//傳入glob,parse,typle_fraction,parent_root最開始為NULL
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
				 PlannerInfo *parent_root,
				 bool hasRecursion, double tuple_fraction)
{
    PlannerInfo *root;
    List	   *newWithCheckOptions;
    List	   *newHaving;
    bool		hasOuterJoins;
    RelOptInfo *final_rel;
    ListCell   *l;
    
    /* 為當前子查詢創建PlannerInfo */
    root = makeNode(PlannerInfo);
    root->parse = parse;
    root->glob = glob;
    root->query_level = parent_root ? parent_root->query_level + 1 : 1;
    root->parent_root = parent_root;
    root->hasRecursion = hasRecursion;
    if (hasRecursion)
    	root->wt_param_id = SS_assign_special_param(root);
    else
    	root->wt_param_id = -1;
    root->non_recursive_path = NULL;
    
    /*
     * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
     * to transform them into joins.  Note that this step does not descend
     * into subqueries; if we pull up any subqueries below, their SubLinks are
     * processed just before pulling them up.
     *子連接:在where和join子句含有ANY和EXISTS
     */
    if (parse->hasSubLinks)
    	pull_up_sublinks(root);
    
    //上拉子查詢
    pull_up_subqueries(root);
    
    //子查詢合並
    if (parse->setOperations)
    	flatten_simple_union_all(root);
    
    //上拉子查詢后處理繼承關系
    preprocess_rowmarks(root);
    expand_inherited_tables(root);

    //條件化簡
    preprocess_expression

    //合並having子句到where子句,如果having里面含有聚集函數,易失函數,子查詢不能合並    

    //消除外連接
    if (hasOuterJoins)
    	reduce_outer_joins(root);
    
    /*
     * Do the main planning.  If we have an inherited target relation, that
     * needs special processing, else go straight to grouping_planner.
     */
    if (parse->resultRelation &&
    	rt_fetch(parse->resultRelation, parse->rtable)->inh)
    	//含有繼承關系的物理優化
    	inheritance_planner(root);
    else
        //物理優化
    	grouping_planner(root, false, tuple_fraction);
    

    set_cheapest(final_rel);
    
    return root;
}
  1. 邏輯優化
    1. 處理CTE表達式(ss_process_ctes)
    2. 上拉子連接
    3. 上拉子查詢
    4. Union all處理:flatten_simple_union_all
    5. 處理for update(row lock):preprocess_rowmark
    6. 繼承表處理(expand_inherited_tables)
    7. 處理目標列(prepocess_expression)
    8. 處理withCheckOptions:prepocess_expression
    9. 處理return 表達式,window子句,limit off子句:prepocess_expression
    10. 合並having到where子句
    11. 消除外連接
  2. 物理優化:生成本成查詢PlanInfo的三條最優路徑,返回給上層

整體流程

  • subquery_planner
    • 處理CTE表達式
    • 上拉子鏈接(去除in, some ,exist)
    • 上拉子查詢
    • 預處理表達式(and/or, 計算明顯的結果, 處理不能上拉的子連接)
    • 消除外連接
    • grouping_planer
      • 處理集合(生成子查詢)
      • 處理非集合,調整order, group, target list之句中的順序
      • query_planner
        • 構建基本表的RelOptInfo
        • 選擇下推
        • 投影下推
        • 推導隱含表達式
        • 生成pathkey
        • make_one_rel(通過數據直方圖計算選擇率)
          • 處理單表最優的查詢方式
          • 處理兩表join最優的方式(判定是否special join)
          • 動態規划或者遺傳算法構建多表Join
      • 獲取cheatest_path,再加上其他子句


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM