深入淺出 Vue.js 第九章解析器---學習筆記

本文轉載自查看原文 2019-06-23 21:37 579 vue/ 解析器

本文結合 Vue 源碼進行學習

學習時，根據 github 上 Vue 項目的 package.json 文件，可知版本為 2.6.10

解析器

一、解析器的作用

解析器的作用就是將模版解析成 AST（抽象語法樹）

在 Vue 中，解析 template 里面的 DOM 元素轉換出來的 AST，是一個 Javascript 對象

該 AST 是使用 JavaScript 中的對象來描述一個節點

一個對象表示一個節點，對象中的屬性用來保存節點所需的各種數據

parent 屬性用來保存父節點的描述對象，children 屬性是一個數組，保存了多個子節點的描述對象

多個獨立的節點通過 parent 屬性和 children 屬性連在一起時，就變成了一棵樹，而這樣一個用對象描述的節點樹就稱之為 AST （抽象語法樹）

例子：
html 元素

<div>
    <p>{{ name }}</p>
</div>

經過解析變成下面格式，即轉換成了 AST

{
    tag: 'div',
    type: 1,
    staticRoot: false,
    static: false,
    plain: true,
    parent: undefined,
    attrsList: [],
    attrsMap: {},
    children: [
        {
            tag: 'p',
            type: 1,
            staticRoot: false,
            static: false,
            plain: true,
            parent: {
                tag: 'div',
                ...
            },
            attrsList: [],
            attrsMap: {},
            children: [
                {
                    type: 2,
                    text: '{{ name }}',
                    static: false,
                    expression: '_s(name)',
                }
            ]
        }
    ]
}

vue-html_parser 解析器 AST 生成示例

二、解析器內部運行的原理

Vue 內部有多個解析器，看下圖 filter 過濾解析器、html 解析器、text 文本解析器

Vue 解析器

這邊講解 html 解析器

html 解析器解析 html 元素，解析過程中，會不斷的觸發各種鈎子函數

鈎子函數有：

開始標簽鈎子函數

結束標簽鈎子函數

文本鈎子函數

注釋鈎子函數

parseHTML(html, {
    /**
     * @param {string}  tagName 解析到的開始標簽名，如 <div></div> 中開始標簽 <div> 中的div
     * @param {Array}   attrs   解析到的開始標簽上的屬性，如 [{name: 'class', value: 'className'}]
     * @param {Boolean} unary   標簽是否時自閉合標簽， true 或者 false
     * @param {Number}  start   解析到的開始標簽在需要解析的 html 模版中所占的開始位置
     * @param {Number}  end     解析到的開始標簽在需要解析的 html 模版中所占的結束位置
     */
    start(tagName, attrs, unary, start, end) {
        // 每當解析到標簽的開始位置時，觸發該函數
    },
    /**
     * @param {string} tagName  解析到的結束標簽名，如 <div></div> 中結束標簽 </div> 中的div
     * @param {Number} start    解析到的結束標簽在需要解析的 html 模版中所占的開始位置
     * @param {Number} end      解析到的結束標簽在需要解析的 html 模版中所占的結束位置
     */
    end(tagName, start, end) {
        // 每當解析到標簽的結束位置時，觸發該函數
    },
    /**
     * @param {string} text  解析到的純文本，如 <p>我是純文本</p> 中 p 標簽包含的純文本
     * @param {Number} start 解析到的純文本在需要解析的 html 模版中所占的開始位置。注：不一定有，可能沒傳
     * @param {Number} end   解析到的純文本在需要解析的 html 模版中所占的結束位置。注：不一定有，可能沒傳
     */
    chars(text, start?, end?) {
        // 每當解析到文本時，觸發該函數
    },
    /**
     * @param {string} text  解析到的注釋，如 <!-- 我是注釋 -->。text經過處理，截取了注釋箭頭中的純文本
     * @param {Number} start 解析到的注釋在需要解析的 html 模版中所占的開始位置
     * @param {Number} end   解析到的注釋在需要解析的 html 模版中所占的結束位置
     */
    comment(text, start, end) {
        // 每當解析到注釋時，觸發該函數
    }
})

例子：

<div>
    <p>我是文本</p>
</div>

解析上面的模版，從前向后解析，依次觸發 start、start、chars、end、end 鈎子函數


解析到	<div>	觸發 start
解析到	<p>	觸發 start
解析到	我是文本	觸發 chars
解析到	</p>	觸發 end
解析到	</div>	觸發 end

各個鈎子函數如何構建 AST 節點?

start 鈎子函數

// /src/compiler/parse/index.js
export function createASTElement (
  tag: string,
  attrs: Array<ASTAttr>,
  parent: ASTElement | void
): ASTElement {
  return {
    type: 1,
    tag,
    attrsList: attrs,
    attrsMap: makeAttrsMap(attrs),
    rawAttrsMap: {},
    parent,
    children: []
  }
}
parseHTML(template, {
    start(tag, attrs, unary, start, end) {
        let element: ASTElement = createASTElement(tag, attrs, currentParent)
    }
})

end 鈎子函數

// /src/compiler/parse/index.js
function closeElement (element) {
    // ...

    currentParent.children.push(element)
    element.parent = currentParent

    // ...
}
parseHTML(template, {
    end(tag, start, end) {
        const element = stack[stack.length - 1]
        // pop stack
        stack.length -= 1
        currentParent = stack[stack.length - 1]
        closeElement(element)
    }
})

chars 鈎子函數

// /src/compiler/parse/index.js
parseHTML(template, {
    chars(text, start, end) {
        let child: ASTNode = {
            type: 3,
            text
        }
    }
})

comment 鈎子函數

// /src/compiler/parse/index.js
parseHTML(template, {
    start(text, start, end) {
        const child: ASTText = {
          type: 3,
          text,
          isComment: true
        }
    }
})

上面構建出來的節點是獨立的

我們需要一套邏輯把這些節點連起來，構成一個真正的 AST

下面介紹一下如何構建 AST 層級關系

解析 html 的時候，我們需要維護一個棧（stack），用 stack 來記錄層級關系，也可以理解為 DOM 的深度

每當遇到開始標簽，觸發 start 鈎子函數；每當遇到結束標簽，觸發 end 鈎子函數。

基於以上情況，我們在觸發 start 鈎子函數時，將當前構建的節點推入 stack 中；觸發 end 鈎子函數時，從 stack 中彈出一個節點。

這樣就可以保證每當觸發 start 鈎子函數時，stack 的最后一個節點就是當前正在構建的節點的父節點

例子：

<div>
    <h1>我是大標題</h1>
    <p>我是文本</p>
</div>

解析時具體細節

解析時候的 html 模版	解析到	解析后的stack	解析后的AST	解析后
`<div>&nbsp<h1>我是大標題</h1>&nbsp<p>我是文本</p></div>`	解析到 `<div>`	div	{ tag: 'div' }	模版中 `<div>` 被截取掉
`&nbsp<h1>我是大標題</h1>&nbsp<p>我是文本</p></div>`	解析到空格	div	{ tag: 'div' }	模版中空格被截取掉
`<h1>我是大標題</h1>&nbsp<p>我是文本</p></div>`	解析到 `<h1>`	div h1	{ tag: 'div', children:[ { tag: 'h1' } ] }	模版中 `<h1>` 被截取掉
`我是大標題</h1>&nbsp<p>我是文本</p></div>`	解析到我是大標題	div h1	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] } ] }	模版中我是大標題被截取掉
`</h1>&nbsp<p>我是文本</p></div>`	解析到 `</h1>`	div	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] } ] }	模版中 `</h1>` 被截取掉
`&nbsp<p>我是文本</p></div>`	解析到空格	div	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] } ] }	模版中空格被截取掉
`<p>我是文本</p></div>`	解析到 `<p>`	div p	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] }, { tag: 'p' } ] }	模版中 `<p>` 被截取掉
`我是文本</p></div>`	解析到我是文本	div p	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] }, { tag: 'p', children: [ { text: '我是文本' } ] } ] }	模版中我是文本被截取掉
`</p></div>`	解析到 `</p>`	div	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] }, { tag: 'p', children: [ { text: '我是文本' } ] } ] }	模版中 `</p>` 被截取掉
`</div>`	解析到 `<div>`	-	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] }, { tag: 'p', children: [ { text: '我是文本' } ] } ] }	模版中 `</div>` 被截取掉
-	html 模版為空，解析完成	-	{ tag: 'div', children:[ { tag: 'h1', children: [ { text: '我是大標題' } ] }, { tag: 'p', children: [ { text: '我是文本' } ] } ] }	-

三、HTML解析器

運行原理

解析 html 模版，就是循環處理 html 模版字符串的過程

每輪循環都從 html 模版截取一小段字符串，做相應處理，然后重復該過程

直到 html 模版字符串被截空時，結束循環，解析完畢

循環過程如上面的構建 AST 關系的解析時具體細節
循環 html 模版偽代碼如下：

function parseHTML(html, options) {
    while (html) {
        // 截取 html 模版字符串，並根據截取的字符串類型，觸發相應鈎子函數
    }
}

截取的每一小段字符串，有可能是：

開始標簽/結束標簽/文本/注釋

根據截取到的字符串的類型觸發相應的鈎子函數

Vue 中通過正則來匹配這幾種字符串類型

// src/core/util/lang.js
const unicodeRegExp = /a-zA-Z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D\u203F-\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD/

// src/compiler/parser/html-parser.js
// Regular Expressions for parsing tags and attributes
const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const dynamicArgAttribute = /^\s*((?:v-[\w-]+:|@|:|#)\[[^=]+\][^\s"'<>\/=]*)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}]*`
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`) // 開始標簽部分，不包含開始標簽的結尾。如 <div class="className" ></div>，匹配的是 '<div class="className"'
const startTagClose = /^\s*(\/?)>/ // 開始標簽的結尾部分。如 <div class="className" ></div>，匹配的是 ' >'
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`) // '</div><p></p>' 匹配結果為 </div>
const doctype = /^<!DOCTYPE [^>]+>/i // 匹配 DOCTYPE
const comment = /^<!\--/ // 匹配注釋
const conditionalComment = /^<!\[/ // 匹配條件注釋

下面具體分析截取各種字符串類型的情況

截取開始標簽

首先判斷 html 模版是否以 < 開頭
以 < 開頭的有四種可能：
注釋
條件注釋
開始標簽
結束標簽

使用匹配開始標簽的正則

// src/core/util/lang.js
const unicodeRegExp = /a-zA-Z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D\u203F-\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD/

// src/compiler/parser/html-parser.js
// Regular Expressions for parsing tags and attributes
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}]*`
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)

// 以開始標簽開始的模版
console.log('<div></div>'.match(startTagOpen))
// ["<div", "div", index: 0, input: "<div></div>", groups: undefined]
console.log('<p class="className" ></p>'.match(startTagOpen))
// ["<p", "p", index: 0, input: "<p class="className" ></p>", groups: undefined]

// 以結束標簽開始的文本模版
console.log('</div><p>文本</p>'.match(startTagOpen))
// null

// 以文本開始的模版
console.log('你好</div>'.match(startTagOpen))
// null

從上面可以看出兩個特點：
只能匹配開始標簽
匹配到的開始標簽不完全，如 <div \ <p，

在 Vue 中開始標簽被分成了三部分

例如

<div class="className" >

注意空格也算
1、<div : 確定開始標簽
2、 class="className" ：確定屬性
3、 > ：確定開始標簽結尾

開始標簽名解析出來后，接下來就是要解析標簽屬性，
標簽屬性是可選的，解析的時候進行判斷，如果存在，就進行解析

const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const dynamicArgAttribute = /^\s*((?:v-[\w-]+:|@|:|#)\[[^=]+\][^\s"'<>\/=]*)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const startTagClose = /^\s*(\/?)>/

// 循環收集屬性
let end, attr
判斷條件：1、不是開始標簽結尾；2、並且存在屬性
while (!(end = html.match(startTagClose)) && (attr = html.match(dynamicArgAttribute) || html.match(attribute))) {
    attr.start = index
    advance(attr[0].length)
    attr.end = index
    match.attrs.push(attr)
}

console.log(' class="className"></div>'.match(attribute))
// [" class="className"", "class", "=", "className", undefined, undefined, index: 0, input: " class="className"></div>", groups: undefined]

// 如果解析到結尾，要判斷該標簽是否是自閉和標簽
if (end) {
    match.unarySlash = end[1]
    advance(end[0].length)
    match.end = index
    return match
}

console.log('></div>'.match(startTagClose)) // [">", "", index: 0, input: "></div>", groups: undefined]
console.log('/>'.match(startTagClose)) // ["/>", "/", index: 0, input: "/>", groups: undefined]

由上面可以看到自閉和標簽在匹配的結果中，第二個元素是 /


Vue 中調用 parseStartTag 解析開始標簽，如果有
// Start tag:
const startTagMatch = parseStartTag()
if (startTagMatch) {
    handleStartTag(startTagMatch)
    continue
}
再調用 handleStartTag，主要是將 tagName、attrs 和 unary 等數據取出來，然后調用鈎子函數將這些數據放到參數中

截取結束標簽

// src/core/util/lang.js
const unicodeRegExp = /a-zA-Z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D\u203F-\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD/

const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}]*`
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`)

 // End tag:
const endTagMatch = html.match(endTag)
if (endTagMatch) {
    const curIndex = index
    advance(endTagMatch[0].length)
    parseEndTag(endTagMatch[1], curIndex, index)
    continue
}

console.log('</div>'.match(endTag)) // ["</div>", "div", index: 0, input: "</div>", groups: undefined]
console.log('<div>'.match(endTag)) // null

當分辨出結束標簽后，需要做兩件事，一件事是截取模板，另一件事是觸發鈎子函數.

另外還要彈出當前 stack 中的標簽

截取注釋

const comment = /^<!\--/


if (comment.test(html)) {
    const commentEnd = html.indexOf('-->')

    if (commentEnd >= 0) {
        if (options.shouldKeepComment) {
            options.comment(html.substring(4, commentEnd), index, index + commentEnd + 3)
        }
        advance(commentEnd + 3)
        continue
    }
}

截取條件注釋

const conditionalComment = /^<!\[/

if (conditionalComment.test(html)) {
    const conditionalEnd = html.indexOf(']>')

    if (conditionalEnd >= 0) {
        advance(conditionalEnd + 2)
        continue
    }
}

截取DOCTYPE

const doctypeMatch = html.match(doctype)
if (doctypeMatch) {
    advance(doctypeMatch[0].length)
    continue
}

截取文本

// src/core/util/lang.js
const unicodeRegExp = /a-zA-Z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D\u203F-\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD/

// src/compiler/parser/html-parser.js
// Regular Expressions for parsing tags and attributes
const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const dynamicArgAttribute = /^\s*((?:v-[\w-]+:|@|:|#)\[[^=]+\][^\s"'<>\/=]*)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}]*`
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`)
const comment = /^<!\--/
const conditionalComment = /^<!\[/

let text, rest, next
if (textEnd >= 0) {
    rest = html.slice(textEnd)
    while (
        !endTag.test(rest) &&
        !startTagOpen.test(rest) &&
        !comment.test(rest) &&
        !conditionalComment.test(rest)
    ) {
        // < in plain text, be forgiving and treat it as text
        next = rest.indexOf('<', 1)
        if (next < 0) break
        textEnd += next
        rest = html.slice(textEnd)
    }
    text = html.substring(0, textEnd)
}
// 沒有，則整個都是文本
if (textEnd < 0) {
    text = html
}
// 截取
if (text) {
    advance(text.length)
}
// 調用 chars 鈎子
if (options.chars && text) {
    options.chars(text, index - text.length, index)
}

// 例如, 包含了 < 符號的處理
'hello < world < i am wenben</div>'
' world < i am wenben</div>'
' i am wenben</div>'

純文本內容元素的處理


// 純文本內容元素
export const isPlainTextElement = makeMap('script,style,textarea', true)
解析它們的時候，需要把這三種標簽內包含的所有內容都當作文本處理

兩種元素處理邏輯不一樣
while (html) {
    if (!lastTag || !isPlainTextElement(lastTag)) {
    // 父元素為正常元素的處理邏輯
    } else {
    // 父元素為script、style、textarea 的處理邏輯
    let endTagLength = 0
      const stackedTag = lastTag.toLowerCase()
      const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
      const rest = html.replace(reStackedTag, function (all, text, endTag) {
        //   參數text（表示結束標簽前的所有內容），觸發了鈎子函數chars
        endTagLength = endTag.length
        if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
          text = text
            .replace(/<!\--([\s\S]*?)-->/g, '$1') // #7298
            .replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1')
        }
        if (shouldIgnoreFirstNewline(stackedTag, text)) {
          text = text.slice(1)
        }
        if (options.chars) {
          options.chars(text)
        }
        // 最后，返回了一個空字符串最后，返回了一個空字符串
        // 將匹配到的內容都截掉了。注意，這里的截掉會將內容和結束標簽一起截取掉
        return ''
      })
      index += html.length - rest.length
      html = rest
      parseEndTag(stackedTag, index - endTagLength, index)
    }
}

解析流程

初始模版

<div id="el">
    <script>console.log(1)</script>
</div>

解析到 script 之后，開始標簽被截取

console.log(1)</script>
</div>

解析內容；

</div>

文本解析器

parseText('你好{{name}}')
// '"你好 "+_s(name)'

parseText('你好')
// undefined

parseText('你好{{name}}, 你今年已經{{age}}歲啦')
// '"你好"+_s(name)+", 你今年已經"+_s(age)+"歲啦"'

總結

解析器的作用是通過模板得到 AST（抽象語法樹）。

生成 AST 的過程需要借助 HTML 解析器，當 HTML 解析器觸發不同的鈎子函數時，我們可以構建出不同的節點。

隨后，我們可以通過棧來得到當前正在構建的節點的父節點，然后將構建出的節點添加到父節點的下面。

最終，當 HTML 解析器運行完畢后，我們就可以得到一個完整的帶 DOM 層級關系的 AST。

HTML 解析器的內部原理是一小段一小段地截取模板字符串，每截取一小段字符串，就會根據截取出來的字符串類型觸發不同的鈎子函數，直到模板字符串截空停止運行。

文本分兩種類型，不帶變量的純文本和帶變量的文本，后者需要使用文本解析器進行二次加工。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python學習筆記：課本第九章上機實踐第九章 LinkedBlockingQueue源碼解析 Vue.js之深入淺出第九章 Service [CSAPP筆記][第九章虛擬存儲器][吐血1500行] 匯編語言-筆記-第九章及實驗8 java並發學習--第九章指令重排序神經網絡與深度學習[邱錫鵬] 第九章、第十章習題解析第九章：Javascript類和模塊 9 第九章安全認證