Vue 编译器初探

发布于 2023-09-21 21:41:58 字数 12010 浏览 25 评论 0

编译器是啥？简单点就是将源代码转换成目标代码的工具，详细点是将便于人编写、阅读、维护的高级计算机语言所写的源代码程序，翻译为计算机解读、运行的低阶机器语言的程序。

Vue 的编译器大致分在三个阶段，即词法分析 -> 句法分析-> 代码生成。词法分析阶段大致是把字符串模版解析为一个个 token，句法分析在词法分析基础上生成 AST，代码生成根据 AST 生成最终代码。本篇大概分析一下词法分析的过程。

词法分析

在源代码（src/compiler/index.js）中由这么一句代码，包含 parse , ast 关键词，可知 parse 函数就是用来解析模版字符串，生成 AST 的。

const ast = parse(template.trim(), options)

找到 parse 函数，发现其内部调用了 parseHTML ，实际上 parseHTML 函数的作用就是用来做词法分析的。而 parse 函数在词法分析的基础上最终生成 AST 。

/**
 * Convert HTML string to AST.
 */
export function parse (
  //parse 函数的作用则是在词法分析的基础上做句法分析从而生成一棵 AST。
  template: string,
  options: CompilerOptions
): ASTElement | void {
 ...
  parseHTML(template, ....){
      //省略...
  }
  ...

那我们就去看看 parseHTML 是如何读取字符流一步步解析模板字符串的吧

export function parseHTML (html, options) {
  // 定义一些常量和变量
  const stack = []// 初始化为一个空数组，在 while 循环中处理 html 字符流每遇到一个非一元标签，就将该开始标签 push 到该数组中。
 
  const expectHTML = options.expectHTML
  
  const isUnaryTag = options.isUnaryTag || no// 用来检测一个标签是否是一元标签
 
  const canBeLeftOpenTag = options.canBeLeftOpenTag || no// 用来检测一个标签是否是可以省略闭合标签的非一元标签。
  
  let index = 0 // 标识着字符流的读入位置
 
  let last, lastTag
  // last 变量存储剩余还未 parse 的 html 字符串
  // 变量 lastTag 存储着位于 stack 栈顶的元素。

  // 开启一个 while 循环，循环结束的条件是 html 为空，即 html 被 parse 完毕
  while (html) {
    last = html 
    // 每次开始循环时将 html 的值赋值给变量 last
    
    if (!lastTag || !isPlainTextElement(lastTag)) {
      // isPlainTextElement 函数确保即将 parse 的内容不是在纯文本标签里 (script,style,textarea)
      
      let textEnd = html.indexOf('<') // html 字符串中左尖括号(<)第一次出现的位置，在对 textEnd 变量进行一系列的判断

      if (textEnd === 0) {
        // textEnd === 0 时说明 html 字符串的第一个字符就是左尖括号(<)
        /**
         1、可能是注释节点：<!-- -->
         2、可能是条件注释节点：<![ ]>
         3、可能是 doctype：<!DOCTYPE >
         4、可能是结束标签：</xxx>
         5、可能是开始标签：<xxx>
         6、可能只是一个单纯的字符串：<abcdefg
       */
      }
    
      let text, rest, next
      if (textEnd >= 0) // textEnd >= 0 的情况 
        // **用来处理那些第一个字符是** < 但没有成功匹配标签，或第一个字符不是 < 的字符串。
      }

      if (textEnd < 0) {
        // textEnd < 0 的情况
        // 整个 html 字符串作为文本处理
      }
    
      if (options.chars && text) {
      // 调用 parse 函数传入的 option 重的 chars 钩子处理文本
        options.chars(text)
      }
    } else {
          // 即将 parse 的内容是在纯文本标签里 (script,style,textarea)
    }
    
    
    // 因为 while 循环内部会调用 advance 更新 html
    // 如果上面的处理之后两者相等，说明 html 在经过循环体的代码后没有任何变化，此时的 html 字符串作为纯文本对待
    // 将整个字符串作为文本对待
    if (html === last) {
      options.chars && options.chars(html)
      if (process.env.NODE_ENV !== 'production' && !stack.length && options.warn) {
        options.warn(`Mal-formatted tag at end of template: "${html}"`)
      }
      break
    }
  }

  // 调用 parseEndTag 函数
  parseEndTag()

  // advance 函数 将已经 parse 完毕的字符串剔除
  function advance (n) {
     index += n
     html = html.substring(n)
  }

  // parseStartTag 函数用来 parse 开始标签
  function parseStartTag () {
    // ...
  }
  // handleStartTag 函数用来处理 parseStartTag 的结果
  function handleStartTag (match) {
    // ...
  }
  // parseEndTag 函数用来 parse 结束标签
  function parseEndTag (tagName, start, end) {
    // ...
  }
}

通过以上代码可知，在数组为空或标签纯文本标签（style、script、textarea）情况下，获取 < 在字符串中第一次出现的位置可分为三种情况。

1、在 textEnd === 0（ < 出现在第一个位置）的情况下，以注释节点和开始标签为例，简单讲解一下，内部是如何处理 textEnd === 0 的情况的。

注释节点

if (comment.test(html)) {
// comment 是一个正则常量  /^<!\--/
  const commentEnd = html.indexOf('-->')
  // 完整的注释节点不仅仅要以 
  // <!-- 开头，还要以 --> 结尾

  if (commentEnd >= 0) { // 说明这确实是一个注释节点
    if (options.shouldKeepComment) {
    //在 Vue 官方文档中可以找到一个叫做 comments 的选项，实际上这里的 options.shouldKeepComment 的值就是 Vue 选项 comments 的值
    options.comment(html.substring(4, commentEnd)) //调用 parse 函数传入的 option 参数中的 comment 钩子
    }
    advance(commentEnd + 3) 
    // 调用 advance 函数传入已经 parse 完毕的字符串的结束位置，
    // 剔除已经被处理过的 html 更新 html 变量为剩下未处理的字符串
    // 更新 indexd 的值为 commentEnd + 3（html 字符串的读入位置）

    continue
    // 跳出此次循环 开启下一次循环，重新开始 parse 过程。
  }
}

开始标签

const startTagMatch = parseStartTag()
// 调用 parseStartTag 函数，并获取其返回值，如果存在返回值则说明开始标签解析成功，的确是一个开始标签
if (startTagMatch) {
  handleStartTag(startTagMatch)
  if (shouldIgnoreFirstNewline(lastTag, html)) {
    advance(1)
  }
  continue
}

function parseStartTag () {
  const start = html.match(startTagOpen)
  // startTagOpen 为匹配开始标签的正则表达式
  // 用来匹配开始标签的一部分，这部分包括：< 以及后面的 标签名称，并且拥有一个捕获组，即捕获标签的名称。
  //匹配的结果赋值给 start 常量，如果 start 常量为 null 则说明匹配失败，则 parseStartTag 函数执行完毕，其返回值为 undefined。
  if (start) {

    const match = {
      tagName: start[1],
      attrs: [],
      start: index
    }
    advance(start[0].length) // 这里传入 tagName 标签的长度 调用 advance 函数
    
    let end, attr
    // while 循环体执行的条件是没有匹配到开始标签的结束部分，并且匹配到了开始标签中的属性
    while (!(end = html.match(startTagClose)) && (attr = html.match(attribute))) {
     // attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
      advance(attr[0].length)
      // 这里传入 attr 的长度 调用 advance 函数
      match.attrs.push(attr)
      // 将此次循环匹配到的结果 push 到前面定义的 match 对象的 attrs 数组
    }
    if (end) {
    //变量 end 存在，即匹配到了开始标签的 结束部分 时，才能说明这是一个完整的开始标签。
      match.unarySlash = end[1] 
      // end[1] 不为 undefined，那么说明该标签是一个一元标签
      advance(end[0].length)
      match.end = index // 前面调用 advance 函数更新了 index,所以 match 的 end 为最新的字符串读入位置
      
      return match
      //只有 end 存在即一个完整的开始标签才会返回 match 对象，其他情况下返回 undefined
    }
  }
}
// 处理开始标签的解析结果
function handleStartTag (match) {
    const tagName = match.tagName
    const unarySlash = match.unarySlash
    // 常量 unarySlash 的值为 '/' 或 undefined 

    if (expectHTML) {
      // isNonPhrasingTag 非段落标签 自动闭合<p>
      if (lastTag === 'p' && isNonPhrasingTag(tagName)) {
        parseEndTag(lastTag)
      }
      if (canBeLeftOpenTag(tagName) && lastTag === tagName) {
        // 当前解析的标签是可闭合的标签且与上一个开始标签相同
        parseEndTag(tagName)
      }
    }

    const unary = isUnaryTag(tagName) || !!unarySlash
    // 判断开始标签是否是一元标签

    const l = match.attrs.length
    const attrs = new Array(l)
    // for 循环的作用是格式化 match.attrs 数组，并将格式化后的数据存储到常量 attrs 中
    for (let i = 0; i < l; i++) {
      const args = match.attrs[i]
      // hackish work around FF bug https://bugzilla.mozilla.org/show_bug.cgi?id=369778
      if (IS_REGEX_CAPTURING_BROKEN && args[0].indexOf('""') === -1) {
        if (args[3] === '') { delete args[3] }
        if (args[4] === '') { delete args[4] }
        if (args[5] === '') { delete args[5] }
      }
      const value = args[3] || args[4] || args[5] || ''
      attrs[i] = {
        name: args[1],
        value: decodeAttr(
          value,
          options.shouldDecodeNewlines
        )
      }
      //attrs 为数组
    }

    if (!unary) { // 非一元标签 push 进 stack 栈内
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs })
      lastTag = tagName// lastTag 变量保存栈顶的元素 更新 lastTag 变量
    }

    if (options.start) {
    // 调用 parse 函数传入的 option 参数重的 start 钩子
      options.start(tagName, attrs, unary, match.start, match.end)
    }
  }

2、在 textEnd >= 0 的情况下

//这段代码处理那些第一个字符是 < 但没有成功匹配标签，或第一个字符不是 < 的字符串
let text, rest, next
if (textEnd >= 0) {
rest = html.slice(textEnd)
// 现 rest 为<开头的字符串
// while 循环的条件是只有截取后的字符串不能匹配标签，说明<存在于普通文本中
while (
  !endTag.test(rest) &&
  !startTagOpen.test(rest) &&
  !comment.test(rest) &&
  !conditionalComment.test(rest)
) {
  // < in plain text, be forgiving and treat it as text
  next = rest.indexOf('<', 1)// 找到第二个<  位置为 2
  if (next < 0) break
  // 如果不存在<就跳出循环执行下面的语句
  textEnd += next // 更新后的 textEnd 的值将是第二个 < 符号的索引
  rest = html.slice(textEnd) // 使用新的 textEnd 对原始字符串 html 进行截取，并将新截取的字符串赋值给 rest 开始新一轮的循环
}
text = html.substring(0, textEnd) //此时保证 text 是纯文本
advance(textEnd)
}

if (options.chars && text) {
    options.chars(text) // 调用 parse 函数的 option 参数中的 chars 钩子
}

3、在 textEnd <= 0 的情况下，整个 html 字符串作为文本处理。

if (textEnd < 0) {
    text = html
    html = ''
}

上面的分析是针对最近一次标签是非纯文本标签的情况下，那么是如何处理纯文本标签的呢？纯文本标签包括 script 标签、style 标签以及 textarea 标签。

let endTagLength = 0
//
//用来保存纯文本标签闭合标签的字符长度
const stackedTag = lastTag.toLowerCase()
const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
// reStackedTag 的作用是用来匹配纯文本标签的内容以及结束标签的
// 代码使用正则 reStackedTag 匹配字符串 html 并将其替换为空字符串，常量 rest 将保存剩余的字符
const rest = html.replace(reStackedTag, function (all, text, endTag) {
// all 保存着完整的字符串
// text 纯文本内容保存着结束标签
// endTag
endTagLength = endTag.length
if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
  text = text
    .replace(/<!--([\s\S]*?)-->/g, '$1')
    .replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1')
}
if (shouldIgnoreFirstNewline(stackedTag, text)) {
// 忽略 pre textarea 标签的第一个换行符
  text = text.slice(1)
}
if (options.chars) {
// 调用 parse 函数传入的 option 重的 chars 钩子
  options.chars(text)
}
return '' 
 // 替换掉正则匹配到的内容为''
})
index += html.length - rest.length; 
// 结束标签位置为 html.length - 剩余字符串长度

html = rest // 更新 html 开始新的 while 循环
parseEndTag(stackedTag, index - endTagLength, index)

看完上面几段代码块，发现 parseEndTag 函数还没有分析，根据名字应该是处理结束标签的吧。

// parseEndTag 有三种调用方式
//  parseEndTag() 处理 stack 栈剩余未处理的标签。
// parseEndTag(tagName)
// parseEndTag (tagName, start, end) 正常处理结束标签
function parseEndTag (tagName, start, end) {
    let pos, lowerCasedTagName
    if (start == null) start = index
    if (end == null) end = index
    
    if (tagName) {
      lowerCasedTagName = tagName.toLowerCase()
    }
    
    // Find the closest opened tag of the same type
    // stack 倒叙查找到与结束标签相对应的开始标签
    if (tagName) {
      for (pos = stack.length - 1; pos >= 0; pos--) {
        if (stack[pos].lowerCasedTag === lowerCasedTagName) {
          break
        }
      }
    } else {
      // If no tag name is provided, clean shop
      pos = 0
    }
    
    if (pos >= 0) {
      // Close all the open elements, up the stack
      for (let i = stack.length - 1; i >= pos; i--) {
        if (process.env.NODE_ENV !== 'production' &&
          (i > pos || !tagName) &&
          options.warn
        ) {
        //在非生产环境下，当不传入 tagName 或在数组下标大于 pos 时（因为 stack 存入的是非一元的起始标签，说明这些起始标签缺少结束标签)
          options.warn(
            `tag <${stack[i].tag}> has no matching end tag.`
          )
        }
        if (options.end) {
        // 调用 parse 传入的 option 参数重的 end 钩子
        // 大概是更新 stack
          options.end(stack[i].tag, start, end)
        }
      }
      // Remove the open elements from the stack
      stack.length = pos
      // 当传入 tagName 时删除 pos 后的元素
      // 未传入 tagName 时 pos 为 0  相当于清空 stack
      lastTag = pos && stack[pos - 1].tag 
      // 更新栈顶元素
    } else if (lowerCasedTagName === 'br') { 
    // pso < 0 只写了结束标签没有写开始标签
    // 遇到</br>替换为<br>
      if (options.start) {
        options.start(tagName, [], true, start, end)
      }
    } else if (lowerCasedTagName === 'p') {
    // pso < 0 只写了开始标签没有写结束标签
    // 遇到</p> 补全为<p></p>
      if (options.start) {
      // // 调用 parse 传入的 option 参数重的 start 钩子
        options.start(tagName, [], false, start, end)
      }
      if (options.end) {
      // 调用 parse 传入的 option 参数重的 end 钩子
        options.end(tagName, start, end)
      }
    }
    // pos< 0 情况下遇到的其他缺少起始标签的结束标签忽略
}

在词法分析的过程中，可以其实现方式就是通过读取字符流配合正则一点一点的解析字符串，直到整个字符串都被解析完毕为止。并且每当遇到一个特定的 token 时都会调用相应的钩子函数，同时将有用的参数传递过去。再 parse 函数根据这些参数生成 AST 。

分享到QQ

分享到微博