html 轉義處理

本文轉載自查看原文 2019-01-24 11:32 2236 前端-js

比如要把：<span>test</span> 這段代碼當做文本原樣輸出在頁面上，如果按照正常的方式，肯定會被轉義，在頁面上只能看到 text。那么要想達到預想的效果，應該怎么辦呢？

在學習 html 標簽時，知道如果要把代碼原樣輸出，可以用標簽 pre + code 處理。但這種方式不能處理：html 標簽。

1. 借助 vue 框架，可以這么做實現：

<template>
     <p v-html="html" />
</template>

<script>
    export default {
        data () {
            return {
                html: '<span>test</span>'
            }
        }
    }
</script>

2. 借助 react 框架，可以這么實現：

import React, { PureComponent } from 'react'

class Test extends PureComponent {
  constructor (props) {
      super(props)
      
      this.state = {
         html: '<span>test</span>'
      }
  }
    
  render () {
   const { content } = this.state

    return (
      <div>
        <p dangerouslySetInnerHTML={{__html: content}} />
      </div>
    )
  }
}

3. 如果想只用一個 html 標簽就實現，可能嗎？答案是可能的，可以用 xmp 標簽。這個標簽的作用：會將內容當做字符串輸出。

<xmp><span>test</span></xmp>

不過，這個標簽被W3C廢棄了，但各大瀏覽器依然支持該標簽。為什么被廢棄呢？被廢棄，肯定有被廢棄的緣由的。如果一定要用這個標簽，需注意：

若模板中包含標簽會造成標簽結束符混亂的問題，因此通過該方式存放模板時，不能包含結束標簽；
xmp元素必須作為body的子孫元素。

4. 那么不借助xmp標簽或框架的能力，如何自己實現呢？

// html 轉義處理
function htmlEncode (text) {
    var isHtml = /[\x00`><\"'&]/;
    var htmlEncode = /[\x00`><"'&]/g;
    return text != null ? isHtml.test(text) && ("" + text).replace(htmlEncode, getCharEntity) || text : "";

    function getCharEntity (ch) {
        var charEntities = {
            "&": "&amp;",
            "<": "&lt;",
            ">": "&gt;",
            "\x00": "&#0;",
            "'": "&#39;",
            '"': "&#34;",
            "`": "&#96;"
        };
        return charEntities[ch] || (charEntities[ch] = "&#" + ch.charCodeAt(0) + ";");
    }
};

let htmlCon = '<span>test</span>';
document.querySelector('#html_con').innerHTML = htmlEncode(htmlCon);

5. 特地去看了 vue，react 源碼，想看看它們都是怎么實現的。但只找到 react 中是怎么實現的，vue 目錄太多也比較繞，沒找到。

node-modeles/react-dom/cjs/react-dom-server.browser.development.js 中 631 行。

// code copied and modified from escape-html
/**
 * Module variables.
 * @private
 */

var matchHtmlRegExp = /["'&<>]/;

/**
 * Escapes special characters and HTML entities in a given html string.
 *
 * @param  {string} string HTML string to escape for later insertion
 * @return {string}
 * @public
 */

function escapeHtml(string) {  
  var str = '' + string;
  var match = matchHtmlRegExp.exec(str);

  if (!match) {
    return str;
  }

  var escape = void 0;
  var html = '';
  var index = 0;
  var lastIndex = 0;

  for (index = match.index; index < str.length; index++) {
    switch (str.charCodeAt(index)) {
      case 34:
        // "
        escape = '&quot;';
        break;
      case 38:
        // &
        escape = '&amp;';
        break;
      case 39:
        // '
        escape = '&#x27;'; // modified from escape-html; used to be '&#39'
        break;
      case 60:
        // <
        escape = '&lt;';
        break;
      case 62:
        // >
        escape = '&gt;';
        break;
      default:
        continue;
    }

    if (lastIndex !== index) {
      html += str.substring(lastIndex, index);
    }

    lastIndex = index + 1;
    html += escape;
  }

  return lastIndex !== index ? html + str.substring(lastIndex, index) : html;
}
// end code copied and modified from escape-html

/**
 * Escapes text to prevent scripting attacks.
 *
 * @param {*} text Text value to escape.
 * @return {string} An escaped string.
 */
function escapeTextForBrowser(text) {
  if (typeof text === 'boolean' || typeof text === 'number') {
    // this shortcircuit helps perf for types that we know will never have
    // special characters, especially given that this function is used often
    // for numeric dom ids.
    return '' + text;
  }
  return escapeHtml(text);
}

/**
 * Escapes attribute value to prevent scripting attacks.
 *
 * @param {*} value Value to escape.
 * @return {string} An escaped string.
 */
function quoteAttributeValueForBrowser(value) {
  return '"' + escapeTextForBrowser(value) + '"';
}

這部分源碼，還是比如容易看懂呢。

總結：html 轉義，主要就是將 "'&<> 這幾個特殊字符轉換為 html 實體。

延伸：預防 xss 攻擊：

對用戶輸入進行轉義
獲取內容后，反轉義並domParse，過濾不安全標簽及屬性，進行xss攔截
- 不安全標簽：style、link、script、iframe、frame、img
- 不安全屬性：onerror、onclick等

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 java 處理頁面傳遞的html標簽文本轉義問題 js處理富文本編輯器轉義、去除轉義、去除HTML標簽 FreeMarker（七）Html轉義 jinjia2-HTML 轉義 PHP 之Html標簽轉義與反轉義 js對HTML字符轉義與反轉義用Javascript（js）進行HTML轉義工具（處理特殊字符顯示） PHP HTML代碼反轉義 JS 轉換HTML轉義符 PHP HTML代碼反轉義