ace -- 語法高亮


Creating a Syntax Highlighter for Ace 給ace創建一個語法高亮

Creating a new syntax highlighter for Ace is extremely simple. You'll need to define two pieces of code: a new mode, and a new set of highlighting rules.

創建一個新的ace語法高亮極為簡單。你需要定義兩個代碼: 一個新的mode和一組新的高亮規則。

Where to Start

We recommend using the Ace Mode Creator when defining your highlighter. This allows you to inspect your code's tokens, as well as providing a live preview of the syntax highlighter in action.

我們建議使用 Ace Mode Creator 定義你的高亮。這允許你檢查你的代碼的tokens,以及在操作中提供語法高亮的實時預覽。

Ace Mode Creator :  https://ace.c9.io/tool/mode_creator.html

Defining a Mode

Every language needs a mode. A mode contains the paths to a language's syntax highlighting rules, indentation rules, and code folding rules. Without defining a mode, Ace won't know anything about the finer aspects of your language.

Here is the starter template we'll use to create a new mode:

每種語言都需要一個mode。mode包含語言的語法高亮規則,縮進規則和代碼折疊規則的路徑。在沒有定義mode的情況下,ACE對你語言的細微之處一無所知

這是一個啟動模板,我們將用它創建一個新的mode:

 

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
// defines the parent mode
var TextMode = require("./text").Mode;
var Tokenizer = require("../tokenizer").Tokenizer;
var MatchingBraceOutdent = require("./matching_brace_outdent").MatchingBraceOutdent;
 
// defines the language specific highlighters and folding rules
var MyNewHighlightRules = require("./mynew_highlight_rules").MyNewHighlightRules;
var MyNewFoldMode = require("./folding/mynew").MyNewFoldMode;
 
var Mode = function() {
// set everything up
this.HighlightRules = MyNewHighlightRules;
this.$outdent = new MatchingBraceOutdent();
this.foldingRules = new MyNewFoldMode();
};
oop.inherits(Mode, TextMode);
 
(function() {
// configure comment start/end characters
this.lineCommentStart = "//";
this.blockComment = {start: "/*", end: "*/"};
 
// special logic for indent/outdent.
// By default ace keeps indentation of previous line
this.getNextLineIndent = function(state, line, tab) {
var indent = this.$getIndent(line);
return indent;
};
 
this.checkOutdent = function(state, line, input) {
return this.$outdent.checkOutdent(line, input);
};
 
this.autoOutdent = function(state, doc, row) {
this.$outdent.autoOutdent(doc, row);
};
 
// create worker for live syntax checking
this.createWorker = function(session) {
var worker = new WorkerClient(["ace"], "ace/mode/mynew_worker", "NewWorker");
worker.attachToDocument(session.getDocument());
worker.on("errors", function(e) {
session.setAnnotations(e.data);
});
return worker;
};
 
}).call(Mode.prototype);
 
exports.Mode = Mode;
});

What's going on here? First, you're defining the path to TextMode (more on this later). Then you're pointing the mode to your definitions for the highlighting rules, as well as your rules for code folding. Finally, you're setting everything up to find those rules, and exporting the Mode so that it can be consumed. That's it!

這里發生了什么?首先,你定義了TextMode的路徑(稍后對此進行更多的闡述)。然后,你將mode指向你定義的高亮規則以及代碼折疊規則。最后你設置所有的內容來查找這些規則,並導出該Mode以便它可以被使用。

 

Regarding TextMode, you'll notice that it's only being used once: oop.inherits(Mode, TextMode);. If your new language depends on the rules of another language, you can choose to inherit the same rules, while expanding on it with your language's own requirements. For example, PHP inherits from HTML, since it can be embedded directly inside .html pages. You can either inherit from TextMode, or any other existing mode, if it already relates to your language.

關於 TextMode, 你會注意到它只使用了一次:oop.inherits(Mode, TextMode); 如果你的新語言依賴於其他語言的規則,那么你可以選擇繼承相同的規則,同時根據你的語言自身的需求對其進行擴展。例如,PHP從HTML繼承,因為PHP可以直接嵌入到.html頁面中。你也可以從 TextMode繼承,或者其他已有的mode,如果它已經涉及到你的語言。

 

All Ace modes can be found in the lib/ace/mode folder.

ace的所有modes都可以在 lib/ace/mode 文件夾中找到

Defining Syntax Highlighting Rules 定義語法高亮規則

The Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define mynew_highlight_rules.js, which our mode above uses.

All syntax highlighters start off looking something like this:

ace高亮可以被認為是一個狀態機。正則表達式給當前狀態定義tokens,以及轉換到另一個狀態。讓我們定義 mynew_highlight_rules.js,上面使用的mode。

所有的語法高亮開始都像這樣:

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
var TextHighlightRules = require("./text_highlight_rules").TextHighlightRules;
 
var MyNewHighlightRules = function() {
 
// regexp must not have capturing parentheses. Use (?:) instead.
// regexps are ordered -> the first match is used
this.$rules = {
"start" : [
{
token: token, // String, Array, or Function: the CSS token to apply
regex: regex, // String or RegExp: the regexp to match
next: next // [Optional] String: next state to enter
}
]
};
};
 
oop.inherits(MyNewHighlightRules, TextHighlightRules);
 
exports.MyNewHighlightRules = MyNewHighlightRules;
 
});

The token state machine operates on whatever is defined in this.$rules. The highlighter always begins at the start state, and progresses down the list, looking for a matching regex. When one is found, the resulting text is wrapped within a <span class="ace_<token>"> tag, where <token> is defined as the token property. Note that all tokens are preceded by the ace_prefix when they're rendered on the page.

token狀態機運行在 this.$rules里不管什么定義。高亮總是從start 狀態開始,並沿着列表前進,尋找匹配的正則表達式regex。當找到文本時,被找到的文本被包裹在<span class="ace_<token>">標簽中, <token>是上面定義的 token屬性。請注意,當tokens渲染到頁面上時,都會以 ace_ 前綴呈現。

 

Once again, we're inheriting from TextHighlightRules here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see "extending Highlighters" below.

再來一次,我們從 TextHighlightRules 繼承下來。如果我們的新語言需要先前定義的語法,我們可以選擇把它變成我們想要的任何其它語言集。有關擴展語言的更多信息,請查看下面的 extending Highlighters 

 

Defining Tokens  定義tokens

The Ace highlighting system is heavily inspired by the TextMate language grammar. Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found on the Ace Wiki.

ace高亮系統深受 TextMate language grammar 啟發。當命名語法時,大多數tokens將遵循 TextMate的約定。在ace wiki上可以找到完整的token列表 (雖然不完整):    

token列表: https://github.com/ajaxorg/ace/wiki/Creating-or-Extending-an-Edit-Mode#commonTokens

 

For the complete list of tokens, see tool/tmtheme.js. It is possible to add new token names, but the scope of that knowledge is outside of this document.

有關完整的tokens列表, 請查看 tool/tmtheme.js  https://github.com/ajaxorg/ace/blob/master/tool/tmtheme.js    可以添加新的token名稱,但該知識的范圍在該文檔之外。

 

Multiple tokens can be applied to the same text by adding dots in the token, e.g. token: support.function wraps the text in a <span class="ace_support ace_function"> tag.

通過在tokens添加 點 ,可以將多個tokens作用於同一文本。例如 token: support.function   將文本包裹在 <span class="ace_support ace_function">標簽中。

 

Defining Regular Expressions 定義正則表達式

Regular expressions can either be a RegExp or String definition

正則表達式既可以是正則表達式也可以是字符串定義

If you're using a regular expression, remember to start and end the line with the / character, like this:

如果你使用一個正則表達式,記住像下面這樣,在一行的開始和結束使用 / 字符。

{
token : "constant.language.escape",
regex : /\$[\w\d]+/
}
 

A caveat of using stringed regular expressions is that any \ character must be escaped. That means that even an innocuous regular expression like this:

使用字符串形式的正則表達式的一個警告是任何 \ 字符必須被轉義。這意味着,即使是一個像下面這樣的無害的正則表達式:

regex: "function\s*\(\w+\)"
 

Must actually be written like this:

必須像下面這樣編寫:

regex: "function\\s*\(\\w+\)"
 

Groupings 分組

You can also include flat regexps--(var)--or have matching groups--((a+)(b+)). There is a strict requirement whereby matching groups must cover the entire matched string; thus, (hel)lo is invalid. If you want to create a non-matching group, simply start the group with the ?: predicate; thus, (hel)(?:lo) is okay. You can, of course, create longer non-matching groups. For example:

你也可以包括 單一的正則 --(var)-- 或者 匹配組 --((a+)(b+))。嚴格要求匹配組必須覆蓋整個匹配字符串,因此 (hel)lo 是無效的。如果你想創建一個不匹配的組,只需要用 ?: 謂語作為組的開始;像 (hel)(?:lo) 也是可以的。 當然,你可以創建更長的非匹配組。 例如:

{
token : "constant.language.boolean",
regex : /(?:true|false)\b/
},
 

For flat regular expression matches, token can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this:

對於單一的正則表達式匹配, token可以是一個 String, 或者是一個接收單個參數(當前匹配)並返回一個字符串token的Function。例如,使用函數可能看起來像下面這樣:

var colors = lang.arrayToMap(
("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" +
"purple|red|silver|teal|white|yellow").split("|")
);
 
var fonts = lang.arrayToMap(
("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" +
"symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" +
"serif|monospace").split("|")
);
 
...
 
{
token: function(value) {
if (colors.hasOwnProperty(value.toLowerCase())) {
return "support.constant.color";
}
else if (fonts.hasOwnProperty(value.toLowerCase())) {
return "support.constant.fonts";
}
else {
return "text";
}
},
regex: "\\-?[a-zA-Z_][a-zA-Z0-9_\\-]*"
}

 

If token is a function,it should take the same number of arguments as there are groups, and return an array of tokens.

如果token是一個函數,它應該具有與組相同的參數數目,並且返回一個tokens數組。

 

For grouped regular expressions, token can be a String, in which case all matched groups are given that same token, like this:

對於分組正則表達式,token可以是 String , 在這種情況下,所有的匹配組都被賦予相同的token。像下面這樣

{
token: "identifier",
regex: "(\\w+\\s*:)(\\w*)"
}
 

More commonly, though, token is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this:

然而,更常見的是,token是一個數組(長度與 組的數量 相同),由此,匹配被賦予與匹配中相同的對齊的token。對於一個復雜的正則表達式,像定義一個函數,看起來可能像下面這樣:

{
token : ["storage.type", "text", "entity.name.function"],
regex : "(function)(\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\b)"
}

 

Defining States 定義狀態

The syntax highlighting state machine stays in the start state, until you define a next state for it to advance to. At that point, the tokenizer stays in that new state, until it advances to another state. Afterwards, you should return to the original start state.

語法高亮狀態機停留在 start 狀態,直到你給它定義一個 next 狀態來更新。此時, tokenizer保持在新的 state , 直到它進入到另一個狀態。然后, 你應該回到原來的 start 狀態。

Here's an example:

this.$rules = {
"start" : [ {
token : "text",
regex : "<\\!\\[CDATA\\[",
next : "cdata"
} ],
 
"cdata" : [ {
token : "text",
regex : "\\]\\]>",
next : "start"
}, {
defaultToken : "text"
} ]
};

In this extremely short sample, we're defining some highlighting rules for when Ace detects a <![CDATA tag. When one is encountered, the tokenizer moves from start into the cdata state. It remains there, applying the text token to any string it encounters. Finally, when it hits a closing ]> symbol, it returns to the start state and continues to tokenize anything else.

在這個非常短的示例中,我們定義了一些用於檢測 <![CDATA 標簽的高亮規則。當遇到一個時,tokenizer從 start 移動到 cdata狀態。它仍然存在,將 ‘text’ token應用到它遇到的任何字符串。最后,當它命中關閉  ]> 符號時, 它返回到start 狀態並且繼續標記任何其他東西。

 

Using the TMLanguage Tool  使用 TMLanguage 工具

There is a tool that will take an existing tmlanguage file and do its best to convert it into Javascript for Ace to consume. Here's what you need to get started:

有一個工具,它將使用現有的 tmlanguage 文件,並盡最大努力將其轉換成 Javascript以供 ace使用。一下是你需要開始的:

  1. In the Ace repository, navigate to the tools folder.
    1.   在ace庫中, 導航到 tools 文件夾
  2. Run npm install to install required dependencies.
    1.   運行 npm install 安裝需要的依賴
  3. Run node tmlanguage.js <path_to_tmlanguage_file>; for example, node <path_to_tmlanguage_file> /Users/Elrond/elven.tmLanguage
    1.   運行 node tmlanguage.js <path_to_tmlanguage_file> 例如: node tmlanguage  /Users/Elrond/elven.tmLanguage 

Two files are created and placed in lib/ace/mode: one for the language mode, and one for the set of highlight rules. You will still need to add the code into ace/ext/modelist.js, and add a sample file for testing.

兩個文件被創建並放置在 lib/ace/mode 目錄下: 一個是語言 mode, 一個是高亮規則的集合。你仍然需要將代碼添加到 ace/ext/modelist.js中,並添加用於測試的示例文件。

 

A Note on Accuracy 關於精度的一點注記

Your .tmlanguage file will then be converted to the best of the converter’s ability. It is an understatement to say that the tool is imperfect. Probably, language mode creation will never be able to be fully autogenerated. There's a list of non-determinable items; for example:

你的 .tmlanguage 文件會轉換為 轉換器最好的能力。這是一個輕描淡寫的說法,該工具是不完美的。也許,語言模式的創造永遠不能完全自生。這里有一個不可確定的項目清單,如下:

  • The use of regular expression lookbehinds
    This is a concept that JavaScript simply does not have and needs to be faked
    •   正則表達式查找表的使用
    •       這是一個javascript根本沒有,需要偽造的概念。
  • Deciding which state to transition to
    While the tool does create new states correctly, it labels them with generic terms like state_2state_10e.t.c.
    •   決定向哪個 狀態 過渡
    •      雖然工具確實創建了新的狀態,但它用 state_2, state_10等通用屬於來標記它們。
  • Extending modes
    Many modes say something like include source.c, to mean, “add all the rules in C highlighting.” That syntax does not make sense to Ace or this tool (though of course you can extending existing highlighters).
    •   擴展模式
    •       許多模式都說一些類似於 include source.c 的例子, 意思是”在c高亮中加入所有的規則“。這種語法對於ace或者這個工具是沒有意義的(當然,你可以擴展現有的高亮顯示器)。
  • Rule preference order
    •   規則偏好順序
  • Gathering keywords
    Most likely, you’ll need to take keywords from your language file and run them through createKeywordMapper()
    •   關鍵詞采集
    •       最有可能的,你需要從你的語言文件中獲取關鍵詞,並通過  createKeywordMapper() 運行它們。

However, the tool is an excellent way to get a quick start, if you already possess a tmlanguage file for you language.

然而。如果你對你的語言已經擁有了一個 tmlanguage 文件,這個工具是一個很好的快速入門的方法。

 

Extending Highlighters  擴展高亮

Suppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (<?lua<?php{%, for example). Ace allows you to easily extend a highlighter using a few helper functions.

假設你正在處理一個 LuaPage, PHP 嵌入到 HTML, 或者一個 Django模板。你需要創建一個語法高亮程序,它從原始語言(Lua, PHP, or Python)獲取所有語法規則,並使用一些附加標識符(例如, <?lua  <?php, {%)擴展它。ace允許你使用幾個輔助函數輕松擴展高亮。

 

Getting Existing Rules  獲取已有的規則

To get the existing syntax highlighting rules for a particular language, use the getRules() function. For example:

要獲得特定語言的現有語法高亮規則,使用getRules() 函數,例如:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
 
this.$rules = new HtmlHighlightRules().getRules();
 
/*
this.$rules == Same this.$rules as HTML highlighting
*/
 

Extending a Highlighter

The addRules method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this:

addRules 方法做一件事,並且做的很好: 它向現有規則集添加新規則,並且用一個給定的標簽給任何狀態添加前綴。例如,假設你有兩套規則,定義如下:

this.$rules = {
"start": [ /* ... */ ]
};
 
var newRules = {
"start": [ /* ... */ ]
}

If you want to incorporate newRules into this.$rules, you'd do something like this:

如果你想將 newRules 合並到 this.$rules , 你可以這樣做:

this.addRules(newRules, "new-");
 
/*
this.$rules = {
"start": [ ... ],
"new-start": [ ... ]
};
*/

Extending Two Highlighters

The last function available to you combines both of these concepts, and it's called embedRules. It takes three parameters:

最后一個可用的函數將這兩個概念結合起來,稱為 embedRules。 它接收三個參數:

  1. An existing rule set to embed with
    1.   嵌入現有的規則
  2. A prefix to apply for each state in the existing rule set
    1.   在現有規則集中應用每個狀態的前綴
  3. A set of new states to add
    1.   添加一組新的狀態

Like addRulesembedRules adds on to the existing this.$rules object.

像 addRules, embedRules 添加到現有的 this.$rules 對象。

To explain this visually, let's take a look at the syntax highlighter for Lua pages, which combines all of these concepts:

為了直觀的解釋這一點,讓我們看看 Lua頁面的語法高亮,它結合了所有這些概念:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules;
 
var LuaPageHighlightRules = function() {
this.$rules = new HtmlHighlightRules().getRules();
 
for (var i in this.$rules) {
this.$rules[i].unshift({
token: "keyword",
regex: "<\\%\\=?",
next: "lua-start"
}, {
token: "keyword",
regex: "<\\?lua\\=?",
next: "lua-start"
});
}
this.embedRules(LuaHighlightRules, "lua-", [
{
token: "keyword",
regex: "\\%>",
next: "start"
},
{
token: "keyword",
regex: "\\?>",
next: "start"
}
]);
};

Here, this.$rules starts off as a set of HTML highlighting rules. To this set, we add two new checks for <%= and <?lua=. We also delegate that if one of these rules are matched, we should move onto the lua-start state. Next, embedRules takes the already existing set of LuaHighlightRules and applies the lua- prefix to each state there. Finally, it adds two new checks for %> and ?>, allowing the state machine to return to start.

這里, this.$rules 規則從一組 HTML高亮規則開始。對於這個集合,我們添加了兩個新的檢查 <%=  和  <?lua= 。我們還授權,如果這些規則中的一個匹配,我們應該移動到 lua-start 狀態。接下來,embedRules將已經存在的 LuaHIghlightRUles集合應用lua-前綴到每個狀態。最后, 它為 %> 和 ?> 添加了兩個新的檢查,允許狀態機返回到 start 。

 

Code Folding

Adding new folding rules to your mode can be a little tricky. First, insert the following lines of code into your mode definition:

在你的mode中添加新的折疊規則可能會有點棘手。 首先,將下面幾行代碼插入到你的mode定義中。

var MyFoldMode = require("./folding/newrules").FoldMode;
 
...
var MyMode = function() {
 
...
 
this.foldingRules = new MyFoldMode();
};

 

You'll be defining your code folding rules into the lib/ace/mode/folding folder. Here's a template that you can use to get started:

你將代碼折疊規則定義到 lib/ace/mode/folding 文件夾。 這里有個模板你可以用它來開始。

define(function(require, exports, module) {
"use strict";
 
var oop = require("../../lib/oop");
var Range = require("../../range").Range;
var BaseFoldMode = require("./fold_mode").FoldMode;
 
var FoldMode = exports.FoldMode = function() {};
oop.inherits(FoldMode, BaseFoldMode);
 
(function() {
 
// regular expressions that identify starting and stopping points
this.foldingStartMarker;
this.foldingStopMarker;
 
this.getFoldWidgetRange = function(session, foldStyle, row) {
var line = session.getLine(row);
 
// test each line, and return a range of segments to collapse
};
 
}).call(FoldMode.prototype);
 
});

 

Just like with TextMode for syntax highlighting, BaseFoldMode contains the starting point for code folding logic. foldingStartMarkerdefines your opening folding point, while foldingStopMarker defines the stopping point. For example, for a C-style folding system, these values might look like this:

就像TextMode語法高亮一樣,BaseFoldMode包含代碼折疊邏輯的起點。foldingStartMarker 定義了你的折疊打開點, 而foldingStopMarker定義了停止點。例如,對於 C-style 折疊系統,這些值可能是這樣:

this.foldingStartMarker = /(\{|\[)[^\}\]]*$|^\s*(\/\*)/;
this.foldingStopMarker = /^[^\[\{]*(\}|\])|^[\s\*]*(\*\/)/;

 

These regular expressions identify various symbols--{[//--to pay attention to. getFoldWidgetRange matches on these regular expressions, and when found, returns the range of relevant folding points. For more information on the Range object, see the Ace API documentation.

這些正則表達式各種符號-- {,[,// --  要注意。 在這些正則表達式上匹配 getFoldWidgetRange, 當找到時,返回相關折疊點的范圍。有關Range對象的更多信息,查看 the Ace API documentation  

Again, for a C-style folding mechanism, a range to return for the starting fold might look like this:

同樣,對於 C-style 折疊機構,返回起始折疊范圍可能是這樣:

var line = session.getLine(row);
var match = line.match(this.foldingStartMarker);
if (match) {
var i = match.index;
 
if (match[1])
return this.openingBracketBlock(session, match[1], row, i);
 
var range = session.getCommentFoldRange(row, i + match[0].length);
range.end.column -= 2;
return range;
}

Let's say we stumble across the code block hello_world() {. Our range object here becomes:

{
startRow: 0,
endRow: 0,
startColumn: 0,
endColumn: 13
}

Testing Your Highlighter

The best way to test your tokenizer is to see it live, right? To do that, you'll want to modify the live Ace demo to preview your changes. You can find this file in the root Ace directory with the name kitchen-sink.html.

  1. add an entry to supportedModes in ace/ext/modelist.js
  2. add a sample file to demo/kitchen-sink/docs/ with same name as the mode file

Once you set this up, you should be able to witness a live demonstration of your new highlighter.

Adding Automated Tests

Adding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development.

In lib/ace/mode/_test create a file named 

text_<modeName>.txt

with some example code. (You can skip this if the document you have added in demo/docs both looks good and covers various edge cases in your language syntax).

 

Run node highlight_rules_test.js -gen to preserve current output of your tokenizer in tokens_<modeName>.json

After this running highlight_rules_test.js optionalLanguageName will compare output of your tokenizer with the correct output you've created.

Any files ending with the _test.js suffix are automatically run by Ace's Travis CI server.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM