從這篇開始,我將按照依賴順序,從核心到外圍一步步說明這個東西。時間有點長,東西也不少。耐下心來看,或許能發現一些用得上的東西,當然不僅僅是在這里!
從字符串解析為一個 Lambda 表達式樹,第一步就是從中分析出有用的東西來。字符串【() => new int[6]】經過解析之后,我們會獲得如下信息:
- 左括號、索引值:0、文本表示:(
- 右括號、索引值:1、文本表示:)
- Lambda 表達式前綴、索引值:3、文本表示:=>
- 標識符、索引值:6、文本表示:new
- 標識符、索引值:10、文本表示:int
- 左中括號、索引值:13、文本表示:[
- 整形數字、索引值:14、文本表示:6
- 右中括號、索引值:15、文本表示:]
好了,字符串解析完畢后,可以保存在一個列表中,方便后面的讀取操作。這些信息,每一項都可以作為一個對象,整體抽象出一個類。封裝一下,不用太多信息,包含上面列出來的三個屬性就可以了:單元類型、索引值和文本表示,經過整理后,代碼如下:
/// <summary>
/// 字符單元
/// </summary>
[DebuggerStepThrough]
[DebuggerDisplay("Text = {Text}, ID = {ID}, Index = {Index}")]
public struct Token
{
#region Fields
/// <summary>
/// 空的字符單元
/// </summary>
public static readonly Token Empty = new Token();
private TokenId id;
private string text;
private int index;
private int? hash;
#endregion
#region Properties
/// <summary>
/// 獲取或設置字符類型
/// </summary>
public TokenId ID
{
get { return id; }
set
{
id = value;
hash = null;
}
}
/// <summary>
/// 獲取或設置當前字符單元的文本表示
/// </summary>
public string Text
{
get { return text; }
set
{
text = value;
hash = null;
}
}
/// <summary>
/// 獲取或設置當前字符單元在整體結果中的索引
/// </summary>
public int Index
{
get { return index; }
set
{
index = value;
hash = null;
}
}
#endregion
#region Override Methods
/// <summary>
/// Determines whether the specified <see cref="System.Object"/> is equal to this instance.
/// </summary>
/// <param name="obj">The <see cref="System.Object"/> to compare with this instance.</param>
/// <returns>
/// <c>true</c> if the specified <see cref="System.Object"/> is equal to this instance; otherwise, <c>false</c>.
/// </returns>
public override bool Equals(object obj)
{
if (ReferenceEquals(obj, null)) return false;
if (obj is Token)
return Equals((Token)obj);
else
return false;
}
/// <summary>
/// Equalses the specified token.
/// </summary>
/// <param name="token">The token.</param>
/// <returns></returns>
public bool Equals(Token token)
{
if (ReferenceEquals(token, null)) return false;
return ID == token.id && Text == token.Text;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <returns>
/// A hash code for this instance, suitable for use in hashing algorithms and data structures like a hash table.
/// </returns>
public override int GetHashCode()
{
unchecked
{
if (!hash.HasValue)
{
hash = ID.GetHashCode();
hash ^= Text.GetHashCode();
hash ^= Index.GetHashCode();
}
return hash.Value;
}
}
/// <summary>
/// Returns a <see cref="System.String"/> that represents this instance.
/// </summary>
/// <returns>
/// A <see cref="System.String"/> that represents this instance.
/// </returns>
public override string ToString()
{
return text;
}
/// <summary>
/// Performs an implicit conversion from <see cref="Lenic.DI.Core.Token"/> to <see cref="System.String"/>.
/// </summary>
/// <param name="value">The value.</param>
/// <returns>The result of the conversion.</returns>
public static implicit operator string(Token value)
{
return value.text;
}
#endregion
#region Exception Throw
/// <summary>
/// 如果當前實例的文本表示與指定的字符串不符, 則拋出異常.
/// </summary>
/// <param name="id">待判斷的字符串.</param>
/// <returns>當前實例對象.</returns>
public Token Throw(TokenId id)
{
if (ID != id)
throw new ParserSyntaxErrorException();
return this;
}
/// <summary>
/// 如果當前實例的字符類型與指定的字符類型不符, 則拋出異常.
/// </summary>
/// <param name="id">待判斷的目標類型的字符類型.</param>
/// <returns>當前實例對象.</returns>
public Token Throw(string text)
{
if (Text != text)
throw new ParserSyntaxErrorException();
return this;
}
#endregion
}
注意到上面出現了一個 .Net 類庫中沒有的 TokenId ,其實就是一個枚舉,指示單元的類型是括號、Lambda 表達式前綴,還是整形數字。我貼出代碼,很好理解:
/// <summary>
/// 字符單元類型
/// </summary>
public enum TokenId
{
/// <summary>
/// End
/// </summary>
End,
/// <summary>
/// Identifier
/// </summary>
Identifier,
/// <summary>
/// String
/// </summary>
StringLiteral,
/// <summary>
/// Integer Literal
/// </summary>
IntegerLiteral,
/// <summary>
/// Long Integer Literal
/// </summary>
LongIntegerLiteral,
/// <summary>
/// Single Real Literal
/// </summary>
SingleRealLiteral,
/// <summary>
/// Decimal Real Literal
/// </summary>
DecimalRealLiteral,
/// <summary>
/// Real Literal
/// </summary>
RealLiteral,
/// <summary>
/// !
/// </summary>
Exclamation,
/// <summary>
/// %
/// </summary>
Percent,
/// <summary>
/// &
/// </summary>
Amphersand,
/// <summary>
/// (
/// </summary>
OpenParen,
/// <summary>
/// )
/// </summary>
CloseParen,
/// <summary>
/// *
/// </summary>
Asterisk,
/// <summary>
/// +
/// </summary>
Plus,
/// <summary>
/// ,
/// </summary>
Comma,
/// <summary>
/// -
/// </summary>
Minus,
/// <summary>
/// .
/// </summary>
Dot,
/// <summary>
/// /
/// </summary>
Slash,
/// <summary>
/// :
/// </summary>
Colon,
/// <summary>
/// <
/// </summary>
LessThan,
/// <summary>
/// =
/// </summary>
Equal,
/// <summary>
/// >
/// </summary>
GreaterThan,
/// <summary>
/// ?
/// </summary>
Question,
/// <summary>
/// ??
/// </summary>
DoubleQuestion,
/// <summary>
/// [
/// </summary>
OpenBracket,
/// <summary>
/// ]
/// </summary>
CloseBracket,
/// <summary>
/// |
/// </summary>
Bar,
/// <summary>
/// !=
/// </summary>
ExclamationEqual,
/// <summary>
/// &&
/// </summary>
DoubleAmphersand,
/// <summary>
/// <=
/// </summary>
LessThanEqual,
/// <summary>
/// <>
/// </summary>
LessGreater,
/// <summary>
/// ==
/// </summary>
DoubleEqual,
/// <summary>
/// >=
/// </summary>
GreaterThanEqual,
/// <summary>
/// ||
/// </summary>
DoubleBar,
/// <summary>
/// =>
/// </summary>
LambdaPrefix,
/// <summary>
/// {
/// </summary>
OpenBrace,
/// <summary>
/// }
/// </summary>
CloseBrace,
}
接下來,這些解析出來的字符單元,放到 List<> 中是個不錯的主意,不過我想增強一下會更好,比如添加下面的幾個方法:
- 讀取並返回下一個(Next)
- 嘗試讀取下一個(PeekNext)
- 判斷下一項是什么(NextIs)
- 跳過下面的幾項(Skip)
- 重置讀取位置(ReturnToIndex)
- ……
這些方法,會為后續操作帶來極大的便利。幸運的是,代碼已經在下面了:
/// <summary>
/// Symbol Parse Result
/// </summary>
[Serializable]
[DebuggerStepThrough]
[DebuggerDisplay("{ToString()}")]
public class SymbolParseResult : ReadOnlyCollection<Token>
{
#region Private Fields
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
private int _maxIndex = 0;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
private int _lastIndex = 0;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
private int _index = -1;
#endregion
#region Constuction
/// <summary>
/// Initializes a new instance of the <see cref="SymbolParseResult"/> class.
/// </summary>
internal SymbolParseResult()
: base(new List<Token>())
{
}
/// <summary>
/// Initializes a new instance of the <see cref="SymbolParseResult"/> class.
/// </summary>
/// <param name="list">The list.</param>
internal SymbolParseResult(IList<Token> list)
: base(list)
{
_maxIndex = list.Count - 1;
}
#endregion
#region Business Properties
/// <summary>
/// 獲取或設置當前讀取索引
/// </summary>
public int Index
{
get { return _index; }
private set
{
_lastIndex = _index;
_index = value;
}
}
/// <summary>
/// 獲取當前讀取中的字符單元
/// </summary>
public Token Current
{
get
{
if (Index < 0 || Index > _maxIndex)
return Token.Empty;
return this[Index];
}
}
/// <summary>
/// 獲取完整的字符串表達式
/// </summary>
private string StringExpression
{
get { return string.Join(" ", this); }
}
#endregion
#region Business Methods
/// <summary>
/// 讀取下一個字符單元, 同時讀取索引前進.
/// </summary>
/// <returns>讀取得到的字符單元</returns>
public Token Next()
{
Token token;
if (TryGetElement(out token, Index + 1))
return token;
else
return Token.Empty;
}
/// <summary>
/// 判斷下一個字符單元是否是指定的類型, 同時讀取索引前進.
/// </summary>
/// <param name="tokenId">期待得到的字符單元類型.</param>
/// <param name="throwIfNot">如果設置為 <c>true</c> 表示拋出異常. 默認為 <c>false</c> 表示不拋出異常.</param>
/// <returns><c>true</c> 表示讀取的單元類型和期待的單元類型一致; 否則返回 <c>false</c> .</returns>
public bool NextIs(TokenId tokenId, bool throwIfNot = false)
{
var result = Next().ID == tokenId;
if (!result && throwIfNot)
throw new ApplicationException(string.Format("next is not {0}", tokenId));
return result;
}
/// <summary>
/// 嘗試讀取下一個字符單元, 但並不前進.
/// </summary>
/// <param name="count">嘗試讀取的當前字符單元的后面第幾個單元, 默認為后面第一個單元.</param>
/// <returns>讀取得到的字符單元.</returns>
public Token PeekNext(int count = 1)
{
Token token;
if (PeekGetElement(out token, Index + count))
return token;
else
return Token.Empty;
}
/// <summary>
/// 判斷下一個字符單元是否是指定的類型, 但讀取索引不前進.
/// </summary>
/// <param name="tokenId">期待得到的字符單元類型.</param>
/// <param name="count">判斷當前字符后面第幾個是指定的字符單元類型, 默認值為 1 .</param>
/// <param name="throwIfNot">如果設置為 <c>true</c> 表示拋出異常. 默認為 <c>false</c> 表示不拋出異常.</param>
/// <returns>
/// <c>true</c> 表示讀取的單元類型和期待的單元類型一致; 否則返回 <c>false</c> .
/// </returns>
public bool PeekNextIs(TokenId tokenId, int count = 1, bool throwIfNot = false)
{
var result = PeekNext(count).ID == tokenId;
if (!result && throwIfNot)
throw new ApplicationException(string.Format("next is not {0}", tokenId));
return result;
}
/// <summary>
/// 前進跳過指定的字符單元.
/// </summary>
/// <param name="count">The count.</param>
public void Skip(int count = 1)
{
count = Index + count;
CheckIndexOut(count);
Index = count;
}
/// <summary>
/// 讀取直到符合 predicate 的條件時停止.
/// </summary>
/// <param name="predicate">比較當前 Token 是否符合條件的方法.</param>
/// <returns>讀取停止時的 Token 列表.</returns>
public IList<Token> SkipUntil(Func<Token, bool> predicate)
{
List<Token> data = new List<Token>();
while (!predicate(Current) || Current.ID == TokenId.End)
data.Add(Next());
return data;
}
/// <summary>
/// 返回到指定的讀取索引.
/// </summary>
/// <param name="index">目標讀取索引.</param>
public void ReturnToIndex(int index)
{
if (index < -1 || index > _maxIndex)
throw new IndexOutOfRangeException();
Index = index;
}
#endregion
#region Private Methods
private bool TryGetElement(out Token token, int index)
{
bool result = PeekGetElement(out token, index);
if (result)
Index = index;
return result;
}
private bool PeekGetElement(out Token token, int index)
{
if (index < 0 || index > _maxIndex)
{
token = Token.Empty;
return false;
}
else
{
token = this[index];
return true;
}
}
private void CheckIndexOut(int index)
{
if (index < 0 || index > _maxIndex)
throw new IndexOutOfRangeException();
}
#endregion
#region Override Methods
/// <summary>
/// Returns a <see cref="System.String"/> that represents this instance.
/// </summary>
/// <returns>
/// A <see cref="System.String"/> that represents this instance.
/// </returns>
public override string ToString()
{
return string.Join(" ", this.TakeWhile(p => p.Index < Current.Index));
}
#endregion
}
接下來,核心的字符串分析類!這個類是我從 DynamicLINQ 中拆出來的,每次讀取一個字符,下一次和前一次的比對,整理出一個個的 Token 。作者的思路很嚴謹,把能想到的都放到里面去了,嚴格按照 C# 的語法讀取,否則拋出異常。除此之外,還可以加入自定義的 Token ,只要在 NextToken 大方法中添加自己的邏輯就好,可以參照我添加的 LambdaPrefix 塊:
/// <summary>
/// Symbol Parser
/// </summary>
[DebuggerStepThrough]
[DebuggerDisplay("CurrentPosition = {CurrentPosition}, Source = {Source}")]
public sealed class SymbolParser
{
#region Fields And Properties
/// <summary>
/// Gets the source.
/// </summary>
public string Source { get; private set; }
/// <summary>
/// Gets the current position.
/// </summary>
public int CurrentPosition { get; private set; }
/// <summary>
/// Gets the length.
/// </summary>
public int Length { get; private set; }
/// <summary>
/// Gets the current char.
/// </summary>
public char CurrentChar { get; private set; }
private Token currentToken;
/// <summary>
/// Gets the current token.
/// </summary>
public Token CurrentToken { get { return currentToken; } }
#endregion
#region Constructor
/// <summary>
/// Initializes a new instance of the <see cref="SymbolParser"/> class.
/// </summary>
/// <param name="source">The source.</param>
public SymbolParser(string source)
{
if (ReferenceEquals(null, source))
throw new ArgumentNullException("source");
Source = source;
Length = source.Length;
SetPosition(0);
}
#endregion
#region Business Methods
/// <summary>
/// Sets the position.
/// </summary>
/// <param name="index">The index.</param>
public void SetPosition(int index)
{
CurrentPosition = index;
CurrentChar = CurrentPosition < Length ? Source[CurrentPosition] : '\0';
}
/// <summary>
/// Nexts the char.
/// </summary>
public void NextChar()
{
if (CurrentPosition < Length) CurrentPosition++;
CurrentChar = CurrentPosition < Length ? Source[CurrentPosition] : '\0';
}
/// <summary>
/// Nexts the token.
/// </summary>
/// <returns></returns>
public Token NextToken()
{
while (Char.IsWhiteSpace(CurrentChar)) NextChar();
TokenId t;
int tokenPos = CurrentPosition;
switch (CurrentChar)
{
case '!':
NextChar();
if (CurrentChar == '=')
{
NextChar();
t = TokenId.ExclamationEqual;
}
else
{
t = TokenId.Exclamation;
}
break;
case '%':
NextChar();
t = TokenId.Percent;
break;
case '&':
NextChar();
if (CurrentChar == '&')
{
NextChar();
t = TokenId.DoubleAmphersand;
}
else
{
t = TokenId.Amphersand;
}
break;
case '(':
NextChar();
t = TokenId.OpenParen;
break;
case ')':
NextChar();
t = TokenId.CloseParen;
break;
case '*':
NextChar();
t = TokenId.Asterisk;
break;
case '+':
NextChar();
t = TokenId.Plus;
break;
case ',':
NextChar();
t = TokenId.Comma;
break;
case '-':
NextChar();
t = TokenId.Minus;
break;
case '.':
NextChar();
t = TokenId.Dot;
break;
case '/':
NextChar();
t = TokenId.Slash;
break;
case ':':
NextChar();
t = TokenId.Colon;
break;
case '<':
NextChar();
if (CurrentChar == '=')
{
NextChar();
t = TokenId.LessThanEqual;
}
else if (CurrentChar == '>')
{
NextChar();
t = TokenId.LessGreater;
}
else
{
t = TokenId.LessThan;
}
break;
case '=':
NextChar();
if (CurrentChar == '=')
{
NextChar();
t = TokenId.DoubleEqual;
}
else if (CurrentChar == '>')
{
NextChar();
t = TokenId.LambdaPrefix;
}
else
{
t = TokenId.Equal;
}
break;
case '>':
NextChar();
if (CurrentChar == '=')
{
NextChar();
t = TokenId.GreaterThanEqual;
}
else
{
t = TokenId.GreaterThan;
}
break;
case '?':
NextChar();
if (CurrentChar == '?')
{
NextChar();
t = TokenId.DoubleQuestion;
}
else
{
t = TokenId.Question;
}
break;
case '[':
NextChar();
t = TokenId.OpenBracket;
break;
case ']':
NextChar();
t = TokenId.CloseBracket;
break;
case '{':
NextChar();
t = TokenId.OpenBrace;
break;
case '}':
NextChar();
t = TokenId.CloseBrace;
break;
case '|':
NextChar();
if (CurrentChar == '|')
{
NextChar();
t = TokenId.DoubleBar;
}
else
{
t = TokenId.Bar;
}
break;
case '"':
case '\'':
char quote = CurrentChar;
do
{
NextChar();
while (CurrentPosition < Length && CurrentChar != quote) NextChar();
if (CurrentPosition == Length)
throw ParseError(CurrentPosition, "Unterminated string literal");
NextChar();
} while (CurrentChar == quote);
t = TokenId.StringLiteral;
break;
default:
if (Char.IsLetter(CurrentChar) || CurrentChar == '@' || CurrentChar == '_')
{
do
{
NextChar();
} while (Char.IsLetterOrDigit(CurrentChar) || CurrentChar == '_' || CurrentChar == '?');
t = TokenId.Identifier;
break;
}
if (Char.IsDigit(CurrentChar))
{
t = TokenId.IntegerLiteral;
do
{
NextChar();
} while (Char.IsDigit(CurrentChar));
if (CurrentChar == 'l' || CurrentChar == 'L')
{
t = TokenId.LongIntegerLiteral;
NextChar();
break;
}
else if (CurrentChar == 'f' || CurrentChar == 'F')
{
t = TokenId.SingleRealLiteral;
NextChar();
break;
}
else if (CurrentChar == 'm' || CurrentChar == 'M')
{
t = TokenId.DecimalRealLiteral;
NextChar();
break;
}
else if (CurrentChar == 'd' || CurrentChar == 'D')
{
t = TokenId.RealLiteral;
NextChar();
break;
}
if (CurrentChar == '.')
{
t = TokenId.RealLiteral;
NextChar();
ValidateDigit();
do
{
NextChar();
} while (Char.IsDigit(CurrentChar));
}
if (CurrentChar == 'E' || CurrentChar == 'e')
{
t = TokenId.RealLiteral;
NextChar();
if (CurrentChar == '+' || CurrentChar == '-') NextChar();
ValidateDigit();
do
{
NextChar();
} while (Char.IsDigit(CurrentChar));
}
if (CurrentChar == 'F' || CurrentChar == 'f')
{
t = TokenId.SingleRealLiteral;
NextChar();
break;
}
else if (CurrentChar == 'm' || CurrentChar == 'M')
{
t = TokenId.DecimalRealLiteral;
NextChar();
break;
}
else if (CurrentChar == 'd' || CurrentChar == 'D')
{
t = TokenId.RealLiteral;
NextChar();
break;
}
break;
}
if (CurrentPosition == Length)
{
t = TokenId.End;
break;
}
throw ParseError(CurrentPosition, "Syntax error '{0}'", CurrentChar);
}
currentToken.ID = t;
currentToken.Text = Source.Substring(tokenPos, CurrentPosition - tokenPos);
currentToken.Index = tokenPos;
return new Token { ID = t, Text = currentToken.Text, Index = tokenPos, };
}
/// <summary>
/// Builds the specified source.
/// </summary>
/// <param name="source">The source.</param>
/// <returns>The Build result.</returns>
public static SymbolParseResult Build(string source)
{
var item = new SymbolParser(source);
List<Token> data = new List<Token>();
while (true)
{
var token = item.NextToken();
data.Add(token);
if (token.ID == TokenId.End)
break;
}
return new SymbolParseResult(data);
}
#endregion
#region Private Methods
private void ValidateDigit()
{
if (!Char.IsDigit(CurrentChar)) throw ParseError(CurrentPosition, "Digit expected");
}
private Exception ParseError(string format, params object[] args)
{
return ParseError(currentToken.Index, format, args);
}
private Exception ParseError(int pos, string format, params object[] args)
{
return new ParseException(string.Format(CultureInfo.CurrentCulture, format, args), pos);
}
#endregion
}
最后,還有兩個異常類:
/// <summary>
/// 分析語法錯誤類
/// </summary>
[DebuggerStepThrough]
public class ParserSyntaxErrorException : Exception
{
/// <summary>
/// 初始化新建一個 <see cref="ParserSyntaxErrorException"/> 類的實例對象.
/// </summary>
public ParserSyntaxErrorException()
: base("syntax error!") { }
}
/// <summary>
/// Parse Exception
/// </summary>
[DebuggerStepThrough]
public sealed class ParseException : Exception
{
private int position;
/// <summary>
/// Initializes a new instance of the <see cref="ParseException"/> class.
/// </summary>
/// <param name="message">The message.</param>
/// <param name="position">The position.</param>
internal ParseException(string message, int position)
: base(message)
{
this.position = position;
}
/// <summary>
/// Gets the position.
/// </summary>
public int Position
{
get { return position; }
}
/// <summary>
/// Returns a <see cref="System.String"/> that represents this instance.
/// </summary>
/// <returns>
/// A <see cref="System.String"/> that represents this instance.
/// </returns>
public override string ToString()
{
return string.Format("{0} (at index {1})", Message, position);
}
}
最后,給大家一個效果圖:
不要說我邪惡啊,看十遍不如自己動手做一遍,呵呵。就到這里吧,期待下一篇的,幫忙點下推薦,我會很感激的,謝謝了!