Linq To Objects - 如何操作字符串
開篇語:
上次發布的 《LINQ:進階 - LINQ 標准查詢操作概述》(90+贊) 社會反響不錯,但自己卻始終覺得缺點什么!“紙上得來終覺淺,絕知此事要躬行”,沒錯,就是實戰!這次讓我們一起來看看一些操作字符串的技巧,也許能引我們從不同的角度思考問題,從而走出思維的死角!
序
LINQ 可用於查詢和轉換字符串和字符串集合。它對文本文件中的半結構化數據尤其有用。LINQ 查詢可與傳統的字符串函數和正則表達式結合使用。
查詢文本塊
查詢文本格式的半結構化數據
一、如何統計單詞在字符串出現次數
1 const string text = @"Historically, the world of data and the world of objects" + 2 @" have not been well integrated. Programmers work in C# or Visual Basic" + 3 @" and also in SQL or XQuery. On the one side are concepts such as classes," + 4 @" objects, fields, inheritance, and .NET Framework APIs. On the other side" + 5 @" are tables, columns, rows, nodes, and separate languages for dealing with" + 6 @" them. Data types often require translation between the two worlds; there are" + 7 @" different standard functions. Because the object world has no notion of query, a" + 8 @" query can only be represented as a string without compile-time type checking or" + 9 @" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" + 10 @" objects in memory is often tedious and error-prone."; 11 12 const string searchTerm = "data"; 13 14 //字符串轉換成數組 15 var source = text.Split(new[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries); 16 17 //創建查詢,並忽略大小寫比較 18 var matchQuery = from word in source 19 where string.Equals(word, searchTerm, StringComparison.InvariantCultureIgnoreCase) 20 select word; 21 22 //統計匹配數量 23 var wordCount = matchQuery.Count(); 24 Console.WriteLine($"{wordCount} occurrences(s) of the search term \"{searchTerm}\" were found.");
1 const string text = @"Historically, the world of data and the world of objects " + 2 @"have not been well integrated. Programmers work in C# or Visual Basic " + 3 @"and also in SQL or XQuery. On the one side are concepts such as classes, " + 4 @"objects, fields, inheritance, and .NET Framework APIs. On the other side " + 5 @"are tables, columns, rows, nodes, and separate languages for dealing with " + 6 @"them. Data types often require translation between the two worlds; there are " + 7 @"different standard functions. Because the object world has no notion of query, a " + 8 @"query can only be represented as a string without compile-time type checking or " + 9 @"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " + 10 @"objects in memory is often tedious and error-prone."; 11 12 //將文本塊切割成數組 13 var sentences = text.Split('.', '?', '!'); 14 15 //定義搜索條件,此列表可以運行時動態添加 16 string[] wordsToMatch = { "Historically", "data", "integrated" }; 17 18 var match = from sentence in sentences 19 let t = 20 sentence.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries) 21 where t.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Length //去重,取交集后的數量對比 22 select sentence; 23 24 foreach (var s in match) 25 { 26 Console.WriteLine(s); 27 }

查詢運行時首先將文本拆分成句子,然后將句子拆分成包含每個單詞的字符串數組。對於每個這樣的數組,Distinct<TSource> 方法移除所有重復的單詞,然后查詢對單詞數組和 wordstoMatch 數組執行 Intersect<TSource> 操作。如果交集的計數與 wordsToMatch 數組的計數相同,則在單詞中找到了所有的單詞,且返回原始句子。
三、如何在字符串中查詢字符
因為 String 類實現泛型 IEnumerable<T> 接口,所以可以將任何字符串作為字符序列進行查詢。但是,這不是 LINQ 的常見用法。若要執行復雜的模式匹配操作,請使用 Regex 類。
下面的示例查詢一個字符串以確定它包含的數字的數目。
1 const string aString = "ABCDE99F-J74-12-89A"; 2 3 //只選擇數字的字符 4 var digits = from ch in aString 5 where char.IsDigit(ch) 6 select ch; 7 8 Console.Write("digit: "); 9 10 foreach (var n in digits) 11 { 12 Console.Write($"{n} "); 13 } 14 15 Console.WriteLine(); 16 17 //選擇第一個“-”之前的所有字符 18 var query = aString.TakeWhile(x => x != '-'); 19 20 foreach (var ch in query) 21 { 22 Console.Write(ch); 23 }
四、如何用正則表達式結合 LINQ 查詢
此示例演示如何使用 Regex 類創建正則表達式以便在文本字符串中進行更復雜的匹配。使用 LINQ 查詢可以方便地對您要用正則表達式搜索的文件進行准確篩選,以及對結果進行加工。
1 //根據不同版本的 vs 修改路徑 2 const string floder = @"C:\Program Files (x86)\Microsoft Visual Studio 14.0\"; 3 var infoes = GetFiles(floder); 4 //創建正則表達式來尋找所有的"Visual" 5 var searchTerm = new Regex(@"Visual (Basic|C#|C\+\+|J#|SourceSafe|Studio)"); 6 7 //搜索每一個“.html”文件 8 //通過 where 找到匹配項 9 //【注意】select 中的變量要求顯示聲明其類型,因為 MatchCollection 不是泛型 IEnumerable 集合 10 var query = from fileInfo in infoes 11 where fileInfo.Extension == ".html" 12 let text = File.ReadAllText(fileInfo.FullName) 13 let matches = searchTerm.Matches(text) 14 where matches.Count > 0 15 select new 16 { 17 name = fileInfo.FullName, 18 matchValue = from Match match in matches select match.Value 19 }; 20 21 Console.WriteLine($"The term \"{searchTerm}\" was found in:"); 22 23 foreach (var q in query) 24 { 25 //修剪匹配找到的文件中的路徑 26 Console.WriteLine($"{q.name.Substring(floder.Length - 1)}"); 27 28 //輸出找到的匹配值 29 foreach (var v in q.matchValue) 30 { 31 Console.WriteLine(v); 32 } 33 }
1 private static IList<FileInfo> GetFiles(string path) 2 { 3 var files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories); 4 5 return files.Select(file => new FileInfo(file)).ToList(); 6 }
您還可以查詢由 RegEx 搜索返回的 MatchCollection 對象。在此示例中,結果中僅生成每個匹配項的值。但也可使用 LINQ 對該集合執行各種篩選、排序和分組操作。
【注意】由於 MatchCollection 是非泛型 IEnumerable 集合,因此必須顯式聲明查詢中的范圍變量的類型。
五、如何查找兩個集合間的差異

Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra

Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi

1 //創建數據源 2 var names1Text = File.ReadAllLines(@"names1.txt"); 3 var names2Text = File.ReadAllLines(@"names2.txt"); 4 5 //創建查詢,這里必須使用方法語法 6 var query = names1Text.Except(names2Text); 7 8 //執行查詢 9 Console.WriteLine("The following lines are in names1.txt but not names2.txt"); 10 foreach (var name in query) 11 { 12 Console.WriteLine(name); 13 }

六、如何排序或過濾任意單詞或字段的文本數據

111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
1 //創建數據源 2 var scores = File.ReadAllLines(@"scores.csv"); 3 //可以改為 0~4 的任意值 4 const int sortField = 1; 5 6 //演示從方法返回查詢 7 //返回查詢變量,非查詢結果 8 //這里執行查詢 9 foreach (var score in RunQuery(scores, sortField)) 10 { 11 Console.WriteLine(score); 12 }
1 private static IEnumerable<string> RunQuery(IEnumerable<string> score, int num) 2 { 3 //分割字符串來排序 4 var query = from line in score 5 let fields = line.Split(',') 6 orderby fields[num] descending 7 select line; 8 9 return query; 10 }

此示例還演示如何從方法返回查詢變量。
七、如何對一個分割的文件的字段重新排序
逗號分隔值 (CSV) 文件是一種文本文件,通常用於存儲電子表格數據或其他由行和列表示的表格數據。通過使用 Split 方法分隔字段,可以非常輕松地使用 LINQ 來查詢和操作 CSV 文件。事實上,可以使用此技術來重新排列任何結構化文本行部分;此技術不局限於 CSV 文件。

Adams,Terry,120 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Cesar,114 Garcia,Debra,115 Garcia,Hugo,118 Mortensen,Sven,113 O'Donnell,Claire,112 Omelchenko,Svetlana,111 Tucker,Lance,119 Tucker,Michael,122 Zabokritski,Eugene,121
1 //數據源 2 var lines = File.ReadAllLines(@"spreadsheet1.csv"); 3 //將舊數據的第2列的字段放到第一位,逆向結合第0列和第1列的字段 4 var query = from line in lines 5 let t = line.Split(',') 6 orderby t[2] 7 select $"{t[2]}, {t[1]} {t[0]}"; 8 9 foreach (var q in query) 10 { 11 Console.WriteLine(q); 12 } 13 14 //寫入文件 15 File.WriteAllLines("spreadsheet2.csv", query);
八、如何組合和比較字符串集合
此示例演示如何合並包含文本行的文件,然后排序結果。具體來說,此示例演示如何對兩組文本行執行簡單的串聯、聯合和交集。

Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra

Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 var names1Text = File.ReadAllLines(@"names1.txt"); 2 var names2Text = File.ReadAllLines(@"names2.txt"); 3 4 //簡單連接,並排序。重復保存。 5 var concatQuery = names1Text.Concat(names2Text).OrderBy(x => x); 6 OutputQueryResult(concatQuery, "Simple concatenate and sort. Duplicates are preserved:"); 7 8 //基於默認字符串比較器連接,並刪除重名。 9 var unionQuery = names1Text.Union(names2Text).OrderBy(x => x); 10 OutputQueryResult(unionQuery, "Union removes duplicate names:"); 11 12 //查找在兩個文件中出現的名稱 13 var intersectQuery = names1Text.Intersect(names2Text).OrderBy(x => x); 14 OutputQueryResult(intersectQuery, "Merge based on intersect:"); 15 16 //在每個列表中找到匹配的字段。使用 concat 將兩個結果合並,然后使用默認的字符串比較器進行排序 17 const string nameMatch = "Garcia"; 18 var matchQuery1 = from name in names1Text 19 let t = name.Split(',') 20 where t[0] == nameMatch 21 select name; 22 var matchQuery2 = from name in names2Text 23 let t = name.Split(',') 24 where t[0] == nameMatch 25 select name; 26 27 var temp = matchQuery1.Concat(matchQuery2).OrderBy(x => x); 28 OutputQueryResult(temp, $"Concat based on partial name match \"{nameMatch}\":");
1 private static void OutputQueryResult(IEnumerable<string> querys, string title) 2 { 3 Console.WriteLine(Environment.NewLine + title); 4 foreach (var query in querys) 5 { 6 Console.WriteLine(query); 7 } 8 9 Console.WriteLine($"{querys.Count()} total names in list"); 10 }
九、如何從多個源中填充對象集合
1 //每行 names.csv 包含姓氏,名字,和身份證號,以逗號分隔。例如,Omelchenko,Svetlana,111 2 var names = File.ReadAllLines(@"names.csv"); 3 //每行 scores.csv 包括身份證號碼和四個測試評分,以逗號分隔。例如,111,97,92,81,60 4 var scores = File.ReadAllLines(@"scores.csv"); 5 6 //使用一個匿名的類型合並數據源。 7 //【注意】動態創建一個 int 的考試成績成員列表。 8 //跳過分割字符串中的第一項,因為它是學生的身份證,不是一個考試成績 9 var students = from name in names 10 let t = name.Split(',') 11 from score in scores 12 13 let t2 = score.Split(',') 14 where t[2] == t2[0] 15 select new 16 { 17 FirstName = t[0], 18 LastName = t[1], 19 ID = Convert.ToInt32(t[2]), 20 ExamScores = (from scoreAsText in t2.Skip(1) 21 select Convert.ToInt32(scoreAsText)).ToList() 22 }; 23 24 foreach (var student in students) 25 { 26 Console.WriteLine( 27 $"The average score of {student.FirstName} {student.LastName} is {student.ExamScores.Average()}."); 28 }

十、如何使用 group 將一個文件拆分成多個文件

Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra

Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 var fileA = File.ReadAllLines(@"names1.txt"); 2 var fileB = File.ReadAllLines(@"names2.txt"); 3 4 //並集:連接並刪除重復的名字 5 var mergeQuery = fileA.Union(fileB); 6 //根據姓氏的首字母對姓名進行分組 7 var query = from name in mergeQuery 8 let t = name.Split(',') 9 group name by t[0][0] into g 10 orderby g.Key 11 select g; 12 13 //注意嵌套的 foreach 循環 14 foreach (var g in query) 15 { 16 var fileName = @"testFile_" + g.Key + ".txt"; 17 Console.WriteLine(g.Key + ":"); 18 19 //寫入文件 20 using (var sw = new StreamWriter(fileName)) 21 { 22 foreach (var name in g) 23 { 24 sw.WriteLine(name); 25 Console.WriteLine(" " + name); 26 } 27 } 28 }
十一、如何向不同的文件中加入內容

111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91

Omelchenko,Svetlana,111 O'Donnell,Claire,112 Mortensen,Sven,113 Garcia,Cesar,114 Garcia,Debra,115 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Hugo,118 Tucker,Lance,119 Adams,Terry,120 Zabokritski,Eugene,121 Tucker,Michael,122
scores.csv:此文件表示電子表格數據。第 1 列是學生的 ID,第 2 至 5 列是測驗分數。
names.csv:此文件表示一個電子表格。該電子表格包含學生的姓氏、名字和學生 ID。
1 var names = File.ReadAllLines(@"names.csv"); 2 var scores = File.ReadAllLines(@"scores.csv"); 3 4 //Name: Last[0], First[1], ID[2] 5 // Omelchenko, Svetlana, 11 6 //Score: StudentID[0], Exam1[1] Exam2[2], Exam3[3], Exam4[4] 7 // 111, 97, 92, 81, 60 8 9 //該查詢基於 id 連接兩個不同的電子表格 10 var query = from name in names 11 let t1 = name.Split(',') 12 from score in scores 13 let t2 = score.Split(',') 14 where t1[2] == t2[0] 15 orderby t1[0] 16 select $"{t1[0]},{t2[1]},{t2[2]},{t2[3]},{t2[4]}"; 17 18 //輸出 19 OutputQueryResult(query, "Merge two spreadsheets:");
1 private static void OutputQueryResult(IEnumerable<string> querys, string title) 2 { 3 Console.WriteLine(Environment.NewLine + title); 4 foreach (var query in querys) 5 { 6 Console.WriteLine(query); 7 } 8 9 Console.WriteLine($"{querys.Count()} total names in list"); 10 }
十二、如何計算一個 CSV 文本文件中的列值

111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
scores.csv:假定第一列表示學員 ID,后面幾列表示四次考試的分數。
1 var scores = File.ReadAllLines(@"scores.csv"); 2 3 //指定要計算的列 4 const int examNum = 3; 5 6 //scores.csv 格式: 7 //Student ID Exam#1 Exam#2 Exam#3 Exam#4 8 //111, 97, 92, 81, 60 9 10 //+1 表示跳過第一列 11 //計算但一列 12 SingleColumn(scores, examNum+1); 13 14 Console.WriteLine(); 15 16 //計算多列 17 MultiColumns(scores);
1 private static void SingleColumn(IEnumerable<string> strs, int examNum) 2 { 3 Console.WriteLine("Single Column Query:"); 4 5 //查詢分兩步: 6 // 1.分割字符串 7 // 2.對要計算的列的值轉換為 int 8 var query = from str in strs 9 let t = str.Split(',') 10 select Convert.ToInt32(t[examNum]); 11 12 //對指定的列進行統計 13 var average = query.Average(); 14 var max = query.Max(); 15 var min = query.Min(); 16 17 Console.WriteLine($"Exam #{examNum}: Average:{average:##.##} High Score:{max} Low Score:{min}"); 18 } 19 20 private static void MultiColumns(IEnumerable<string> strs) 21 { 22 Console.WriteLine("Multi Column Query:"); 23 24 //查詢步驟: 25 // 1.分割字符串 26 // 2.跳過 id 列(第一列) 27 // 3.將當前行的每個評分都轉換成 int,並選擇整個序列作為一行結果。 28 var query = from str in strs 29 let t1 = str.Split(',') 30 let t2 = t1.Skip(1) 31 select (from t in t2 32 select Convert.ToInt32(t)); 33 34 //執行查詢並緩存結果以提高性能 35 var results = query.ToList(); 36 //找出結果的列數 37 var count = results[0].Count(); 38 39 //執行統計 40 //為每一列分數的循環執行一次循環 41 for (var i = 0; i < count; i++) 42 { 43 var query2 = from result in results 44 select result.ElementAt(i); 45 46 var average = query2.Average(); 47 var max = query2.Max(); 48 var min = query2.Min(); 49 50 //+1 因為 #1 表示第一次考試 51 Console.WriteLine($"Exam #{i + 1} Average: {average:##.##} High Score: {max} Low Score: {min}"); 52 } 53 54 }

查詢的工作原理是使用 Split 方法將每一行文本轉換為數組。每個數組元素表示一列。最后,每一列中的文本都轉換為其數字表示形式。如果文件是制表符分隔文件,只需將 Split 方法中的參數更新為 \t。
================================================== 傳送門分割線 ==================================================
LINQ 其它隨筆 - 《開始使用 LINQ》
================================================== 傳送門分割線 ==================================================
【首聯】http://www.cnblogs.com/liqingwen/p/5814204.html
【參考】https://msdn.microsoft.com/zh-cn/library/bb397915(v=vs.100).aspx