這篇博客將介紹在C#中如何讀取數據量很大的Xml文件。請看下面的Xml文件,
<?xml version="1.0" encoding="utf-8"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>C# developer</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> </catalog>
使用LINQ TO XML會很方便的處理這個Xml文件,例如我們要獲取Book的數量
XElement doc = XElement.Load("Book.xml"); var books = from book in doc.Descendants("book") where book.Attribute("id").Value != "bk109" select book; Console.WriteLine("Books count: {0}", books.Count());
非常方便快捷的可以得到結果。但是當Xml文件很大時(例如,XML文件50M),使用這種方式讀取會很慢。這是因為XElement會將這個Xml文檔一次性的加載到內存中,在內存中需要維護XML的DOM模型,會消耗很多的內存。使用XmlDocument操作大Xml文件結果也是一樣。
當遇到大型的Xml文件,使用XmlReader來處理。請看下面的代碼;
public static IEnumerable<Book> Books(this XmlReader source) { while (source.Read()) { if (source.NodeType == XmlNodeType.Element && source.Name == "book") { string id = source.GetAttribute("id"); int count = source.AttributeCount; string content = source.ReadInnerXml(); string formated = string.Format("<book>{0}</book>", content); XElement element = XElement.Parse(formated); yield return new Book { Id = id, Author = element.Element("author").Value, Title = element.Element("title").Value, Description = element.Element("description").Value }; } } }
using (XmlReader reader = XmlReader.Create("Book.xml")) { Console.WriteLine("Books count: {0}", reader.Books().Count()); }
使用XmlReader讀取Xml文件時,不會一次性將Xml文件讀取到內存中。處理大型Xml文件的效率比XmlDocument/LINQ TO Xml高很多。
感謝您的閱讀。