[UWP]漲姿勢UWP源碼——RSS feed的獲取和解析

本文轉載自查看原文 2016-11-27 20:25 1395 漲姿勢UWP/ UWP

　　本篇開始具體分析漲姿勢UWP這個APP的代碼，首先從數據的源頭着手，即RSS feed的獲取和解析，相關的類為RssReader，所有和數據相關的操作均放在里面。

　　漲姿勢網站提供的RSS feed地址為http://www.zhangzishi.cc/feed，在UWP中想要通過發送http request並從URI接受http response，最簡單的方式就是使用HttpClient：

        public async Task<string> DownloadRssString()
        {
            var httpClient = new HttpClient();
            var result = await httpClient.GetStringAsync(new Uri("http://www.zhangzishi.cc/feed"));
            return result;
        }

　　通過上面這個方法，我們會獲取到最新的漲姿勢的數據源，並且是以XML格式組織的。頭部是一些命名空間的定義，接下來的channel節點定義了一些title，description等信息，這里比較重要的是lastBuildDate，因為后面我們會根據這個字段來判斷是否有新數據需要保存到本地，並刷姿勢新聞列表。

<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/">
  <channel>
    <title>漲姿勢</title>
    <atom:link href="http://www.zhangzishi.cc/feed" rel="self" type="application/rss+xml" />
    <link>http://www.zhangzishi.cc</link>
    <description>騷年,來這里漲點姿勢吧！</description>
    <lastBuildDate>Sun, 17 Jul 2016 04:37:46 +0800</lastBuildDate>
</channel>
</rss>

　　APP核心的新聞內容對應數據源中Item節點，每一個Item就對應一條漲姿勢的新聞。整個XML文件中會存在幾十個Item節點。對Item節點進行解析后，我們會創建一個Item對象的集合，映射到UI界面的ListView上，同時也要分析並保存每一個Item節點的詳細信息，在用戶點擊ListView的具體ListViewItem時，打開詳細頁面填充內容。比如下圖右側的詳細內容較為簡單，僅僅是一副圖片。

　　我們來看一個Item節點的Sample：

    <item>
      <title>日本某高校一男生在女生生日派對上公開表白，青春真好</title>
      <link>http://www.zhangzishi.cc/20160717zh.html</link>
      <comments>http://www.zhangzishi.cc/20160717zh.html#comments</comments>
      <pubDate>Sun, 17 Jul 2016 04:37:46 +0800</pubDate>
      <dc:creator><![CDATA[丁丁]]></dc:creator>
      <category><![CDATA[世界觀]]></category>
      <guid isPermaLink="false">http://www.zhangzishi.cc/?p=178981</guid>
      <description><![CDATA[日本某高校一男生在女生生日派對上公開表白。“在這個世界上我最喜歡的人是你，我會好好珍惜你的。”看得本公舉全程一 [&#8230;]]]></description>
      <content:encoded>
        <![CDATA[<p style="color: #444444;">日本某高校一男生在女生生日派對上公開表白。“在這個世界上我最喜歡的人是你，我會好好珍惜你的。”看得本公舉全程一直傻笑，青春真好啊~</p>
<p><embed width="480" height="480" type="application/x-shockwave-flash" src="http://video.weibo.com/player/1034:7e3df996c2f5e9a1973974f0bb9e5e39/v.swf" allowscriptaccess="always" allowfullscreen="allowfullscreen" wmode="transparent" quality="high"></embed></p>
<p>視頻鏈接：<a style="color: #428bca;" href="http://weibo.com/p/2304447e3df996c2f5e9a1973974f0bb9e5e39" target="_blank">http://weibo.com/p/2304447e3df996c2f5e9a1973974f0bb9e5e39</a><img src="http://cdnjp.zhangzishi.cc/wp-content/uploads/2016/05/024045ftw.jpg" alt="" class="alignnone size-medium wp-image-171793" /></p>
<p>微信訂閱號 zhangzishi_weixin 合作請直接聯系 tintin@zhangzishi.cc</p>
]]>
      </content:encoded>
      <wfw:commentRss>http://www.zhangzishi.cc/20160717zh.html/feed</wfw:commentRss>
      <slash:comments>12</slash:comments>
    </item>

　　很容易就能分析出title，pubDate，description，category這些內容，我們會建立對應的Model對象來存儲相關信息。同時我們也發現，詳細內容放置在<content:encoded>節點，並加了<![CDATA[>>標簽，包含在標簽中的內容會被XML的解析器忽略，當作一般文本處理。所以你會看見content節點中包含了大量的HTML標簽，這些HTML的內容會被作為整體的字符串存儲在Item對象的ContentEncoded屬性中。

    public class Item
    {
        public string Title { get; set; }
        public Uri Link { get; set; }
        public DateTime PublishedDate { get; set; }
        public string Creator { get; set; }
        public string Category { get; set; }
        public string Description { get; set; }
        public string ContentEncoded { get; set; }
        public string CoverImageUri { get; set; } 
    }

　　XML文件的處理，我這里選擇來System.Xml.Linq命名空間下的XDocument類來處理。在獲取rss這個根的XElement后，在channel節點找到Item節點的集合，對Item進行解析：

        private Item ParseItemNode(XElement itemNode)
        {
            var item = new Item();
            item.Title = itemNode.Element("title").Value;
            string uriString = itemNode.Element("link").Value;
            if (string.IsNullOrEmpty(uriString) == false)
            {
                item.Link = new Uri(uriString);
            }
            item.PublishedDate = DateTime.Parse(itemNode.Element("pubDate").Value);

            XNamespace dc = XmlNameSpaceDic["dc"];
            item.Creator = itemNode.Element(dc + "creator").Value;
            item.Category = itemNode.Element("category").Value;
            item.Description = itemNode.Element("description").Value;
            XNamespace content = XmlNameSpaceDic["content"];
            var contentEncoded = itemNode.Element(content + "encoded").Value;
            
            var allImageUri = GetAllImageUri(ref contentEncoded);
            item.CoverImageUri = allImageUri.FirstOrDefault();
            item.ContentEncoded = RemoveEmbedFlash(contentEncoded);
            return item;
        }

　　這里稍微值得注意的是部分節點存在命名空間，在通過Element方法取值的時候，需要加上對應的命名空間才能成功。這里附上一個獲取XML文件頭部定義的命名空間的方法：

        private Dictionary<string, string> GetXmlNameSpaceDic(XElement rssNode)
        {
            var dic = new Dictionary<string, string>();
            foreach (var attribute in rssNode.Attributes().Where(_ => _.IsNamespaceDeclaration))
            {
                dic.Add(attribute.Name.LocalName,attribute.Value);
            }

            return dic;
        }

　　ParseItemNode方法中還做了一件特殊的事情，是去去正文中的圖片地址，因為rss feed沒有提供每條新聞的封面圖片，我這里就通過正則表達式將正文的圖片地址篩選出來，以第一張圖片作為新聞的封面。正則表達式匹配項中有一個Group的概念，可以很好的選出img節點中的src屬性，EditImageUri這個方法是為了給圖片加上width和height更好的適應不同尺寸的屏幕：

        private List<string> GetAllImageUri(ref string content)
        {
            var matchList = new List<string>();
            string pattern = "<img.+?src=[\"'](.+?)[\"'].*?>";

            var regex = new Regex(pattern, RegexOptions.IgnoreCase);
            foreach (Match match in regex.Matches(content))
            {
                var uri = EditImageUri(match.Value);
                if (uri != match.Value)
                {
                    matchList.Add(match.Groups[1].Value);
                    content = content.Replace(match.Value, uri);
                }
            }

            return matchList;
        }

　　大體上RssReader這個類就分析完了，具體的代碼有興趣請去GitHub上查看，如果發現了bug還望不吝賜教，幫我提個pull request，萬分感激。

　　其實這個漲姿勢UWP的APP屬於鬧着玩，網易雲閱讀WP版太簡陋，看起來限制太多，思來想去自己動手豐衣足食，后面還會進一步補充功能，畢竟現在這個版本我用起來也不滿意。

　　GitHub：

https://github.com/manupstairs/ZhangZiShiRSSRead

　　Windows Store：

https://www.microsoft.com/zh-cn/store/p/%e6%b6%a8%e5%a7%bf%e5%8a%bfuwp/9nblggh3zqd1

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 漲姿勢：深入 foreach循環 [UWP]合體姿勢不對的HeaderedContentControl 【CryptoKitties源碼解析】養貓的正確姿勢！解析RSS數據 switch多值匹配騷操作，帶你漲姿勢！ pytorch 學習筆記之編寫 C 擴展,又漲姿勢了 python 爬取喜馬拉雅節目生成RSS Feed RedissonClient獲取鎖源碼解析自己寫http獲取網絡資源和解析json數據 java使用Rome解析Rss的實例（轉）