爬數據的時候免不了需要登錄。每次實驗都要輸入驗證碼是個麻煩的事情,於是就想像瀏覽器一樣把cookies存到文件中,下次重新運行的時候可以直接使用。
Google 搜“C# CookieContainer 存文件”能找到最好的代碼如下:
http://www.huxu.net.cn/2011_03/154.html
- public static List<Cookie> GetAllCookies(CookieContainer cc) {
- List<Cookie> lstCookies = new List<Cookie>();
- Hashtable table = (Hashtable)cc.GetType().InvokeMember("m_domainTable",
- System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField |
- System.Reflection.BindingFlags.Instance, null, cc, new object[] { });
- foreach (object pathList in table.Values) {
- SortedList lstCookieCol = (SortedList)pathList.GetType().InvokeMember("m_list",
- System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField
- | System.Reflection.BindingFlags.Instance, null, pathList, new object[] { });
- foreach (CookieCollection colCookies in lstCookieCol.Values)
- foreach (Cookie c in colCookies) lstCookies.Add(c);
- }
- return lstCookies;
- }
- //存儲
- StringBuilder sbc = new StringBuilder();
- List<Cookie> cooklist = Code.ProgTool.GetAllCookies(CookieContainer);
- foreach (Cookie cookie in cooklist) {
- sbc.AppendFormat("{0};{1};{2};{3};{4};{5}\r\n",
- cookie.Domain, cookie.Name, cookie.Path, cookie.Port,
- cookie.Secure.ToString(), cookie.Value);
- }
- FileStream fs = File.Create("d:\\chinarencookies.txt");
- fs.Close();
- File.WriteAllText("d:\\chinarencookies.txt", sbc.ToString(), System.Text.Encoding.Default);
- //讀取
- string[] cookies = File.ReadAllText("d:\\chinarencookies.txt", System.Text.Encoding.Default)
- .Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
- foreach (string c in cookies) {
- string[] cc = c.Split(";".ToCharArray());
- Cookie ck = new Cookie(); ;
- ck.Discard = false;
- ck.Domain = cc[0];
- ck.Expired = true;
- ck.HttpOnly = true;
- ck.Name = cc[1];
- ck.Path = cc[2];
- ck.Port = cc[3];
- ck.Secure = bool.Parse(cc[4]);
- ck.Value = cc[5];
- CookieContainer.Add(ck);
- }
這種方法需要深入了解cookies的構成,作為模板用還可以接受。IE也用類似的方式存儲cookies。
但這種寫法非常容易出bug,其中的ck.Expired = true;就是一個隱患。
嘗試着又搜了一下“C# save CookieContainer to file”,果然發現了一份更優雅的代碼。
http://stackoverflow.com/questions/1777203/c-writing-a-cookiecontainer-to-disk-and-loading-back-in-for-use
- public static void WriteCookiesToDisk(string file, CookieContainer cookieJar)
- {
- using(Stream stream = File.Create(file))
- {
- try {
- Console.Out.Write("Writing cookies to disk... ");
- BinaryFormatter formatter = new BinaryFormatter();
- formatter.Serialize(stream, cookieJar);
- Console.Out.WriteLine("Done.");
- } catch(Exception e) {
- Console.Out.WriteLine("Problem writing cookies to disk: " + e.GetType());
- }
- }
- }
- public static CookieContainer ReadCookiesFromDisk(string file)
- {
- try {
- using(Stream stream = File.Open(file, FileMode.Open))
- {
- Console.Out.Write("Reading cookies from disk... ");
- BinaryFormatter formatter = new BinaryFormatter();
- Console.Out.WriteLine("Done.");
- return (CookieContainer)formatter.Deserialize(stream);
- }
- } catch(Exception e) {
- Console.Out.WriteLine("Problem reading cookies from disk: " + e.GetType());
- return new CookieContainer();
- }
- }
這里用了C#自帶的序列化功能,幾行代碼就完成了這一任務。把繁瑣的事情全都交給庫自動完成。
C#已經在最近(2012年2月)的編程語言排行榜中排名第三了。被強大的庫函數吸引過來,現在也漸漸感覺到語言本身的優勢了。不能再當C語言用了……試着抽象