注:文中的方法非常簡單,使用MySQL導出表到XML,發送郵件到你的為知/Evernote郵箱賬戶即可,唯一目前沒找到破的是SMTP常隨機無響應.. 而且象GMAIL的SMTP有發送次數限制(一天一千左右吧,所以如果有超過的話可能需要更換賬號再試,當然你也可以改良下,做個數組,遇到配額錯誤可以提取下一個SMTP信息),優點是利用郵件的方式可以將博客中的圖片也能導入到筆記中(而不是用的圖片原鏈接,是直接放進筆記)
日志導入Wiz的結果:
一、目的
很簡單,我擔心博客放在網上有一天會不見了,所以一直想存到本地,博客太多,手動已不可能
- 我這些年使用wordpress有上千篇博客,其中有8成都是私有,使用現在的一些讀取FEED方式導出的方式不能讀取私有和保護博客,Wordpress也有導出插件,我沒有試太多,有一個導出xml的工具采用流式試過十多次沒有一次是完整導出,有時導出100來K,有時幾M,最后放棄使用WP插件導出。
- 數據源:使用MySQL導出XML。
首先原因是空間提供商不提供MySQL遠程連接,(如果你能導出SQL在本地建MySQL再導入,就不在這個話題范圍內)。我原本想導出為SQL,再轉換為MSSQL,可在網上找的所有converter都是必須登陸遠程MySQL服務器,這條路徑也行不通。導出的CSV為亂碼,試過UTF8, GB2312都不行,這條路徑也行不通。最后發現只有導出XML時沒有亂碼。這就簡單多了。 - 使用C#讀取XML時,有幾類重要的信息是個人覺得必需的
- 博客表 wp_posts
- 類別(category / tag等) 表 wp_term_taxonomy, wp_terms wp_term_relationships
SQL:
-- 列出全部日志 --
create view v_post as
SELECT p.*, tax.term_taxonomy_id,tax.term_id,category.name, tax.count
FROM
wp_term_relationships relation,
wp_terms category,
wp_term_taxonomy tax,
wp_posts p
WHERE
category.term_id = tax.term_id and
tax.term_taxonomy_id = relation.term_taxonomy_id and
relation.object_id = p.id
二、操作步驟
- 登陸你空間的phpAdmin,選中日志數據庫后,選擇Export,框選的是需要注意的,導出時不要選擇導出TABLE/VIEW等SQL語句。UTF-8必選否則會有亂碼
得到導出的XML將作為我們的數據源。請一定確保下載下來的XML是有效的(我剛試過,這一種也有可能會下載不完整,無解 /攤手) - 跑日志的運行結果
- 代碼
using System; using System.Linq; using System.Net.Mail; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; using System.Xml.Linq; using System.Xml.XPath; namespace WordpressExport { class Program { static SmtpClient smtpClient; static MailMessage mailMessage; static IOrderedEnumerable<post> list; static bool smtpConnected; static bool triedAfterEx = false; readonly static string CONFIG_smtp_addr = System.Configuration.ConfigurationSettings.AppSettings["smtp_addr"]; readonly static string CONFIG_smtp_acct_name = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_name"]; readonly static string CONFIG_smtp_acct_pwd = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_pwd"]; readonly static string CONFIG_xml_path = System.Configuration.ConfigurationSettings.AppSettings["xml_path"]; readonly static string CONFIG_evernote_folder = System.Configuration.ConfigurationSettings.AppSettings["evernote_folder"]; readonly static string CONFIG_post_scope_start_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_start_date"]; readonly static string CONFIG_post_scope_end_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_end_date"]; readonly static string CONFIG_notebook_email = System.Configuration.ConfigurationSettings.AppSettings["notebook_email"]; readonly static string CONFIG_blog_addr = System.Configuration.ConfigurationSettings.AppSettings["blog_addr"]; static void SendMail(post post) { bool isSuccess = true; if (post == null) return; DateTime post_date; if (!DateTime.TryParse(post.post_date, out post_date)) { post_date = System.DateTime.MinValue; } string str_post_date = post_date == System.DateTime.MinValue ? "" : post_date.ToString("yyyy-MM-dd"); mailMessage.SubjectEncoding = Encoding.UTF8; mailMessage.Subject = string.Format("[{0}] {1} {2}", str_post_date, post.post_title, CONFIG_evernote_folder);//主題 mailMessage.Body = "<b>創建時間:</b>" + post.post_date + "<br/>";//內容 mailMessage.Body += "<b>原目錄或tag</b>:" + post.post_tagcat + "<br/>";//內容 mailMessage.Body += string.Format("<b>原文見</b>:<a href={0}?p={1}>{0}?p={1}</a></a><br/><br/><br/>", CONFIG_blog_addr, post.ID);//內容 mailMessage.Body += post.post_content; try { mailMessage.BodyEncoding = Encoding.UTF8;//正文編碼 mailMessage.Priority = MailPriority.High;//優先級 mailMessage.IsBodyHtml = true; Regex reg = new Regex(@"\n"); mailMessage.Body = reg.Replace(mailMessage.Body, "<br/>"); Console.WriteLine(System.DateTime.Now + " sending mail... id = " + post.ID + " " + post.post_title); // 你也可以使用異步發送,不過會導致發到郵箱的時間很亂,我們還是希望導入到筆記后的筆記自然順序(即ID)是按時間順序的 // smtpClient.Send(mailMessage);// 發送郵件 } catch (SmtpException ex) { Console.WriteLine("failed in SMTP connection, try again..."); // System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the // software in your host machine. // ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine if (triedAfterEx) { isSuccess = false; Console.WriteLine(ex.Message); } // again, create another smtp instance for reconnecting SMTP // Parallel.Invoke(() => connectSmtp(), () => SendMail (post)); isSuccess = false; Console.WriteLine(ex.Message); triedAfterEx = true; } catch (Exception ex2) { isSuccess = false; Console.WriteLine(ex2.Message); } Console.WriteLine(System.DateTime.Now.ToString() + (isSuccess ? " completed" : " completed with error(s): log's date" + post.post_date));//下次從這個時間戳開始導入 } static void Main(string[] args) { smtpConnected = false; Console.WriteLine(System.DateTime.Now + " start..."); Parallel.Invoke(() => connectSmtp(), () => HandleWPExport()); if (list.Count() == 0) { throw new Exception("ooops.."); } int cnt = 0; while (1 == 1) { if (smtpConnected = true || cnt > 10000) break; cnt++; } Console.WriteLine(cnt); foreach (var p in list) { triedAfterEx = false; SendMail(p); } } private static void connectSmtp() { smtpClient = new SmtpClient(); smtpClient.DeliveryMethod = SmtpDeliveryMethod.Network; //指定電子郵件發送方式 smtpClient.Host = CONFIG_smtp_addr; //指定SMTP服務器 smtpClient.EnableSsl = true; smtpClient.Credentials = new System.Net.NetworkCredential(CONFIG_smtp_acct_name, CONFIG_smtp_acct_pwd); smtpClient.Timeout = 100000; mailMessage = new MailMessage(CONFIG_smtp_acct_name, CONFIG_notebook_email); } private static void HandleWPExport() { Console.WriteLine(System.DateTime.Now + " starting to read wp_posts.xml..."); XDocument xmlDoc = XDocument.Load(CONFIG_xml_path); var queryTax = (from tax in xmlDoc.XPathSelectElements(@".//table[@name='wp_term_taxonomy']") where tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.category.ToString() || tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.link_category.ToString() || tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.post_tag.ToString() select new { term_id = tax.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()), tax_id = tax.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray()) }).ToList(); var queryCat = (from cat in xmlDoc.XPathSelectElements(@".//table[@name='wp_terms']") select new { term_id = cat.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()), name = cat.XPathSelectElement("column[@name='name']").Value.Trim("\n".ToCharArray()), }).ToList(); var queryRel = (from rel in xmlDoc.XPathSelectElements(@".//table[@name='wp_term_relationships']") select new { object_id = rel.XPathSelectElement("column[@name='object_id']").Value.Trim("\n".ToCharArray()), tax_id = rel.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray()), }).ToList(); Console.WriteLine(System.DateTime.Now + " continuing ... "); var queryTagCat = (from tax in queryTax from cat in queryCat from rel in queryRel where cat.term_id == tax.term_id && tax.tax_id == rel.tax_id select new { name = cat.name, id = rel.object_id }).ToList(); var query = from p in xmlDoc.XPathSelectElements(@".//table[@name='wp_posts' ]") where p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()) == "post" // there're two types - post, attachment, we don't want attachment type of posts select new post { ID = p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()), post_author = p.XPathSelectElement("column[@name='post_author']").Value.Trim("\n".ToCharArray()), post_date = p.XPathSelectElement("column[@name='post_date']").Value.Trim("\n".ToCharArray()), post_date_gmt = p.XPathSelectElement("column[@name='post_date_gmt']").Value.Trim("\n".ToCharArray()), post_content = p.XPathSelectElement("column[@name='post_content']").Value.Trim("\n".ToCharArray()), post_title = p.XPathSelectElement("column[@name='post_title']").Value.Trim("\n".ToCharArray()), post_excerpt = p.XPathSelectElement("column[@name='post_excerpt']").Value.Trim("\n".ToCharArray()), post_status = p.XPathSelectElement("column[@name='post_status']").Value.Trim("\n".ToCharArray()), comment_status = p.XPathSelectElement("column[@name='comment_status']").Value.Trim("\n".ToCharArray()), ping_status = p.XPathSelectElement("column[@name='ping_status']").Value.Trim("\n".ToCharArray()), post_password = p.XPathSelectElement("column[@name='post_password']").Value.Trim("\n".ToCharArray()), post_name = p.XPathSelectElement("column[@name='post_name']").Value.Trim("\n".ToCharArray()), to_ping = p.XPathSelectElement("column[@name='to_ping']").Value.Trim("\n".ToCharArray()), pinged = p.XPathSelectElement("column[@name='pinged']").Value.Trim("\n".ToCharArray()), post_modified = p.XPathSelectElement("column[@name='post_modified']").Value.Trim("\n".ToCharArray()), post_modified_gmt = p.XPathSelectElement("column[@name='post_modified_gmt']").Value.Trim("\n".ToCharArray()), post_content_filtered = p.XPathSelectElement("column[@name='post_content_filtered']").Value.Trim("\n".ToCharArray()), post_parent = p.XPathSelectElement("column[@name='post_parent']").Value.Trim("\n".ToCharArray()), guid = p.XPathSelectElement("column[@name='guid']").Value.Trim("\n".ToCharArray()), menu_order = p.XPathSelectElement("column[@name='menu_order']").Value.Trim("\n".ToCharArray()), post_type = p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()), post_mime_type = p.XPathSelectElement("column[@name='post_mime_type']").Value.Trim("\n".ToCharArray()), comment_count = p.XPathSelectElement("column[@name='comment_count']").Value.Trim("\n".ToCharArray()), post_tagcat = string.Join(" ", (from t in queryTagCat where t.id == p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()) select t.name).ToList()) }; Console.WriteLine(System.DateTime.Now + " done with reading the xml... "); // PredicateBuilder if (!string.IsNullOrEmpty(CONFIG_post_scope_start_date)) query = query.Where(o => Convert.ToDateTime(o.post_date) >= DateTime.Parse(CONFIG_post_scope_start_date)); if (!string.IsNullOrEmpty(CONFIG_post_scope_end_date)) query = query.Where(o => Convert.ToDateTime(o.post_date) <= DateTime.Parse(CONFIG_post_scope_end_date)); list = query.ToList().OrderBy(o => o.post_date); } } }
DTO類using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Serialization; namespace WordpressExport { [Serializable] public class column { public string name { get; set; } public string text { get; set; } } [Serializable] public class database { [XmlElement(ElementName = "database")] public List<table> tables { get; set; } } [Serializable] public class table { public string name { get; set; } public column column { get; set; } } [Serializable] public class post { public string ID { get; set; } public string post_author { get; set; } public string post_date { get; set; } public string post_date_gmt { get; set; } public string post_content { get; set; } public string post_title { get; set; } public string post_excerpt { get; set; } public string post_status { get; set; } public string comment_status { get; set; } public string ping_status { get; set; } public string post_password { get; set; } public string post_name { get; set; } public string to_ping { get; set; } public string pinged { get; set; } public string post_modified { get; set; } public string post_modified_gmt { get; set; } public string post_content_filtered { get; set; } public string post_parent { get; set; } public string guid { get; set; } public string menu_order { get; set; } public string post_type { get; set; } public string post_mime_type { get; set; } public string comment_count { get; set; } public string post_tagcat { get; set; } } public enum TaxonomyEnum { category, link_category, post_format, post_tag, series } }
app.config<?xml version="1.0"?> <configuration> <startup><supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/></startup> <appSettings> <add key="smtp_addr" value="smtp.gmail.com"/> <add key="smtp_acct_name" value="@gmail.com"/> <add key="smtp_acct_pwd" value=""/> <add key="evernote_folder" value="@folder..."/> <add key="xml_path" value="xxxxx\wp_posts.xml"/> <add key="notebook_email" value="xxxx@mywiz.cn"/> <add key="post_scope_start_date" value="2013-04-11 22:25:50"/> <add key="post_scope_end_date" value=""/> <add key="blog_addr" value="http://www.xxxx.com/"/> </appSettings> </configuration>
三、可改善的地方
- Mail body頂部,我加入了日志詳情
- 我沒有將日志放進tag,你可以改一下代碼使之subject上加入tag (for evernote, 加入#tag1 tag2等,自行查下規則)
- 唯一目前沒找到破的是SMTP常隨機無響應.. 你如果找到可以破的(比如CATCH到IOExeption重連的),麻煩告訴我
- GMAIL的SMTP有發送次數限制(一天一千左右吧,所以如果有超過的話可能需要更換賬號再試,當然你也可以改良下,做個數組,遇到配額錯誤可以提取下一個SMTP信息)
四、關於為知Wiz
- 個人覺得Wiz是個還有很多提升空間的、自定義性比較強的國產筆記類軟件,記得用Wiz的話一定要注意給你不願意別人看到的條目加個密,不然按Wiz的目錄式的HTML文件在電腦中完全沒有秘密可言,知道目錄的無需登陸就能打加所有日志。
- 沒有Wiz賬號的,在這順便遞送個邀請碼吧,6d485186 (http://www.wiz.cn/i/6d485186),據說用了注冊后第一個月有VIP試用,當然你不用也行。
- 比較起Evernote來說,好處就是日志的空間比較大,插件化讓它比較有趣味。 比如它可以導出CHM,可以把以往的日志全部轉成一個漂亮的CHM。它的搜索體驗我覺得比Evernote更好,全文檢索的index使得搜索特別快。再多的就不介紹了,敢興趣的可以自行了解。