MongoDB的自定義序列化（Customizing serialization）

本文轉載自查看原文 2012-03-01 00:11 4413 c#/ MongoDB

我最近一直在研究MongoDB，有些小心得。恰好發現原來博客園支持Live writer啊

興奮異常，終於多年以后重回這里。以前一直用liver writer寫 myspace和 wordpress

但是前者完了，后者翻牆很煩。

====================================================

首先推薦一個MongoDB的查詢分析器

MongoVUE

這個工具是非常好用，雖然超過試用期，但是仍然可以使用

只是只能開三個查詢窗口而已。

以前一直使用db4o， protobuf.net ，所以對mongoDB還是很適應的。

因為相似性太大。尤其是對象持久化的方式，細節略微不同而已。

=============================================

1.需求：

我的一個新寫的算法需要讀取一個完整的collection，而這需要幾十秒鍾。

而一開始都是使用特性標注的自動序列化和反序列化，無論用任何方式調整，InsertBatch和

FindAll() 的性能都得不到提高。

2.思考：

我一開始以為讀取速度和保存奇慢無比，是因為mongoDB自己的問題。今天仔細想了想。問題關鍵在於寫入硬盤的數據太多。

mongoDB的數據持久化是以BSON格式的。而這種格式的冗余還是相當大的。尤其是默認序列化和反序列化。

"_id" : ObjectId("4f4e2a02c992571e54c30465"),
"value" : "xxxxx",
"chars" : [{
"words" : [{
"index" : 0,
"length" : 2,
"wordTypes" : 0
}]
}, {
"words" : [{
"index" : 0,
"length" : 2,
"wordTypes" : 0
}, {
"index" : 1,
"length" : 2,
"wordTypes" : 0
}]
},

用mongoVUE查看最終數據格式，發覺主要存儲空間消耗在意義不大的屬性name上。計算一下就可以知道，名稱幾乎是值的5-10倍空間大小。

相比 protobuf，采用數字作為屬性的名稱，就十分節省空間了。

但是mongodb可以檢索字段，而protobuf不可以，所以mongo沒有采用protobuf的方式。

我有一個collection有50000個document，平均一個document 4000byte，這真是令人吃驚的低效持久化啊。怪不得讀取都需要幾十秒鍾。整個數據存儲消耗了200m空間。

由於看過mongoDB的官方文檔

http://www.mongodb.org/display/DOCS/CSharp+Language+Center

所以對Customizing serialization有點印象。

官方文檔描述十分簡略，只說了應該將類繼承IBsonSerializable 接口，然后實現四個方法。但是沒有示例，完全不知道如何具體操作。

public class MyClass : IBsonSerializable { // implement Deserialize method // implement Serialize method }

好吧有google大神在。

stackoverflow是個好網站

http://stackoverflow.com/questions/7105274/storing-composite-nested-object-graph

3.解決：

第一部分：將對象變換成數字，節省名稱和空間消耗

         public UInt32 IntValue
        {
             get
            {
                 var v1 = ((UInt32)WordTypes) << 24;
                 var v2 = ((UInt32)Index) << 16;
                 var v3 = ((UInt32)Length) << 8;
                 var v4 = (UInt32) 0; // 預留

                 return v1 | v2 | v3 | v4;
            }
        }

         public void FromInt32(UInt32 value)
        {
             this.WordTypes = (WordTypes)(value >> 24);
             this.Index = (Int32)(value<< 8 >> 24);
             this.Length = (Int32)(value << 16 >> 24);
        }

以上沒什么好講的，無非左移右移，當然可能會出現數據類型溢出可能，如果有這種情況，換成Int64，或者適當修改。

說明一下，這個三級對象我不准備在mongoDB中檢索字段，而是只用於存儲，至於檢索是變換成另外字符串keyword的方式來檢索。

所以既然不需要檢索，屬性也就根本不需要有name，所以多個屬性可以位或成一個數值，存放到數組中。對象都省了。

第二部分

 
 
 
 
         
  
  
  
           
           public  
           partial  
           class Sentence : IBsonSerializable 
           
    { 
           
 
           
         
           public  
           static  
           int idSum; 
           
         
           public  
           bool GetDocumentId( 
           out  
           object id,  
           out Type idNominalType,  
           out IIdGenerator idGenerator) 
           
        { 
           
            id =  
           this.Id = idSum++; 
           
            idNominalType =  
           typeof( 
           int); 
           
            idGenerator =  
           null; 
           
             
           return  
           true; 
           
        } 
           
 
           
         
           public  
           void Serialize(MongoDB.Bson.IO.BsonWriter bsonWriter, Type nominalType, IBsonSerializationOptions options) 
           
        { 
           
            bsonWriter.WriteStartDocument(); 
           
            bsonWriter.WriteInt32( 
           " 
           _id 
           ",  
           this.Id);   
           // 
           10多個個字節，如果用objectId 
           
            bsonWriter.WriteString( 
           " 
           value 
           ",  
           this.Value); 
           // 
           名稱如果都改用幾個字母可以節省十幾個個字節 
           
            bsonWriter.WriteString( 
           " 
           words 
           ",  
           this.WordStr); 
           
            bsonWriter.WriteBoolean( 
           " 
           isConf 
           ",  
           this.IsConflict); 
           
            bsonWriter.WriteStartArray( 
           " 
           c 
           "); 
           
 
           
             
           foreach ( 
           var item  
           in Chars) 
           
            { 
           
                BsonSerializer.Serialize(bsonWriter, item.Words.Select(v=>v.IntValue).ToList());   
           
            }         
           
 
           
 
           
            bsonWriter.WriteEndArray();             
           
            bsonWriter.WriteEndDocument(); 
           
        } 
           
 
           
         
           public  
           void SetDocumentId( 
           object id) 
           
        { 
           
             
           throw  
           new NotImplementedException(); 
           
        } 
           
 
           
         
           public  
           object Deserialize(MongoDB.Bson.IO.BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options) 
           
        { 
           
             
           // 
           bsonReader.ReadStartDocument();
             
           // 
           this.Id = bsonReader.ReadInt32(); 
             
           // 
           var value=bsonReader.ReadString("v");
             
           // 
           var wordStr=bsonReader.ReadString("w");
             
           // 
           bsonReader.ReadStartArray();

             
           // 
           var list = new List<List<Int32>>();
             
           // 
           while (bsonReader.ReadBsonType() != BsonType.EndOfDocument) 
             
           // 
           { 
             
           // 
               var element = BsonSerializer.Deserialize<List<Int32>>(bsonReader); 
             
           // 
               list.Add(element);             
             
           // 
           }

             
           // 
           bsonReader.ReadEndArray();
             
           // 
           var isConflict=bsonReader.ReadBoolean("i");
             
           // 
           bsonReader.ReadEndDocument(); 
           
 
           
 
           
             
           if (nominalType !=  
           typeof(Sentence)) 
           
                 
           throw  
           new ArgumentException( 
           " 
           不能序列化，因為類型定義不一致 
           "); 
           
             
           var doc = BsonDocument.ReadFrom(bsonReader); 
           
 
           
             
           this.Id = (Int32)doc[ 
           " 
           _id 
           "]; 
           
             
           this.Value = ( 
           string)doc[ 
           " 
           value 
           "]; 
           
             
           this.WordStr = ( 
           string)doc[ 
           " 
           words 
           "]; 
           
             
           this.IsConflict = ( 
           bool)doc[ 
           " 
           isConf 
           "]; 
           
             
           var list = (BsonArray)doc[ 
           " 
           c 
           "]; 
           
 
           
             
           this.Chars =  
           new List<CharObj>(); 
           
             
           for ( 
           int i =  
           0; i < list.Count; i++) 
           
            { 
           
                 
           var ch =  
           new CharObj { Index = i, Sen =  
           this, Words= 
           new List<WordObj>() }; 
           
                 
           this.Chars.Add(ch); 
           
 
           
                 
           var words = (BsonArray)list[i]; 
           
 
           
                 
           foreach (Int32 item  
           in words) 
           
                { 
           
                     
           var wordObj =  
           new WordObj((UInt32)item); 
           
                    wordObj.Sen =  
           this; 
           
                    ch.Words.Add(wordObj); 
           
                }               
           
            } 
           
 
           
 
           
             
           return  
           this; 
           
             
           // 
           return new Sentence { Id=1,  IsConflict= true, Value="1", WordStr= "1"}; 
           
        } 
           
 
           
    }

主要有幾個注意地方：

一個是Id的生成。我有點不明白為什么id賦值函數要弄的那么復雜的參數，但是這樣可以繞過ObjectID的 guid式的id，使用int可以節省一些空間。

當然，如果整體對象比較大，還是用objectID吧。完全沒必要用int，int也有很多問題，需要保存最大值在另外的collection，沒法像ObjectId一樣跨多個Collection。所以mongoDB設計Id 用ObjectId而不是int，是非常有道理的。如果對象整體比較大，還是沒必要節省這十幾個字節的消耗。

二是Serialize 方法的實現中，必須要以bsonWriter.WriteStartDocument()開始 bsonWriter.WriteEndDocument() 結束，切記，否則會報出一個沒法write的錯誤。

三是如何對二層的集合進行寫入，我原來是這樣寫的

 
 
 
 
         
  
  
  
          
                         
           foreach ( 
           var item  
           in Chars) 
           
            { 
           
                bsonWriter.WriteStartArray( 
           " 
           words 
           "); 
           
                 
           foreach ( 
           var w  
           in item.Words) 
           
                    bsonWriter.WriteInt32((Int32)w.IntValue); 
           
                bsonWriter.WriteEndArray(); 
           
            }

但是mongoDB不支持這種嵌套式的持久化。

必須改成

 
 
 
 
         
  
  
  
          
                         
           foreach ( 
           var item  
           in Chars) 
           
            { 
           
                BsonSerializer.Serialize(bsonWriter, item.Words.Select(v=>v.IntValue).ToList());   
           
            }

那個注意雖然 BsonSerializer.Serialize的參數是一個IEnumerable<T> 但是必須要ToList，否則不會保存成功數據

第四，反序列化的時候不能直接用start end方式，必然會報錯，只能先一次讀取，再取字典值

4.對比

新的bson格式的存儲比較緊湊了。

對比原來的，差距非常明顯。

用mongoVUE 查看平均 document大小，平均只有364byte了。原來可是嚇死人的4000

而合計Size也從200m下降到17m

而耗時用我筆記本，耗時大概9秒鍾。原來40秒以上。而用台式機硬盤快，可以快幾倍，幾秒鍾內搞定。

5.其他

其實為什么要實現自定義的持久化方法，一當然是性能十分的讓人憂慮。第二個則是對象關聯指針的重新綁定問題。

原來從數據庫讀取的數據，需要手工恢復相互關聯的指針，現在可以在反序列化函數中直接完成這個操作。

也就是說，一旦查詢出來的對象，都已經和內存對像一摸一樣了。

好處是大大降低了程序的復雜度。

使用mongoDB數據對象，猶如內存對象一樣進行指針操作。然后自動永久化數據。

呃。我發覺愛上mongoDB了。雖然它還有不少缺點。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Qt 自定義序列化 java自定義序列化 System.Web.HttpException: 無法序列化會話狀態。在“StateServer”或“SQLServer”模式下，ASP.NET 將序列化會話狀態對象，因此不允許使用無法序列化的對象或 MarshalByRef 對象。如果自定義會話狀態存儲在“Custom”模式下執行了類似的序列化，則適用同樣的限制。 ---> System.Runtime.Serialization.Seria QDataStream(自定義數據結構序列化) JS | TS 自定義 object 的 json 序列化 Hive中自定義序列化器（帶編碼） Jackson自定義序列化器 Serializer自定義序列化(了解) .Net Core 自定義序列化格式用自定義注解實現fastjson序列化的擴展