MongoDB的數據模型

本文轉載自查看原文 2014-12-16 23:02 2467 學習mongodb

文檔的數據模型代表了數據的組織結構，一個好的數據模型能更好的支持應用程序。在MongoDB中，文檔有兩種數據模型，內嵌（embed）和引用（references）。

內嵌

MongoDB的文檔是無模式的，所以可以支持各種數據結構，內嵌模型也叫做非規格化模型（denormalized）。在MongoDB中，一組相關的數據可以是一個文檔，也可以是組成文檔的一部分。看看下面一張MongoDB文檔中的圖片。

內嵌類型支持一組相關的數據存儲在一個文檔中，這樣的好處就是，應用程序可以通過比較少的的查詢和更新操作來完成一些常規的數據的查詢和更新工作。

根據MongoDB文檔，當遇到以下情況的時候，我們應該考慮使用內嵌類型：

如果數據關系是一種一對一的包含關系，例如下面的文檔，每個人都有一個contact字段來描述這個人的聯系方式。

像這種一對一的關系，使用內嵌類型可以很方便的進行數據的查詢和更新。

{
    "_id": <ObjectId0>,
    "name": "Wilber",
    "contact": {
                        "phone": "12345678",
                        "email": "wilber@shanghai.com"
                    }
}

如果數據的關系是一對多，那么也可以考慮使用內嵌模型。例如下面的文檔，用posts字段記錄所有用戶發布的博客。

在這中情況中，如果應用程序會經常通過用戶名字段來查詢改用戶發布的博客信息。那么，把posts作為內嵌字段會是一個比較好的選擇，這樣就可以減少很多查詢的操作。

{
    "_id": <ObjectId1>,
    "name": "Wilber",
    "contact": {
                        "phone": "12345678",
                        "email": "wilber@shanghai.com"
                    },
    "posts": [
                    {
                        "title": "Indexes in MongoDB",
                        "created": "12/01/2014",
                        "link": "www.blog.com"
                    },
                    {
                        "title": "Replication in MongoDB",
                        "created": "12/02/2014",
                        "link": "www.blog.com"
                    },
                    {
                        "title": "Sharding in MongoDB",
                        "created": "12/03/2014",
                        "link": "www.blog.com"
                    }
                 ]
}

根據上面的描述可以看出，內嵌模型可以給應用程序提供很好的數據查詢性能，因為基於內嵌模型，可以通過一次數據庫操作得到所有相關的數據。同時，內嵌模型可以使數據更新操作變成一個原子寫操作。

然而，內嵌模型也可能引入一些問題，比如說文檔會越來越大，這樣就可能會影響數據庫寫操作的性能，還可能會產生數據碎片（data fragmentation）（即：使用內嵌模型要考慮Document Growth，下面引入MongoDB文檔對Document Grouth的介紹）。另外，MongoDB中會有最大文檔大小限制，所以在使用內嵌類型時還要考慮這點。

Document Growth

Some updates to documents can increase the size of documents. These updates include pushing elements to an array (i.e. $push) and adding new fields to a document. If the document size exceeds the allocated space for that document, MongoDB will relocate the document on disk. Relocating documents takes longer than in place updates and can lead to fragmented storage. Although MongoDB automatically adds padding to document allocations to minimize the likelihood of relocation, data models should avoid document growth when possible.

For instance, if your applications require updates that will cause document growth, you may want to refactor your data model to use references between data in distinct documents rather than a denormalized data model.

引用

相對於嵌入模型，引用模型又稱規格化模型（Normalized data models），通過引用的方式來表示數據之間的關系。

這里同樣使用來自MongoDB文檔中的圖片，在這個模型中，把contact和access從user中移出，並通過user_id作為索引來表示他們之間的聯系。

當我們遇到以下情況的時候，就可以考慮使用引用模型了：

使用內嵌模型往往會帶來數據的冗余，卻可以提升數據查詢的效率。但是，當應用程序基本上不通過內嵌模型查詢，或者說查詢效率的提升不足以彌補數據冗余帶來的問題時，我們就應該考慮引用模型了。
當需要實現復雜的多對多關系的時候，可以考慮引用模型。比如我們熟知的例子，學生-課程-老師關系，如果用引用模型來實現三者的關系，可能會比內嵌模型更清晰直觀，同時會減少很多冗余數據。
當需要實現復雜的樹形關系的時候，可以考慮引用模型。

下面看一個比較有意思的例子，該例子來自MongoDB文檔

很直觀的，我們都會使用父子關系來表示這中樹形結構

那么通過父引用，我們可以通過下面的方式來表示這棵樹

db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "dbm", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )

也可以通過孩子引用

db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "dbm", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "dbm" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )

在MongoDB中，引用又有兩種實現方式，手動引用（Manual references）和DBRefs。

Manual references

像前面的一對多例子，我們可以把use中的name字段保存在post文檔中建立兩者的關系，這樣我們可以通過多次查詢的方式的到我們想要的數據。這種引用方式比較簡單，而且可以滿足大多數的需求。

user document

post document

{

"name": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": "Wilber"

}

注意，手動引用的唯一不足是這種引用沒有指明使用哪個database，哪個collection。如果出現一個collection中的文檔與多個其它collection中的文檔有引用關系，我們可能就要考慮使用DBRefs了。

舉例，假如用戶可以在多個博客平台上發布博客，不同博客平台的數據保存在不同的collection。這種情況使用DBRefs就比較方便了。

user document

Post4CNblog document

Post4CSDN document

Post4ITeye document

{

"name": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Notepad++ configuration",

"created": "12/05/2014",

"link": "www.blog.com",

"author": "Wilber"

}

如果要查詢在CNblog上發布"Replication in MongoDB"的用戶詳細信息，我們可以使用下面語句，通過兩次查詢得到用戶詳細信息

> db.Post4CNblog.find({"title": "Replication in MongoDB"})
{ "_id" : ObjectId("548fe8100c3e84a00806a48f"), "title" : "Replication in MongoDB", "created" : "12/02/2014", "link" : "www.blog.com", "auth
or" : "Wilber" }
> db.user.find({"name":"Wilber"}).toArray()
[
        {
                "_id" : ObjectId("548fe8100c3e84a00806a48d"),
                "name" : "Wilber",
                "gender" : "Male",
                "birthday" : "1987-09",
                "contact" : {
                        "phone" : "12345678",
                        "email" : "wilber@shanghai.com"
                }
        }
]

DBRefs

DBRefs引用通過_id，collection名和database名（可選）來建立文檔之間的關系。通過這種方式，即使文檔分布在多個不同的collection中，也可以被方便的鏈接起來。

DBRefs有特定的格式，會包含下面字段:

$ref：要引用文檔的collection名稱
$id：包含要引用文檔的_id字段
$db（Optional）：要引用的文檔所在的database名稱

舉例，將上面的例子通過DBRefs來實現。注意，這是要把user文檔中的用戶名設置成_id字段。

user document

Post4CNblog document

Post4CSDN document

Post4ITeye document

{

"_id": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Notepad++ configuration",

"created": "12/05/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

同樣查詢在CNblog上發布"Replication in MongoDB"的用戶詳細信息，這樣可以通過一次查詢來完成

> db.Post4CNblog.findOne({"title":"Replication in MongoDB"}).author.fetch()
{
        "_id" : "Wilber",
        "gender" : "Male",
        "birthday" : "1987-09",
        "contact" : {
                "phone" : "12345678",
                "email" : "wilber@shanghai.com"
        }
}
>

總結

通過這篇文章大概認識了MongoDB中的數據模型，不能說內嵌模型和引用模型那個好，關鍵是看應用場景。

還有就是，在使用內嵌模型是一定要注意Document Growth和最大文檔限制。

Ps：例子中所有的命令都可以參考以下鏈接

http://files.cnblogs.com/wilber2013/data_modeling.js

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據模型設計 janusgraph的數據模型數據模型數據模型 Python數據模型 JaunsGraph數據模型不同數據模型之間的同步數據模型的三要素數據模型（維度建模） 1-3 superset數據模型