MongoDB 數據庫 — 查詢方法

本文轉載自查看原文 2019-06-17 15:40 1545 MongoDB 聚合/ MongoDB/ MongoDB 管道

接上篇，本篇專門整理 MongoDB 查詢方法。

4 基本查詢

你可以在數據庫中使用 find 或者 findOne 函數來執行專門的查詢；
你可以查詢范圍、集合、不等式，也可以使用 $-條件 執行更多的操作；
查詢結果是一個數據庫游標（cursor），當需要的時候返回你需要的文檔。
你可以在 cursor 上執行許多元操作（metaoperations），包括 skipping 一定數量的結果，limiting 返回結果的數量，和 sorting 結果。

4.1 find

# find 的第一個參數指定了查詢准則
db.users.find()  #  匹配集合中的所有文檔
db.users.find({"age": 27})

# 多條件查詢可以通過增加更多的 key/value 對，可以解釋為 *condition1* AND *condition2* AND ... AND *conditionN*
db.users.find({"username": "joe", "age": 27)

指定返回的鍵

find（或者 findOne）的第二個參數指定返回的鍵，雖然 "_id" 鍵沒有被指定，但是默認返回。也可以指定需要排除的 key/value 對。

# "_id" 鍵默認返回
> db.users.find({}, {"username": 1, "email": 1})

# 結果
{
    "_id" : ObjectId("4ba0f0dfd22aa494fd523620"),
    "username" : "joe",
    "email" : "joe@example.com"
}

# 阻止 "_id" 鍵返回
> db.users.find({}, {"username": 1, "email": 1, "_id": 0})

# 結果
{
    "username" : "joe",
    "email" : "joe@example.com"
}

限制（Limitations）

數據庫所關心的查詢文檔的值必須是常量，也就是不能引用文檔中其他鍵的值。例如，要想保持庫存，有原庫存 "in_stock" 和 "num_sold" 兩個鍵，我們不能像下面這樣比較兩者的值：

> db.stock.find({"in_stock" : "this.num_sold"})  // doesn't work

4.2 查詢准則（Criteria）

查詢條件

比較操作："$\$$lt"，"$\$$lte"，"$\$$gt"，"$\$$gte"，"$\$$ne"

> db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

> db.users.find({"username" : {"$ne" : "joe"}})

OR 查詢

MongoDB 中有兩種 OR 查詢。"$\$$in" 用作對一個 key 查詢多個值；"$\$$or" 用作查詢多個 keys 給定的值。

# 單個鍵，一種類型的值
> db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})

# 單個鍵，多種類型的值
> db.users.find({"user_id" : {"$in" : [12345, "joe"]})

# "$nin" 
> db.raffle.find({"ticket_no" : {"$nin" : [725, 542, 390]}})

# 多個鍵
> db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})

# 多個鍵，帶有條件
> db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}}, {"winner" : true}]})

$not

"$\$$not" 是元條件句，可以用在任何條件之上。例如，取模運算符 "$\$$mod" 會將查詢的值除以第一個給定的值，若余數等於第二個給定值，則返回該結果：

db.users.find({"id_num" : {"$mod" : [5, 1]}})

上面的查詢會返回 "id_num" 值為 1、6、11、16 等的用戶，但要返回 "id_num" 為 2、3、4、6、7、8 等的用戶，就要用 "$not" 了：

> db.users.find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})

"$not" 與正則表達式聯合使用的時候極為有用，用來查找那些與特定模式不符的文檔。

條件句的規則

比較更新修改器和查詢文檔，會發現以 $ 開頭的鍵處在不同的位置。條件句是內層文檔的鍵，而修改器是外層文檔的鍵。可以對一個鍵應用多個條件，但是一個鍵不能對應多個修改器。


> db.users.find({"age" : {"$lt" : 30, "$gt" : 20}})

# 修改了年齡兩次，錯誤
> db.users.find({"$inc" : {"age" : 1}, "$set" : {age : 40}})

但是，也有一些元操作可以用在外層文檔："$\$$and"，"$\$$or"，和 "$\$$nor"：

> db.users.find({"$and" : [{"x" : {"$lt" : 1}}, {"x" : 4}]})

看上去這個條件相矛盾，但是如果 x 是一個數組的話：{"x" : [0, 4]} 是符合的。

4.3 特定類型的查詢

null

null 表現起來有些奇怪，它不但匹配它自己，而且能匹配 "does not exist"，所以查詢一個值為 null 的鍵，會返回缺乏那個鍵的所有文檔。

> db.c.find()

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

> db.c.find({"y" : null})

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }

> db.c.find({"z" : null})

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null } 
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

如果我們想要找到值為 null 的鍵，我們可以使用 "$\$$exists" 檢查鍵是 null，並且存在。如下，不幸的是，沒有 "$\$$eq"操作，但是帶有一個元素的 "$\$$in" 和它等價。

> db.c.find({"z" : {"$in" : [null], "$exists" : true}})

正則表達式

MongoDB 使用 $regex 操作符來設置匹配字符串的正則表達式。
MongoDB 使用 Perl Compatible Regular Expression (PCRE) 庫匹配正則表達式；任何在 PCRE 中可以使用的正則表達式語法都可以在 MongoDB 中使用。在使用正則表達式之前可以先在 JavaScript shell 中檢查語法，看是否是你想要匹配的。

# 文檔結構
{
   "post_text": "enjoy the mongodb articles on runoob",
   "tags": [
      "mongodb",
      "runoob"
   ]
}

# 使用正則表達式查找包含 runoob 字符串的文章
> db.posts.find({post_text:{$regex:"runoob"}})

# 以上操作還可以表示如下
> db.posts.find({post_text:/runoob/})

# 不區分大小寫的正則表達式
# 如果為 $options: "$s"，表示允許點字符（dot character，即 .）匹配包括換行字符（newline characters）在內的所有字符
> db.posts.find({post_text:{"$regex":"runoob", "$options":"$i"}})
# 或者 
> db.posts.find({post_text: /runoob/i})

查詢內嵌文檔

有兩種方式查詢內嵌文檔：查詢整個文檔，或者只針對它的鍵/值對進行查詢。

{
    "name" : {
            "first" : "Joe",
            "last" : "Schmoe"
            },
     "age" : 45
}

可以使用如下方式進行查：

> db.people.find({"name" : {"first" : "Joe", "last" : "Schmoe"}})

但是，對於子文檔的查詢必須精確匹配子文檔。如果 Joe 決定去添加一個中間的名字，這個查詢就將不起作用，這類查詢也是 order-sensitive 的，{"last" : "Schmoe", "first" : "Joe"} 將不能匹配。

僅僅查詢內嵌文檔特定的鍵通常是一個好主意。這樣的話，如果你的數據模式改變，也不會導致所有查詢突然失效，因為他們不再是精確匹配。可以通過使用 dot-記號 查詢內嵌的鍵：

> db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})

當文檔更加復雜的時候，內嵌文檔的匹配有些技巧。例如，假設有博客文章若干，要找到由 Joe 發表的 5 分以上的評論。博客文章的結構如下所示：

> db.blog.find()
{
        "content" : "...",
        "comments" : [
            {
                "author" : "joe",
                "score" : 3,
                "comment" : "nice post"
            },
            {
                "author" : "mary",
                "score" : 6,
                "comment" : "terrible post"
            }
    ]
}

查詢的方式如下：

# 錯誤，內嵌的文檔必須匹配整個文檔，這個沒有匹配 "comment" 鍵
> db.blog.find({"comments" : {"author" : "joe", "score" : {"$gte" : 5}}})

# 錯誤，因為符合 author 條件的評論和符合 score 條件的評論可能不是同一條評論
> db.blog.find({"comments.author" : "joe", "comments.score" : {"$gte" : 5}})

# 正確，"$elemMatch" 將限定條件進行分組，僅當對一個內嵌文檔的多個鍵進行操作時才會用到
> db.blog.find({"comments" : {"$elemMatch" : {"author" : "joe", "score" : {"$gte" : 5}}}})

5 聚合（aggregation）

將數據存儲在 MongoDB 中后，我們就可以進行檢索，然而，我們可能想在它上面做更多的分析工作。

5.1 聚合框架（The aggregation framework）

聚合框架可以在一個集合中轉化（transform）和混合（combine）文檔。基本的，你可以通過幾個創建模塊（filtering, projecting, grouping, sorting, limiting, and skipping）來建立處理一批文檔的管道。

例如，如果有一個雜志文章的集合，你可能想找出誰是最多產的作者。假設每一篇文章都作為一個文檔存儲在 MongoDB 中，你可以通過以下幾步來創建一個管道：

# 1. 將每篇文章文檔的作者映射出來
{"$project" : {"author" : 1}}

# 2. 通過名字將作者分組，統計文檔的數量
{"$group" : {"_id" : "$author", "count" : {"$sum" : 1}}}

# 3. 通過文章數量，降序排列作者
{"$sort" : {"count" : -1}}

# 4. 限制前5個結果
{"$limit" : 5}

# 在 MonoDB 中，將每個操作傳遞給 aggregate() 函數
> db.articles.aggregate( {"$project" : {"author" : 1}},
        {"$group" : {"_id" : "$author", "count" : {"$sum" : 1}}},
        {"$sort" : {"count" : -1}},
        {"$limit" : 5}
        )

# 輸出結果，返回一個結果文檔數組
    {
        "result" : [ 
            {
                "_id" : "R. L. Stine",
                "count" : 430
            },
            {
                "_id" : "Edgar Wallace",
                "count" : 175
            },
            {
                "_id" : "Nora Roberts",
                "count" : 145
            },
            {
                "_id" : "Erle Stanley Gardner",
                "count" : 140
            },
            {
                "_id" : "Agatha Christie",
                "count" : 85 
            }
       ],
        "ok" : 1 
    }

注：aggregate 框架不會寫入到集合，所以所有的結果必須返回客戶端。因此，aggregation 返回的數據結果限制在 16MB。

5.2 管道操作（Pipeline Operations）

$match

$\$$match 過濾文檔，以致於你可以在文檔子集上運行聚合操作。通常，盡可能的將 "$\$$match" 操作放到管道操作的前面。這樣做主要有兩個優點：1. 可以快速過濾掉不需要的文檔（留下管道操作需要執行的文檔），2. 可以在 projections 和 groupings 之前使用 indexes 查詢。

$project

映射在管道中操作比在“標准的”查詢語言中（find函數的第二個參數）更加強有力。

# 映射，"_id" 總是默認返回，此處指定不返回
> db.articles.aggregate({"$project" : {"author" : 1, "_id" : 0}})

# 重命名被映射的域 "_id"
> db.users.aggregate({"$project" : {"userId" : "$_id", "_id" : 0}})

# 如果 originalFieldname 是索引，則在重命名之后就不再默認為索引了
> db.articles.aggregate({"$project" : {"newFieldname" : "$originalFieldname"}},
       {"$sort" : {"newFieldname" : 1}})

"$\$$fieldname" 語法被用來在 aggregation framework 中引用 fieldname 的值。比如上面例子中，"$\$$_id" 將會被 _id 域的內容取代。當然，如果重命名了，則就不要返回兩次了，正如上例所示，當 "_id" 被重命名之后就不再返回。

管道表達式

最簡單的 "$project" 表達式是包含、排除和域名重命名。也可以使用其它的表達式。

數學表達式

"$\$$add", "$\$$subtract", "$\$$multiply", "$\$$divide", "$\$$mod"

# 域 "salary" 和域 "bonus" 相加
> db.employees.aggregate(
      {
            "$project" : {
                    "totalPay" : {
                            "$add" : ["$salary", "$bonus"]
                      }
               }
       })

# "$subtract" 表達式，減掉 401k 
 > db.employees.aggregate(
        {
            "$project" : {
                    "totalPay" : {
                            "$subtract" : [{"$add" : ["$salary", "$bonus"]}, "$401k"]
                      }
                }
        })

日期表達式

aggregation 有一個可以提取日期信息的表達式集合： "$\$$year"， "$\$$month"， "$\$$week"，"$\$$dayOfMonth"， "$\$$dayOfWeek"， "$\$$dayOfYear"， "$\$$hour"， "$\$$minute" 和 "$\$$second"。

# 返回每個員工被雇佣的月
> db.employees.aggregate(
       {
            "$project" : {
                    "hiredIn" : {"$month" : "$hireDate"}
            }
        })

# 計算員工在公司工作的年數
> db.employees.aggregate(
        {
            "$project" : {
                    "tenure" : {
                            "$subtract" : [{"$year" : new Date()}, {"$year" : "$hireDate"}] }
                     }
              }
         }

字符串表達式

"$substr" : [expr, startOffset, numToReturn] 返回第一個參數的子串，起始於第 startOffset 個字節，包含 numToReturn 個字節（注意，這個以字節測量，而不是字符，所以多字節編碼需要小心）。
"$concat" : [expr1[, expr2, ..., exprN]] 連接每一個給定的字符串。
"$toLower" : expr 以小寫的形式返回字符串。
"$toUpper" : expr 以大寫的形式返回字符串。

> db.employees.aggregate(
        {
                "$project" : {
                        "email" : {
                                "$concat" : [
                                        {"$substr" : ["$firstName", 0, 1]},
                                        ".",
                                        "$lastName",
                                        "@example.com"
                                  ] 
                            }
                    }
        })

邏輯表達式

比較表達式

"$cmp" : [expr1, expr2] 比較表達式 expr1 和 expr2，如果相等返回 0，如果 expr1 小於 expr2 返回負值，如果 expr2 小於 expr1 返回正值。
"$strcasecmp" : [string1, string2] 比較 string1 和 string2，必須為羅馬字符。
"$\$$eq"，"$\$$ne"， "$\$$gt"， "$\$$gte"， "$\$$lt"， "$\$$lte" : [expr1, expr2]

布爾表達式：

"$and" : [expr1[, expr2, ..., exprN]]
"$or" : [expr1[, expr2, ..., exprN]]
"$not" : expr

控制語句：

"$cond" : [booleanExpr, trueExpr, falseExpr] booleanExpr 為 true 時返回 trueExpr，否則返回 falseExpr。
"$ifNull" : [expr, replacementExpr] 如果 expr 為空返回 replacementExpr，否則返回 expr。

一個例子

> db.students.aggregate(
        {
                "$project" : {
                        "grade" : {
                                "$cond" : [
                                        "$teachersPet",
                                        100, // if
                                        {     // else
                                                "$add" : [
                                                        {"$multiply" : [.1, "$attendanceAvg"]},
                                                        {"$multiply" : [.3, "$quizzAvg"]},
                                                        {"$multiply" : [.6, "$testAvg"]}
                                                    ]
                                         }
                                  ]
                           }
                    }
            })

$group

算數操作符

# 在多個國家銷售數據的集合，計算每個國家的總收入
> db.sales.aggregate(
        {
                "$group" : {
                        "_id" : "$country",
                        "totalRevenue" : {"$sum" : "$revenue"}
                }
        })

# 返回每個國家的平均收入和銷售的數量
> db.sales.aggregate(
        {
                "$group" : {
                        "_id" : "$country",
                        "totalRevenue" : {"$average" : "$revenue"},
                        "numSales" : {"$sum" : 1}
                }
        })

極端操作符（Extreme operators）

如果你的數據已經排序好了，使用 $\$$first 和 $\$$last 比 $\$$min 和 $\$$max 更有效率。如果數據事先沒有排序，則使用 $\$$min 和 $\$$max 比先排序然后 $\$$first 和 $\$$last 更有效率。

# 在一次測驗中學生分數的集合，找出每個年級的局外點
> db.scores.aggregate(
        {
                "$group" : {
                        "_id" : "$grade",
                        "lowestScore" : {"$min" : "$score"},
                        "highestScore" : {"$max" : "$score"}
                }
        }

# 或者
> db.scores.aggregate(
        {
                "$sort" : {"score" : 1}
        },
        {
                "$group" : {
                "_id" : "$grade",
                "lowestScore" : {"$first" : "$score"},
                "highestScore" : {"$last" : "$score"}
                }
        })

數組操作符（Array operators）

"$addToSet": expr 保持一個數組，如果 expr 不在數組中，添加它。每一個值在數組中最多出現一次，不一定按照順序。
"$push": expr 不加區分的將每一個看到的值添加到數組，返回包含所有值得數組。

$unwind（展開）

unwind 將數組的每個域轉化為一個單獨的文檔。例如，如果我們有一個有多條評論的博客，我們可以使用 unwind 將每個評論轉化為自己的文檔。

> db.blog.findOne()
        {
                "_id" : ObjectId("50eeffc4c82a5271290530be"),
                "author" : "k",
                "post" : "Hello, world!",
                "comments" : [
                        {
                                "author" : "mark",
                                "date" : ISODate("2013-01-10T17:52:04.148Z"),
                                "text" : "Nice post"
                        },
                        {
                                "author" : "bill",
                                "date" : ISODate("2013-01-10T17:52:04.148Z"),
                                "text" : "I agree"
                        }
                ]
} 

# unwind 
> db.blog.aggregate({"$unwind" : "$comments"})
        {
                "results" :
                        {
                                "_id" : ObjectId("50eeffc4c82a5271290530be"),
                                "author" : "k",
                                "post" : "Hello, world!",
                                "comments" : {
                                        "author" : "mark",
                                        "date" : ISODate("2013-01-10T17:52:04.148Z"),
                                        "text" : "Nice post"
                                }
                        },
                        {
                                "_id" : ObjectId("50eeffc4c82a5271290530be"),
                                "author" : "k",
                                "post" : "Hello, world!",
                                "comments" : {
                                        "author" : "bill",
                                        "date" : ISODate("2013-01-10T17:52:04.148Z"),
                                        "text" : "I agree"
                                }
                        }
                "ok" : 1
        }

$sort

# 1 是 ascending，-1 是 descending
> db.employees.aggregate(
        {
                "$project" : {
                        "compensation" : {
                                "$add" : ["$salary", "$bonus"]
                        },
                        "name" : 1
                }
        },
        {
                "$sort" : {"compensation" : -1, "name" : 1}
        }
    )

$limit

$limit 接收數值 n，然后返回前 n 個結果文檔。

$skip

$limit 接收數值 n，然后從結果集中剔除前 n 個文檔。對於標准查詢，一個大的 skips 效率比較低，因為它必須找出所有匹配被 skipped 的文檔，然后剔除它們。

使用管道

在使用 "$\$$project"、"$\$$group" 或者 "$\$$unwind" 操作之前，最好盡可能過濾出更多的文檔（和更多的域）。一旦管道不使用直接來自集合中的數據，索引（index）就不再能夠幫助取過濾（filter）和排序（sort）。如果可能的話，聚合管道試圖為你重新排序這些操作，以便能使用索引。

MongoDB 不允許單一聚合操作使用超過一定比例的系統內存：如果它計算得到一個聚合操作占用超過 20% 的內存，聚合就會出錯。允許輸出被輸送到一個集合中（這樣可以最小化所需內存的數量）是為將來作計划。

參考資料

MongoDB: The Definitive Guide, Second Edition
MongoDB 正則表達式
dateToString

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據庫查詢方法 python django 數據庫查詢方法總結 MSSQL數據庫跨表和跨數據庫查詢方法簡(轉) 轉 zabbix 優化方法以及后台數據庫查詢方法兩則億級數據量場景下，如何優化數據庫分頁查詢方法？ C# MongoDB 查詢方法 MongoDB 創建數據庫和查詢數據使用PyMongo查詢MongoDB數據庫！ Django數據查詢方法總結 JSON數據查詢方法