Marshmallow詳解

本文轉載自查看原文 2019-08-05 20:31 1226 Python/ Marshmallow

Marshmallow詳解

注意：這里的marshmallow版本是預發行版本3.x，非而不是正式版本2.x。版本3與版本2有一些差別，望周知。

文檔說明：https://marshmallow.readthedocs.io

marshmallow是一個用來將復雜的orm對象與python原生數據類型之間相互轉換的庫，簡而言之，就是實現object -> dict， objects -> list, string -> dict 和 string -> list。

序列化：序列化的意思是將數據對象轉化為可存儲或可傳輸的數據類型
反序列化：將可存儲或可傳輸的數據類型轉化為數據對象

要進行序列化或反序列化，首先我們需要一個用來操作的object，這里我們先定義一個類：

import datetime as dt


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()

Marshmallow詳解

1. Scheme

要對一個類或者一個json數據實現相互轉換(即序列化和反序列化), 需要一個中間載體, 這個載體就是Schema，另外Schema還可以用來做數據驗證。

# 這是一個簡單的Scheme
from marshmallow import Schema, fields


class UserSchema(Schema):
    name = fields.String()
    email = fields.Email()
    created_time = fields.DateTime()

2. Serializing(序列化)

使用scheme的dump()方法來序列化對象，返回的是dict格式的數據

另外schema的dumps()方法序列化對象，返回的是json編碼格式的字符串。


user = User(name="TTY", email="tty@python.org")
schema = UserSchema()
res = schema.dump(user)
print(res)
# {'email': 'tty@python.org', 'name': 'TTY', 'created_time': '2019-08-05T14:43:51.168241+00:00'}

res2 = schema.dumps(user)
print(res2)
# '{"name": "TTY", "created_time": "2019-08-05T14:46:07.111755+00:00", "email": "tty@python.org"}'

3. 過濾輸出

當不需要輸出所有的字段時，可以在實例化Scheme時，聲明only參數，來指定輸出：

summary_schema = UserSchema(only=("name", "email"))
res = summary_schema.dump(user)
print(res)
{'name': 'TTY', 'email': 'tty@python.org'}

only參數用來指定輸出的字段，也可以用exclude參數來排除不輸出的字段，達到一樣的效果。

4. Deserializing（反序列化）

schema的load()方法與dump()方法相反，用於dict類型的反序列化。他將輸入的字典格式數據轉換成應用層數據結構。他也能起到驗證輸入的字典格式數據的作用。
同樣，也有對json解碼的loads()方法。用於string類型的反序列化。
默認情況下，load()方法返回一個字典，當輸入的數據的值不匹配字段類型時，拋出 ValidationError 異常。

schema = UserSchema()
res = schema.load(user_data)
print(res)
# {'email': 'tty2@python.org', 'created_time': datetime.datetime(2019, 8, 5, 14, 46, 7), 'name': 'tty2'}

對反序列化而言, 將傳入的dict變成object更加有意義. 在Marshmallow中, dict -> object的方法需要自己實現, 然后在該方法前面加上一個裝飾器post_load即可

class UserSchema(Schema):
    name = fields.String()
    email = fields.Email()
    created_time = fields.DateTime()

    @post_load
    def make_user(self, data):
        return User(**data)

這樣每次調用load()方法時, 會按照make_user的邏輯, 返回一個User類對象。

user_data = {
    "name": "tty2",
    "email": "tty2@python.org"
}

schema = UserSchema()
res = schema.load(user_data)
print(res)
# <__main__.User object at 0x0000027BE9678128>
user = res
print("name: {}    email: {}".format(user.name, user.email))
# name: tty2    email: tty2@python.org

5. 處理多個對象的集合

多個對象的集合如果是可迭代的，那么也可以直接對這個集合進行序列化或者反序列化。在實例化Scheme類時設置參數many=True

也可以不在實例化類的時候設置，而在調用dump()方法的時候傳入這個參數。

user1 = User(name="tty1", email="tty1@python.org")
user2 = User(name="tty2", email="tty2@python.org")
users = [user1, user2]

# 第一種方法
schema = UserSchema(many=True)
res = schema.dump(users)

# 第二種方法
# schema = UserSchema()
# res = schema.dump(users， many=True)

print(res)
# [{'created_time': '2019-08-05T15:09:19.781325+00:00', 'email': 'tty1@python.org', 'name': 'tty1'},
#  {'created_time': '2019-08-05T15:09:19.781325+00:00', 'email': 'tty2@python.org', 'name': 'tty2'}]

6. Validation(驗證)

當不合法的數據通過Schema.load()或者Schema.loads()時，會拋出一個 ValidationError 異常。ValidationError.messages屬性有驗證錯誤信息，驗證通過的數據在 ValidationError.valid_data 屬性中
我們捕獲這個異常，然后做異常處理。首先需要導入ValidationError這個異常

from marshmallow import Schema, fields, ValidationError


class UserSchema(Schema):
    name = fields.String()
    email = fields.Email()
    created_time = fields.DateTime()


try:
    res = UserSchema().load({"name": "ttty", "email": "ttty"})
except ValidationError as e:
    print("錯誤信息：{}   合法數據：{}".format(e.messages, e.valid_data))
    # 錯誤信息：{'email': ['Not a valid email address.']}     合法數據：{'name': 'ttty'}
``
當驗證一個數據集合的時候，返回的錯誤信息會以 錯誤序號-錯誤信息 的鍵值對形式保存在errors中

```python
user_data = [
    {'email': 'mick@stones.com', 'name': 'Mick'},
    {'email': 'invalid', 'name': 'Invalid'},
    {'name': 'Keith'},
    {'email': 'charlie@stones.com'},
]
try:
    schema = UserSchema(many=True)
    res = schema.load(user_data)
except ValidationError as e:
    print("錯誤信息：{}   合法數據：{}".format(e.messages, e.valid_data))

    # 錯誤信息：{1: {'email': ['Not a valid email address.']}}
    # 合法數據：[{'email': 'mick@stones.com', 'name': 'Mick'},
    #           {'name': 'Invalid'},
    #           {'name': 'Keith'},
    #           {'email': 'charlie@stones.com'}]

可以看到上面，有錯誤信息，但是對於沒有傳入的屬性則沒有檢查，也就是說沒有規定屬性必須傳入。

在Schema里規定不可缺省字段：設置參數required=True

class UserSchema(Schema):
    name = fields.String(required=True)
    email = fields.Email()
    created_time = fields.DateTime()

再次進行驗證：

try:
    schema = UserSchema(many=True)
    res = schema.load(user_data)
except ValidationError as e:
    print("錯誤信息：{}   合法數據：{}".format(e.messages, e.valid_data))

    # 錯誤信息：{1: {'email': ['Not a valid email address.']},
    #           3: {'name': ['Missing data for required field.']}}
    # 合法數據：[{'email': 'mick@stones.com', 'name': 'Mick'},
    #           {'name': 'Invalid'},
    #           {'name': 'Keith'},
    #           {'email': 'charlie@stones.com'}]

6.1 自定義驗證信息

在編寫Schema類的時候，可以向內建的fields中設置validate參數的值來定制驗證的邏輯, validate的值可以是函數, 匿名函數lambda, 或者是定義了__call__的對象。

class UserSchema(Schema):
    name = fields.String(required=True, validate=lambda s: len(s)<6)
    email = fields.Email()
    created_time = fields.DateTime()

user_data = {'name': 'InvalidName', 'email': 'tty@python.org'}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e.messages)
    # {'name': ['Invalid value.']}

在驗證函數中自定義異常信息：

from marshmallow import Schema, fields, ValidationError

def validate_name(name):
    if len(name) <= 2:
        raise ValidationError("name長度必須大於2位")
    if len(name) >= 6:
        raise ValidationError("name長度不能大於6位")


class UserSchema(Schema):
    name = fields.String(required=True, validate=validate_name)
    email = fields.Email()
    created_time = fields.DateTime()


user_data = {'name': 'InvalidName', 'email': 'tty@python.org'}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e.messages)
    # {'name': ['name長度不能大於6位']}

注意：只會在反序列化的時候發生驗證！序列化的時候不會驗證！

6.2 將驗證函數寫在Schema中變成驗證方法

在Schema中，使用validates裝飾器就可以注冊驗證方法。

from marshmallow import Schema, fields, ValidationError, validates

class UserSchema(Schema):
    name = fields.String(required=True)
    email = fields.Email()
    created_time = fields.DateTime()

    @validates("name")
    def validate_name(self, value):
        if len(value) <= 2:
            raise ValidationError("name長度必須大於2位")
        if len(value) >= 6:
            raise ValidationError("name長度不能大於6位")


user_data = {'name': 'InvalidName', 'email': 'tty@python.org'}
try:
    res = UserSchema().load(user_data)
except ValidationError as e:
    print(e.messages)
    # {'name': ['name長度不能大於6位']}

6.3 Required Fields(必填選項)

上面已經簡單使用過required參數了。這里再簡單介紹一下。

自定義required異常信息：

首先我們可以自定義在requird=True時缺失字段時拋出的異常信息：設置參數error_messages的值

class UserSchema(Schema):
    name = fields.String(required=True, error_messages={"required": "name字段必須填寫"})
    email = fields.Email()
    created_time = fields.DateTime()


user = {"email": "tty@python.org"}
schema = UserSchema()
try:
    res = schema.load(user)
except ValidationError as e:
    print(e.messages)
    # {'name': ['name字段必須填寫']}

忽略部分字段：

使用required之后我們還是可以在傳入數據的時候忽略這個必填字段。

class UserSchema(Schema):
    name = fields.String(required=True)
    age = fields.Integer(required=True)

# 方法一：在load()方法設置partial參數的值（元組），表時忽略那些字段。
schema = UserSchema()
res = schema.load({"age": 42}, partial=("name",))
print(res)
# {'age': 42}

# 方法二：直接設置partial=True
schema = UserSchema()
res = schema.load({"age": 42}, partial=True)
print(res)
# {'age': 42}

看起來兩種方法是一樣的，但是方法一和方法二有區別：方法一只忽略傳入partial的字段，方法二會忽略除前面傳入的數據里已有的字段之外的所有字段

6.4 對未知字段的處理

默認情況下，如果傳入了未知的字段（Schema里沒有的字段），執行load()方法會拋出一個 ValidationError 異常。這種行為可以通過更改 unknown 選項來修改。

unknown 有三個值：

EXCLUDE: exclude unknown fields(直接扔掉未知字段)
INCLUDE: accept and include the unknown fields（接受未知字段）
RAISE: raise a ValidationError if there are any unknown fields（拋出異常）

我們可以看到，默認的行為就是RAISE。有兩種方法去更改：

方法一：在編寫Schema類的時候在class Meta里修改

＃　首先導入 EXCLUDE
from marshmallow import  EXCLUDE

class UserSchema(Schema):
    name = fields.String(required=True, error_messages={"required": "name字段必須填寫"})
    email = fields.Email()
    created_time = fields.DateTime()

    class Meta:
        unknown = EXCLUDE

方法二：在實例化Schema類的時候設置參數unknown的值

class UserSchema(Schema):
    name = fields.Str(required=True, error_messages={"required": "name字段必須填寫"})
    email = fields.Email()
    created_time = fields.DateTime()

shema = UserSchema(unknown=EXCLUDE)

7. Schema.validate(校驗數據)

如果只是想用Schema去驗證數據, 而不進行反序列化生成對象, 可以使用Schema.validate()
可以看到, 通過schema.validate()會自動對數據進行校驗, 如果有錯誤, 則會返回錯誤信息的dict,沒有錯誤則返回空的dict，通過返回的數據, 我們就可以確認驗證是否通過.

class UserSchema(Schema):
    name = fields.Str(required=True, error_messages={"required": "name字段必須填寫"})
    email = fields.Email()
    created_time = fields.DateTime()


user = {"name": "tty", "email": "tty@python"}
schema = UserSchema()
res = schema.validate(user)
print(res)
# {'email': ['Not a valid email address.']}

user1 = {"name": "tty", "email": "tty@python.org"}
schema = UserSchema()
res1 = schema.validate(user1)
print(res1)
# {}

8. Specifying Serialization/Deserialization Keys（指定序列化/反序列化鍵）

8.1 Specifying Attribute Names（序列化時指定object屬性對應fields字段）

Schema默認會序列化傳入對象和自身定義的fields相同的屬性, 然而你也會有需求使用不同的fields和屬性名. 在這種情況下, 你需要明確定義這個fields將從什么屬性名取值

import datetime as dt
from marshmallow import Schema, fields


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()


class UserSchema(Schema):
    full_name = fields.String(attribute="name")
    email_address = fields.Email(attribute="email")
    created_at = fields.DateTime(attribute="created_time")


user = User("ttty", email="ttty@python.org")
schema = UserSchema()
res = schema.dump(user)
print(res)

如上所示：UserSchema中的full_name，email_address，created_at分別從User對象的name，email，created_time屬性取值。

8.2 反序列化時指定fields字段對應object屬性

這個與上面相反，Schema默認反序列化傳入字典和輸出字典中相同的字段名. 如果你覺得數據不匹配你的schema, 可以傳入load_from參數指定需要增加load的字段名(原字段名也能load, 且優先load原字段名)

class UserSchema(Schema):
    full_name = fields.String(load_from="name")
    email_address = fields.Email(load_from="email")
    created_at = fields.DateTime(load_from="created_time")


user = {"full_name": "ttty", "email_address": "ttty@python.org"}
schema = UserSchema()
res = schema.load(user)
print(res)
# {'email_address': 'ttty@python.org', 'full_name': 'ttty'}

8.3 讓key同時滿足序列化與反序列化的方法

class UserSchema(Schema):
    full_name = fields.String(data_key="name")
    email_address = fields.Email(data_key="email")
    created_at = fields.DateTime(data_key="created_time")

# 序列化
user = {"full_name": "ttty", "email_address": "ttty@python.org"}
schema = UserSchema()
res = schema.dump(user)
print(res)
# {'name': 'ttty', 'email': 'ttty@python.org'}

# 反序列化
user1 = {"name": "ttty", "email": "ttty@python.org"}
schema = UserSchema()
res = schema.load(user1)
print(res)
# {'email_address': 'ttty@python.org', 'full_name': 'ttty'}

9. 重構：創建隱式字段

當Schema具有許多屬性時，為每個屬性指定字段類型可能會重復，特別是當許多屬性已經是本地python的數據類型時。class Meta允許指定要序列化的屬性，marshmallow將根據屬性的類型選擇適當的字段類型。

# 重構Schema
class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    class Meta:
        fields = ("name", "email", "created_at", "uppername")

以上代碼中， name將自動被格式化為String類型，created_at將被格式化為DateTime類型。

如果您希望指定除了顯式聲明的字段之外還包括哪些字段名，則可以使用附加選項。如下：

class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    class Meta:
        # No need to include 'uppername'
        additional = ("name", "email", "created_at")

10. 排序

對於某些用例，維護序列化輸出的字段順序可能很有用。要啟用排序，請將ordered選項設置為true。這將指示marshmallow將數據序列化到collections.OrderedDict

from collections import OrderedDict


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = dt.datetime.now()

class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    class Meta:
        fields = ("name", "email", "created_time", "uppername")
        ordered = True


u = User("Charlie", "charlie@stones.com")
schema = UserSchema()
res = schema.dump(u)
print(isinstance(res, OrderedDict))
# True
print(res)
# OrderedDict([('name', 'Charlie'), ('email', 'charlie@stones.com'), ('created_time', '2019-08-05T20:22:05.788540+00:00'), ('uppername', 'CHARLIE')])

11. “只讀”與“只寫”字段

在Web API的上下文中，序列化參數dump_only和反序列化參數load_only在概念上分別等同於只讀和只寫字段。

class UserSchema(Schema):
    name = fields.Str()
    # password is "write-only"
    password = fields.Str(load_only=True)
    # created_at is "read-only"
    created_at = fields.DateTime(dump_only=True)

load時，dump_only字段被視為未知字段。如果unknown選項設置為include，則與這些字段對應的鍵的值將因此loaded而不進行驗證。

12. 序列化/反序列化時指定字段的默認值

序列化時輸入值缺失用default指定默認值。反序列化時輸入值缺失用missing指定默認值。

class UserSchema(Schema):
    id = fields.UUID(missing=uuid.uuid1)
    birthdate = fields.DateTime(default=dt.datetime(2020, 9, 9))


# 序列化
res1 = UserSchema().dump({})
print(res1)
# {'birthdate': '2020-09-09T00:00:00+00:00'}

# 反序列化
res = UserSchema().load({})
print(res)
# {'id': UUID('18f1eb3a-b7ec-11e9-82fb-8cec4b76ee65')}

13. 后續擴展

需要表示對象之間的關系？請參見 Nesting Schemas 頁面。
想要創建自己的字段類型？請參閱自定義字段頁面。
需要添加模式級驗證，后處理或錯誤處理行為嗎？請參閱Schema擴展頁面。
例如，使用marshmallow的應用程序，請查看Examples頁面。

一個自定義字段的小例子：

from marshmallow import Schema, fields


class String128(fields.String):
    """
    長度為128的字符串類型
    """

    default_error_messages = {
        "type": "該字段只能是字符串類型",
        "invalid": "該字符串長度必須大於6",
    }

    def _deserialize(self, value, attr, data, **kwargs):
        if not isinstance(value, str):
            self.fail("type")
        if len(value) < 6:
            self.fail("invalid")


class AppSchema(Schema):
    name = String128(required=True)
    priority = fields.Integer()
    obj_type = String128()
    link = String128()
    deploy = fields.Dict()
    description = fields.String()
    projects = fields.List(cls_or_instance=fields.Dict)


app = {
    "name": "app11",
    "priority": 2,
    "obj_type": "web",
    "link": "123.123.00.2",
    "deploy": {"deploy1": "deploy1", "deploy2": "deploy2"},
    "description": "app111 test111",
    "projects": [{"id": 2}]
}

schema = AppSchema()
res = schema.validate(app)
print(res)
# {'obj_type': ['該字符串長度必須大於6'], 'name': ['該字符串長度必須大於6']}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python之Marshmallow marshmallow基本使用 marshmallow-sqlalchemy flask marshmallow文檔 python marshmallow庫 python3 marshmallow學習 Python（二）Marshmallow 庫相關學習 Flask之 Marshmallow 踩坑實錄詳解this 【知識詳解】Https詳解