yaml簡介以及在python上的應用


2009-05-13 javaeye http://angeloce.iteye.com/admin/blogs/385976

==================================================

 

YAML是一種直觀的能夠被電腦識別的的數據序列化格式,容易被人類閱讀,並且容易和腳本語言交互。YAML類似於XML,但是語法比XML簡單得多,對於轉化成數組或可以hash的數據時是很簡單有效的。

 

YAML語法規則:

  http://www.ibm.com/developerworks/cn/xml/x-cn-yamlintro/

  http://www.yaml.org/

 

YAML被很多人認為是可以超越xml和json的文件格式。對比xml,除了擁有xml的眾多優點外,它足夠簡單,易於使用。而對於json,YAML可以寫成規范化的配置文件(這我認為是高於json很多的優點,用json寫配置文件會讓人發瘋)。

  YAML使用寄主語言的數據類型,這在多種語言中流傳的時候可能會引起兼容性的問題。

 

如何寫yaml?(抄的)

name: Tom Smith
age: 37
spouse:
    name: Jane Smith
    age: 25
children:
 - name: Jimmy Smith
   age: 15
 - name1: Jenny Smith
   age1: 12

 

具體語法請參照yaml語法規則。

 

--------------------------------------------------------------------------------------------

 

yaml在python上的具體實現:PyYaml

 

將yaml寫成配置腳本test.yaml ,以下介紹如何讀寫yaml配置。

 

使用python的yaml庫PyYAML。http://pyyaml.org/

 

安裝到python lib下后就可以正常使用了。

 

#加載yaml
import yaml

#讀取文件
f = open('test.yaml')

#導入
x = yaml.load(f)

print x

 

也許你會得到以下類似的strings:

{'age': 37, 'spouse': {'age': 25, 'name': 'Jane Smith'}, 'name': 'Tom Smith', 'children': [{'age': 15, 'name': 'Jimmy Smith'}, {'age1': 12, 'name1': 'Jenny Smith'}]}

 

 python上使用yaml庫很簡單,基本就使用兩個函數:

 

yaml.load

 

yaml.dump

 

對於使用過pickle的各位童鞋來說,這意味着什么不用詳說了吧?

 

Warning: It is not safe to call yaml.load with any data received from an untrusted source!yaml.load is as powerful as pickle.load and so may call any Python function.

 

對於yaml的讀取來講,最難的在於寫出正確的yaml數據格式。如果一不小心出錯,將會導致load異常,但有時沒有異常報,而是會讀不出任何數據。

 

pyYaml是完全的python實現,號稱比pickle更nb。(這誰知道呢?)

 

yaml.load accepts a byte string, a Unicode string, an open binary file object, or an open text file object. A byte string or a file must be encoded with utf-8utf-16-be or utf-16-le encoding. yaml.loaddetects the encoding by checking the BOM (byte order mark) sequence at the beginning of the string/file. If no BOM is present, the utf-8 encoding is assumed.

 

yaml.load可接收一個byte字符串,unicode字符串,打開的二進制文件或文本文件對象。字節字符串和文件必須是utf-8,utf-16-be或utf-16-le編碼的.yaml.load通過檢查字符串/文件開始的BOM(字節序標記)來確認編碼。如果沒有BOM,就默認為utf-8。

 

百度上的關於BOM
    在UCS 編碼中有一個叫做"ZERO WIDTH NO-BREAK SPACE"的字符,它的編碼是FEFF。而FFFE在UCS中是不存在的字符,所以不應該出現在實際傳輸中。UCS規范建議我們在傳輸字節流前,先傳輸字符"ZERO WIDTH NO-BREAK SPACE"。這樣如果接收者收到FEFF,就表明這個字節流是Big-Endian的;如果收到FFFE,就表明這個字節流是Little- Endian的。因此字符"ZERO WIDTH NO-BREAK SPACE"又被稱作BOM。 
    UTF-8不需要BOM來表明字節順序,但可以用BOM來表明編碼方式。字符"ZERO WIDTH NO-BREAK SPACE"的UTF-8編碼是EF BB BF。所以如果接收者收到以EF BB BF開頭的字節流,就知道這是UTF-8編碼了。Windows就是使用BOM來標記文本文件的編碼方式的。

 

 yaml.load 會返回一個python對象。關於會是什么……看你數據是什么了……

 

 

If a string or a file contains several documents, you may load them all with the yaml.load_all function.

 

如果string或文件包含幾塊yaml文檔,你可以使用yaml.load_all來解析全部的文檔。

 

yaml.load(stream, Loader=<class 'yaml.loader.Loader'>)
    Parse the first YAML document in a stream #只解析第一個
    and produce the corresponding Python object.

yaml.load_all(stream, Loader=<class 'yaml.loader.Loader'>)
    Parse all YAML documents in a stream
    and produce corresponding Python objects.

 

yaml.load_all 會生成一個迭代器,你要做的就是for 讀出來

 

documents = """
name: The Set of Gauntlets 'Pauraegen'
description: >
  A set of handgear with sparks that crackle
  across its knuckleguards.
 ---
name: The Set of Gauntlets 'Paurnen'
description: >
   A set of gauntlets that gives off a foul,
   acrid odour yet remains untarnished.
 ---
name: The Set of Gauntlets 'Paurnimmen'
description: >
   A set of handgear, freezing with unnatural cold.
"""


for data in yaml.load_all(documents):
print data

#{'description': 'A set of handgear with sparks that crackle across its #knuckleguards.\n',
#'name': "The Set of Gauntlets 'Pauraegen'"}
#{'description': 'A set of gauntlets that gives off a foul, acrid odour #yet remains untarnished.\n',
#'name': "The Set of Gauntlets 'Paurnen'"}
#{'description': 'A set of handgear, freezing with unnatural cold.\n',
#'name': "The Set of Gauntlets 'Paurnimmen'"}

 

PyYAML allows you to construct a Python object of any type.

Even instances of Python classes can be constructed using the !!python/object tag.

 

PyYaml允許你構建任何類型的python對象,甚至是python類實例,只需要借助一下yaml標簽!!python/object。

這個以后再說,非常有用的東西。

 

Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.

 

需要注意的是隨意在yaml里構建python對象是有一定危險的,尤其是接收到一個未知的yaml文檔。yaml.safe_load可以限制這個能力,就使用些簡單的對象吧。

 

 ---------------------------------------

Dumping YAML

 

The yaml.dump function accepts a Python object and produces a YAML document.

 

yaml.dump 將一個python對象生成為yaml文檔,與yaml.load搭配使用。

dump(data, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, **kwds)

    Serialize a Python object into a YAML stream.
    If stream is None, return the produced string instead.
    #很好,如果缺省數據流為空的話,就會給你返回個字符串作為yaml文檔

 

 

aproject = {'name': 'Silenthand Olleander', 
                   'race': 'Human',
                    'traits': ['ONE_HAND', 'ONE_EYE']
                   }


print yaml.dump(aproject)

#返回
#name: Silenthand Olleander
#race: Human
#traits: [ONE_HAND, ONE_EYE]

 

 

 

 

yaml.dump accepts the second optional argument, which must be an open text or binary file. In this case, yaml.dump will write the produced YAML document into the file. Otherwise, yaml.dump returns the produced document. 

 

 解釋上面那句話的:yaml.dump接收的第二個參數一定要是一個打開的文本文件或二進制文件,yaml.dump會把生成的yaml文檔寫到文件里。否則,yaml.dump會返回生成的文檔。

 

If you need to dump several YAML documents to a single stream, use the function yaml.dump_all.yaml.dump_all accepts a list or a generator producing

Python objects to be serialized into a YAML document. The second optional argument is an open file.

 

如果你需要把幾段yaml文檔同時寫進一個數據流中,請使用yaml.dump_all函數。yaml.dump_all可以接收一個列表或者生成python對象的可序列化生成器(好別扭啊),第二個參數是打開的文件。這完全是對應yaml.load_all的。

 

You may even dump instances of Python classes.

 

你甚至可以直接把python類的實例(對象)dump進去。

 

yaml.dump supports a number of keyword arguments that specify formatting details for the emitter. For instance, you may set the preferred intendation and width, use the canonical YAML format or force preferred style for scalars and collections.

 

yaml.dump支持很多種確定格式化發射器的關鍵字參數(請先無視這句- -#)。比如你可以設置縮進和寬度(指的yaml文檔),使用標准yaml格式或者強制優先樣式對於標量和收集(請繼續無視- -#)。

 

瞧這翻譯的。

 

dump_all(documents, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, default_style=None, default_flow_style=None, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding='utf-8', explicit_start=None, explicit_end=None, version=None, tags=None)


#不過對應具體的函數參數可以看出所敘述的幾個參數
#cannonical
#indent
#width
#等等

 

舉例

>>> print yaml.dump(range(50))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
  23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
  43, 44, 45, 46, 47, 48, 49]

>>> print yaml.dump(range(50), width=50, indent=4)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
    28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
    40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

>>> print yaml.dump(range(5), canonical=True)
---
!!seq [
  !!int "0",
  !!int "1",
  !!int "2",
  !!int "3",
  !!int "4",
]

>>> print yaml.dump(range(5), default_flow_style=False)
- 0
- 1
- 2
- 3
- 4

>>> print yaml.dump(range(5), default_flow_style=True, default_style='"')
[!!int "0", !!int "1", !!int "2", !!int "3", !!int "4"]

 

這關鍵都在后面的參數呢。

 

------------------------------------------------------

 

 Constructors, representers, resolvers

 

構造器,描繪器(?),解析器

 

You may define your own application-specific tags. The easiest way to do it is to define a subclass ofyaml.YAMLObject

 

你可以自定義一個程序專屬標簽(tag),定義一個yaml.YAMLObject的子類的最簡單方法可以這么干:

 

 

class Monster(yaml.YAMLObject):
    yaml_tag = u'!Monster'
    def __init__(self, name, hp, ac, attacks):
        self.name = name
        self.hp = hp
        self.ac = ac
        self.attacks = attacks
    def __repr__(self):
        return "%s(name=%r, hp=%r, ac=%r, attacks=%r)" % (
            self.__class__.__name__, self.name, self.hp, self.ac,self.attacks)

 

 

The above definition is enough to automatically load and dump Monster objects:

 

    上面這個定義的Monster類已經足夠用來load和dump了:

 

>>> yaml.load("""
... --- !Monster
... name: Cave spider
... hp: [2,6]    # 2d6
... ac: 16
... attacks: [BITE, HURT]
... """)

Monster(name='Cave spider', hp=[2, 6], ac=16, attacks=['BITE', 'HURT'])

>>> print yaml.dump(Monster(
...     name='Cave lizard', hp=[3,6], ac=16, attacks=['BITE','HURT']))

!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard

 

 

 

yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.

 

yaml.YAMLObject 使用魔法元類注冊一個把yaml編碼轉成類實例的構造器,還有一個把類實例序列化成yaml編碼的描述器。

 

If you don't want to use metaclasses, you may register your constructors and representers using the functions yaml.add_constructor and yaml.add_representer. For instance, you may want to add a constructor and a representer for the following Dice class:

 

如果不想使用元類,也可以使用函數yaml.add_constructor和yaml.add_representer來注冊構造器和描述器。例如,你可以把一個構造器和描述器加到下面這個Dice類里:

 

>>> class Dice(tuple):
...     def __new__(cls, a, b):
...         return tuple.__new__(cls, [a, b])
...     def __repr__(self):
...         return "Dice(%s,%s)" % self

>>> print Dice(3,6)
Dice(3,6)

 

 

The default representation for Dice objects is not nice:

 

這個Dice對象默認的yaml描述可不怎么好看:

 

>>> print yaml.dump(Dice(3,6))

!!python/object/new:__main__.Dice
- !!python/tuple [3, 6]

 

 

Suppose you want a Dice object to represented as AdB in YAML:

 

好,現在假設你想把Dice對象描述成在yaml里為"AdB"的形式(A,B為變量)。

 

First we define a representer that convert a dice object to scalar node with the tag !dice and register it.

 

首先我們定義一個可以把Dice對象轉換成帶有'!dice'標簽節點的描述器,然后注冊。

 

>>> def dice_representer(dumper, data):
...     return dumper.represent_scalar(u'!dice', u'%sd%s' % data)

>>> yaml.add_representer(Dice, dice_representer)

 

 

Now you may dump an instance of the Dice object:

 

現在你就可以dump一個Dice實例了:

 

>>> print yaml.dump({'gold': Dice(10,6)})
{gold: !dice '10d6'}

 

Let us add the code to construct a Dice object:

 

讓我們把節點加到Dice對象的構造器中。

 

>>> def dice_constructor(loader, node):
...     value = loader.construct_scalar(node)
...     a, b = map(int, value.split('d'))
...     return Dice(a, b)

>>> yaml.add_constructor(u'!dice', dice_constructor)

 

 

Then you may load a Dice object as well:

 

然后就可以使用了

 

>>> print yaml.load("""
... initial hit points: !dice 8d4
... """)

{'initial hit points': Dice(8,4)}

 

 

從這里可以看出了,constructor和representer是相對的,一個為load,一個為dump。

 

 

-------------------------------------------------------

 

以上大多數來自 http://pyyaml.org/wiki/PyYAMLDocumentation

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM