來源:http://lxml.de/tutorial.html
lxml是python中處理xml的一個非常強大的庫,可以非常方便的解析和生成xml文件。下面的內容翻譯了鏈接中的一部分
1.生成空xml節點
from lxml import etree root = etree.Element("root") print(etree.tostring(root, pretty_print=True))
<root/>
2.生成xml子節點
from lxml import etree root = etree.Element("root") root.append(etree.Element("child1")) #方法一 child2 = etree.SubElement(root, "child2") #方法二 child2 = etree.SubElement(root, "child3") print(etree.tostring(root))
<root> <child1/> <child2/> <child3/> </root>
3.生成帶內容的xml節點
from lxml import etree root = etree.Element("root") root.text = "Hello World" print(etree.tostring(root, pretty_print=True))
<root>Hello World</root>
4.屬性
lxml中將屬性以字典的形式存儲
生成屬性
from lxml import etree root = etree.Element("root", intersting = "totally") #方法一 root.set("hello","huhu") #方法二 root.text = "Hello World" print(etree.tostring(root))
<root intersting="totally" hello="huhu">Hello World</root>
獲取屬性
方法一:
root.get("interesting") root.get("hello")
totally
huhu
方法二:
attributes = root.attrib print(attributes["interesting"])
遍歷屬性
for name, value in sorted(root.items()): print('%s = %r' % (name, value))
5.生成特殊內容
如下xml,中間的文字被<br/>分割,需要用到.tail
<html><body>Hello<br/>World</body></html>
html = etree.Element("html") body = etree.SubElement(html, "body") body.text = "TEXT" br = etree.SubElement(body, "br") br.tail = "TAIL" etree.tostring(html)
6.遍歷
遍歷節點
for element in root.iter(): print("%s - %s" % (element.tag, element.text))
遍歷指定子節點,將子節點名寫入iter()
for element in root.iter("child"): print("%s - %s" % (element.tag, element.text))
7.用XPath查找節點內容
build_text_list = etree.XPath("//text()") # lxml.etree only! print(build_text_list(html))
8.查找節點
iterfind():遍歷所有節點匹配表達式
findall():返回滿足匹配的節點列表
find():返回滿足匹配的第一個
findtext():返回第一個滿足匹配條件的.text內容
設有以下xml內容
root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")
查找子節點
>>> print(root.find("b")) None >>> print(root.find("a").tag) a
查找樹中任意節點
>>> print(root.find(".//b").tag) b >>> [ b.tag for b in root.iterfind(".//b") ] ['b', 'b']
查找具有指定屬性的節點
>>> print(root.findall(".//a[@x]")[0].tag) a >>> print(root.findall(".//a[@y]")) []
9.字符串解析為XML
>>> some_xml_data = "<root>data</root>" >>> root = etree.fromstring(some_xml_data) >>> print(root.tag) root >>> etree.tostring(root) b'<root>data</root>'
10.使用E-factory快速生成XML和HTML
>>> from lxml.builder import E >>> def CLASS(*args): # class is a reserved word in Python return {"class":' '.join(args)} >>> html = page = ( E.html( # create an Element called "html" E.head( E.title("This is a sample document") ), E.body( E.h1("Hello!", CLASS("title")), E.p("This is a paragraph with ", E.b("bold"), " text in it!"), E.p("This is another paragraph, with a", "\n ", E.a("link", href="http://www.python.org"), "."), E.p("Here are some reservered characters: <spam&egg>."), etree.XML("<p>And finally an embedded XHTML fragment.</p>"), ) ) ) >>> print(etree.tostring(page, pretty_print=True)) <html> <head> <title>This is a sample document</title> </head> <body> <h1 class="title">Hello!</h1> <p>This is a paragraph with <b>bold</b> text in it!</p> <p>This is another paragraph, with a <a href="http://www.python.org">link</a>.</p> <p>Here are some reservered characters: <spam&egg>.</p> <p>And finally an embedded XHTML fragment.</p> </body> </html>