【python】lxml


來源:http://lxml.de/tutorial.html

 

lxml是python中處理xml的一個非常強大的庫,可以非常方便的解析和生成xml文件。下面的內容翻譯了鏈接中的一部分

 

1.生成空xml節點

from lxml import etree

root = etree.Element("root")
print(etree.tostring(root, pretty_print=True))
<root/>

2.生成xml子節點

from lxml import etree

root = etree.Element("root")
root.append(etree.Element("child1"))     #方法一
child2 = etree.SubElement(root, "child2")  #方法二
child2 = etree.SubElement(root, "child3")
print(etree.tostring(root))
<root>
  <child1/>
  <child2/>
  <child3/>
</root>

3.生成帶內容的xml節點

from lxml import etree

root = etree.Element("root")
root.text = "Hello World"
print(etree.tostring(root, pretty_print=True))
<root>Hello World</root>

4.屬性

lxml中將屬性以字典的形式存儲

生成屬性

from lxml import etree

root = etree.Element("root", intersting = "totally")  #方法一
root.set("hello","huhu")  #方法二
root.text = "Hello World"
print(etree.tostring(root))
<root intersting="totally" hello="huhu">Hello World</root>

獲取屬性

方法一:

root.get("interesting")
root.get("hello")
totally
huhu

方法二:

attributes = root.attrib
print(attributes["interesting"])

遍歷屬性

for name, value in sorted(root.items()):
     print('%s = %r' % (name, value))

5.生成特殊內容

如下xml,中間的文字被<br/>分割,需要用到.tail

<html><body>Hello<br/>World</body></html>
html = etree.Element("html")
body = etree.SubElement(html, "body")
body.text = "TEXT"
br = etree.SubElement(body, "br")
br.tail = "TAIL"
etree.tostring(html)

6.遍歷

遍歷節點

for element in root.iter():
     print("%s - %s" % (element.tag, element.text))

遍歷指定子節點,將子節點名寫入iter()

for element in root.iter("child"):
     print("%s - %s" % (element.tag, element.text))

7.用XPath查找節點內容

build_text_list = etree.XPath("//text()") # lxml.etree only!
print(build_text_list(html))

8.查找節點

iterfind():遍歷所有節點匹配表達式

findall():返回滿足匹配的節點列表

find():返回滿足匹配的第一個

findtext():返回第一個滿足匹配條件的.text內容

設有以下xml內容

root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

查找子節點

>>> print(root.find("b"))
None
>>> print(root.find("a").tag)
a

查找樹中任意節點

>>> print(root.find(".//b").tag)
b
>>> [ b.tag for b in root.iterfind(".//b") ]
['b', 'b']

查找具有指定屬性的節點

>>> print(root.findall(".//a[@x]")[0].tag)
a
>>> print(root.findall(".//a[@y]"))
[]

9.字符串解析為XML

>>> some_xml_data = "<root>data</root>"

>>> root = etree.fromstring(some_xml_data)
>>> print(root.tag)
root
>>> etree.tostring(root)
b'<root>data</root>'

10.使用E-factory快速生成XML和HTML

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python
        return {"class":' '.join(args)}

>>> html = page = (
    E.html(       # create an Element called "html"
      E.head(
        E.title("This is a sample document")
      ),
      E.body(
        E.h1("Hello!", CLASS("title")),
        E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
        E.p("This is another paragraph, with a", "\n      ",
          E.a("link", href="http://www.python.org"), "."),
        E.p("Here are some reservered characters: <spam&egg>."),
        etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
      )
    )
  )

>>> print(etree.tostring(page, pretty_print=True))
<html>
  <head>
    <title>This is a sample document</title>
  </head>
  <body>
    <h1 class="title">Hello!</h1>
    <p>This is a paragraph with <b>bold</b> text in it!</p>
    <p>This is another paragraph, with a
      <a href="http://www.python.org">link</a>.</p>
    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>
    <p>And finally an embedded XHTML fragment.</p>
  </body>
</html>

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM