python解析XML文件

本文轉載自查看原文 2021-05-24 20:24 2121 python

對於xml的解析，與json原理是相同的。都是擺脫了只是純文件解析成str的麻煩。無論是json解析還是xml解析，對於python來說都獲取了對象，可以拿來直接用。具體方法，
看數據文件的格式。但無疑，json更好用，可以直接對應上python的列表或者字典。而xml，需要一些對象、屬性處理。至於xml相對JSON的好處，似乎沒有什么優勢，只因為XML比JSON
產生的時間早，那么在這個比較長的時間里面，形成了一種習慣用法。在表述數據方面因為有<>等的幫助，可能會更清晰一些。但也是仁者見仁罷了。
利用python解析XML的模塊，需要遍歷根（root）節點及子節點。而節點的屬性有4個：tag、attrib、text、tail，在python中用for循環遍歷即可。
本文主要用xml.etree.ElementTree模塊，最清晰明確。python還有其他兩個模塊：SAX (simple API for XML )、DOM(Document Object Model)可以解析XML文件。但感覺
用法比較繁復，另做探討。
先設定一個xml文件（xml_lesson）：

 1 <data>
 2     <country name="Liechtenstein">
 3         <rank updated="yes">2</rank>
 4         <year updated="no">2011</year>tail-尾部
 5         <gdppc>141100</gdppc>
 6         <neighbor direction="E" name="Austria" />
 7         <neighbor direction="W" name="Switzerland" />
 8     </country>
 9     <country name="Singapore">
10         <rank updated="yes">5</rank>
11         <year updated="no">2014</year>
12         <gdppc>59900</gdppc>
13         <neighbor direction="N" name="Malaysia" />
14     </country>
15     <country name="Panama">
16         <rank updated="yes">69</rank>
17         <year updated="no">2014</year>
18         <gdppc>13600</gdppc>
19         <neighbor direction="W" name="Costa Rica" />
20         <neighbor direction="E" name="Colombia" />
21     </country>
22      <country name="Panama">
23         <rank updated="yes">69</rank>
24         <year updated="yes">2013</year>
25         <gdppc>13600</gdppc>
26         <neighbor direction="W" name="Costa Rica" />
27         <neighbor direction="E" name="Colombia" />
28     </country>
29      <country name="Panama">
30         <rank updated="yes">89</rank>
31         <year updated="yes">2013</year>
32         <gdppc>13600</gdppc>
33         <neighbor direction="W" name="Costa Rica" />
34         <neighbor direction="E" name="Colombia" />
35     </country>
36 </data>

View Code

先了解一下xml的結構，樹形結構，root很重要：

1 import xml.etree.ElementTree as ET
2 xml_file = r"xml_lesson.xml"
3 # xml_file=r"movies.xml"
4 tree = ET.parse(xml_file)
5 root = tree.getroot()
6 print("root.tag: ", root.tag)  # >>>root.tag:  data
7 print("type: %s root.iter %s text: %s :" % (type(root), root.iter, root.text))

View Code

#>>>

root.tag: data
type: <class 'xml.etree.ElementTree.Element'> root.iter <built-in method iter of xml.etree.ElementTree.Element object at 0x0000025EDA942D10> text:

從輸出可以看到，所謂的tag就是尖括號最左邊靠近括號的內容。一般情況下root沒有attrib，也沒有text。在后面的子節點可以看到更多的屬性。

# 用for循環遍歷各級節點，是要知道xml的結構，然后一層一層進行循環。代碼如下：

 1 def loop_root():
 2     for i in root:  # 針對根下面第一層標簽數據的遍歷(有多少個country)
 3         print("-->i.tag: ", i.tag, end=' ')  # >>>i.tag:  country
 4         # tag屬性是data下面的標簽定義，如country
 5         print("i.attrib: ", i.attrib)  # >>>i.attrib:  {'name': 'Liechtenstein'}
 6         # attrib屬性是標簽country的key:value鍵值對。
 7         # print("\n")
 8         for j in i:  # 針對country下面數據的遍歷。
 9             print("\t-->", j.tag, end=' ')
10             print(j.attrib, end=' ')
11             print(j.text)

View Code

執行loop_root()輸出結果如下：

#>>>

　　-->i.tag: country i.attrib: {'name': 'Liechtenstein'}

--> rank {'updated': 'yes'} 2
--> year {'updated': 'yes'} 2010
--> gdppc {} 141100
--> neighbor {'direction': 'E', 'name': 'Austria'} None
--> neighbor {'direction': 'W', 'name': 'Switzerland'} None
-->i.tag: country i.attrib: {'name': 'Singapore'}
--> rank {'updated': 'yes'} 5
--> year {'updated': 'yes'} 2013
--> gdppc {} 59900
--> neighbor {'direction': 'N', 'name': 'Malaysia'} None
-->i.tag: country i.attrib: {'name': 'Panama'}
--> rank {'updated': 'yes'} 69
--> year {'updated': 'yes'} 2013
--> gdppc {} 13600
--> neighbor {'direction': 'W', 'name': 'Costa Rica'} None
--> neighbor {'direction': 'E', 'name': 'Colombia'} None

改進的方法，比如只看其中某幾個節點，可以把節點名字當做列表參數。代碼如下：

1 # 遍歷root下各個子節點，參數為節點列表，也可以只輸入1個節點名稱
2 def traverse_child(*child_node_list):
3     for child_node in child_node_list:
4         for node in root.iter(child_node):
5             print(child_node, ":", node.tag, node.attrib,node.text,node.tail)

View Code

執行如下函數：
traverse_child('year')
traverse_child(*['rank','gdppc','neighbor'])

輸出如下：

#>>>

year : year {'updated': 'yes'} 2010

year : year {'updated': 'yes'} 2013 tail-content

year : year {'updated': 'yes'} 2013

rank : rank {'updated': 'yes'} 2

rank : rank {'updated': 'yes'} 5

rank : rank {'updated': 'yes'} 69

gdppc : gdppc {} 141100

gdppc : gdppc {} 59900

gdppc : gdppc {} 13600

neighbor : neighbor {'direction': 'E', 'name': 'Austria'} None

neighbor : neighbor {'direction': 'W', 'name': 'Switzerland'} None

neighbor : neighbor {'direction': 'N', 'name': 'Malaysia'} None

neighbor : neighbor {'direction': 'W', 'name': 'Costa Rica'} None

neighbor : neighbor {'direction': 'E', 'name': 'Colombia'} None

# 修改xml文件,更改某些節點的屬性。

1 def modify_xml(child_node):
2    import xml.etree.ElementTree as ET
3    tree = ET.parse("xml_lesson")
4    root = tree.getroot()
5    for node in root.iter(child_node):
6       new_year = int(node.text) + 1
7       node.text = str(new_year)
8       node.set("updated", "no")
9    tree.write("xml_lesson")

View Code

執行modify_xml('year')可以修改year節點屬性，然后保存xml文件。

# 刪除node,可以設定過濾器，刪除滿足條件的節點。

1 def del_xml_node(child_node, filter):
2    import xml.etree.ElementTree as ET
3    tree = ET.parse("xml_lesson")
4    root = tree.getroot()
5    for country in root.findall('country'):
6       rank = int(country.find('rank').text)
7       if rank > filter:
8          root.remove(country)
9    tree.write('output.xml')

View Code

執行del_xml_node('country',50)，可以按照條件刪除一些節點

##按照xml的相關對象創建新的xml文件，構造xml各級節點（標簽tag、屬性attrib、文本text、尾部tail）。代碼如下

 1 def create_xml():
 2     import xml.etree.ElementTree as ET
 3     new_xml = ET.Element("namelist")
 4     # name = ET.SubElement(new_xml, "name", attrib={"enrolled": "yes"})
 5     name = ET.SubElement(new_xml, "")  # 函數至少2個參數。
 6     name.tag = '名字'
 7     name.attrib = {"注冊": "yes"}
 8     name.text = '張三'
 9     age = ET.SubElement(name, "age", attrib={"checked": "no"})
10     age.text = '33'
11     sex = ET.SubElement(name, "sex")
12     sex.text = '男'
13     name2 = ET.SubElement(new_xml, "name", attrib={"enrolled": "no"})
14     name2.text = '李四'
15     age = ET.SubElement(name2, "age")
16     age.attrib={"青年":"yes"}
17     age.text = '19'
18     sex = ET.SubElement(name2, "sex")
19     sex.text = '女'
20 
21     et = ET.ElementTree(new_xml)  # 生成文檔對象，將根目錄轉化為xml樹狀結構(即ElementTree對象)
22     pretty_xml(new_xml, '\t', '\n')  # 執行美化方法
23 
24     ET.dump(new_xml)  # 打印生成的格式，在終端輸出xml樹內容。
25     et.write("test.xml", encoding="utf-8", xml_declaration=True)

View Code

構造xml，需要了解xml文件的結構，特別是節點的各個屬性。值得說明的是，如果不經過修飾，輸出的xml就是一個長條，很難看。可以加入如下函數，對xml的

輸出進行縮進美化：

 1 ##下面的函數是一個美化xml格式的，如果沒有它，生成的xml文檔只有1行，很難看。
 2 ##有了這個函數，解決了xml按照節點縮進的問題。
 3 def pretty_xml(element, indent, newline, level=0):
 4     # elemnt為傳進來的Elment類，參數indent用於縮進，newline用於換行
 5     if element:  # 判斷element是否有子元素
 6         if (element.text is None) or element.text.isspace():  # 如果element的text沒有內容
 7             element.text = newline + indent * (level + 1)
 8         else:
 9             element.text = newline + indent * (level + 1) + \
10                            element.text.strip() + newline + indent * (level + 1)
11         # else:  # 此處兩行如果把注釋去掉，Element的text也會另起一行
12         # element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * level
13     temp = list(element)  # 將element轉成list
14     for subelement in temp:
15         if temp.index(subelement) < (len(temp) - 1):
16             # 如果不是list的最后一個元素，說明下一個行是同級別元素的起始，縮進應一致
17             subelement.tail = newline + indent * (level + 1)
18         else:  # 如果是list的最后一個元素， 說明下一行是母元素的結束，縮進應該少一個
19             subelement.tail = newline + indent * level
20         pretty_xml(subelement, indent, newline, level=level + 1)  # 對子元素進行遞歸操作

View Code

調用函數create_xml(),確實可以產生能夠按照層級縮進的xml文檔。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python之XML文件解析 python 解析 XML文件 python 解析xml文件-python parse xml. Python 解析 XML 文件生成 HTML python XML文件解析：用xml.dom.minidom來解析xml文件 python xml.etree.ElementTree解析xml文件獲取節點 python解析xml文件之xml.etree.cElementTree和xml.etree.ElementTree區別和基本使用 Unmarshaller解析xml文件 Java解析XML文件 SQL解析XML文件