xml.etree.ElementTree模塊實現了一個簡單而有效的用戶解析和創建XML數據的API。
在python3.3版本中,該模塊進行了一些修改:
xml.etree.cElementTree模塊被棄用。
警告:xml.etree.ElementTree模塊在解析惡意構造的數據會產生一定的安全隱患。所以使用該模塊的時候需要謹慎。
下面來看看該模塊是怎樣解析和創建XML數據文檔的。
首先,我們應該了解一下什么是XML樹和元素,XML是一種固有的層次化數據格式,這是一種最自然的格式類表示一棵樹。
xml.etree.ElementTree(簡寫ET)就此而言,ElementTree代表的是整個XML無奈的和元素的一棵樹,這棵樹有一個唯一的
root根節點。在根節點下面,可以有很多子節點,而每一個子節點又可以有自己的屬性或子節點....
我們今天需要解析的XML文件的內容如下:
我把該XML文件保存在:c:\\test\\hongten.xml文件中
1 <?xml version="1.0" encoding="UTF-8"?> 2 <students> 3 <student no="2009081097"> 4 <name>Hongten</name> 5 <gender>M</gender> 6 <age>20</age> 7 <score subject="math">97</score> 8 <score subject="chinese">90</score> 9 </student> 10 <student no="2009081098"> 11 <name>DuDu</name> 12 <gender>W</gender> 13 <age>21</age> 14 <score subject="math">87</score> 15 <score subject="chinese">96</score> 16 </student> 17 <student no="2009081099"> 18 <name>Sum</name> 19 <gender>M</gender> 20 <age>19</age> 21 <score subject="math">64</score> 22 <score subject="chinese">98</score> 23 </student> 24 </students>
在上面的XML文件內容中,我們可以看到此XML文件的根節點為:students
我們可以通過下面的方法獲取到根節點:
1 import xml.etree.ElementTree as ET 2 tree = ET.parse('c:\\test\\hongten.xml') 3 root = tree.getroot() 4 tag = root.tag #students
同樣的我們也可以獲取到根節點的屬性:
1 attrib = root.attrib #{}
因為根節點:students是沒有屬性的,所以為空。
我們要獲取根節點:students的子節點名稱和屬性:
1 for child in root: 2 print(child.tag, child.attrib)
輸出為:
student {'no' : '2009081097'} student {'no' : '2009081098'} student {'no' : '2009081099'}
我們同樣可以獲取屬性對應的值:
1 for student in root.findall('student'): 2 no = student.get('no') 3 name = student.find('name').text 4 print(no, name)
輸出為:
2009081097 Hongten 2009081098 DuDu 2009081099 Sum
當然,我們也可以修改XML文件的內容:
1 for age in root.iter('age'): 2 new_age = int(age.text) + 1 3 age.text = str(new_age) 4 age.set('updated', 'yes') 5 tree.write('c:\\test\\hongten_update.xml')
修改后的XML文件內容如下:
1 <students> 2 <student no="2009081097"> 3 <name>Hongten</name> 4 <gender>M</gender> 5 <age updated="yes">21</age> 6 <score subject="math">97</score> 7 <score subject="chinese">90</score> 8 </student> 9 <student no="2009081098"> 10 <name>DuDu</name> 11 <gender>W</gender> 12 <age updated="yes">22</age> 13 <score subject="math">87</score> 14 <score subject="chinese">96</score> 15 </student> 16 <student no="2009081099"> 17 <name>Sum</name> 18 <gender>M</gender> 19 <age updated="yes">20</age> 20 <score subject="math">64</score> 21 <score subject="chinese">98</score> 22 </student> 23 </students>
==================================================================
以下是我對xml.etree.ElementTree模塊進行了一些封裝
==================================================================
1 # -*- coding: utf-8 -*- 2 #python xml.etree.ElementTree 3 4 #Author : Hongten 5 #Mailto : hongtenzone@foxmail.com 6 #Blog : http://www.cnblogs.com/hongten 7 #QQ : 648719819 8 #Version : 1.0 9 #Create : 2013-09-03 10 11 import os 12 import xml.etree.ElementTree as ET 13 14 ''' 15 在python中,解析XML文件有很多中方法 16 本文中要使用的方法是:xml.etree.ElementTree 17 ''' 18 #global var 19 #show log 20 SHOW_LOG = True 21 #XML file 22 XML_PATH = None 23 24 def get_root(path): 25 '''parse the XML file,and get the tree of the XML file 26 finally,return the root element of the tree. 27 if the XML file dose not exist,then print the information''' 28 if os.path.exists(path): 29 if SHOW_LOG: 30 print('start to parse the file : [{}]'.format(path)) 31 tree = ET.parse(path) 32 return tree.getroot() 33 else: 34 print('the path [{}] dose not exist!'.format(path)) 35 36 def get_element_tag(element): 37 '''return the element tag if the element is not None.''' 38 if element is not None: 39 if SHOW_LOG: 40 print('begin to handle the element : [{}]'.format(element)) 41 return element.tag 42 else: 43 print('the element is None!') 44 45 def get_element_attrib(element): 46 '''return the element attrib if the element is not None.''' 47 if element is not None: 48 if SHOW_LOG: 49 print('begin to handle the element : [{}]'.format(element)) 50 return element.attrib 51 else: 52 print('the element is None!') 53 54 def get_element_text(element): 55 '''return the text of the element.''' 56 if element is not None: 57 return element.text 58 else: 59 print('the element is None!') 60 61 def get_element_children(element): 62 '''return the element children if the element is not None.''' 63 if element is not None: 64 if SHOW_LOG: 65 print('begin to handle the element : [{}]'.format(element)) 66 return [c for c in element] 67 else: 68 print('the element is None!') 69 70 def get_elements_tag(elements): 71 '''return the list of tags of element's tag''' 72 if elements is not None: 73 tags = [] 74 for e in elements: 75 tags.append(e.tag) 76 return tags 77 else: 78 print('the elements is None!') 79 80 def get_elements_attrib(elements): 81 '''return the list of attribs of element's attrib''' 82 if elements is not None: 83 attribs = [] 84 for a in elements: 85 attribs.append(a.attrib) 86 return attribs 87 else: 88 print('the elements is None!') 89 90 def get_elements_text(elements): 91 '''return the dict of element''' 92 if elements is not None: 93 text = [] 94 for t in elements: 95 text.append(t.text) 96 return dict(zip(get_elements_tag(elements), text)) 97 else: 98 print('the elements is None!') 99 100 def init(): 101 global SHOW_LOG 102 SHOW_LOG = True 103 global XML_PATH 104 XML_PATH = 'c:\\test\\hongten.xml' 105 106 def main(): 107 init() 108 #root 109 root = get_root(XML_PATH) 110 root_tag = get_element_tag(root) 111 print(root_tag) 112 root_attrib = get_element_attrib(root) 113 print(root_attrib) 114 #children 115 children = get_element_children(root) 116 print(children) 117 children_tags = get_elements_tag(children) 118 print(children_tags) 119 children_attribs = get_elements_attrib(children) 120 print(children_attribs) 121 122 print('#' * 50) 123 #獲取二級元素的每一個子節點的名稱和值 124 for c in children: 125 c_children = get_element_children(c) 126 dict_text = get_elements_text(c_children) 127 print(dict_text) 128 129 if __name__ == '__main__': 130 main()
運行效果:
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> ================================ RESTART ================================ >>> start to parse the file : [c:\test\hongten.xml] begin to handle the element : [<Element 'students' at 0x0215C5A0>] students begin to handle the element : [<Element 'students' at 0x0215C5A0>] {} begin to handle the element : [<Element 'students' at 0x0215C5A0>] [<Element 'student' at 0x0215C600>, <Element 'student' at 0x0215C750>, <Element 'student' at 0x0215C870>] ['student', 'student', 'student'] [{'no': '2009081097'}, {'no': '2009081098'}, {'no': '2009081099'}] ################################################## begin to handle the element : [<Element 'student' at 0x0215C600>] {'score': '90', 'gender': 'M', 'name': 'Hongten', 'age': '20'} begin to handle the element : [<Element 'student' at 0x0215C750>] {'score': '96', 'gender': 'W', 'name': 'DuDu', 'age': '21'} begin to handle the element : [<Element 'student' at 0x0215C870>] {'score': '98', 'gender': 'M', 'name': 'Sum', 'age': '19'} >>>
========================================================
More reading,and english is important.
I'm Hongten
大哥哥大姐姐,覺得有用打賞點哦!多多少少沒關系,一分也是對我的支持和鼓勵。謝謝。
Hongten博客排名在100名以內。粉絲過千。
Hongten出品,必是精品。
E | hongtenzone@foxmail.com B | http://www.cnblogs.com/hongten
========================================================