python開發_xml.etree.ElementTree_XML文件操作_該模塊在操作XML數據是存在安全隱患_慎用


xml.etree.ElementTree模塊實現了一個簡單而有效的用戶解析和創建XML數據的API。

在python3.3版本中,該模塊進行了一些修改:

xml.etree.cElementTree模塊被棄用。

警告:xml.etree.ElementTree模塊在解析惡意構造的數據會產生一定的安全隱患。所以使用該模塊的時候需要謹慎。

下面來看看該模塊是怎樣解析和創建XML數據文檔的。

首先,我們應該了解一下什么是XML樹和元素,XML是一種固有的層次化數據格式,這是一種最自然的格式類表示一棵樹。

xml.etree.ElementTree(簡寫ET)就此而言,ElementTree代表的是整個XML無奈的和元素的一棵樹,這棵樹有一個唯一的

root根節點。在根節點下面,可以有很多子節點,而每一個子節點又可以有自己的屬性或子節點....

我們今天需要解析的XML文件的內容如下:

我把該XML文件保存在:c:\\test\\hongten.xml文件中

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <students>
 3     <student no="2009081097">
 4         <name>Hongten</name>
 5         <gender>M</gender>
 6         <age>20</age>
 7         <score subject="math">97</score>
 8         <score subject="chinese">90</score>
 9     </student>
10     <student no="2009081098">
11         <name>DuDu</name>
12         <gender>W</gender>
13         <age>21</age>
14         <score subject="math">87</score>
15         <score subject="chinese">96</score>
16     </student>
17     <student no="2009081099">
18         <name>Sum</name>
19         <gender>M</gender>
20         <age>19</age>
21         <score subject="math">64</score>
22         <score subject="chinese">98</score>
23     </student>
24 </students>

在上面的XML文件內容中,我們可以看到此XML文件的根節點為:students
我們可以通過下面的方法獲取到根節點

1 import xml.etree.ElementTree as ET
2 tree = ET.parse('c:\\test\\hongten.xml')
3 root = tree.getroot()
4 tag = root.tag          #students

同樣的我們也可以獲取到根節點的屬性

1 attrib = root.attrib     #{}

 因為根節點:students是沒有屬性的,所以為空。

我們要獲取根節點:students的子節點名稱和屬性

1  for child in root:
2      print(child.tag, child.attrib)

輸出為:

student {'no' : '2009081097'}
student {'no' : '2009081098'}
student {'no' : '2009081099'}

我們同樣可以獲取屬性對應的值

1 for student in root.findall('student'):
2     no = student.get('no')
3     name = student.find('name').text
4     print(no, name)

輸出為:

2009081097 Hongten
2009081098 DuDu
2009081099 Sum

當然,我們也可以修改XML文件的內容:

1 for age in root.iter('age'):
2     new_age = int(age.text) + 1
3     age.text = str(new_age)
4     age.set('updated', 'yes')
5 tree.write('c:\\test\\hongten_update.xml')

修改后的XML文件內容如下:

 1 <students>
 2     <student no="2009081097">
 3         <name>Hongten</name>
 4         <gender>M</gender>
 5         <age updated="yes">21</age>
 6         <score subject="math">97</score>
 7         <score subject="chinese">90</score>
 8     </student>
 9     <student no="2009081098">
10         <name>DuDu</name>
11         <gender>W</gender>
12         <age updated="yes">22</age>
13         <score subject="math">87</score>
14         <score subject="chinese">96</score>
15     </student>
16     <student no="2009081099">
17         <name>Sum</name>
18         <gender>M</gender>
19         <age updated="yes">20</age>
20         <score subject="math">64</score>
21         <score subject="chinese">98</score>
22     </student>
23 </students>

==================================================================

以下是我對xml.etree.ElementTree模塊進行了一些封裝

==================================================================

  1 # -*- coding: utf-8 -*-
  2 #python xml.etree.ElementTree
  3 
  4 #Author   :   Hongten
  5 #Mailto   :   hongtenzone@foxmail.com
  6 #Blog     :   http://www.cnblogs.com/hongten
  7 #QQ       :   648719819
  8 #Version  :   1.0
  9 #Create   :   2013-09-03
 10 
 11 import os
 12 import xml.etree.ElementTree as ET
 13 
 14 '''
 15     在python中,解析XML文件有很多中方法
 16     本文中要使用的方法是:xml.etree.ElementTree       
 17 '''
 18 #global var
 19 #show log
 20 SHOW_LOG = True
 21 #XML file
 22 XML_PATH = None
 23 
 24 def get_root(path):
 25     '''parse the XML file,and get the tree of the XML file
 26     finally,return the root element of the tree.
 27     if the XML file dose not exist,then print the information'''
 28     if os.path.exists(path):
 29         if SHOW_LOG:
 30             print('start to parse the file : [{}]'.format(path))
 31         tree = ET.parse(path)
 32         return tree.getroot()
 33     else:
 34         print('the path [{}] dose not exist!'.format(path))
 35 
 36 def get_element_tag(element):
 37     '''return the element tag if the element is not None.'''
 38     if element is not None:
 39         if SHOW_LOG:
 40             print('begin to handle the element : [{}]'.format(element))
 41         return element.tag
 42     else:
 43         print('the element is None!')
 44 
 45 def get_element_attrib(element):
 46     '''return the element attrib if the element is not None.'''
 47     if element is not None:
 48         if SHOW_LOG:
 49             print('begin to handle the element : [{}]'.format(element))
 50         return element.attrib
 51     else:
 52         print('the element is None!')
 53 
 54 def get_element_text(element):
 55     '''return the text of the element.'''
 56     if element is not None:
 57         return element.text
 58     else:
 59         print('the element is None!')
 60 
 61 def get_element_children(element):
 62     '''return the element children if the element is not None.'''
 63     if element is not None:
 64         if SHOW_LOG:
 65             print('begin to handle the element : [{}]'.format(element))
 66         return [c for c in element]
 67     else:
 68         print('the element is None!')
 69 
 70 def get_elements_tag(elements):
 71     '''return the list of tags of element's tag'''
 72     if elements is not None:
 73         tags = []
 74         for e in elements:
 75             tags.append(e.tag)
 76         return tags
 77     else:
 78         print('the elements is None!')
 79 
 80 def get_elements_attrib(elements):
 81     '''return the list of attribs of element's attrib'''
 82     if elements is not None:
 83         attribs = []
 84         for a in elements:
 85             attribs.append(a.attrib)
 86         return attribs
 87     else:
 88         print('the elements is None!')
 89 
 90 def get_elements_text(elements):
 91     '''return the dict of element'''
 92     if elements is not None:
 93         text = []
 94         for t in elements:
 95             text.append(t.text)
 96         return dict(zip(get_elements_tag(elements), text))
 97     else:
 98         print('the elements is None!')
 99 
100 def init():
101     global SHOW_LOG
102     SHOW_LOG = True
103     global XML_PATH
104     XML_PATH = 'c:\\test\\hongten.xml'
105 
106 def main():
107     init()
108     #root
109     root = get_root(XML_PATH)
110     root_tag = get_element_tag(root)
111     print(root_tag)
112     root_attrib = get_element_attrib(root)
113     print(root_attrib)
114     #children
115     children = get_element_children(root)
116     print(children)
117     children_tags = get_elements_tag(children)
118     print(children_tags)
119     children_attribs = get_elements_attrib(children)
120     print(children_attribs)
121 
122     print('#' * 50)
123     #獲取二級元素的每一個子節點的名稱和值
124     for c in children:
125         c_children = get_element_children(c)
126         dict_text = get_elements_text(c_children)
127         print(dict_text)
128     
129 if __name__ == '__main__':
130     main()

運行效果:

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
start to parse the file : [c:\test\hongten.xml]
begin to handle the element : [<Element 'students' at 0x0215C5A0>]
students
begin to handle the element : [<Element 'students' at 0x0215C5A0>]
{}
begin to handle the element : [<Element 'students' at 0x0215C5A0>]
[<Element 'student' at 0x0215C600>, <Element 'student' at 0x0215C750>, <Element 'student' at 0x0215C870>]
['student', 'student', 'student']
[{'no': '2009081097'}, {'no': '2009081098'}, {'no': '2009081099'}]
##################################################
begin to handle the element : [<Element 'student' at 0x0215C600>]
{'score': '90', 'gender': 'M', 'name': 'Hongten', 'age': '20'}
begin to handle the element : [<Element 'student' at 0x0215C750>]
{'score': '96', 'gender': 'W', 'name': 'DuDu', 'age': '21'}
begin to handle the element : [<Element 'student' at 0x0215C870>]
{'score': '98', 'gender': 'M', 'name': 'Sum', 'age': '19'}
>>> 

 

========================================================

More reading,and english is important.

I'm Hongten

 

大哥哥大姐姐,覺得有用打賞點哦!多多少少沒關系,一分也是對我的支持和鼓勵。謝謝。
Hongten博客排名在100名以內。粉絲過千。
Hongten出品,必是精品。

E | hongtenzone@foxmail.com  B | http://www.cnblogs.com/hongten

========================================================


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM