在python開發[第九篇],我們已經在request模塊中,講解了如何根據url去獲取網頁內容。
如果返回的是內容的格式python的基本數據類型,可以json將返回的字符串轉為python的基本數據類型。但是大多數情況下,我們通過http協議請求一個url后,返回的卻是?xml格式。基於這種常見的報文格式,python對其進行提供了相應模塊,如下:
一、xml模塊
XML是實現不同語言或程序之間進行數據交換的協議,XML文件格式如下:

<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
下面我們先試着請求一個地址,來查看其返回的結果:
#檢查QQ是否在線的
import requests r = requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=1223995142") result = r.text print(result) """ 執行結果: <?xml version="1.0" encoding="utf-8"?> <string xmlns="http://WebXml.com.cn/">Y</string> """
由上面可以看出,請求后返回的是xml格式的。為了能便捷的處理xml內容,python提供了內置模塊模塊xml模塊,在python的Lib目錄下。
1.xml報文的解析
第一種方法:
from xml.etree import ElementTree as ET #在poython的Lib目錄中有xml/etree目錄。並在該目錄下導入ElementTree.py文件,並重命名為ET # 打開文件,讀取XML內容 str_xml = open('first.xml', 'r').read() # 將字符串解析成xml特殊對象,root代指xml文件的根節點 root = ET.XML(str_xml) #調用ElementTree文件中的XML()方法
print(root,type(root)) #執行結果:<Element 'data' at 0x00702BA0> <class 'xml.etree.ElementTree.Element'>
#可見,root是Element的類對象
第二種方法:
from xml.etree import ElementTree as ET # 利用ElementTree.parse將文件直接解析成xml對象 。不需要打開文件 tree = ET.parse("first.xml") print(tree,type(tree)) #執行結果:<xml.etree.ElementTree.ElementTree object at 0x0067FDB0> <class 'xml.etree.ElementTree.ElementTree'> # 獲取xml文件的根節點 root = tree.getroot() #此時生成的tree是ElementTree的類對象 print(root,type(root)) #執行結果:<Element 'data' at 0x00502BA0> <class 'xml.etree.ElementTree.Element'>
#注意:1.在使用ElementTree類方法進行解析時,tree是ElementTree的對象。而root(根節點)卻是Element的對象
2.不管采用什么方法解析xml。解析后,所有的節點(不論根節點,還是子節點)都是Element對象
3.ET.XML()是對字符串解析成xml。所以呢,如果是想解析xml文件,就需要先讀取文件,獲取到文件內容,然后再解析。
4.ET。parse()是只能解析文件,不能將字符串解析為XML

class Element: """An XML element. This class is the reference implementation of the Element interface. An element's length is its number of subelements. That means if you want to check if an element is truly empty, you should check BOTH its length AND its text attribute. The element tag, attribute names, and attribute values can be either bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing element attributes. *extra* are additional element attributes given as keyword arguments. Example form: <tag attrib>text<child/>...</tag>tail """ 當前節點的標簽名 tag = None """The element's name.""" 當前節點的屬性 attrib = None """Dictionary of the element's attributes.""" 當前節點的內容 text = None """ Text before first subelement. This is either a string or the value None. Note that if there is no text, this attribute may be either None or the empty string, depending on the parser. """ tail = None """ Text after this element's end tag, but before the next sibling element's start tag. This is either a string or the value None. Note that if there was no text, this attribute may be either None or an empty string, depending on the parser. """ def __init__(self, tag, attrib={}, **extra): if not isinstance(attrib, dict): raise TypeError("attrib must be dict, not %s" % ( attrib.__class__.__name__,)) attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib): 創建一個新節點 """Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """ return self.__class__(tag, attrib) def copy(self): """Return copy of current element. This creates a shallow copy. Subelements will be shared with the original tree. """ elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem def __len__(self): return len(self._children) def __bool__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific 'len(elem)' or 'elem is not None' test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now def __getitem__(self, index): return self._children[index] def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element def __delitem__(self, index): del self._children[index] def append(self, subelement): 為當前節點追加一個子節點 """Add *subelement* to the end of this element. The new element will appear in document order after the last existing subelement (or directly after the text, if it's the first subelement), but before the end tag for this element. """ self._assert_is_element(subelement) self._children.append(subelement) def extend(self, elements): 為當前節點擴展 n 個子節點 """Append subelements from a sequence. *elements* is a sequence with zero or more elements. """ for element in elements: self._assert_is_element(element) self._children.extend(elements) def insert(self, index, subelement): 在當前節點的子節點中插入某個節點,即:為當前節點創建子節點,然后插入指定位置 """Insert *subelement* at position *index*.""" self._assert_is_element(subelement) self._children.insert(index, subelement) def _assert_is_element(self, e): # Need to refer to the actual Python implementation, not the # shadowing C implementation. if not isinstance(e, _Element_Py): raise TypeError('expected an Element, not %s' % type(e).__name__) def remove(self, subelement): 在當前節點在子節點中刪除某個節點 """Remove matching subelement. Unlike the find methods, this method compares elements based on identity, NOT ON tag value or contents. To remove subelements by other means, the easiest way is to use a list comprehension to select what elements to keep, and then use slice assignment to update the parent element. ValueError is raised if a matching element could not be found. """ # assert iselement(element) self._children.remove(subelement) def getchildren(self): 獲取所有的子節點(廢棄) """(Deprecated) Return all subelements. Elements are returned in document order. """ warnings.warn( "This method will be removed in future versions. " "Use 'list(elem)' or iteration over elem instead.", DeprecationWarning, stacklevel=2 ) return self._children def find(self, path, namespaces=None): 獲取第一個尋找到的子節點 """Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None): 獲取第一個尋找到的子節點的內容 """Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *default* is the value to return if the element was not found, *namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if none was found. Note that if an element is found having no text content, the empty string is returned. """ return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None): 獲取所有的子節點 """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """ return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None): 獲取當前節點下的指定的子節點(只能是子節點,孫節點都不行),並創建一個迭代器(可以被for循環) """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ return ElementPath.iterfind(self, path, namespaces) def clear(self): 清空節點 """Reset element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. """ self.attrib.clear() self._children = [] self.text = self.tail = None def get(self, key, default=None): 獲取當前節點的屬性值 """Get element attribute. Equivalent to attrib.get, but some implementations may handle this a bit more efficiently. *key* is what attribute to look for, and *default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if attribute was not found. """ return self.attrib.get(key, default) def set(self, key, value): 為當前節點設置屬性值 """Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle this a bit more efficiently. *key* is what attribute to set, and *value* is the attribute value to set it to. """ self.attrib[key] = value def keys(self): 獲取當前節點的所有屬性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary Python dict. Equivalent to attrib.keys() """ return self.attrib.keys() def items(self): 獲取當前節點的所有屬性值,每個屬性都是一個鍵值對 """Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to attrib.items(). Return a list of (name, value) tuples. """ return self.attrib.items() def iter(self, tag=None): 在當前節點的子孫中根據節點名稱尋找所有指定的節點,並返回一個迭代器(可以被for循環)。 """Create tree iterator. The iterator loops over the element and all subelements in document order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed elements may or may not be included. To get a stable set, use the list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """ if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: yield from e.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'elem.iter()' or 'list(elem.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def itertext(self): 在當前節點的子孫中根據節點名稱尋找所有指定的節點的內容,並返回一個迭代器(可以被for循環)。 """Create text iterator. The iterator loops over the element and all subelements in document order, returning all inner text. """ tag = self.tag if not isinstance(tag, str) and tag is not None: return if self.text: yield self.text for e in self: yield from e.itertext() if e.tail: yield e.tail

class ElementTree: """An XML element hierarchy. This class also provides support for serialization to and from standard XML. *element* is an optional root element node, *file* is an optional file handle or file name of an XML file whose contents will be used to initialize the tree with. """ def __init__(self, element=None, file=None): # assert element is None or iselement(element) self._root = element # first node if file: self.parse(file) def getroot(self): """Return root element of this tree.""" return self._root def _setroot(self, element): """Replace root element of this tree. This will discard the current contents of the tree and replace it with the given element. Use with care! """ # assert iselement(element) self._root = element def parse(self, source, parser=None): """Load external XML document into element tree. *source* is a file name or file object, *parser* is an optional parser instance that defaults to XMLParser. ParseError is raised if the parser fails to parse the document. Returns the root element of the given source document. """ close_source = False if not hasattr(source, "read"): source = open(source, "rb") close_source = True try: if parser is None: # If no parser was specified, create a default XMLParser parser = XMLParser() if hasattr(parser, '_parse_whole'): # The default XMLParser, when it comes from an accelerator, # can define an internal _parse_whole API for efficiency. # It can be used to parse the whole source without feeding # it with chunks. self._root = parser._parse_whole(source) return self._root while True: data = source.read(65536) if not data: break parser.feed(data) self._root = parser.close() return self._root finally: if close_source: source.close() def iter(self, tag=None): """Create and return tree iterator for the root element. The iterator loops over all elements in this tree, in document order. *tag* is a string with the tag name to iterate over (default is to return all elements). """ # assert self._root is not None return self._root.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'tree.iter()' or 'list(tree.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def find(self, path, namespaces=None): """Find first matching element by tag name or path. Same as getroot().find(path), which is Element.find() *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ # assert self._root is not None if path[:1] == "/": path = "." + path warnings.warn( "This search is broken in 1.3 and earlier, and will be " "fixed in a future version. If you rely on the current " "behaviour, change it to %r" % path, FutureWarning, stacklevel=2 ) return self._root.find(path, namespaces) def findtext(self, path, default=None, namespaces=None): """Find first matching element by tag name or path. Same as getroot().findtext(path), which is Element.findtext() *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ # assert self._root is not None if path[:1] == "/": path = "." + path warnings.warn( "This search is broken in 1.3 and earlier, and will be " "fixed in a future version. If you rely on the current " "behaviour, change it to %r" % path, FutureWarning, stacklevel=2 ) return self._root.findtext(path, default, namespaces) def findall(self, path, namespaces=None): """Find all matching subelements by tag name or path. Same as getroot().findall(path), which is Element.findall(). *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return list containing all matching elements in document order. """ # assert self._root is not None if path[:1] == "/": path = "." + path warnings.warn( "This search is broken in 1.3 and earlier, and will be " "fixed in a future version. If you rely on the current " "behaviour, change it to %r" % path, FutureWarning, stacklevel=2 ) return self._root.findall(path, namespaces) def iterfind(self, path, namespaces=None): """Find all matching subelements by tag name or path. Same as getroot().iterfind(path), which is element.iterfind() *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ # assert self._root is not None if path[:1] == "/": path = "." + path warnings.warn( "This search is broken in 1.3 and earlier, and will be " "fixed in a future version. If you rely on the current " "behaviour, change it to %r" % path, FutureWarning, stacklevel=2 ) return self._root.iterfind(path, namespaces) def write(self, file_or_filename, encoding=None, xml_declaration=None, default_namespace=None, method=None, *, short_empty_elements=True): """Write element tree to a file as XML. Arguments: *file_or_filename* -- file name or a file object opened for writing *encoding* -- the output encoding (default: US-ASCII) *xml_declaration* -- bool indicating if an XML declaration should be added to the output. If None, an XML declaration is added if encoding IS NOT either of: US-ASCII, UTF-8, or Unicode *default_namespace* -- sets the default XML namespace (for "xmlns") *method* -- either "xml" (default), "html, "text", or "c14n" *short_empty_elements* -- controls the formatting of elements that contain no content. If True (default) they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags """ if not method: method = "xml" elif method not in _serialize: raise ValueError("unknown method %r" % method) if not encoding: if method == "c14n": encoding = "utf-8" else: encoding = "us-ascii" enc_lower = encoding.lower() with _get_writer(file_or_filename, enc_lower) as write: if method == "xml" and (xml_declaration or (xml_declaration is None and enc_lower not in ("utf-8", "us-ascii", "unicode"))): declared_encoding = encoding if enc_lower == "unicode": # Retrieve the default encoding for the xml declaration import locale declared_encoding = locale.getpreferredencoding() write("<?xml version='1.0' encoding='%s'?>\n" % ( declared_encoding,)) if method == "text": _serialize_text(write, self._root) else: qnames, namespaces = _namespaces(self._root, default_namespace) serialize = _serialize[method] serialize(write, self._root, qnames, namespaces, short_empty_elements=short_empty_elements) def write_c14n(self, file): # lxml.etree compatibility. use output method instead return self.write(file, method="c14n")
2、遍歷同級別的所有節點
由於 每個節點 都具有和根節點同樣的方法,並且在上一步驟中解析時均得到了root(xml文件的根節點),so 可以利用以上方法進行操作xml文件。
from xml.etree import ElementTree as ET ############ 解析方式一 ############ """ # 打開文件,讀取XML內容 str_xml = open('first.xml', 'r').read() # 將字符串解析成xml特殊對象,root代指xml文件的根節點 root = ET.XML(str_xml) """ ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("first.xml") # 獲取xml文件的根節點 root = tree.getroot() ### 操作 # 頂層標簽 print(root.tag) # 遍歷XML文檔的第二層 for child in root: # 第二層節點的標簽名稱和標簽屬性 print(child.tag, child.attrib) # 遍歷XML文檔的第三層 for i in child: # 第二層節點的標簽名稱和內容 print(i.tag,i.text)
3、僅遍歷XML中指定的節點
from xml.etree import ElementTree as ET ############ 解析方式一 ############ """ # 打開文件,讀取XML內容 str_xml = open('first.xml', 'r').read() # 將字符串解析成xml特殊對象,root代指xml文件的根節點 root = ET.XML(str_xml) """ ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("first.xml") # 獲取xml文件的根節點 root = tree.getroot() ### 操作 # 頂層標簽 print(root.tag) #指定了遍歷的節點:year 。那么就只會遍歷標簽為year的節點 for node in root.iter('year'): # 節點的標簽名稱和內容 print(node.tag, node.text)
4、修改節點內容:
修改后的xml一定要保存,因為修改的動作只是發生在了內存中,並沒有將xml進行重寫
使用解析方式一:
from xml.etree import ElementTree as ET ############ 解析方式一 ############ # 打開文件,讀取XML內容 str_xml = open('first.xml', 'r').read() # 將字符串解析成xml特殊對象,root代指xml文件的根節點 root = ET.XML(str_xml) ############ 操作 ############ # 頂層標簽 print(root.tag) # 循環所有的year節點 for node in root.iter('year'): # 將year節點中的內容自增一 new_year = int(node.text) + 1 node.text = str(new_year) # 設置屬性 node.set('name', 'alex') node.set('age', '18') # 刪除屬性 del node.attrib['name'] # 遍歷data下的所有country節點 for country in root.findall('country'): # 獲取每一個country節點下rank節點的內容 rank = int(country.find('rank').text) if rank > 50: # 刪除指定country節點 root.remove(country) ############ 保存文件 ############ tree = ET.ElementTree(root) #因為Element類方法中,沒有保存文件這個功能,所以只能將類轉為ElementTree來進行保存 tree.write("newnew.xml", encoding='utf-8')
使用解析方式二:
from xml.etree import ElementTree as ET ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("first.xml") # 獲取xml文件的根節點 root = tree.getroot() ############ 操作 ############ # 頂層標簽 print(root.tag) # 循環所有的year節點 for node in root.iter('year'): # 將year節點中的內容自增一 new_year = int(node.text) + 1 node.text = str(new_year) # 設置屬性 node.set('name', 'alex') node.set('age', '18') # 刪除屬性 del node.attrib['name'] # 遍歷data下的所有country節點 for country in root.findall('country'): # 獲取每一個country節點下rank節點的內容 rank = int(country.find('rank').text) if rank > 50: # 刪除指定country節點 root.remove(country) ############ 保存文件 ############ tree.write("newnew.xml", encoding='utf-8')
注意:這兩者的區別就在於保存文件時不同 。
5.節點的創建
創建方式一,(推薦方式)
在python中一切皆對象,如"i=3"其本質就是i=int(3),所以i是int類的一個實例對象。同樣在xml模塊下,我們在本章的<1.xml報文的解析>中已經指出過,所有的節點都是Element類的對象。那么我們就可以直接son1=ET.Element('son', {'name': '兒1'})來創建節點。如下:
from xml.etree import ElementTree as ET # 創建根節點 root = ET.Element("famliy") # 創建節點大兒子 son1 = ET.Element('son', {'name': '兒1'}) #節點標簽是:son ,標簽屬性是:'name': '兒1' # 創建小兒子 son2 = ET.Element('son', {"name": '兒2'}) # 在大兒子中創建兩個孫子 grandson1 = ET.Element('grandson', {'name': '兒11'}) grandson2 = ET.Element('grandson', {'name': '兒12'})
#給son1添加子節點grandson1 son1.append(grandson1) #節點創建好,只是代表將節點在內存中生成了,並不代表就已經將所有節點連接在了一起,所以需要將子節點加入父節點中。 son1.append(grandson2) # 把兒子添加到根節點中 root.append(son1) root.append(son1) tree = ET.ElementTree(root)
############ 寫文件方式1(不推薦)##############
tree.write('oooo.xml') #沒有任何要求的寫入,這種寫入方式,默認是:1、不識別中文,若寫入中文會出現亂碼 2、沒有內容的節點會采取如:<neighbor direction="E"/>(非常簡潔的閉合)
"""執行結果:
<famliy><son name="儿1"><age name="儿11">孙子</age></son><son name="儿2" /></famliy>
"""
############ 寫文件方式2(推薦)##############
tree.write('oooo.xml',encoding='utf-8') #能夠識別中文,2,沒有內容的節點會采取簡單閉合
"""執行結果:
<famliy><son name="兒1"><age name="兒11">孫子</age></son><son name="兒2" /></famliy>
"""
############ 寫文件方式3(不推薦)##############
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False) #能識別中文,2,沒有內容的節點采取的閉合方式是:<rank updated="yes"></rank>
"""執行結果:
<famliy><son name="兒1"><age name="兒11">孫子</age></son><son name="兒2"></son></famliy>
"""
#注:采取這種閉合方式,沒有任何的優點,反而增加代碼量
############ 寫文件方式4(推薦)##############
et.write("test.xml", encoding="utf-8", xml_declaration=True) #能識別中文,同時在xml根節點前加入標識:<?xml version='1.0' encoding='utf-8'?>。
"""執行結果:
<?xml version='1.0' encoding='utf-8'?>
<famliy><son name="兒1"><age name="兒11">孫子</age></son><son name="兒2" /></famliy>
"""
#注:根節點前的標識,只是一個注釋的作用,代表使用的xml格式是什么版本,編碼使用的是什么
創建方式二(強烈推薦):
直接調用SubElement類來創建對象,如:son1 = ET.SubElement(root, "son", attrib={'name': '兒1'}) 。在創建過程中就直接指明了,在根節點下,創建一個子節點。

from xml.etree import ElementTree as ET # 創建根節點 root = ET.Element("famliy") # 創建節點大兒子。 在root根節點下,創建一個子節點,節點標簽為:son,屬性為:'name': '兒1' son1 = ET.SubElement(root, "son", attrib={'name': '兒1'}) # 創建小兒子 son2 = ET.SubElement(root, "son", attrib={"name": "兒2"}) # 在大兒子中創建一個孫子 grandson1 = ET.SubElement(son1, "age", attrib={'name': '兒11'}) grandson1.text = '孫子' et = ET.ElementTree(root) #生成文檔對象 et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)
創建方式三:
調用Element類方法中的makeelement方法來節點:

from xml.etree import ElementTree as ET # 創建根節點 root = ET.Element("famliy") # 創建大兒子 # son1 = ET.Element('son', {'name': '兒1'}) son1 = root.makeelement('son', {'name': '兒1'}) #這個是調用Element類方法makeelement()來創建節點 # 創建小兒子 # son2 = ET.Element('son', {"name": '兒2'}) son2 = root.makeelement('son', {"name": '兒2'}) # 在大兒子中創建兩個孫子 # grandson1 = ET.Element('grandson', {'name': '兒11'}) grandson1 = son1.makeelement('grandson', {'name': '兒11'}) # grandson2 = ET.Element('grandson', {'name': '兒12'}) grandson2 = son1.makeelement('grandson', {'name': '兒12'}) son1.append(grandson1) son1.append(grandson2) # 把兒子添加到根節點中 root.append(son1) root.append(son1) tree = ET.ElementTree(root) tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
6.xml格式化調整
注:格式化只是為了美觀,沒有實質意義,一般不建議格式化
由於原生保存的XML時默認無縮進,如果想要設置縮進的話, 需要修改保存方式:
from xml.etree import ElementTree as ET from xml.dom import minidom #用於來格式化xml def prettify(elem,path): """將節點轉換成字符串,並添加縮進。 """ rough_string = ET.tostring(elem, 'utf-8') #將創建的xml轉成字符串 reparsed = minidom.parseString(rough_string) #將字符串進行解析為xml raw_str= reparsed.toprettyxml(indent="\t") #給xml進行格式化 f = open(path, 'w', encoding='utf-8') f.write(raw_str) f.close() # 創建根節點 root = ET.Element("famliy") # 創建大兒子 son1 = root.makeelement('son', {'name': '兒1'}) # 創建小兒子 son2 = root.makeelement('son', {"name": '兒2'}) # 在大兒子中創建兩個孫子 grandson1 = son1.makeelement('grandson', {'name': '兒11'}) grandson2 = son1.makeelement('grandson', {'name': '兒12'}) son1.append(grandson1) son1.append(grandson2) # 把兒子添加到根節點中 root.append(son1) root.append(son1) prettify(root,"test.xml") #通過傳入根節點,來指明剛創建的xml
7、命名空間
命名空間只是為了避免兩個xml文件出現相同的節點名。
命名沖突 在 XML 中,元素名稱是由開發者定義的,當兩個不同的文檔使用相同的元素名時,就會發生命名沖突。 有一個 XML 文檔信息: <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> 然而還有另一個XML文檔信息: <table> <name>African Coffee Table</name> <width>80</width> <length>120</length> </table> 假如這兩個 XML 文檔被一起使用,由於兩個文檔都定義的 <table> 元素,就會發生命名沖突。 XML 解析器無法確定如何處理這類沖突。 使用前綴來避免命名沖突 對第一個xml文檔進行更改為: <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> 對另一個XML 文檔進行更改為: <f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> 現在,命名沖突不存在了,這是由於兩個文檔都使用了不同的名稱來命名它們的 <table> 元素 (<h:table> 和 <f:table>)。
那么在程序中怎么實現的呢?
from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #定義以別名,"com"就代指"http://www.company.com" # build a tree structure root = ET.Element("{http://www.company.com}STUFF")
#在創建節點的時候,里面{http://www.company.com}就代表要使用別名,在輸出到文件中時{http://www.company.com}就會被自動解析為com body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"}) body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xml", xml_declaration=True, encoding='utf-8', method="xml")
二、requests和xml的結合應用:
檢查QQ是否在線:
import requests r = requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=1223995142") result = r.text # XML 模塊 from xml.etree import ElementTree as ET #解析xml格式內容 #xml接受一個參數:字符串,格式化為特殊的對象 node = ET.XML(result)
#注意:此處只能使用ET.XML() ,不可使用ET.parse(),原因:result是通過requests()方法獲取到的字符串,而要將字符串解析成xml只能是ET.XML()。ET.parse()是直接解析文件 #獲取內容 if node.text == "Y": # text 是xml的值 print("在線") elif node.text == "N": print("離線") elif node.text == "V": print("隱身")