之前在使用xml解析的時候,在網上搜了很多教程,最終沒有能按照網上的教程實現需求。
所以呢,只好自己去看源碼,在sax的__init__.py下看到這么一段代碼:
1 def parse(source, handler, errorHandler=ErrorHandler()): 2 parser = make_parser() 3 parser.setContentHandler(handler) 4 parser.setErrorHandler(errorHandler) 5 parser.parse(source) # 可以看出來,執行xml解析至少需要兩個參數:source:源文件路徑和實例化的handler對象
下面我們就用一個例子來是實現一下:(事先說明,這個例子是網上找的,不是自己寫的)
<bookstore>
<book category="CHILDREN">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
下面將對各個步驟的作用逐個說明:
#!usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/5/30 22:43
# @Author : Adong_Chen
from xml import sax
class TestHandler(sax.ContentHandler): # 定義自己的handler類,繼承sax.ContentHandler
def __init__(self):
sax.ContentHandler.__init__(self) # 弗父類和子類都需要初始化(做一些變量的賦值操作等)
self._content = ""
self._tag = ""
def startElement(self, name, attrs): # 遇到<tag>標簽時候會執行的方法,這里的name,attrs不用自己傳值的(這里其實是重寫)
self._tag = name
if name == "bookstore":
print "=========BOOKSTORE========="
if self._tag == "book":
print "BOOK: " + attrs["category"]
print "--------------------------"
def endElement(self, name): # 遇到</tag>執行的方法,name不用自己傳值(重寫)
# print "endElement"
if name == "bookstore":
print "=========BOOKSTORE========="
elif name == "title":
print "Title: " + self._content
elif name == "author":
print "Author: " + self._content
elif name == "year":
print "Year: " + self._content
elif name == "price":
print "Price: " + self._content
else:
pass
def characters(self, content): # 獲取標簽內容
self._content = content
if __name__ == "__main__":
handler = TestHandler() # 自定義類實例化成對象
sax.parse("Test2.xml", handler) # 解析xml文件
執行結果如下:
=========BOOKSTORE========= BOOK: CHILDREN -------------------------- Title: Harry Potter Author: J K. Rowling Year: 2005 Price: 29.99 BOOK: WEB -------------------------- Title: Learning XML Author: Erik T. Ray Year: 2003 Price: 39.95 =========BOOKSTORE=========
