python處理xml的常用包(lib.xml、ElementTree、lxml)


python處理xml的三種常見機制

  • dom(隨機訪問機制)
  • sax(Simple APIs for XML,事件驅動機制)
  • etree

python處理xml的三種包

  • 標准庫中的xml
  • Fredrik Lundh 的 ElementTree
  • Stefan Behnel 的 lxml

對以上三種包的介紹和對比

摘錄自:http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/index.html

With the continued growth of both Python and XML, there is a plethora(過剩/過多) of packages out there that help you read, generate, and modify XML files from Python scripts. Compared to most of them, the lxml package has two big advantages:

  • Performance. Reading and writing even fairly large XML files takes an almost imperceptible(小得無法察覺的) amount of time.
  • Ease of programming. The lxml package is based on ElementTree, which Fredrik Lundh invented to simplify and streamline XML processing.

lxml is similar in many ways to two other, earlier packages:

  • Fredrik Lundh continues to maintain his original version of ElementTree.
  • xml.etree.ElementTree is now an official part of the Python library. There is a C-language version called cElementTree which may be even faster than lxml for some applications.

However, the author prefers lxml for providing a number of additional features that make life easier. In particular, support for XPath makes it considerably easier to manage more complex XML structures.

標准庫中的xml包

摘錄自:http://docs.python.org/library/xml.html

The XML handling submodules are:

  • xml.etree.ElementTree: the ElementTree API, a simple and lightweight XML processor
  • xml.dom: the DOM API definition
  • xml.dom.minidom: a minimal DOM implementation
  • xml.dom.pulldom: support for building partial DOM trees
  • xml.sax: SAX2 base classes and convenience functions
  • xml.parsers.expat: the Expat parser binding

ElementTree包

PYPI的介紹:https://pypi.python.org/pypi/elementtree/

The Element type is a flexible container object, designed to store hierarchical data structures in memory. Element structures can be converted to and from XML.

其作者對lxml的推介:http://effbot.org/zone/element-index.htm
There’s also an independent implementation, lxml.etree, based on the well-known libxml2/libxslt libraries. This adds full support for XSLT, XPath, and more.

IBM文檔庫的介紹文章:XML 問題: 使用 ElementTree,以 Python 語言處理 XML

lxml介紹

摘錄自:http://lxml.de/

lxml - XML and HTML with Python

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 2.6 to 3.6.

總結

一般情況下使用lxml可獲得高效率和易用性。

擴展閱讀

Python的XML處理方案:
Python XML解析
JAVA的xml方案:
java解析xml的幾種方式
lxml教程:
Python XML processing with lxml
命名空間相關:
Parsing XML with lxml and elementtree
XML 命名空間


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM