文章目录
- 前言
- SAX模块
- 用SAX读取XML文件
- 常用函数
- SAX解析器
- SAX事件处理器
- 用SAX解析XML文件综合代码
- 用SAX读取XML文件
前言
SAX和DOM都是用于处理XML文件的技术,但它们的处理方式不同。SAX是一种基于事件驱动的解析方式,它逐行读取XML文件并触发相应的事件加粗样式,从而实现对XML文件的解析。而DOM则是将整个XML文件加载到内存中,形成一棵树形结构,通过对树的遍历来实现对XML文件的解析。两种方式各有优缺点,具体使用哪种方式取决于具体的需求。
SAX模块
SAX模块是一种解析XML文档的方式,它基于事件驱动的模型,逐个解析XML文档中的元素和属性,并触发相应的事件。相比于DOM模型,SAX模型更加轻量级,适用于处理大型XML文档。
用SAX读取XML文件
XML.sax
是一种Python库,用于解析XML文档。它提供了一种基于事件的API,可以在解析XML文档时触发事件,从而实现对XML文档的解析和处理。
常用函数
make_parser
建立并返回一个SAX解析器的XMLReader对象
def make_parser(parser_list=()):
"""Creates and returns a SAX parser.解析器
Creates the first parser it is able to instantiate of the ones
given in the iterable created by chaining parser_list and
default_parser_list. The iterables must contain the names of Python modules containing both a SAX parser and a create_parser function."""
创建它能够实例化的第一个解析器在通过链接 parser _ list 和Default _ parser _ list: 迭代程序必须包含同时包含 SAX 解析器和 create _ parser 函数的 Python 模块的名称。
parse
建立一个SAX解析器,并用它来解析XML文档
def parse(source, handler, errorHandler=ErrorHandler()):
parser = make_parser()
parser.setContentHandler(handler)
parser.setErrorHandler(errorHandler)
parser.parse(source)
parseString
与parse函数类似,但从string参数所提供的字符串中解析XML
def parseString(string, handler, errorHandler=ErrorHandler()):
SAXException
封装了XML操作相关错误或警告
class SAXException(Exception):
"""Encapsulate an XML error or warning. This class can contain
basic error or warning information from either the XML parser or
the application: you can subclass子类 it to provide additional
functionality, or to add localization. Note that although you will
receive a SAXException as the argument to the handlers in the
ErrorHandler interface, you are not actually required to raise
the exception; instead, you can simply read the information in
it."""
SAX解析器
主要作用是:向事件处理器发送时间
SAX事件处理器
ContentHandler
类来实现
# ===== CONTENTHANDLER =====
class ContentHandler:
"""Interface for receiving logical document content events.
This is the main callback interface in SAX, and the one most
important to applications. The order of events in this interface
mirrors the order of the information in the document."""
此接口中事件的顺序反映了文档中信息的顺序。
代码语言:javascript复制class ContentHandler:
"""Interface for receiving logical document content events.
This is the main callback interface in SAX, and the one most
important to applications. The order of events in this interface
mirrors the order of the information in the document."""
def __init__(self):
self._locator = None定位器
def setDocumentLocator(self, locator):
"""Called by the parser to give the application a locator for
locating the origin of document events.由解析器调用,为应用程序提供一个定位文档事件的起源。
SAX parsers are strongly encouraged 鼓励(though not absolutely
required虽然不是绝对必需的) to supply提供 a locator: if it does so, it must supply
the locator to the application by invoking this method before
invoking调用 any of the other methods in the DocumentHandler
interface.
The locator allows the application to determine the end
position of any document-related event, even if the parser is
not reporting an error. Typically, the application will use
this information for reporting its own errors (such as
character content that does not match an application's
business rules). The information returned by the locator is
probably not sufficient for use with a search engine.
Note that the locator will return correct information only
during the invocation 调用of the events in this interface. The
application should not attempt to use it at any other time."""
self._locator = locator
def startDocument(self):
"""Receive notification of the beginning of a document.
The SAX parser will invoke this method only once, before any
other methods in this interface or in DTDHandler (except for
setDocumentLocator)."""
def endDocument(self):
"""Receive notification of the end of a document.
The SAX parser will invoke this method only once, and it will
be the last method invoked during the parse. The parser shall
not invoke this method until it has either abandoned parsing
(because of an unrecoverable error) or reached the end of
input."""
def startPrefixMapping(self, prefix, uri):
"""Begin the scope of a prefix-URI Namespace mapping.
开始了prefix-URI名称空间映射的范围。
The information from this event is not necessary for normal
Namespace processing: the SAX XML reader will automatically
replace prefixes for element and attribute names when the
http://xml.org/sax/features/namespaces feature is true (the
default).
There are cases, however, when applications need to use
prefixes in character data or in attribute values, where they
cannot safely be expanded automatically; the
start/endPrefixMapping event supplies the information to the
application to expand prefixes in those contexts itself, if
necessary.
Note that start/endPrefixMapping events are not guaranteed to
be properly nested relative to each-other: all
startPrefixMapping events will occur before the corresponding
startElement event, and all endPrefixMapping events will occur
after the corresponding endElement event, but their order is
not guaranteed."""
def endPrefixMapping(self, prefix):
"""End the scope of a prefix-URI mapping映射.
See startPrefixMapping for details. This event will always
occur after the corresponding endElement event, but the order
of endPrefixMapping events is not otherwise guaranteed.不以其他方式保证"""
def startElement(self, name, attrs):
"""Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the
element type as a string and the attrs parameter holds an
instance of the Attributes class containing the attributes of
the element."""
def endElement(self, name):
"""Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just
as with the startElement event."""
def startElementNS(self, name, qname, attrs):
"""Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a
(uri, localname) tuple, the qname parameter the raw XML 1.0
name used in the source document, and the attrs parameter
holds an instance of the Attributes class containing the
attributes of the element.
The uri part of the name tuple is None for elements which have
no namespace."""
def endElementNS(self, name, qname):
"""Signals the end of an element in namespace mode.
The name parameter contains the name of the element type, just
as with the startElementNS event."""
def characters(self, content):
"""Receive notification of character data.
The Parser will call this method to report each chunk of
character data. SAX parsers may return all contiguous
character data in a single chunk, or they may split it into
several chunks; however, all of the characters in any single
event must come from the same external entity so that the
Locator provides useful information."""
def ignorableWhitespace(self, whitespace):
"""Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk
of ignorable whitespace (see the W3C XML 1.0 recommendation,
section 2.10): non-validating parsers may also use this method
if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single
chunk, or they may split it into several chunks; however, all
of the characters in any single event must come from the same
external entity, so that the Locator provides useful
information."""
def processingInstruction(self, target, data):
"""Receive notification of a processing instruction.
The Parser will invoke this method once for each processing
instruction found: note that processing instructions may occur
before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0,
section 2.8) or a text declaration (XML 1.0, section 4.3.1)
using this method."""
def skippedEntity(self, name):
"""Receive notification of a skipped entity.实体
The Parser will invoke this method once for each entity
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities and the
http://xml.org/sax/features/external-parameter-entities
properties."""
# ===== DTDHandler =====
用SAX解析XML文件综合代码
SAX_parse_XML.py
代码语言:javascript复制# coding=gbk
import xml.sax
import sys
get_record=[] # 接受获取xml文档数据
class GetStorehouse(xml.sax.ContentHandler):# 事件处理器
def __init__(self):
self.CurrentDate=""# 自定义当前元素标签名属性
self.title=""# 自定义商品二级分类属性
self.name=""
self.amount=""
self.price=""
def startElement(self,label,atrributes):# 遇到元素开始标签出发该函数
self.CurrentDate=label # label为实例对象在解析的时候传递的标签名
if label=="goods":
category=atrributes["category"]
return category
def endElement(self,label):
global get_record
if self.CurrentDate=="title":
get_record.append(self.title)
elif self.CurrentDate=="name":
get_record.append(self.name)
elif self.CurrentDate=="amount":
get_record.append(self.amount)
elif self.CurrentDate=="price":
get_record.append(self.price)
def characters(self,content):
if self.CurrentDate=="title":
self.title=content
elif self.CurrentDate=="name":
self.name=content
elif self.CurrentDate=="amount":
self.amount=content
elif self.CurrentDate=="price":
self.price=content
#=======
parser=xml.sax.make_parser()#创建一个解析器的XMLreader对象
parser.setFeature(xml.sax.handler.feature_namespaces,0)# 从xml文件解析数据,关闭从命名空间解析数据
Handler=GetStorehouse()
parser.setContentHandler(Handler)
parser.parse("storehouse.xml")
print(get_record)
代码语言:javascript复制['淡水鱼', '鲫鱼', '18', '8', ' ', '温带水果', '猕猴桃', '10', '10', ' ', 'n']
代码语言:javascript复制<storehouse>
<goods category="fish">
<title>淡水鱼</title>
<name>鲫鱼</name>
<amount>18</amount>
<price>8</price>
</goods>
<goods category="fruit">
<title>温带水果</title>
<name>猕猴桃</name>
<amount>10</amount>
<price>10</price>
</goods>
</storehouse>