SAX(Simple API for XML)采用事件驱动的方式解析XML。在解析XML的过程中,每解析到一个XML的特定组成部分,都会回调事件处理器对应的方法并将当前解析到的XML内容作为方法参数传递给事件处理器。

事件处理器中的主要事件如下:

  • startDocument:开始解析XML
  • startElement:解析到了一个元素,例如<Server>
  • characters:解析到了字符
  • endElement:解析到了一个结束的元素,例如</Server>
  • endDocumentXML解析结束
  • error:解析发生错误

以下面的XML为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?xml version="1.0" encoding="UTF-8"?>
<Server port="8005" shutdown="SHUTDOWN">
this is Server start
<Service name="Catalina">
this is Service start
<Connector port="8080"
protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
<Engine name="Catalina" defaultHost="localHost">
this is Engine start
<Host name="localhost"
appBase="webapps"
unpackWARs="true"
autoDeploy="true">
this is Host
</Host>
this is Engine end
</Engine>
this is Service end
</Service>
this is Server end
</Server>

SAX进行解析的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
package com.sunchaser.sparrow.javase.base.xml;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.InputStream;

/**
* @author sunchaser admin@lilu.org.cn
* @since JDK8 2021/8/17
*/
public class SAXParseXmlTest {
private static final Logger LOGGER = LoggerFactory.getLogger(SAXParseXmlTest.class);

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
InputStream is = SAXParseXmlTest.class.getResourceAsStream("/xml/server.xml");
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
saxParser.parse(is, new DefaultHandler() {
@Override
public void startDocument() throws SAXException {
super.startDocument();
LOGGER.info("startDocument");
}

@Override
public void endDocument() throws SAXException {
super.endDocument();
LOGGER.info("endDocument");
}

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
super.startElement(uri, localName, qName, attributes);
LOGGER.info("startElement: uri={}, localName={}, qName={}, attributes={}", uri, localName, qName, attributes);
for (int i = 0; i < attributes.getLength(); i++) {// 循环输出Attr属性
LOGGER.info("attributes: localName={}, qName={}, type={}, uri={}, value={}",
attributes.getLocalName(i),
attributes.getQName(i),
attributes.getType(i),
attributes.getURI(i),
attributes.getValue(i)
);
}
}

@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
super.endElement(uri, localName, qName);
LOGGER.info("endElement: uri={}, localName={}, qName={}", uri, localName, qName);
}

@Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);// 解析到字符
LOGGER.info("characters: ch={}", new String(ch, start, length));
}

@Override
public void error(SAXParseException e) throws SAXException {
super.error(e);
LOGGER.error("error", e);
}
});
}
}

可以看到SAXParser#parse方法需要两个参数,一个是XML文件的InputStream输入流,另一个是DefaultHandler事件处理器。这里我们采用匿名子类的方式创建事件处理器对象,在匿名子类中重写各事件处理方法进行相应的处理。运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
22:48:41.529 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startDocument
22:48:41.538 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startElement: uri=, localName=, qName=Server, attributes=com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy@66cd51c3
22:48:41.539 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=port, qName=port, type=CDATA, uri=, value=8005
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=shutdown, qName=shutdown, type=CDATA, uri=, value=SHUTDOWN
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Server start

22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startElement: uri=, localName=, qName=Service, attributes=com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy@66cd51c3
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=name, qName=name, type=CDATA, uri=, value=Catalina
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Service start

22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startElement: uri=, localName=, qName=Connector, attributes=com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy@66cd51c3
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=port, qName=port, type=CDATA, uri=, value=8080
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=protocol, qName=protocol, type=CDATA, uri=, value=HTTP/1.1
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=connectionTimeout, qName=connectionTimeout, type=CDATA, uri=, value=20000
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=redirectPort, qName=redirectPort, type=CDATA, uri=, value=8443
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endElement: uri=, localName=, qName=Connector
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=

22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startElement: uri=, localName=, qName=Engine, attributes=com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy@66cd51c3
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=name, qName=name, type=CDATA, uri=, value=Catalina
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=defaultHost, qName=defaultHost, type=CDATA, uri=, value=localHost
22:48:41.540 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Engine start

22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - startElement: uri=, localName=, qName=Host, attributes=com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy@66cd51c3
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=name, qName=name, type=CDATA, uri=, value=localhost
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=appBase, qName=appBase, type=CDATA, uri=, value=webapps
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=unpackWARs, qName=unpackWARs, type=CDATA, uri=, value=true
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - attributes: localName=autoDeploy, qName=autoDeploy, type=CDATA, uri=, value=true
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Host

22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endElement: uri=, localName=, qName=Host
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Engine end

22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endElement: uri=, localName=, qName=Engine
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Service end

22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endElement: uri=, localName=, qName=Service
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - characters: ch=
this is Server end

22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endElement: uri=, localName=, qName=Server
22:48:41.541 [main] INFO com.sunchaser.sparrow.javase.base.xml.SAXParseXmlTest - endDocument

对于SAX解析方式来说,由于是边读取边解析,事件驱动,所以无论XML文件有多大,解析过程中占用的内存都很小。但SAX也有自身的缺点,就是解析当前元素时,上一个元素的信息已经丢弃,也就是说没有保存元素和元素之间的关联关系。