XML是一种广泛用于数据传输的可扩展标记语言,Java提供了DOM(Document Object Model) API来对其进行解析。DOM模型把XML当做一个树形结构来处理,从根节点开始,每个节点都可以包含任意个子节点。

以下面的XML为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?xml version="1.0" encoding="UTF-8"?>
<Server port="8005" shutdown="SHUTDOWN">
this is Server start
<Service name="Catalina">
this is Service start
<Connector port="8080"
protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
<Engine name="Catalina" defaultHost="localHost">
this is Engine start
<Host name="localhost"
appBase="webapps"
unpackWARs="true"
autoDeploy="true">
this is Host
</Host>
this is Engine end
</Engine>
this is Service end
</Service>
this is Server end
</Server>

如果解析为DOM结构,它大概是下面这样:

graph TB

Document(Document)-->Server(Server)
Server(Server)-->Service(Service)
Service(Service)-->Connector(Connector)
Service(Service)-->Engine(Engine)
Engine(Engine)-->Host(Host)

注意最顶层的Document代表XML文档,它是真正的“根”,而<Server>虽然是根元素,但它是Document的一个子节点。

JavaDOM API在解析时会一次性读取XML全部内容到内存中,并构建出一个树形结构。主要代码如下:

1
2
3
4
InputStream is = DomApiParseXmlTest.class.getResourceAsStream("/xml/server.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(is);

在调用DocumentBuilder.parse()方法后我们会得到一个Document对象,该对象即代表了整个XML文档的树形结构。我们可以对其遍历来读取每一个子节点或指定节点的值。

有两种方式。一种是直接递归遍历整个树形结构,主要代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
private static void recursionPrintNode(Node n) {
short nodeType = n.getNodeType();
if (nodeType == DOCUMENT_NODE) {// Document节点
LOGGER.info("Document: {}", n.getNodeName());
} else if (nodeType == ELEMENT_NODE) {// 元素节点
LOGGER.info("Element: {}", n.getNodeName());
// 遍历输出该元素节点的所有属性
for (int i = 0; i < n.getAttributes().getLength(); i++) {
LOGGER.info("Attr: {} = {}",
n.getAttributes().item(i).getNodeName(),
n.getAttributes().item(i).getNodeValue()
);
}
} else if (nodeType == TEXT_NODE) {// 文本
LOGGER.info("Text: {} = {}", n.getNodeName(), n.getNodeValue());
} else if (nodeType == ATTRIBUTE_NODE) {// 属性
LOGGER.info("Attr: {} = {}", n.getNodeName(), n.getNodeValue());
} else {
LOGGER.info("NodeType: {}, NodeName: {}", n.getNodeType(), n.getNodeName());
}
for (Node child = n.getFirstChild();child != null;child = child.getNextSibling()) {
recursionPrintNode(child);
}
}

运行输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
00:32:23.997 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Document: #document
00:32:24.002 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Server
00:32:24.002 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: port = 8005
00:32:24.002 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: shutdown = SHUTDOWN
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Server start

00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Service
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: name = Catalina
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Service start

00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Connector
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: connectionTimeout = 20000
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: port = 8080
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: protocol = HTTP/1.1
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: redirectPort = 8443
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =

00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Engine
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: defaultHost = localHost
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: name = Catalina
00:32:24.003 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Engine start

00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Host
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: appBase = webapps
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: autoDeploy = true
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: name = localhost
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Attr: unpackWARs = true
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Host

00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Engine end

00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Service end

00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Text: #text =
this is Server end

可以看出DOM API针对XML中的各种元素进行了建模,定义了一个统一的Node接口,根据getNodeType()方法的返回值区分一个Node是元素还是属性或者是文本等。

下面采用循环的方式遍历Document获取指定节点的值,主要代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
private static void loopPrintByContext(Node document) {
LOGGER.info("Root Element: {}", document.getNodeName());
Node firstChild = document.getFirstChild();
Element element = (Element) firstChild;
LOGGER.info("Element: {}, Attribute: {}, {}",
firstChild.getNodeName(),
element.getAttributes().getNamedItem("port"),
element.getAttributes().getNamedItem("shutdown")
);

NodeList serviceList = element.getElementsByTagName("Service");
for (int i = 0; i < serviceList.getLength(); i++) {
Element service = (Element) serviceList.item(i);
LOGGER.info("Element: {}, Attribute: {}", service.getNodeName(), service.getAttributes().getNamedItem("name"));

NodeList connectorList = service.getElementsByTagName("Connector");
for (int j = 0; j < connectorList.getLength(); j++) {
Node connector = connectorList.item(j);
LOGGER.info("Element: {}, Attribute: {}, {}, {}, {}",
connector.getNodeName(),
connector.getAttributes().getNamedItem("port"),
connector.getAttributes().getNamedItem("protocol"),
connector.getAttributes().getNamedItem("connectionTimeout"),
connector.getAttributes().getNamedItem("redirectPort")
);
}

NodeList engineList = service.getElementsByTagName("Engine");
for (int k = 0; k < engineList.getLength(); k++) {
Element engine = (Element) engineList.item(k);
LOGGER.info("Element: {}, Attribute: {}, {}",
engine.getNodeName(),
engine.getAttributes().getNamedItem("name"),
engine.getAttributes().getNamedItem("defaultHost")
);

NodeList hostList = engine.getElementsByTagName("Host");
for (int m = 0; m < hostList.getLength(); m++) {
Node host = hostList.item(m);
LOGGER.info("Element: {}, Attribute: {}, {}, {}, {}",
host.getNodeName(),
host.getAttributes().getNamedItem("name"),
host.getAttributes().getNamedItem("appBase"),
host.getAttributes().getNamedItem("unpackWARs"),
host.getAttributes().getNamedItem("autoDeploy")
);
}
}
}
}

运行输出结果如下:

1
2
3
4
5
6
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Root Element: #document
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Server, Attribute: port="8005", shutdown="SHUTDOWN"
00:32:24.004 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Service, Attribute: name="Catalina"
00:32:24.005 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Connector, Attribute: port="8080", protocol="HTTP/1.1", connectionTimeout="20000", redirectPort="8443"
00:32:24.005 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Engine, Attribute: name="Catalina", defaultHost="localHost"
00:32:24.005 [main] INFO com.sunchaser.sparrow.javase.base.xml.DomApiParseXmlTest - Element: Host, Attribute: name="localhost", appBase="webapps", unpackWARs="true", autoDeploy="true"

以上就是DOM API的用法。它的缺点在于,由于是一次性读取XML到内存中,所以当XML文件很大时,对内存大小的要求会比较高。否则容易发生内存溢出。

完整代码请看:传送门