使用 SAX 解析器解析大型 XML 文件,该类变得臃肿且不可读 - 如何解决此问题?
这纯粹是一个与代码可读性相关的问题,类的性能不是问题。
以下是我构建此 XMLHandler 的方法:
对于与应用程序相关的每个元素,我在“ElementName”中有一个布尔值,根据解析过程中我的位置将其设置为 true 或 false:问题,我现在有 10 多个布尔值我的班级开始时的宣言,而且它变得越来越大。
在我的 startElement 和 endElement 方法中,我有数百行
if (qName = "elementName") {
...
} else if (qName = "anotherElementName") {
...
}
不同的解析规则(如果我在 xml 文件中的这个位置,则执行此操作,否则执行此操作等...)
编码新的解析规则和调试变得越来越痛苦。
编写 sax 解析器的最佳实践是什么?我可以做些什么来使我的代码更具可读性?
This is purely a code readability related question, the performance of the class is not an issue.
Here is how I am building this XMLHandler :
For each element that is relevant to the application, I have a boolean in'ElementName' which I set to true or false depending on my location during the parsing : Problem, I now have 10+ boolean declaration at the beginning of my class and it is getting bigger and bigger.
In my startElement and in my endElement method, I have hundreds of line of
if (qName = "elementName") {
...
} else if (qName = "anotherElementName") {
...
}
with different parsing rules in them (if I am in this position in the xml file, do this, otherwise, do this etc...)
Coding new parsing rules and debugging is becoming increasingly painfull.
What are the best practices for coding a sax parser, and what can I do to make my code more readable ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
布尔变量有什么用?为了跟踪筑巢?
我最近通过对每个元素使用枚举来实现这一点。
代码正在运行,但这是我脑海中的粗略近似:
编辑:
XML 结构中的位置由元素堆栈来跟踪。调用 startElement 时,可以通过使用 1) 跟踪堆栈中的父元素和 2) 作为 sName 参数传递的元素标记(作为从 生成的 Map 的键)来确定适当的
Element
枚举。父信息定义为Element
枚举的一部分。Pair
类只是 2 部分密钥的持有者。这种方法允许重复出现在 XML 结构的不同部分中且具有不同语义的相同元素标签由不同的
Element
枚举表示。例如:使用这种技术,我们不需要使用标志来跟踪上下文,以便我们知道正在处理哪个
元素。上下文被声明为Element
枚举定义的一部分,并通过消除各种状态变量来减少混乱。What do you use the boolean variables for? To keep track of nesting?
I recently implemented this by using an enum for every element.
The code is at work but this is a rough approximation of it off the top of my head:
Edit:
The position within the XML structure is tracked by a stack of elements. When startElement is called, the appropriate
Element
enum can be determined by using 1) the parent element from the tracking stack and 2) the element tag passed as the sName parameter as the key to a Map generated from the parent information defined as part of theElement
enum. ThePair
class is simply a holder for the 2-part key.This approach allows the same element-tag that appears repeatedly in different parts of the XML structure with different semantics to be represented by different
Element
enums. For example:Using this technique, we don't need to use flags to track the context so that we know which
<child>
element is being processed. The context is declared as part of theElement
enum definition and reduces confusion by eliminating assorted state variables.这取决于 XML 结构。如果不同情况下的操作很简单或(或多或少)“独立”,您可以尝试使用地图:
It depends on the XML structure. If the actions for different cases are easy or (more or less) "independent", you could try to use a map:
我会回退到 JAXB 或类似的东西,并让框架来完成工作。
I would fallback to JAXB or something equivalent and let the framework do the work.