python sax解析器跳过异常
有没有办法使用 SAX XML 解析器“跳过”一行?
我有一个非确认 XML 文档,它是有效 XML 文档的串联,因此每个文档都会出现 。另请注意,我需要使用 SAX 解析器,因为输入文档很大。
我尝试制作一个“自定义流”类作为解析器的供给器,但很快意识到 SAX 使用 read
方法,因此读取“字节数组”中的内容,从而爆炸了该项目的复杂性。
谢谢!
更新:我知道可以使用csplit
解决这个问题,但如果在合理的范围内可能的话,我正在寻求基于Python的解决方案。
更新2:也许我应该说“跳到下一个文档”,这样会更有意义。无论如何,这就是我所需要的:一种从单个输入流解析多个文档的方法。
Is there a way to "skip" a line using a SAX XML parser?
I've got a non-confirming XML document which is a concatenation of valid XML documents and thus the <?xml ...?>
appears for each document. Also note I need to use a SAX parser since the input documents are huge.
I tried crafting a "custom stream" class as feeder for the parser but quickly realized that SAX uses the read
method and thus reads stuff in "byte arrays" thereby exploding the complexity of this project.
thanks!
UPDATE: I know there is way around this using csplit
but I am after a Python based solution if at all possible within reasonable limits.
Update2: Maybe I should have said "skipping to next document", that would have made more sense. Anyhow, that's what I need: a way of parsing multiple documents from a single input stream.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您将文档连接在一起时,只需替换开头的 ,这将注释掉 xml 声明。
When you are concatenating the documents together, just replace the beginning <? and ?> with <!-- and -->, this will comment out the xml declarations.