如何检测“文本内容中发现无效字符”

发布于 2024-10-17 15:05:59 字数 1388 浏览 5 评论 0原文

我正在使用 SAX 在 Java 中进行 XML 验证，并且我想识别以下类型的错误： “在文本内容中发现无效字符”。

目前，我使用 SAX 进行了验证，对于某些文档，我损坏了未检测为错误的字符。例如，当我尝试使用 IE 浏览器打开结果 XML 文件时，我收到一条错误消息“在文本内容中发现无效字符”。

这是 XML 数据的示例：

<?xml version='1.0' encoding='UTF-8' standalone='yes'>
<!DOCTYPE blabla SYSTEM 'blabla.dtd'>
<blabla type='type' num='num'>
<...>... corrupted character </...>
</blabla>

这是解析器实例化的示例：

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

parser = factory.newSAXParser();
parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
parser.setProperty(JAXP_SCHEMA_SOURCE, new File(theConfig.getRoot()
        .concat(File.separator).concat(theConfig.getXsdFileName())
        .concat("-v").concat(theConfig.getXsdFileVersion()).concat(
                        XSD_EXTENSION)));
reader = parser.getXMLReader();
reader.setErrorHandler(getHandler());
reader.setEntityResolver(new MyEntityResolver(theConfig.getRoot(),
                theConfig));
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(theDataToParse));
reader.parse(is);

错误处理程序实现了“warning”、“error”和“fatalError”方法，但未检测到任何内容。实体解析器能够引导存储在配置目录中的客户实体文件。

有人知道为什么没有检测到这种格式错误的字符错误吗？是因为我的流来自字符串而不是文件吗？

提前感谢您的帮助。

问候。

原文

I'm doing an XML validation in Java, using SAX, and i'd like to recognize the following kind of error :
"An invalid character was found in text content".

At the moment, i have a validation with SAX, and for some documents i have corrupted characters not detected as errors. When i try to open the result XML file with IE Browser for example, i get an error message "an invalid character was found in text content".

This is an example of XML data:

<?xml version='1.0' encoding='UTF-8' standalone='yes'>
<!DOCTYPE blabla SYSTEM 'blabla.dtd'>
<blabla type='type' num='num'>
<...>... corrupted character </...>
</blabla>

And this is an example of the instanciation of the parser:

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

parser = factory.newSAXParser();
parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
parser.setProperty(JAXP_SCHEMA_SOURCE, new File(theConfig.getRoot()
        .concat(File.separator).concat(theConfig.getXsdFileName())
        .concat("-v").concat(theConfig.getXsdFileVersion()).concat(
                        XSD_EXTENSION)));
reader = parser.getXMLReader();
reader.setErrorHandler(getHandler());
reader.setEntityResolver(new MyEntityResolver(theConfig.getRoot(),
                theConfig));
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(theDataToParse));
reader.parse(is);

The error handler implements methods 'warning', 'error' and 'fatalError', but nothing is detected.
The entity resolver enable to lead a custome entity file, stored in a configuration directory.

Does someone have an idea why such malformed character error is not detected ? Is it because my stream comes from a String and not a file ?

Thanks in advance for your help.

Regards.

分享到QQ

分享到微博