使用适当的错误处理(行号、原始文本等)读取 XML
我想读取一个相当大的 xml 文件。它小到足以容纳内存,但仍然很大。读取 XML 时,会根据 XSD 对其进行验证。然而,这并不能防止使用读取的数据在系统中进行进一步操作时发生业务错误。当发生此类业务错误时(XSD 验证后),我希望能够描述 xml 中元素的开始和结束位置的行号和列号。此外,在这种情况下,在从文件读取时显示输入 xml 将是用户友好的。
使用 xsd.exe 我已经代码生成了所有数据类,并使用读取了 xml
using (var reader = new StringReader(content))
{
var errors = new List<string>();
var settings = new XmlReaderSettings();
settings.Schemas.Add("urn:import-schema", "Import.xsd");
settings.ValidationEventHandler += (o, args) => errors.Add(args.Message);
settings.ValidationType = ValidationType.Schema;
using (XmlReader xr = XmlReader.Create(reader, settings))
{
var xs = new XmlSerializer(typeof(ImportRoot));
var result = (ImportRoot) xs.Deserialize(xr);
if (errors.Any())
throw new Exception(string.Join("\n\n", errors));
return result;
}
}
}
但是,我似乎找不到我正在寻找的元信息。我也检查了 XDocument
类。这里的元素似乎有一个 Value
属性,它是一个字符串。但这还不是我想要显示的全部信息。
I want to read a fairly large xml file. Its small enough to fit in memory, but still very big. When reading the XML it is validated against an XSD. This, however, does not prevent business errors from happening when using the read data for further manipulation in the system. When such business errors occur (after XSD validation) I want to be able to describe the line number and column number for the start and end position of an element from my xml. Also, in this context, it would be user friendly to show the input xml as it was read from the file.
Using the xsd.exe I've code generated all the data classes and I read the xml using
using (var reader = new StringReader(content))
{
var errors = new List<string>();
var settings = new XmlReaderSettings();
settings.Schemas.Add("urn:import-schema", "Import.xsd");
settings.ValidationEventHandler += (o, args) => errors.Add(args.Message);
settings.ValidationType = ValidationType.Schema;
using (XmlReader xr = XmlReader.Create(reader, settings))
{
var xs = new XmlSerializer(typeof(ImportRoot));
var result = (ImportRoot) xs.Deserialize(xr);
if (errors.Any())
throw new Exception(string.Join("\n\n", errors));
return result;
}
}
}
However, I can't seem to find the meta-info that I'm looking for. I've checked the XDocument
class as well. Here elements seems to have a Value
property that is a string. But that is still not all the information I want to display.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
行号信息不是从
StringReader
读取的。如果您在FileStream
上使用StreamReader
,您将能够获取行号。您要查找的附加元数据称为““后架构验证”信息集”。
Line number information is not read from a
StringReader
. If you use aStreamReader
on aFileStream
, you'll be able to get the line number.This additional metadata you're looking for is called the "Post Schema Validation Infoset".
在 ValidationEventHandler 中查看 args.Exception 属性。它是 XmlSchemaException 类型,包含行数字等。
您可以保留所有错误,然后再解析它们。
可以通过将业务验证错误实现为自定义 xslt 函数来处理它们。请参阅这篇文章。一旦您拥有实现 IXsltContextFunction 的函数,您就可以在调用方法来提示您在文档中的位置。
一旦获得提示,您就可以将其与原始文档中的每一行进行比较。
几年前我做了类似的事情(除了行号之外)并且效果非常好。即使对于大型 xml 文档也是如此。
In your ValidationEventHandler look at the args.Exception property. It is a XmlSchemaException type, that contains line number etc.
You could keep all the errors and then parse them afterwards.
Business validation errors can be handled by implementing them as custom xslt functions. See this article. Once you have a function that implements IXsltContextFunction you can examine the XPathNavigator in the Invoke method for a hint about where in the document you are.
Once you have the hint you can compare it with each line in the original document.
I did something like that a couple of years ago (besides the line numbers) and it worked very nicely. Even for large xml documents.