使用 ReadToDescendant 和/或 ReadElementContentAsObject 更正 XmlReader 问题

发布于 2024-08-21 19:31:47 字数 5652 浏览 4 评论 0 原文

我正在研究通常非常好的开源项目 Excel Data Reader 中的一个神秘错误。它跳过从我的特定 OpenXML .xlsx 电子表格读取的值。

问题发生在 ReadSheetRow 方法(演示代码如下)。源 XML 由 Excel 保存,并且不包含空格,这就是奇怪行为发生的时候。然而,使用空格重新格式化的 XML(例如,在 Visual Studio 中,转到“编辑”、“高级”、“格式化文档”)可以完全正常工作!

带空格的测试数据:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheetData>
        <row r="5" spans="1:73" s="7" customFormat="1">
            <c r="B5" s="12">
                <v>39844</v>
            </c>
            <c r="C5" s="8"/>
            <c r="D5" s="8"/>
            <c r="E5" s="8"/>
            <c r="F5" s="8"/>
            <c r="G5" s="8"/>
            <c r="H5" s="12">
                <v>39872</v>
            </c>
            <c r="I5" s="8"/>
            <c r="J5" s="8"/>
            <c r="K5" s="8"/>
            <c r="L5" s="8"/>
            <c r="M5" s="8"/>
            <c r="N5" s="12">
                <v>39903</v>
            </c>
        </row>
    </sheetData>
</worksheet>

不带空格的测试数据:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>

演示问题的示例代码:

请注意,输出为 A _xmlReader.Read()B 之后ReadToDescendantReadElementContentAsObject

while (reader.Read())
{
    if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));

    if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
    {
        string a_s = reader.GetAttribute("s");
        string a_t = reader.GetAttribute("t");
        string a_r = reader.GetAttribute("r");

        bool matchingDescendantFound = reader.ReadToDescendant("v");
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
        object o = reader.ReadElementContentAsObject();
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
    }
}

包含空格的 XML 测试结果:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*A* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

不包含空格的 XML 测试结果:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*C* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

模式更改表明 ReadElementContentAsObject 或可能是 ReadToDescendant 将 XmlReader 移动到。

有谁知道这里会发生什么?

I'm working on a mysterious bug in the usually very good open source project Excel Data Reader. It's skipping values reading from my particular OpenXML .xlsx spreadsheet.

The problem is occurring in the ReadSheetRow method (demonstration code below). The source XML is saved by Excel and contains no whitespace which is when the strange behaviour occurs. However XML that has been reformatted with whitespace (e.g. in Visual Studio go to Edit, Advanced, Format Document) works completely fine!

Test data with whitespace:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheetData>
        <row r="5" spans="1:73" s="7" customFormat="1">
            <c r="B5" s="12">
                <v>39844</v>
            </c>
            <c r="C5" s="8"/>
            <c r="D5" s="8"/>
            <c r="E5" s="8"/>
            <c r="F5" s="8"/>
            <c r="G5" s="8"/>
            <c r="H5" s="12">
                <v>39872</v>
            </c>
            <c r="I5" s="8"/>
            <c r="J5" s="8"/>
            <c r="K5" s="8"/>
            <c r="L5" s="8"/>
            <c r="M5" s="8"/>
            <c r="N5" s="12">
                <v>39903</v>
            </c>
        </row>
    </sheetData>
</worksheet>

Test data without whitespace:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>

Example code that demonstrates the problem:

Note that A is output after _xmlReader.Read(), B after ReadToDescendant, and C after ReadElementContentAsObject.

while (reader.Read())
{
    if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));

    if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
    {
        string a_s = reader.GetAttribute("s");
        string a_t = reader.GetAttribute("t");
        string a_r = reader.GetAttribute("r");

        bool matchingDescendantFound = reader.ReadToDescendant("v");
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
        object o = reader.ReadElementContentAsObject();
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
    }
}

Test results for XML with whitespace:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*A* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

Test results for XML without whitespace:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*C* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

The pattern changes indicate an issue in ReadElementContentAsObject or possibly the location that ReadToDescendant moves the XmlReader to.

Does anyone know what might be happening here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

又怨 2024-08-28 19:31:47

这相当简单。正如您从输出中看到的,第一次位于“B”行时,您位于第一个“v”元素处。然后,您调用 ReadElementContentAsObject。返回 v 的文本内容,“将阅读器移过结束元素标记”。 (第五)。现在,如果有空格,则指向一个空格节点;如果没有,则指向(c 的)EndElement 节点。当然,如果你的输出是空白,则不会打印。无论哪种方式,您都可以执行 Read() 并继续处理下一个元素。在非空白的情况下,您丢失了 EndElement。

在其他情况下这个问题要严重得多。当您执行 ac 的 ReadElementContentAsObject(称为 c1)时,您将继续执行下一个 c (c2)。然后你进行一次读取,移动到 c3,并永远失去 c2。

我不会尝试修复真实代码。但很明显,您需要担心的是什么,将流程在多个地方向前推进。一般来说,这是循环错误的常见来源。

It's fairly simple. As you can see from the output, the first time you're on the "B" line, you're positioned at the first 'v' Element. Then, you call ReadElementContentAsObject. That returns the text content of v, and "moves the reader past the end element tag." (of v). You are now pointing to a whitespace node if there is whitespace, or an EndElement node (of c) if there is not. Of course, your output doesn't print if it's whitespace. Either way, you then do a Read() and move on to the next element. In the case of the non-whitespace, you have lost the EndElement.

The problem is much worse in other situtations. When you do a ReadElementContentAsObject of a c (call it c1), you then move on the next c (c2). Then you do a Read, moving to c3, and lose c2 for good.

I'm not going to try to fix the real code. But it's clear what you need to worry about, moving the stream forward in more than one place. This is a common source of looping errors in general.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文