单线程应用程序显示类似竞争条件的行为

发布于 2024-09-20 00:39:51 字数 2671 浏览 9 评论 0原文

我有一个大(约 40mb)的 XML 数据集合,分成许多格式不正确的文件,因此我合并它们,添加根节点并将所有 xml 加载到 XmlDocument 中。它基本上是 3 种不同类型的列表,可以通过几种不同的方式嵌套。此示例应该显示大多数情况:

<Root>
  <A>
    <A>
      <A></A>
      <A></A>
    </A>
  </A>
  <A />
  <B>
    <A>
      <A>
        <A></A>
        <A></A>
      </A>
    </A>
  </B>
  <C />
</Root>

我通过在 XmlDocument 上使用 XPath 表达式来分离所有 A、B 和 C 节点(//A//B//C),将生成的节点集转换为数据表,并在 Datagridview 中分别显示每个节点类型的所有节点的列表。这很好用。

但现在我面临着一个更大的文件,一旦我加载它,它只显示 4 行。然后,我在实际 XmlDocument.SelectNodes 发生的行添加了一个断点,并检查了生成的 NodeSet。它向我显示了大约 25,000 个条目。继续加载程序后,哎哟,我所有的 25k 行都显示出来了。我又试了一次,我可以重现它。如果我手动跨过 XmlDocument.SelectNodes,它就会起作用。如果我不打破那里,它就不会。我没有在我的应用程序中生成单个线程。

我如何进一步调试它?寻找什么?我在 jsch (ssh) 等多线程库中经历过这种行为,但我不明白为什么在我的情况下会发生这种情况。

非常感谢!

// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
                              List<XmlColumn> columns,
                              ITableCreator tableCreator)
{
    // I have to break here -->
    XmlNodeList xmlNodeList = logFile.GetEntries(logType);
    // <-- I have to break here

    DataTable dataTable = tableCreator.CreateTableLayout(columns);
    foreach (XmlNode xmlNode in xmlNodeList)
    {
        DataRow row = dataTable.NewRow();
        tableCreator.PopulateRow(xmlNode, row, columns);
        dataTable.Rows.Add(row);
    }
    return dataTable;
}

// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
    return (_xmlDocument != null && _xmlDocument.HasChildNodes)
                         ? _xmlDocument.SelectNodes(e.XPath)
                         : new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the  // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
    _xmlDocument = new XmlDocument();
    _xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
                             CLOSE_ROOT_ELEMENT);
    if (DataChanged != null)
        DataChanged(this, new EventArgs());
}

// class NamedXPath:
public abstract class NamedXPath
{
    private readonly String _name;
    private readonly String _xPath;
    protected NamedXPath(string name, string xPath)
    {
        _name = name;
        _xPath = xPath;
    }

    public string Name
    {
        get { return _name; }
    }

    public string XPath
    {
        get { return _xPath; }
    }
}

I have a big (~40mb) collection of XML data, split in many files which are not well formed, so i merge them, add a root node and load all the xml in a XmlDocument. Its basically a list of 3 different types which can be nested in a few different ways. This example should show most of the cases:

<Root>
  <A>
    <A>
      <A></A>
      <A></A>
    </A>
  </A>
  <A />
  <B>
    <A>
      <A>
        <A></A>
        <A></A>
      </A>
    </A>
  </B>
  <C />
</Root>

Im separating all A, B and C nodes by using XPath expressions on a XmlDocument (//A, //B, //C), convert the resulting nodesets to a datatable and show a list of all nodes of each nodetype separately in a Datagridview. This works fine.

But now Im facing an even bigger file and as soon as i load it, it shows me only 4 rows. Then i added a breakpoint at the line where the actual XmlDocument.SelectNodes happens and checked the resulting NodeSet. It shows me about 25,000 entries. After continuing the program loaded and whoops, all my 25k rows were shown. I tried it again and i can reproduce it. If i step over XmlDocument.SelectNodes by hand, it works. If i dont break there, it does not. Im not spawning a single thread in my application.

How can i debug this any further? What to look for? I have experienced such behaviour with multithreaded libraries such as jsch (ssh) but im dont see why this should happen in my case.

Thank you very much!

// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
                              List<XmlColumn> columns,
                              ITableCreator tableCreator)
{
    // I have to break here -->
    XmlNodeList xmlNodeList = logFile.GetEntries(logType);
    // <-- I have to break here

    DataTable dataTable = tableCreator.CreateTableLayout(columns);
    foreach (XmlNode xmlNode in xmlNodeList)
    {
        DataRow row = dataTable.NewRow();
        tableCreator.PopulateRow(xmlNode, row, columns);
        dataTable.Rows.Add(row);
    }
    return dataTable;
}

// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
    return (_xmlDocument != null && _xmlDocument.HasChildNodes)
                         ? _xmlDocument.SelectNodes(e.XPath)
                         : new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the  // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
    _xmlDocument = new XmlDocument();
    _xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
                             CLOSE_ROOT_ELEMENT);
    if (DataChanged != null)
        DataChanged(this, new EventArgs());
}

// class NamedXPath:
public abstract class NamedXPath
{
    private readonly String _name;
    private readonly String _xPath;
    protected NamedXPath(string name, string xPath)
    {
        _name = name;
        _xPath = xPath;
    }

    public string Name
    {
        get { return _name; }
    }

    public string XPath
    {
        get { return _xPath; }
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

水水月牙 2024-09-27 00:39:51

我不会首先在代码中直接使用 XPath,而是使用 sketchPath 等工具来正确设置 XPath。您可以加载原始 XML 或使用原始 XML 的子集。

在代码中使用 xpath 之前,使用 XPath 和 XML 来查看是否已选择预期的节点。

Instead of using XPath directly in the code first, I would use a tool such as sketchPath to get my XPath right. You can either load your original XML or use subset of original XML.

Play with XPath and your XML to see if the expected nodes are getting selected before using xpath in your code.

浅笑依然 2024-09-27 00:39:51

好吧,解决了。 tableCreator 是我的策略模式的一部分,它影响表的构建方式。在某个实现中,我做了这样的事情:

XmlNode xn = xmlDocument.SelectSingleNode(fancyXPath);
// if a node has ancestors, then its a linked list:
// <a><a><a></a></a></a>
if(xn.SelectSingleNode("a") != null)
    xn.SelectSingleNode("a").InnerText = "<IDs of linked list items CSV like here>";

这意味着我用一些文本替换 xml 链接列表的一部分,并丢失那里的嵌套项目。
如果此更改不会影响原始 XmlDocument,那么找到此错误就不成问题。即便如此,调试它也不应该太难。是什么让我的程序根据我是否中断而表现不同,似乎如下:

返回值:
第一个 XmlNode
匹配 XPath 查询,如果不匹配则为 null
找到匹配的节点。 Xml节点
不应期望连接
“实时”到 XML 文档。那是,
XML 中出现的更改
文档可能不会出现在
XmlNode,反之亦然。 (API
XmlNode.SelectNodes())说明

如果我在那里中断,则更改将写回原始 XmlDocument,如果我不中断,则不会写回。我无法真正向自己解释这一点,但如果没有 XmlNode 的更改,一切都会正常。

编辑:
现在我非常确定:我的手表中有 XmlNodeList.Count。这意味着,每次我调试时,VS 都会调用属性 Count,它不仅返回一个数字,还会调用 ReadUntil(int),刷新内部列表:

internal int ReadUntil(int index)
{
    int count = this.list.Count;
    while (!this.done && (count <= index))
    {
        if (this.nodeIterator.MoveNext())
        {
            XmlNode item = this.GetNode(this.nodeIterator.Current);
            if (item != null)
            {
                this.list.Add(item);
                count++;
            }
        }
        else
        {
            this.done = true;
            return count;
        }
    }
    return count;
}

这可能导致了这种奇怪的行为。

Okay, solved it. tableCreator is part of my strategy pattern, which influences the way the table is built. In a certain implementation I do something like this:

XmlNode xn = xmlDocument.SelectSingleNode(fancyXPath);
// if a node has ancestors, then its a linked list:
// <a><a><a></a></a></a>
if(xn.SelectSingleNode("a") != null)
    xn.SelectSingleNode("a").InnerText = "<IDs of linked list items CSV like here>";

Which means im replacing parts of a xml linked list with some text and lose the nested items there.
Wouldn't be a problem to find this bug if this change wouldn't affect the original XmlDocument. Even then, debugging it should not be too hard. What makes my program behaving differently depending whether I break or not seems to be the following:

Return Value:
The first XmlNode that
matches the XPath query or null if no
matching node is found. The XmlNode
should not be expected to be connected
"live" to the XML document. That is,
changes that appear in the XML
document may not appear in the
XmlNode, and vice versa. (API
Description of XmlNode.SelectNodes())

If I break there, the changes are written back to the original XmlDocument, if I don't break, its not written back. Can't really explain that to myself, but without the change in the XmlNode everything works.

edit:
Now im quite sure: I had XmlNodeList.Count in my watches. This means, everytime i debugged, VS called the property Count, which not only returns a number but calls ReadUntil(int), which refreshes the internal list:

internal int ReadUntil(int index)
{
    int count = this.list.Count;
    while (!this.done && (count <= index))
    {
        if (this.nodeIterator.MoveNext())
        {
            XmlNode item = this.GetNode(this.nodeIterator.Current);
            if (item != null)
            {
                this.list.Add(item);
                count++;
            }
        }
        else
        {
            this.done = true;
            return count;
        }
    }
    return count;
}

This may have caused that weird behavior.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文