单线程应用程序显示类似竞争条件的行为
我有一个大(约 40mb)的 XML 数据集合,分成许多格式不正确的文件,因此我合并它们,添加根节点并将所有 xml 加载到 XmlDocument
中。它基本上是 3 种不同类型的列表,可以通过几种不同的方式嵌套。此示例应该显示大多数情况:
<Root>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
<A />
<B>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
</B>
<C />
</Root>
我通过在 XmlDocument
上使用 XPath 表达式来分离所有 A、B 和 C 节点(//A
、//B
、//C
),将生成的节点集转换为数据表,并在 Datagridview 中分别显示每个节点类型的所有节点的列表。这很好用。
但现在我面临着一个更大的文件,一旦我加载它,它只显示 4 行。然后,我在实际 XmlDocument.SelectNodes
发生的行添加了一个断点,并检查了生成的 NodeSet
。它向我显示了大约 25,000 个条目。继续加载程序后,哎哟,我所有的 25k 行都显示出来了。我又试了一次,我可以重现它。如果我手动跨过 XmlDocument.SelectNodes,它就会起作用。如果我不打破那里,它就不会。我没有在我的应用程序中生成单个线程。
我如何进一步调试它?寻找什么?我在 jsch (ssh) 等多线程库中经历过这种行为,但我不明白为什么在我的情况下会发生这种情况。
非常感谢!
// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
List<XmlColumn> columns,
ITableCreator tableCreator)
{
// I have to break here -->
XmlNodeList xmlNodeList = logFile.GetEntries(logType);
// <-- I have to break here
DataTable dataTable = tableCreator.CreateTableLayout(columns);
foreach (XmlNode xmlNode in xmlNodeList)
{
DataRow row = dataTable.NewRow();
tableCreator.PopulateRow(xmlNode, row, columns);
dataTable.Rows.Add(row);
}
return dataTable;
}
// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
return (_xmlDocument != null && _xmlDocument.HasChildNodes)
? _xmlDocument.SelectNodes(e.XPath)
: new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
_xmlDocument = new XmlDocument();
_xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
CLOSE_ROOT_ELEMENT);
if (DataChanged != null)
DataChanged(this, new EventArgs());
}
// class NamedXPath:
public abstract class NamedXPath
{
private readonly String _name;
private readonly String _xPath;
protected NamedXPath(string name, string xPath)
{
_name = name;
_xPath = xPath;
}
public string Name
{
get { return _name; }
}
public string XPath
{
get { return _xPath; }
}
}
I have a big (~40mb) collection of XML data, split in many files which are not well formed, so i merge them, add a root node and load all the xml in a XmlDocument
. Its basically a list of 3 different types which can be nested in a few different ways. This example should show most of the cases:
<Root>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
<A />
<B>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
</B>
<C />
</Root>
Im separating all A, B and C nodes by using XPath expressions on a XmlDocument
(//A
, //B
, //C
), convert the resulting nodesets to a datatable and show a list of all nodes of each nodetype separately in a Datagridview. This works fine.
But now Im facing an even bigger file and as soon as i load it, it shows me only 4 rows. Then i added a breakpoint at the line where the actual XmlDocument.SelectNodes
happens and checked the resulting NodeSet
. It shows me about 25,000 entries. After continuing the program loaded and whoops, all my 25k rows were shown. I tried it again and i can reproduce it. If i step over XmlDocument.SelectNodes by hand, it works. If i dont break there, it does not. Im not spawning a single thread in my application.
How can i debug this any further? What to look for? I have experienced such behaviour with multithreaded libraries such as jsch (ssh) but im dont see why this should happen in my case.
Thank you very much!
// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
List<XmlColumn> columns,
ITableCreator tableCreator)
{
// I have to break here -->
XmlNodeList xmlNodeList = logFile.GetEntries(logType);
// <-- I have to break here
DataTable dataTable = tableCreator.CreateTableLayout(columns);
foreach (XmlNode xmlNode in xmlNodeList)
{
DataRow row = dataTable.NewRow();
tableCreator.PopulateRow(xmlNode, row, columns);
dataTable.Rows.Add(row);
}
return dataTable;
}
// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
return (_xmlDocument != null && _xmlDocument.HasChildNodes)
? _xmlDocument.SelectNodes(e.XPath)
: new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
_xmlDocument = new XmlDocument();
_xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
CLOSE_ROOT_ELEMENT);
if (DataChanged != null)
DataChanged(this, new EventArgs());
}
// class NamedXPath:
public abstract class NamedXPath
{
private readonly String _name;
private readonly String _xPath;
protected NamedXPath(string name, string xPath)
{
_name = name;
_xPath = xPath;
}
public string Name
{
get { return _name; }
}
public string XPath
{
get { return _xPath; }
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不会首先在代码中直接使用 XPath,而是使用 sketchPath 等工具来正确设置 XPath。您可以加载原始 XML 或使用原始 XML 的子集。
在代码中使用 xpath 之前,使用 XPath 和 XML 来查看是否已选择预期的节点。
Instead of using XPath directly in the code first, I would use a tool such as sketchPath to get my XPath right. You can either load your original XML or use subset of original XML.
Play with XPath and your XML to see if the expected nodes are getting selected before using xpath in your code.
好吧,解决了。
tableCreator
是我的策略模式的一部分,它影响表的构建方式。在某个实现中,我做了这样的事情:这意味着我用一些文本替换 xml 链接列表的一部分,并丢失那里的嵌套项目。
如果此更改不会影响原始
XmlDocument
,那么找到此错误就不成问题。即便如此,调试它也不应该太难。是什么让我的程序根据我是否中断而表现不同,似乎如下:如果我在那里中断,则更改将写回原始 XmlDocument,如果我不中断,则不会写回。我无法真正向自己解释这一点,但如果没有 XmlNode 的更改,一切都会正常。
编辑:
现在我非常确定:我的手表中有 XmlNodeList.Count。这意味着,每次我调试时,VS 都会调用属性
Count
,它不仅返回一个数字,还会调用 ReadUntil(int),刷新内部列表:这可能导致了这种奇怪的行为。
Okay, solved it.
tableCreator
is part of my strategy pattern, which influences the way the table is built. In a certain implementation I do something like this:Which means im replacing parts of a xml linked list with some text and lose the nested items there.
Wouldn't be a problem to find this bug if this change wouldn't affect the original
XmlDocument
. Even then, debugging it should not be too hard. What makes my program behaving differently depending whether I break or not seems to be the following:If I break there, the changes are written back to the original XmlDocument, if I don't break, its not written back. Can't really explain that to myself, but without the change in the XmlNode everything works.
edit:
Now im quite sure: I had XmlNodeList.Count in my watches. This means, everytime i debugged, VS called the property
Count
, which not only returns a number but calls ReadUntil(int), which refreshes the internal list:This may have caused that weird behavior.