如何搜索大型 XML 数据集?
我有一个带有 XML 的 DataModule,我需要进行搜索...
不幸的是,有超过 300,000 条记录,我无法进行循环来逐一检查。
是否可以在不使用数据库的情况下进行查询?
还有其他解决方案吗?
I have a DataModule with XML and I need do a search...
Unfortunately there are more than 300,000 records and I can't make a loop to check one-by-one.
Is it possible to make a query without using a database?
Is there another solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
XML 适合少量信息,但对于如此大的数据集,关系数据库实际上是唯一明智的选择,特别是如果您需要能够查询它。
XML's fine for small amounts of information, but for a dataset that big, a relational database is really the only sane choice, especially if you need to be able to query it.
您可以使用 XPath 之类的东西进行搜索:但是,这仅意味着 XPath 实现代表您进行搜索(这不一定会提高性能)。
You can search using something like XPath: however, that would just mean that the XPath implementation does the searching on your behalf (which doesn't necessarily improve performance).
我认为问一下为什么使用 XML 来存储 30 万条记录?可能很重要。因为 XML 并不是操作数据的最有效的格式。
如果您坚持使用 XML,那么您最好将 XML 文件读入某种数据库(您可能会使用内存表,但您可能会再次耗尽内存)。我认为如果您使用 TXMLDocument 对象来加载 XML 文件,您要么会遇到严重的性能问题,要么会耗尽内存(不久前我在处理 250k 记录 xml 文件时遇到了麻烦)。
您也许可以直接使用 MSXML DOM(您可能可以导入类型库)或使用 SAX,这将允许您按顺序解析它,这两种方式我都没有太多经验。
I think it is probably important to ask Why are you using XML to store 300k records?. As XML is not the most efficient format to manipulate data with.
If you're stuck with XML then you might be best to read the XML file into some sort of database (you might get away with an in memory table, but then again you might run out of memory). I think if you use a TXMLDocument object to load the XML file into you'll either have a serious performance issue or run out of memory (I had trouble when I was playing with a 250k record xml file awhile back).
You might be able to use the MSXML DOM directly (you can probably import the type library) or use SAX which will allow you to parse it sequentially, neither of which I have had much experience with.
有许多可能有用的内存数据库。至少您可以根据需要索引和查询数据。我知道的一个是来自 Components4developers.com。
大卫
There are a number of in-memory databases that may be useful. At least you could then index and query the data as required. One I know of is from components4developers.com.
David
您没有说明如何实现数据源。我使用了通过 TXMLTransformProvider 连接的 TClientDataSet(好的,不是 300K 记录),而是几千条记录。简单地设置过滤器和过滤属性似乎可以“查询”它就好了......
或者我错过了什么?
You don't say how you are implementing the datasource. I have used TClientDataSet connected via TXMLTransformProvider (OK not for 300K records) but for a few thousand. and simply setting the filter and filtered properties seems to "Query" it just fine...
Or have I missed something?
对于 Delphi 的 SAX 解析器,请检查此 Stackoverflow 问题:
是否有适用于 Delphi 和 Free Pascal 的 SAX 解析器?
For a SAX parser for Delphi, check this Stackoverflow question:
Is there a SAX Parser for Delphi and Free Pascal?