当前位置：文江博客话题详情

接收日常 XML 文件 - 每天需要搜索 12 种类型

发布于 2024-10-25 19:44:02 字数 780 浏览 6 评论 0原文

Asp.NET - C#.NET

我需要有关以下设计问题的建议：

我每天都会收到 XML 文件。它改变了数量，例如昨天收到了 10 个 XML 文件，今天收到了 56 个 XML 文件，明天可能收到了 161 个 XML 文件等。

有 12 种类型（12 XSD）...顶部有一个名为 FormType 的属性，例如 FormType="1 "、FormType="2" 、 FormType="12" 等最多 12 种表单类型。

它们都有共同的字段，如姓名、地址、电话。但例如 FormType=1 用于建筑，FormType=2 用于 IT，FormType 3=医院，Formtype=4 用于广告等。

正如我所说，它们都有共同的属性。

要求： 需要一个搜索屏幕，以便用户可以搜索这些 XML 内容。但我不知道如何解决这个问题。例如，在某些属性中搜索从 Date_From 和 Date_To 接收的 xml 文本。

问题： 我听说过将 XML 放入二进制字段并执行 XPATH 查询或其他任何操作，但不知道要在 google 上搜索该词。

我正在考虑创建一个大的database.table 并读取所有XML 并将其放入数据库表中。但问题是一些 xml 属性非常大，比如 2-3 页。其他 XML 文件中的相同属性为空。因此，为每个 XML 属性创建 NVARCHAR(MAX) 并将它们放入 table.field...一段时间后，我的数据库将成为一个巨大的怪物...

有人可以建议什么是处理这个问题的最佳方法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

行至春深 2024-11-01 19:44:03

我不能百分百确定我理解你的问题。我猜测该查询应该返回满足某种用户指定条件的单个 XML 文档。

在这种情况下，我的出发点可能是实现一种查询单个 XML 文档的方法，即如果文档命中则返回 true，否则返回 false。我很可能会将查询参数设置为 XPath 查询，但谁知道呢？下面是一个简单的示例：

public bool TestXml(XDocument d, string query)
{
   return d.XPathSelectElements(query).Any();
}

接下来，我需要存储 XML 文档来进行查询。这家商店位于哪里，采取什么形式？在某种程度上，这些是我的应用程序不关心的实现细节。它们可以存在于数据库或文件系统中。它们可以缓存在内存中。我首先要保持简单，例如：

public IEnumerable<XDocument> XmlDocuments()
{
   DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
   foreach (FileInfo fi in di.GetFiles())
   {
      yield return XDocument.Load(fi.Filename);
   }
}

现在我可以获得满足这样的请求的所有文档：

public IEnumerable<XDocument> GetDocuments(query)
{
   return XmlDocuments.Where(x => TextXml(x, query));
}

当我看到这个问题时，我突然想到的是：我必须将我的文档解析为 XDocument 对象来查询它们。无论它们位于数据库还是文件系统中，这种情况都会发生。（如果我将它们放在数据库中并编写一个执行 XPath 查询的存储过程，就像有人建议的那样，每次执行查询时我仍然会解析所有 XML；我刚刚将所有工作移至数据库服务器.)

一遍又一遍地做同样的事情需要花费大量的 I/O 和 CPU 时间。如果查询量不是很小，我会考虑在第一次调用 GetDocuments() 时构建一个 List 并提出一个方案将该列表保留在内存中，直到收到新的 XML 文档（或者可能在收到新的 XML 文档时更新它）。

I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.

In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:

public bool TestXml(XDocument d, string query)
{
   return d.XPathSelectElements(query).Any();
}

Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:

public IEnumerable<XDocument> XmlDocuments()
{
   DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
   foreach (FileInfo fi in di.GetFiles())
   {
      yield return XDocument.Load(fi.Filename);
   }
}

Now I can get all of the documents that fulfill a request like this:

public IEnumerable<XDocument> GetDocuments(query)
{
   return XmlDocuments.Where(x => TextXml(x, query));
}

The thing that jumps out at me when I look at this problem: I have to parse my documents into XDocument objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)

That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a List<XDocument> the first time GetDocuments() is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).

回复收藏 0 原文

~没有更多了~