接收日常 XML 文件 - 每天需要搜索 12 种类型
Asp.NET - C#.NET
我需要有关以下设计问题的建议:
我每天都会收到 XML 文件。它改变了数量,例如昨天收到了 10 个 XML 文件,今天收到了 56 个 XML 文件,明天可能收到了 161 个 XML 文件等。
有 12 种类型(12 XSD)...顶部有一个名为 FormType 的属性,例如 FormType="1 "、FormType="2" 、 FormType="12" 等最多 12 种表单类型。
它们都有共同的字段,如姓名、地址、电话。 但例如 FormType=1 用于建筑,FormType=2 用于 IT,FormType 3=医院,Formtype=4 用于广告等。
正如我所说,它们都有共同的属性。
要求: 需要一个搜索屏幕,以便用户可以搜索这些 XML 内容。但我不知道如何解决这个问题。例如,在某些属性中搜索从 Date_From 和 Date_To 接收的 xml 文本。
问题: 我听说过将 XML 放入二进制字段并执行 XPATH 查询或其他任何操作,但不知道要在 google 上搜索该词。
我正在考虑创建一个大的database.table 并读取所有XML 并将其放入数据库表中。但问题是一些 xml 属性非常大,比如 2-3 页。其他 XML 文件中的相同属性为空。 因此,为每个 XML 属性创建 NVARCHAR(MAX) 并将它们放入 table.field...一段时间后,我的数据库将成为一个巨大的怪物...
有人可以建议什么是处理这个问题的最佳方法吗?
Asp.NET - C#.NET
I need a advice regarding a design problem below:
I'll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and maybe tomorrow 161 XML files etc.
There are 12 types (12 XSD)... and in the top there is a attribute called FormType e.g. FormType="1", FormType="2" , FormType="12" etc. up to 12 formtypes.
All of them have common fields like Name, adres, Phone.
But e.g. FormType=1 is for Construction, FormType=2 is for IT, FormType 3=Hospital, Formtype=4 is for Advertisement etc. etc.
As I said all of them have common attributes.
Requirements:
Need a search screen so the user can do search on these XML contents. But I don't have any clue how to approach this. e.g. Search the text in some attributes for the xml's received from Date_From and Date_To.
Problem:
I've heard about putting the XML's in a Binary field and do XPATH query or whatever but don't know the word's to search on google.
I was thinking to create a big database.table and read all XML's and put in the Database Table. But the issue is some xml attributes are very huge like 2-3 pages. and the same attributes in other XML file are empty..
So creating NVARCHAR(MAX) for every XML attribute and putting them in table.field.... After some period my DATABASE will be a big big monster...
Can someone advice what is the best approach to handle this issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不能百分百确定我理解你的问题。我猜测该查询应该返回满足某种用户指定条件的单个 XML 文档。
在这种情况下,我的出发点可能是实现一种查询单个 XML 文档的方法,即如果文档命中则返回 true,否则返回 false。我很可能会将查询参数设置为 XPath 查询,但谁知道呢?下面是一个简单的示例:
接下来,我需要存储 XML 文档来进行查询。这家商店位于哪里,采取什么形式?在某种程度上,这些是我的应用程序不关心的实现细节。它们可以存在于数据库或文件系统中。它们可以缓存在内存中。我首先要保持简单,例如:
现在我可以获得满足这样的请求的所有文档:
当我看到这个问题时,我突然想到的是:我必须将我的文档解析为
XDocument
对象来查询它们。无论它们位于数据库还是文件系统中,这种情况都会发生。 (如果我将它们放在数据库中并编写一个执行 XPath 查询的存储过程,就像有人建议的那样,每次执行查询时我仍然会解析所有 XML;我刚刚将所有工作移至数据库服务器.)一遍又一遍地做同样的事情需要花费大量的 I/O 和 CPU 时间。如果查询量不是很小,我会考虑在第一次调用
GetDocuments()
时构建一个List
并提出一个方案将该列表保留在内存中,直到收到新的 XML 文档(或者可能在收到新的 XML 文档时更新它)。I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.
In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:
Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:
Now I can get all of the documents that fulfill a request like this:
The thing that jumps out at me when I look at this problem: I have to parse my documents into
XDocument
objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a
List<XDocument>
the first timeGetDocuments()
is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).