设计解决方案的指南 - XML 文件与数据库
我正在考虑将大量数据存储在 XML 文件中。每个文件都包含有关不同元素(例如联系人)的信息。现在我试图根据一些信息检索联系人,例如:查找居住在 CA 的所有联系人。我如何搜索这些信息?我可以使用像 LINQ 这样的东西吗?我看到 XElement 但它是否适用于多个 XML 文件。
转换为数据集有帮助吗?所以我想我应该为我的应用程序提供一个构造函数,它将所有 xml 文件加载到数据集中并对数据集执行查询。如果这是一个好方法,有人可以给我指出示例/资源吗?
最重要的是,这是一个好的解决方案还是我应该使用数据库?我使用 XML 文件的原因是我需要扩展此解决方案以在将来在后端层(业务逻辑、数据库)中使用 xquery,并且我认为在 xml 文件中包含数据会很有帮助。
更新我已经在这里有了架构 - http://ideone.com/ZRPco
I am thinking of storing bunch of data in XML files. Each file will has information about a distinct element lets say contacts. Now I am trying to do retrieve a contact based on some information eg: Find all the contacts who live in CA. How do I search for this information? Can I use something like LINQ. I am seeing XElement but does it work for multiple XML files.
Does converting to datasets help? So I am thinking I should have a constructor for my application which loads all the xml files into a dataset and perform queries on the dataset. If this is a good approach can someone point me to examples/resources?
And most importantly is this a good solution or should I use databases? The reason I am using XML files is I need to extend this solution to use xquery in the backend tiers (business logic, database) in future and I thought having data in xml files would be helpful.
Update I already have the schema here - http://ideone.com/ZRPco
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果将数据放入数据库中,那么很容易将其输出为 XML。不要仅仅因为最终需要以 XML 开始。如果您需要对数据进行查询,那么数据库几乎肯定是最好的选择。
If you put the data in a database then it's easy to output it as XML. Don't start off in XML just because you're going to need to end up there. If you're needing to do queries on the data then a database is almost certainly the best option.
您可以在您的事业中使用 XML。只是为了理解你的例子。
您的公司可能有 1000 名员工。
每个员工可以有零个或多个联系人(如主要联系人、次要联系人等)。
因此每个员工都可以有一个contacts.xml(基于Xml 数据库(如eXist、MarkLogic、Berkely 等)进行识别)。
例如)-contacts.xml
一旦数据位于 Xml 数据库内。然后数据库可以根据您想要的方面获取所有排序详细信息。
例如按邮政编码、按城市、按姓名等获取联系人。
您所需要做的就是编写特定的 XQuery 来挖掘您请求的数据。 (如果是 MarkLogic Xml 数据库服务器)。这个世界使用的术语是分面浏览。
Xml 数据库旨在处理此类信息。将联系人视为海量数据而不是行/列。
You can use XML in your cause. just to understand your example.
you may have 1000 Employees in your company.
Each Employeer can have zero or more contacts( like primary, secondray, etc ).
so every employeer can have a contacts.xml ( identified based on Xml Databases like eXist, MarkLogic, Berkely etc ).
e.g) -contacts.xml
Once the Data is inside an Xml Database. Then Database can fetch you all sort details based on what ever facet you want.
like fetch contacts by ZipCode, by City, by Name etc.
All you need to is write specific XQuery to mine the Data for your request. ( in case of MarkLogic Xml Database Server ). The Terminology used in this world is Faceted browsing.
Xml Databases are designed to handle such information. View Contacts as a Mass Data rather than Rows/Columns.
以下是不使用 XML 的两个原因...
如果数据集很大,我不会使用 xml。您要么使用 dom 解析器(处理大数据时速度较慢),要么使用 sax 解析器(速度更快,但在读取整个文件之前会失去验证能力)。
如果数据要改变。 您必须重写整个 xml 文件才能更改其中的一部分。
这就是我使用 XML 的原因..
如果数据集很小,自然是分层的,并且需要在文本编辑器中可查看/可编辑。
如果需要输出为xml,那么从数据库输出xml是没有问题的。
Here are two reasons not to use XML ...
if the dataset is large, i would not use xml. you either have a use a dom parser (slow on big data) or a sax parser (faster, but you lose validation ability until the whole file is read).
if the data is going to change. You have to rewrite the whole xml file in order to change a portion of it.
Here is the reason I would use XML ..
If the dataset is small, is naturally hierarchical, and needs to be viewable/editable in a text editor.
If you need to output as xml, it is not a problem to output xml from a database.
这里有很多评论,没有人对 MarkLogic Server XML 数据库有太多了解,以及当应用多种类型的索引(元素、值、属性、xml 结构、xml 节点顺序、单词、短语索引)时,XML 作为一种存储格式有多么强大)
MarkLogic 可以存储/索引数十亿个 XML 文档,并允许在所有文档中进行亚秒级搜索、复杂的 SUM COUNT MIN MAX 操作等。
我使用关系 XML 文件和 C#.NET LINQ-to-XML 来实现原海报想要实现。 (此时没有 MarkLogic,只有纯 XML 文件和 C# LINQ 代码,它们将它们连接在一起以实现我正在寻找的任何类型的搜索)您可能有一个用于联系人的 XML 文件:
您可能还想将其连接到另一个 XML公司文件:
这里是一些示例 C#.NET LINQ-to-XML 语法,用于实现在这两个文件之间执行 LEFT OUTER JOIN:
我已将其与 90MB 的 XML 文件一起使用,并与 4-5MB 的较小 XML 文件连接,并且可以在 2-3 秒范围内执行具有多个 WHERE 条件的复杂搜索。
Lots of comments here, nobody has much understanding of MarkLogic Server XML Databases, and how powerful XML can be as a storage format when multiple types of indexes are applied (element, value, attribute, xml structure, xml node order, word, phrase indexes)
MarkLogic can store/index billions of XML documents and allow sub-second searching across all of them, complex SUM COUNT MIN MAX operations, etc.
I've used relational XML files with C#.NET LINQ-to-XML to achieve what the original poster wants to achieve. (No MarkLogic at this point, just plain XML files and C# LINQ code that joins them together to achieve whatever type of search I'm looking for) You may have an XML file for contacts:
You may also want to join this to another XML file for companies:
Here is some sample C#.NET LINQ-to-XML syntax to achieve doing a LEFT OUTER JOIN between these two files:
I've used this with XML files of 90MB joining with smaller XML files of 4-5MB, and can perform complex searches with multiple WHERE conditions in the 2-3 sec range.
听起来数据库绝对是正确的解决方案。我在这里看到的两个要求是,您需要对数据集运行某些类型的查询,并且在某个时刻需要它采用 XML 格式。 SQL 数据库能够比 XML 文件更好地处理复杂查询,同时您始终可以在需要时将数据转换为 XML。
It definitely sounds like databases would be the correct solution. The two requirements I see here are you will need to run certain types of queries against the dataset and you need it to be in XML at a certain point. A SQL database will be able to handle complex queries a lot better than XML files while at the same time you can always convert the data to XML when you need it.
根据我的经验,使用 XML 作为主数据源并不是一个好主意,在某些时候它会很痛苦。请尝试使用 SQLite,它是一个功能强大且可移植的关系数据库。
As per my experience, using XML as a master data source is not a good idea, it will be a pain at some point. Try SQLite instead, it is a powerful and portable relational database.