在数据库中存储个人联系人的格式
我正在考虑在业务应用程序的数据库中存储个人联系人的最佳方法。传统且直接的方法是创建一个表格,其中包含每个元素的列,即姓名、电话号码、职位、地址等...但是,此类数据有已知的行业标准,例如 vCard,或 hCard,或 vCard-RDF/XML 甚至 Windows 联系人 XML 架构。使用标准格式会带来一些好处,例如与其他系统的互操作性。但我如何决定使用哪种方法呢?
需求主要是存储数据。搜索和排序查询的可能性极小,但也是可能的。数据量最大为100,000条记录。
我的数据库引擎支持本机 XML 列。我一直在考虑使用一些基于 XML 的格式来存储个人联系人。然后,如果需要搜索和排序,就可以在该数据上使用 XML 索引。这是一个好方法吗?您会为此推荐哪种联系人格式和架构?
在第一个答案后编辑
这就是为什么我认为直接的方法不好。这是由此类数据的性质决定的 - 它并不那么简单。
- 个人联系人不是良好结构化的数据,可以称为半结构化。每个联系人可能有不同的数据字段,甚至可能是我无法预料的字段。在我看来,每一条数据都应该被视为重要信息,即任何一条数据都不能因为数据库中没有相关列而被丢弃。
- 如果我们更进一步,假设不会丢失任何数据,那么我们可以创建一个名为 Comment 或 Description 或 Other 的大文本列,并且将无法很好地放入表列中的所有内容都放在那里。但话又说回来 - 数据会丢失结构 - 这可能很糟糕。
- 如果我们想要结构化数据,那么根据数据库设计原则,数据应该分解为实体,并且应该在实体之间建立关系。但这增加了复杂性 - 实体太多了,应该做出很多设计决策,例如“我们如何存储地址?个人姓名?电话号码?我们如何对家庭电话号码进行编码?”手机号码?其他联系信息怎么样?...”实体之间的关系复杂且多样,每个关系都是数据库中的一张表。每个关系都需要记录在设计文件中。这是很多工作要做。但完全避免复杂性是可能的——只需记录数据是根据这样那样的标准模式存储的。那么任何阅读该文档的人都应该很容易理解它的全部内容。
- 最后,这一切都与使用行业标准有关。希望该标准是由一些聪明的人设计的,他们比我更好地预测和描述了个人联系人信息的结构。为什么我们都要重新发明轮子?使用标准模式要容易得多。问题是,标准太多了 - 决定使用哪一个并不容易!
I'm thinking of the best way to store personal contacts in a database for a business application. The traditional and straightforward approach would be to create a table with columns for each element, i.e. Name, Telephone Number, Job title, Address, etc... However, there are known industry standards for this kind of data, like for example vCard, or hCard, or vCard-RDF/XML or even Windows Contacts XML Schema. Utilizing an standard format would offer some benefits, like inter-operablilty with other systems. But how can I decide which method to use?
The requirements are mainly to store the data. Search and ordering queries are highly unlikely but possible. The volume of the data is 100,000 records at maximum.
My database engine supports native XML columns. I have been thinking to use some XML-based format to store the personal contacts. Then it will be possible to utilize XML indexes on this data, if searching and ordering is needed. Is this a good approach? Which contacts format and schema would you recommend for this?
Edited after first answers
Here is why I think the straightforward approach is bad. This is due to the nature of this kind of data - it is not that simple.
- The personal contacts it is not well-structured data, it may be called semi-structured. Each contact may have different data fields, maybe even such fields which I cannot anticipate. In my opinion, each piece of this data should be treated as important information, i.e. no piece of data could be discarded just because there was no relevant column in the database.
- If we took it further, assuming that no data may be lost, then we could create a big text column named Comment or Description or Other and put there everything which cannot be fitted well into table columns. But then again - the data would lose structure - this might be bad.
- If we wanted structured data then - according to the database design principles - the data should be decomposed into entities, and relations should be established between the entities. But this adds complexity - there are just too many entities, and lots of design desicions should be made, like "How do we store Address? Personal Name? Phone number? How do we encode home phone numbers and mobile phone numbers? How about other contact info?.." The relations between entities are complex and multiple, and each relation is a table in the database. Each relation needs to be documented in the design papers. That is a lot of work to do. But it is possible to avoid the complexity entirely - just document that the data is stored according to such and such standard schema, period. Then anybody who would be reading that document should easily understand what it was all about.
- Finally, this is all about using an industry standard. The standard is, hopefully, designed by some clever people who anticipated and described the structure of personal contacts information much better than I ever could. Why should we all reinvent the wheel?? It's much easier to use a standard schema. The problem is, there are just too many standards - it's not easy to decide which one to use!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您提到的格式是在系统之间交换数据的好方法,但不适合在数据库中存储。不要让数据交换标准决定数据库设计。无论您使用什么数据库设计,您始终可以创建一个服务或程序,以 XML 格式公开数据以供外部使用。
The formats you mention are great ways to exchange data between systems but are not ideal for storage in a database. Don't let data interchange standards dictate the database design. Whatever database design you use you could always create a service or program that exposes the data in an XML format for external use.
看起来您没有任何真正的性能或空间问题。因此,请使用编码和维护时间最少的方法!
您可能希望允许将数据导出为 vCard/hCard 等格式,但不要将它们用作应用程序的存储后端,除非您认为这会减少总体编码/维护工作。
It doesn't look like you have any real performance or space issues. So use whatever takes the least time to code and maintain!
You may wish to allow exporting the data to vCard/hCard etc. formats, but don't use them as your application's storage backend unless you think that would lead to reduced coding/maintenance overall.
我可能会为“正常”数据位(姓名、地址、电话等)设置一个“正常”表结构,然后与包含以下内容的单独表“custom_fields”建立一对多关系三列:
user_id(foreign ey)、fieldtype(string)、data(string/blob)
作为替代方案,您可以在主联系人表中添加一个 blob 或文本字段,其中包含自定义字段/值映射的格式化列表(您可以使用 BSON、JSON 或 YAML 让生活变得轻松)。然后,只需在用户打开联系人时解压数据即可。
如果您需要更快的性能并能够轻松地按自定义字段对联系人进行排序,您可能需要研究以文档为中心的数据库后端,例如 MongoDB,甚至是适当的搜索引擎(SOLR 或 Google..idk..)可能有点过分了,但也可能是一个有趣的项目!
有很多很多方法可以将自定义字段和值与“普通”数据库中的条目相关联。只要选择一个你能理解并且能很快写出来的然后就可以了。我从未见过一家公司/雇主关心后端数据存储系统的“标准合规性”。只要您编写某种导出脚本,或者(如上所述)编写插件来支持无缝 VCARD/XML 导入/导出,您可以声称您的应用程序“符合标准”。
I'd probably set up a "normal" table structure for the "normal" bits of data (name, address, phone, etc..) and then have a one->many relation to a separate table "custom_fields" that contains three columns:
user_id(foreign ey), fieldtype(string), data(string/blob)
As an alternative, you could just add a blob or text field in the main contacts table that contains a formatted list of custom field/value mappings (you can use BSON, JSON, or YAML to make life easy). Then just unpack the data when the user opens the contact.
If you need faster performance and the ability to sort your contacts by custom field easily, you might want to look into document-centric database backends like MongoDB, or even a search engine proper (SOLR or Google.. idk..) Might be overkill, but might also be an interesting project!
There are many, many, many ways to associate custom fields and values with entries in a "normal" database. Just pick one that you understand and can write quickly and go for it. I've never seen a company/employer care about "standards compliance" of the backend data storage system.. As long as you write some sort of export script, or (as mentioned) write plugins to support seamless VCARD/XML import/export, you can claim that your app is "standards compliant."
正常的数据库方法有什么问题。就像您自己提到的 - 有几种不同的格式,如果您实现一种格式,那么您就会破坏与其他系统的兼容性。
通过数据库方法,您可以稍后为与外部应用程序(VCard 或其他应用程序)链接所需的每种格式编写插件。
What's wrong with normal database approach. Like you mentioned yourself - there are several different formats out and if you implement one then you break compatibility with other systems.
With database approach you can later write plugins for every format needed to link with external applications - VCard or something else.