非线性数据的数据库结构
我需要有关数据库结构的建议。我需要从网络上捕获几个特定网站上有关某个特定主题的数据,并将该数据插入数据库。
此任务的问题是信息不是线性的,如果我尝试设计包含所有可能数据的字段的表,我最终会得到许多带有 NULL 值的行字段。这有什么问题吗(最终会出现许多带有 NULL 值的行字段)?或者我应该使用其他类型的结构?例如,将数据存储在一个字段中,并且该字段包含带有数据的关联数组。
我对非线性数据的意思如下:
array(
'name' => 'Don',
'age' => '31'
);
array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);
在特定网站搜索中,我将仅存储“姓名”和“年龄”,在其他网站中,我将存储“姓名”、“年龄”、“汽车”和“公里” 。
我不知道我是否能解释我的问题。我的英语不是很好。
此致。
I need advise about a database structure. I need to capture data from the web about one specific subject on few specific websites and insert that data to a database.
The problem with this task is that the information is not linear, if I try to design tables with fields for all possible data I will end up with many row fields with NULL values. There are any problem with this(end up with many row fields with NULL values)? Or should I user other kind of structure? For example store the data in one field and that field containing an associative array with data.
What I mean with non linear data is the following:
array(
'name' => 'Don',
'age' => '31'
);
array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);
In a specific website search I will store only "name" and "age", and in other website I will store "name", "age", "car" and "km".
I don't know If I explain weel my problem. My english is not very good.
Best Regards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这类问题正是 NoSQL 解决方案擅长的领域。使用传统数据库时,您必须提前指定所有列,而 NoSQL 解决方案使您可以选择添加您喜欢的任何类型的数据。
因此,这取决于您是否拥有固定数量的数据。如果您已经知道将使用的所有列。您可以将列添加为
NULL
。如果您还不知道所有列并预计将来会有更多列,那么 NoSQL 解决方案会更好。
This kind of problem is exactly the area where NoSQL solutions excell. With a traditional database you have to specify all columns in advance while NoSQL solutions give you the option of adding any kind of data you like.
So it depends on wheter you will have a fixed amount of data or not. If you already know all the columns that you'll use. Than you can add the columns as
NULL
.If you don't know all the columns yet and expect that there will be more columns in the future, than a NoSQL solution would be better.
您有很多选择,
使用支持对象的数据库
--"-- 支持 xml
就我个人而言,我会使用 3),简单快捷。并且不会将您束缚于特定的
数据库或软件。
问候,
/吨
You have many options,
using database with support for objects
--"-- with support for xml
your solution, keeping baseinfo in one table, attributes in another.
Personally, I'd use 3), easy and fast. And doesn't tie you down to specific
db or software.
regards,
/t
好吧,让我们回溯一下……并假设您对数据库最熟悉……您始终可以将非线性结构分解为线性类型……只有查询性能会受到影响……
在具有大量空值的行中没有问题。取决于数据库实现,但我以前见过这样的设计,它们非常灵活。
让我举个例子
假设我们必须存储每周的工作时间......但在您的情况下,一周可以有任意天数。
因此,您定义一个包含
StartDate、Id、MondayHour、Tuesdayhour 等列的表,直到 SundayHour
如果您想添加另一个小时,例如 MondayHour1,只需添加列并修改您的查询。
存储相同的结构是线性(标准化)方式(不确定线性在这里是否是正确的词)只需定义一个表,如下所示 DayID , DayName
然后你的小时表将包含 StartDate, ID, DayID, Hours..
只是现在您需要连接两个表。
希望我正确理解并回答了您的问题
Ok lets track back..and assume you are most comfortable with databases....you can always break down a nonlinear structure to linear type..only query performance will get hit..
No problem in rows with lot of null values. Depends on db implementation, but I have seen such designs before and they are pretty flexible.
Let me give an example
Lets say we have to store hours worked per week..but in your case week can have any number of days.
So you define a table with columns like
StartDate, Id, MondayHour, Tuesdayhour, etc etc..upto SundayHour
If you want to add another hour like MondayHour1, just add the column and modify your queries..
To store the same structure is a linear(normalized) way(not sure if linear is a right word here) just define a table as follows DayID, DayName
And then your hours table will have StartDate, ID, DayID, Hours..
Only now you need a join on two tables.
Hope I have understood and answered your question correctly