对数据库结构的意见和建议
我正在构建这个用于对数据进行分类的工具。基本上,我将定期接收平面文件中的数据行,如下所示:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
我有一个类别列表来将这些行分解为,例如:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
从此刻开始,类别名称也是已知的作为分解数据的类别数。但这可能会随着时间的推移而改变(例如,添加/删除的类别...类别总数发生变化)。
好吧,所以我并不是真的在寻求有关如何解析行或将数据放入数据库或其他任何内容的帮助...我知道如何做到这一切,并且已经编写了大部分核心脚本来处理解析值行并分为不同数量的类别。
我主要是在寻找有关如何构建数据库来存储这些内容的建议。所以我一直在思考这个问题,这就是我想到的:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
基本上这个想法是当我解析一行时,我将在表 Generate
中插入一个新条目,以及 X 个条目在表 Category_Values
中,其中 X 是当前类别的数量。类别名称存储在另一个表Categories
中。
我的脚本将立即执行的操作是处理原始值行并将生成的类别值输出到要发送到某处的新文件。但是然后我有这个数据库来存储生成的数据,以便我可以制作另一个脚本,在其中我可以搜索并列出以前生成的值,或者用新值或其他内容更新以前生成的条目。
这看起来是一个不错的数据库结构吗?有什么明显的我遗漏或可能欺骗自己的事情吗?例如,使用这种结构...嗯...我不是 sql 专家,但我认为我应该能够这样做
select * from Generated where original_string = '$string'
// id is put into $id
,然后
select * from Category_Values where generated_id = '$id'
...然后我将使用我的数据来获取搜索结果或表单来更改数据...好吧,我相当确定我什至可以将其合并到一个带有连接或其他内容的查询中,但我对 sql 不太擅长,所以我不知道如何实际做到这一点..但是重点是,我知道我可以从这个数据库做我需要的事情结构..但是我是否让这变得比需要的更困难?犯一些明显的菜鸟错误?
I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated
, as well as X entries in table Category_Values
, where X is however many categories there currently are. And the category names are stored in another table Categories
.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的建议:
链接
时间戳: http://dev.mysql.com/doc/refman/5.1 /en/timestamp.html
创建表语法: http://dev.mysql.com/doc /refman/5.1/en/create-table.html
枚举: http://dev.mysql.com/doc/refman/5.1 /en/enum.html
My suggestion:
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
我认为这个解决方案非常适合您想做的事情。类别列表现在很灵活,因此您可以添加新类别或淘汰旧类别(我建议您在同意删除类别之前仔细考虑一下 - 您是否会孤立记录或删除它们,等等)
基本上,我'我说你的目标是正确的。结构很简单,但它会很适合您。干得好(干得好,在问题中提供了正确数量的信息)。
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).