对数据库结构的意见和建议

发布于 2024-11-07 10:02:17 字数 2092 浏览 0 评论 0原文

我正在构建这个用于对数据进行分类的工具。基本上,我将定期接收平面文件中的数据行,如下所示:

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

我有一个类别列表来将这些行分解为,例如:

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

从此刻开始,类别名称也是已知的作为分解数据的类别数。但这可能会随着时间的推移而改变(例如,添加/删除的类别...类别总数发生变化)。

好吧,所以我并不是真的在寻求有关如何解析行或将数据放入数据库或其他任何内容的帮助...我知道如何做到这一切,并且已经编写了大部分核心脚本来处理解析值行并分为不同数量的类别。

我主要是在寻找有关如何构建数据库来存储这些内容的建议。所以我一直在思考这个问题,这就是我想到的:

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

基本上这个想法是当我解析一行时,我将在表 Generate 中插入一个新条目,以及 X 个条目在表 Category_Values 中,其中 X 是当前类别的数量。类别名称存储在另一个表Categories中。

我的脚本将立即执行的操作是处理原始值行并将生成的类别值输出到要发送到某处的新文件。但是然后我有这个数据库来存储生成的数据,以便我可以制作另一个脚本,在其中我可以搜索并列出以前生成的值,或者用新值或其他内容更新以前生成的条目。

这看起来是一个不错的数据库结构吗?有什么明显的我遗漏或可能欺骗自己的事情吗?例如,使用这种结构...嗯...我不是 sql 专家,但我认为我应该能够这样做

select * from Generated where original_string = '$string'
// id is put into $id

,然后

select * from Category_Values where generated_id = '$id'

...然后我将使用我的数据来获取搜索结果或表单来更改数据...好吧,我相当确定我什至可以将其合并到一个带有连接或其他内容的查询中,但我对 sql 不太擅长,所以我不知道如何实际做到这一点..但是重点是,我知道我可以从这个数据库做我需要的事情结构..但是我是否让这变得比需要的更困难?犯一些明显的菜鸟错误?

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

And I have a list of categories to break these rows up into, for example:

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).

Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.

Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.

What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.

Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like

select * from Generated where original_string = '$string'
// id is put into $id

and then

select * from Category_Values where generated_id = '$id'

...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷爱 2024-11-14 10:02:17

我的建议:

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

链接
时间戳: http://dev.mysql.com/doc/refman/5.1 /en/timestamp.html
创建表语法: http://dev.mysql.com/doc /refman/5.1/en/create-table.html
枚举: http://dev.mysql.com/doc/refman/5.1 /en/enum.html

My suggestion:

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html

初见你 2024-11-14 10:02:17

我认为这个解决方案非常适合您想做的事情。类别列表现在很灵活,因此您可以添加新类别或淘汰旧类别(我建议您在同意删除类别之前仔细考虑一下 - 您是否会孤立记录或删除它们,等等)

基本上,我'我说你的目标是正确的。结构很简单,但它会很适合您。干得好(干得好,在问题中提供了正确数量的信息)。

I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)

Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文