对数据库结构的意见和建议

发布于 2024-11-07 10:02:17 字数 2092 浏览 5 评论 0原文

我正在构建这个用于对数据进行分类的工具。基本上，我将定期接收平面文件中的数据行，如下所示：

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

我有一个类别列表来将这些行分解为，例如：

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

从此刻开始，类别名称也是已知的作为分解数据的类别数。但这可能会随着时间的推移而改变（例如，添加/删除的类别...类别总数发生变化）。

好吧，所以我并不是真的在寻求有关如何解析行或将数据放入数据库或其他任何内容的帮助...我知道如何做到这一切，并且已经编写了大部分核心脚本来处理解析值行并分为不同数量的类别。

我主要是在寻找有关如何构建数据库来存储这些内容的建议。所以我一直在思考这个问题，这就是我想到的：

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

基本上这个想法是当我解析一行时，我将在表 Generate 中插入一个新条目，以及 X 个条目在表 Category_Values 中，其中 X 是当前类别的数量。类别名称存储在另一个表Categories中。

我的脚本将立即执行的操作是处理原始值行并将生成的类别值输出到要发送到某处的新文件。但是然后我有这个数据库来存储生成的数据，以便我可以制作另一个脚本，在其中我可以搜索并列出以前生成的值，或者用新值或其他内容更新以前生成的条目。

这看起来是一个不错的数据库结构吗？有什么明显的我遗漏或可能欺骗自己的事情吗？例如，使用这种结构...嗯...我不是 sql 专家，但我认为我应该能够这样做

select * from Generated where original_string = '$string'
// id is put into $id

，然后

select * from Category_Values where generated_id = '$id'

...然后我将使用我的数据来获取搜索结果或表单来更改数据...好吧，我相当确定我什至可以将其合并到一个带有连接或其他内容的查询中，但我对 sql 不太擅长，所以我不知道如何实际做到这一点..但是重点是，我知道我可以从这个数据库做我需要的事情结构..但是我是否让这变得比需要的更困难？犯一些明显的菜鸟错误？

原文

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

And I have a list of categories to break these rows up into, for example:

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).

Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.

Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.

What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.

Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like

select * from Generated where original_string = '$string'
// id is put into $id

and then

select * from Category_Values where generated_id = '$id'

...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷爱 2024-11-14 10:02:17

我的建议：

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

链接
时间戳： http://dev.mysql.com/doc/refman/5.1 /en/timestamp.html
创建表语法： http://dev.mysql.com/doc /refman/5.1/en/create-table.html
枚举： http://dev.mysql.com/doc/refman/5.1 /en/enum.html

My suggestion:

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html

回复收藏 0 原文