寻找数据库设计相关的方式
我正在开发一个日志分析系统,它读取tomcat的日志并通过网页中的图表/表格显示它们。 (我知道有一些现有的日志分析系统,我正在重新造轮子。但这是我的工作,我的老板想要它。)
我们的tomcat日志是每天保存的。例如:
2011-01-01.txt
2011-01-02.txt
......
以下是我将日志导出到db并读取的方式:
1 数据库结构
我有三个表: 1)log_current:保存今天生成的日志。
2)log_past:保存今天之前生成的日志。
上面两个表拥有相同的架构。
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| Id | hostip | username | datasend | method | uri | queryStr | protocol | status | time | browser | platform | refer |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| 44359 | 127.0.0.1 | - | 0 | GET | / | | HTTP/1.1 | 404 | 2011-02-17 08:08:25 | Unknown | Unknown | - |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
3)log_record:保存log_past的信息,记录日志导出到log_past表的天数。
+-----+------------+
| Id | savedDate |
+-----+------------+
| 127 | 2011-02-15 |
| 128 | 2011-02-14 |
..................
+-----+------------+
该表显示已导出2011-02-15 的日志。
2 导出(到数据库)
我有两个计划工作。
1)白天工作。
00:05:00,查看tomcat日志目录(/tomcat/logs),找到最近30天的所有日志文件(当然也包括昨天的日志。
查看log_record表查看是否导出了一天的日志,例如log_record中没有找到2011-02-16,所以我就读取2011-02- 16.txt,并将它们导出到log_past。
导出昨天的日志后,我启动对今天的日志(2011-02-17.txt)的文件监控
2)文件监控
一旦监视器启动,它将每小时读取该文件。它读取的每个日志都会保存在log_current表中。
3 tomcat服务器重启。
有时候我们必须重启tomcat,所以一旦tomcat启动,我会删除log_current的所有日志,然后进行日常工作。
4 我的问题
1)两个表(log_current和log_past)。
因为如果我将今天的日志保存到log_past,我无法确保所有日志文件(xxxx-xx-xx.txt)都导出到数据库。因为我会每天 00:05:00 进行检查,以确保必须导出今天之前的日志。
但这使得查询过去和现在的日志变得困难。
例如,查询从2011-02-14 00:00:00到2011-02-15 00:00:00,这些日志必须位于log_past。
但是从 2011-02-14 00:00:00 到 2011-02-17 08:00:00 怎么样?(假设是 2011-02-17 09:00:00 现在)。
跨表查询比较复杂。
另外,我一直认为我对表格和工作方式(导出/读取的安排工作)的设计并不完美,所以有人可以给出一个好的建议吗?
我只需要导出和读取日志,并且可以进行几乎实时的分析,其中实时意味着我必须通过图表/表格等使当天的日志可见。
I am working for a log analyzer system,which read the log of tomcat and display them by a chart/table in web page.
(I know there are some existed log analyzer system,I am recreating the wheel. But this is my job,my boss want it.)
Our tomcat log are saved by day. For example:
2011-01-01.txt
2011-01-02.txt
......
The following is my manner for export logs to db and read them:
1 The DB structure
I have three tables:
1)log_current:save the logs generated today.
2)log_past:save the logs generated before today.
The above two tables own the SAME schema.
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| Id | hostip | username | datasend | method | uri | queryStr | protocol | status | time | browser | platform | refer |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| 44359 | 127.0.0.1 | - | 0 | GET | / | | HTTP/1.1 | 404 | 2011-02-17 08:08:25 | Unknown | Unknown | - |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
3)log_record:save the information of log_past,it record the days whose logs have been exported to the log_past table.
+-----+------------+
| Id | savedDate |
+-----+------------+
| 127 | 2011-02-15 |
| 128 | 2011-02-14 |
..................
+-----+------------+
The table shows log of 2011-02-15 have been exported.
2 Export(to db)
I have two schedule work.
1) day work.
at 00:05:00,check the tomcat log directory(/tomcat/logs) to find all the latest 30 days log files(of course it include logs of yesterday.
check the log_record table to see if logs of one day is exported,for example,2011-02-16 is not find in the log_record,so I will read the 2011-02-16.txt,and export them to log_past.
After export log of yesterday,I start the file monitor for today's log(2011-02-17.txt) not matter it exist or not.
2)the file monitor
Once the monitor is started,it will read the file hour by hour. Each log it read will be saved in the log_current table.
3 tomcat server restart.
Sometimes we have to restart the tomcat,so once the tomcat is started,I will delete all logs of log_current,then do the day work.
4 My problem
1) two table (log_current and log_past).
Because if I save the today's log to log_past,I can not make sure all the log file(xxxx-xx-xx.txt) are exported to db. Since I will do a check in 00:05:00 every day which make sure that logs before today must be exported.
But this make it difficult to query logs accros yestersay and today.
For example,query from 2011-02-14 00:00:00 to 2011-02-15 00:00:00,these log must be at log_past.
But how about from 2011-02-14 00:00:00 to 2011-02-17 08:00:00 ?(suppose it is 2011-02-17 09:00:00 now).
It is complex to query across tables.
Also,I always think my desing for the table and work manner(schedule work of export/read) are not perfect,so anyone can give a good suggestion?
I just need to export and read log and can do a almost real-time analysis where real-time means I have to make logs of current day visiable by chart/table and etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,在我看来,您不需要 2 个不同的表
log_current
和log_past
。您可以在同一个表中插入所有行,例如logs
并使用从日志中选择*,其中id =(从log_record中选择id,其中savedDate ='YOUR_DATE')
这将为您提供特定日期的所有日志。
现在,一旦您能够使用上述方式消除表之间当前和过去的区别,我认为您在这里提出的问题就会得到解决。 :)
First of all, IMO you don't need 2 different tables
log_current
andlog_past
. You can insert all the rows in the same table, saylogs
and retrieve usingselect * from logs where id = (select id from log_record where savedDate = 'YOUR_DATE')
This will give you all the logs of the particular day.
Now, once you are able to remove the current and past distinction between tables using above way, I think the problem you are asking here would be solved. :)