我应该使用什么数据库表布局来快速检索日期范围内的聚合/不同数据?

发布于 2024-12-08 14:54:29 字数 550 浏览 2 评论 0原文

我正在编写一个网络应用程序来分析我的网络服务器日志。

我计划有一个每天运行的 SQL 作业,以非规范化 SQL 数据库中的 Web 服务器日志,以便 Web 应用程序不会读取原始 Web 服务器日志。

我希望 Web 应用程序用户输入一个日期范围,然后让 Web 应用程序返回:

  • 一个表,其中一列包含每个浏览器,下一列包含该日期范围的唯一客户端 IP 数量
  • 一个表包含每个操作系统一列和下一列中该日期范围的唯一客户端 IP 数量
  • 一个表,其中一列中包含每个浏览器 + 操作系统,下一列中该日期范围内的唯一客户端 IP 数量

(您可以在 Google 中看到这个想法分析。)

我们有大约 100,000每月唯一的客户端 IP,并且我希望将非规范化数据保留一年(尽管其中许多客户端 IP 每月都是相同的)。

  1. 放置非规范化信息的表格布局是什么?
  2. Web 应用程序要有效检索所需信息的 SQL 查询是什么?

(我不是问如何让 SQL 作业写入这些表;我可以弄清楚这一点。)

I am writing a web app to analyze my web server log.

I plan to have an SQL job, run daily, to denormalize my web server log in a SQL database, so that the web app does not read the raw web server log.

I would like for the web app user to input a date range, then have the web app return:

  • a table containing each browser in one column and in the next column the number of unique client IPs for that date range
  • a table containing each OS in one column and in the next column the number of unique client IPs for that date range
  • a table containing each browser + OS in one column and in the next column the number of unique client IPs for that date range

(You can see this idea in Google Analytics.)

We have about 100,000 unique client IPs per month, and I wish to hold denormalized data for a year (though many of these client IPs will be the same month-to-month).

  1. What would be a table layout in which to put the denormalized information?
  2. What would be SQL queries for web app to efficiently retrieve the desired information?

(I am not asking how to have the SQL job write to these tables; this I can figure out.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泪冰清 2024-12-15 14:54:29

我会总结 SQL 作业的访问次数,并将每日结果放入表中,如 [logsum]:

Table [logsum]:
sum_id (int / auto_increment)
sum_day (date)
sum_name (string)
sum_count (number)

并将非规范化日志数据放入 [logaccess]:

Table [logaccess]:
access_id (int / auto_increment)
access_day (date)
access_ip (string)
access_browser (string)
access_os (string)
access_click_count (int)

SQL-job:

1)将所有日志条目添加到 [Loginfo] 并汇总每个 IP 和每天的点击次数

for each line in log
{
  info = parse(line)
  execute_sql('REPLACE logaccess
               SET 
                 access_day=date(),
                 access_ip='& info[IP] &',
                 access_browser'& info[browser] &',
                 access_os='& info[OS] &',
                 access_click_count=IF(ISNULL(access_click_count),0,access_click_count) + 1) 
               WHERE access_day=date() AND access_ip='& info[IP] &';')
}

2) 汇总并保存到 [logsum]:

//- get OS count per day
res = execute_sql('SELECT access_day, access_os, count(access_id) AS C FROM logaccess GROUP BY access_day, access_os;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_os'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_os'];
}


//- get browser count per Day
res = execute_sql('SELECT access_day, access_browser, count(access_id) AS C FROM logaccess GROUP BY access_day, access_browser;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_browser'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_browser'];
}

//- get IP count per Day
res = execute_sql('SELECT access_day, count(access_id) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='ip',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='ip';
}

//- get click count per Day
res = execute_sql('SELECT access_day, sum(access_click_count) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='clicks',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='clicks';
}

3) 清理/删除超过 1 年的条目:

DELETE FROM logaccess WHERE access_day<DATE_ADD(date(),INTERVAL 1 year)

也许这对您有帮助
问候托马斯

I would sum up the number of access by the SQL-job and put the daily result into table like [logsum]:

Table [logsum]:
sum_id (int / auto_increment)
sum_day (date)
sum_name (string)
sum_count (number)

and the denormalized-log-data to [logaccess]:

Table [logaccess]:
access_id (int / auto_increment)
access_day (date)
access_ip (string)
access_browser (string)
access_os (string)
access_click_count (int)

SQL-job:

1) Add all log-entries to [Loginfo] and sum-up clicks per IP and day

for each line in log
{
  info = parse(line)
  execute_sql('REPLACE logaccess
               SET 
                 access_day=date(),
                 access_ip='& info[IP] &',
                 access_browser'& info[browser] &',
                 access_os='& info[OS] &',
                 access_click_count=IF(ISNULL(access_click_count),0,access_click_count) + 1) 
               WHERE access_day=date() AND access_ip='& info[IP] &';')
}

2) Sum up and save to [logsum]:

//- get OS count per day
res = execute_sql('SELECT access_day, access_os, count(access_id) AS C FROM logaccess GROUP BY access_day, access_os;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_os'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_os'];
}


//- get browser count per Day
res = execute_sql('SELECT access_day, access_browser, count(access_id) AS C FROM logaccess GROUP BY access_day, access_browser;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_browser'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_browser'];
}

//- get IP count per Day
res = execute_sql('SELECT access_day, count(access_id) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='ip',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='ip';
}

//- get click count per Day
res = execute_sql('SELECT access_day, sum(access_click_count) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='clicks',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='clicks';
}

3) clean up / delete entrys older then 1 year:

DELETE FROM logaccess WHERE access_day<DATE_ADD(date(),INTERVAL 1 year)

maybe this helps you
Regards Thomas

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文