在连接查询中按日期排序和计算

发布于 2024-11-09 05:45:16 字数 894 浏览 0 评论 0原文

我正在做一个项目,我在网站上跟踪用户,我记录他们在网站上的每次点击。每当他们点击 URL 时,我都会在数据库中创建它,并用一些标签对其进行标记。

在我的数据库中,每个 URL 都被命名为“资源”,并且一个资源可以使用多个标签进行标记。访问者在访问 URL 时会连接到资源,当用户点击资源时,我也会将日期连接到该资源。

我想做的事情是找到带有正确标签的资源,例如本月或今天观看过的资源。

我当前正在构建的查询位于此处:

SELECT r.resource_id, r.resource_url
FROM resource r
JOIN visitor_resource vt ON vt.resource_id = r.resource_id
JOIN resource_tags rt ON rt.resource_id = vt.resource_id
JOIN tags t ON t.tag_id = rt.tag_id AND t.tag_name =  '42'
GROUP BY r.resource_id

为了让您了解结构,您可以在此处看到: 跟踪数据库结构 http://kaspergrubbe.dk/db-overview.png

所以基本上我会必须通过查看上个月的visitor_resources.last_visited 来计算给定月份内有多少visitor_resources,并获取访问量最大的5 个资源。

如何处理这个问题?

如果没有查询缓存,上面的查询看起来也很慢,我怀疑这是因为 t.tag_name 不是索引,而是一个 varchar,但是除了添加这个索引之外,还有什么办法可以加快进程吗?

谢谢。

I am doing a project where I am tracking users on a website, I log their every hit across the site. Whenever they hit a URL, I will create it in the database, and tag it with some tags.

Every URL is named a 'resource' in my database, and a resource can be tagged with multiple tags. A Visitor is connected to resources when they visit a URL, and when a user hits a resource, i also connect the date to it.

The thing I want to do is finding the resources with the correct tags, that has been watched this month or today for an example.

The query I am currently building is here:

SELECT r.resource_id, r.resource_url
FROM resource r
JOIN visitor_resource vt ON vt.resource_id = r.resource_id
JOIN resource_tags rt ON rt.resource_id = vt.resource_id
JOIN tags t ON t.tag_id = rt.tag_id AND t.tag_name =  '42'
GROUP BY r.resource_id

To give you an idea for the structure you can see here:
tracking database structure http://kaspergrubbe.dk/db-overview.png

So basically I will have to count how many visitor_resources there is in a given month by looking at visitor_resources.last_visited for the last month, and get the 5 most visited resources.

How to approach this?

The above query also seems very slow without query-caching, I suspect it is because t.tag_name is not an index, and that is a varchar, but is there anyway to speed up the process other than adding this index?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

水溶 2024-11-16 05:45:16

您遗漏了任何基于日期的条件,因此您应该添加该条件并查看性能如何变化。另外,如果您正在寻找计数,那么您也应该添加该计数。我认为 mySQL 支持 LIMIT 子句(而不是 TOP),因此添加该子句以将其限制为 5 个最常访问的资源。将所有内容放在一起,它可能看起来像这样:

SELECT
    r.resource_id,
    r.resource_url,
    COUNT(*)
FROM
    Visitor_Resources VR
INNER JOIN Resources R ON R.resource_id = VR.resource_id
INNER JOIN Resource_Tags RT ON RT.resource_id = R.resource_id
INNER JOIN Tags T ON
    T.tag_id = RT.tag_id AND
    T.tag_name = '42'
WHERE
    VR.last_visited BETWEEN <start of month> AND <end of month>
GROUP BY
    r.resource_id,
    r.resource_url
ORDER BY
    COUNT(*) DESC
LIMIT 5

抱歉,这些天我没有做很多 mySQL,所以我不知道最后一行中的开始和结束日期参数会是什么样子。

除非您的标签表非常大,否则索引可能并不重要。不过,Visitor_Resources.last_visited 上的索引可能是个好主意。

另外,我更改了查询中的表名称以使其更加一致。我个人喜欢复数名称,但单数也可以。不管你选择哪一个,选择一个并坚持下去。

You've left out any criteria based on the date, so you should add that and see how the performance changes. Also, if you're looking for a count then you should add that as well. I think that mySQL supports the LIMIT clause (as opposed to TOP), so add that for limiting it to the 5 most visited resources. With everything together it will probably look something like this:

SELECT
    r.resource_id,
    r.resource_url,
    COUNT(*)
FROM
    Visitor_Resources VR
INNER JOIN Resources R ON R.resource_id = VR.resource_id
INNER JOIN Resource_Tags RT ON RT.resource_id = R.resource_id
INNER JOIN Tags T ON
    T.tag_id = RT.tag_id AND
    T.tag_name = '42'
WHERE
    VR.last_visited BETWEEN <start of month> AND <end of month>
GROUP BY
    r.resource_id,
    r.resource_url
ORDER BY
    COUNT(*) DESC
LIMIT 5

Sorry, I don't do a lot of mySQL these days, so I don't know what the start and end date parameters would look like in the last line.

Unless your Tags table is very large an index probably won't matter much. An index on the Visitor_Resources.last_visited might be a good idea though.

Also, I changed your table names in the query to be more consistent. Personally I like plural names, but singular are ok too. No matter which you choose though, pick one and stick to it.

无风消散 2024-11-16 05:45:16

Kasper,

首先您需要调整您的 JOIN。没有必要在每一行都说 JOIN ON:

ie.
        LEFT JOIN(
        groups, sign
        )ON(
        user.user_id = groups.userID AND
        groups.group_id = sign.groupID

尝试类似的连接。

然后向子句添加 ORDER BY。

ORDER BY last_visited DESC LIMIT 5;

这将对您的日期表进行排序,并从您指定的列中获取最近的 5 个条目。

希望这有帮助。

Kasper,

First off you need to adjust your JOIN. There is no need to say JOIN ON on each line:

ie.
        LEFT JOIN(
        groups, sign
        )ON(
        user.user_id = groups.userID AND
        groups.group_id = sign.groupID

Try something like that for the joins.

Then add an ORDER BY to the clause.

ORDER BY last_visited DESC LIMIT 5;

This will sort your date table, and take the most recent 5 entries from which ever column you specify.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文