SQL 代码要做:热门话题等于 Twitter

发布于 2024-07-26 15:20:48 字数 1072 浏览 3 评论 0原文

如果我想统计两个表的一行中的匹配单词,有数百万行,示例:

表帖子,示例:

+----+---------+-----------------------------+
| ID | ID_user | text                        | 
+----+---------+-----------------------------+
| 1  | bruno   | michael jackson is dead     |
| 2  | thomasi | michael j. moonwalk is dead |
| 3  | userts  | michael jackson lives       |
+----+---------+-----------------------------+

我想查询表中重复次数最多的单词,限制前10个,结果可能是这样的:

+-------+------------+
| count | word       |
+-------+------------+
| 3     | michael    |
| 2     | dead       |
| 2     | jackson    |
| 1     | j.         |
| 1     | lives      |
| 1     | moonwalk   |
+-------+------------+

但我想要仅搜索重复次数超过 10 次的单词,在这种情况下不会出现任何单词,但如果重复单词的条件为 2,则将仅显示“michael”和“dead”,但忽略“is”,因为我不想要更少的单词2个长度的字符,以及一个短语的单词,然后我需要出现这个:

+-------+-----------------+
| count | word            |
+-------+-----------------+
| 2     | michael jackson |
| 2     | dead            |
+-------+-----------------+

我需要mysql中的代码来回复我网站帖子的twitter的“热门话题”。

If i wants count the matching words in a rows of two tables, with milions of rows, sample:

Table posts, sample:

+----+---------+-----------------------------+
| ID | ID_user | text                        | 
+----+---------+-----------------------------+
| 1  | bruno   | michael jackson is dead     |
| 2  | thomasi | michael j. moonwalk is dead |
| 3  | userts  | michael jackson lives       |
+----+---------+-----------------------------+

i want query the words most repeated on the table, limit top 10, the result may be this:

+-------+------------+
| count | word       |
+-------+------------+
| 3     | michael    |
| 2     | dead       |
| 2     | jackson    |
| 1     | j.         |
| 1     | lives      |
| 1     | moonwalk   |
+-------+------------+

but i want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, will display only 'michael' and 'dead', but ignore 'is' because i dont want words with less 2 chars of lenght, and the words that a phrase, then i need apear this:

+-------+-----------------+
| count | word            |
+-------+-----------------+
| 2     | michael jackson |
| 2     | dead            |
+-------+-----------------+

i need a code in mysql that replies the "trending topics" of twitter for posts of my site.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赏烟花じ飞满天 2024-08-02 15:20:48

您正在寻找的是术语提取,MySQL 本身不提供该功能。

其他一些平台提供了该功能,但它被视为企业功能,因此您必须为此付出高昂的代价。

或者,您可以使用 Yahoo! 的术语提取 API

这里是一篇讨论使用 Yahoo! 服务的博客文章从 PHP5 开始。

What you're looking for is term extraction, which isn't provided natively within MySQL.

Some other platforms provide that function, but it's considered an enterprise feature, so you'll have to pay through the nose for it.

Alternatively, you can use something like Yahoo!'s Term Extraction API.

Here is a blog post that talks about using Yahoo!'s service from PHP5.

心的憧憬 2024-08-02 15:20:48

在插入时分解句子,根据黑名单过滤单词,使用计数(或可能使用引用)存储不同的单词。 使用 count() 进行计数:)

这会生成大量数据,而且我不知道速度和存储影响是什么。

break the sentence up on insert, filter the words against a blacklist, store distinct words with a count (or probably with references). count using count() :)

this would generate a lot of data tough, and i don't know what the speed and storage implications are.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文