SQL 代码要做:热门话题等于 Twitter
如果我想统计两个表的一行中的匹配单词,有数百万行,示例:
表帖子,示例:
+----+---------+-----------------------------+
| ID | ID_user | text |
+----+---------+-----------------------------+
| 1 | bruno | michael jackson is dead |
| 2 | thomasi | michael j. moonwalk is dead |
| 3 | userts | michael jackson lives |
+----+---------+-----------------------------+
我想查询表中重复次数最多的单词,限制前10个,结果可能是这样的:
+-------+------------+
| count | word |
+-------+------------+
| 3 | michael |
| 2 | dead |
| 2 | jackson |
| 1 | j. |
| 1 | lives |
| 1 | moonwalk |
+-------+------------+
但我想要仅搜索重复次数超过 10 次的单词,在这种情况下不会出现任何单词,但如果重复单词的条件为 2,则将仅显示“michael”和“dead”,但忽略“is”,因为我不想要更少的单词2个长度的字符,以及一个短语的单词,然后我需要出现这个:
+-------+-----------------+
| count | word |
+-------+-----------------+
| 2 | michael jackson |
| 2 | dead |
+-------+-----------------+
我需要mysql中的代码来回复我网站帖子的twitter的“热门话题”。
If i wants count the matching words in a rows of two tables, with milions of rows, sample:
Table posts, sample:
+----+---------+-----------------------------+
| ID | ID_user | text |
+----+---------+-----------------------------+
| 1 | bruno | michael jackson is dead |
| 2 | thomasi | michael j. moonwalk is dead |
| 3 | userts | michael jackson lives |
+----+---------+-----------------------------+
i want query the words most repeated on the table, limit top 10, the result may be this:
+-------+------------+
| count | word |
+-------+------------+
| 3 | michael |
| 2 | dead |
| 2 | jackson |
| 1 | j. |
| 1 | lives |
| 1 | moonwalk |
+-------+------------+
but i want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, will display only 'michael' and 'dead', but ignore 'is' because i dont want words with less 2 chars of lenght, and the words that a phrase, then i need apear this:
+-------+-----------------+
| count | word |
+-------+-----------------+
| 2 | michael jackson |
| 2 | dead |
+-------+-----------------+
i need a code in mysql that replies the "trending topics" of twitter for posts of my site.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在寻找的是术语提取,MySQL 本身不提供该功能。
其他一些平台提供了该功能,但它被视为企业功能,因此您必须为此付出高昂的代价。
或者,您可以使用 Yahoo! 的术语提取 API 。
这里是一篇讨论使用 Yahoo! 服务的博客文章从 PHP5 开始。
What you're looking for is term extraction, which isn't provided natively within MySQL.
Some other platforms provide that function, but it's considered an enterprise feature, so you'll have to pay through the nose for it.
Alternatively, you can use something like Yahoo!'s Term Extraction API.
Here is a blog post that talks about using Yahoo!'s service from PHP5.
在插入时分解句子,根据黑名单过滤单词,使用计数(或可能使用引用)存储不同的单词。 使用 count() 进行计数:)
这会生成大量数据,而且我不知道速度和存储影响是什么。
break the sentence up on insert, filter the words against a blacklist, store distinct words with a count (or probably with references). count using count() :)
this would generate a lot of data tough, and i don't know what the speed and storage implications are.