是否有 API 可以确定大量推文中最常见的链接?
是否有一个 API(Twitter API 不提供此功能)可以用来确定 200 条推文中最常见的链接。我想做的是获取最新的 200 条推文,然后确定人们在谈论什么,我确信这些推文将包含链接(因为我会要求 twitter API 返回仅包含链接的推文),但我也会想要确保我的代码能够理解两个 URL 是相同的,即使它们具有不同的 bit.ly 链接。
我想做的(这可能会让你们更容易提供一些帮助)是我试图确定人们在这 200 条推文中谈论的最重要的主题是什么。我知道人们可能正在谈论同一个故事,但提供不同的链接,但是,我不确定是否有一种简单的方法可以理解这一点。
示例、API、示例代码和任何其他想法的链接都会有所帮助:)
如果您需要更多信息来解释这一点,请告诉我,我将编辑问题以包含更多信息
Is there an API (Twitter API does not provide this) that I can use to determine the most common links in 200 tweets for example. What I want to do is to get the latest 200 tweets and then determine what are people talking about, I am sure that the tweets will contain links (because I will ask the twitter API to return tweets that contain links only) but I will also want to make sure that my code will understand that Two URLs are the same even if they have different bit.ly links.
What I am trying to do (this might make it easier for your guys to provide some help) is that i am trying to determine what is the most important subject people are talking about in these 200 tweets. I understand that people might be talking about the same story but provide different links, however, i am not sure if there is an easy way to understand that.
Links to examples, APIs, sample code, and any other ideas will be helpful :)
If you need more information to explain this please tell me and I will edit the question to include more information
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
据我所知,但您可以通过以下方式完成此操作:
使用正则表达式模式查找推文列表中的所有链接。
使用
使用 Twitter 搜索 api 搜索每个链接。返回结果数。
按返回结果的数量对链接进行手动排序。
Not that I know of, but you can accomplish this by..
Find all of the links in the list of tweets using a regex pattern.
Use the twitter search api to search for each link. The number of results is returned.
Manually sort the links by # of results returned.
基本上你可以从 api 获取这个,首先获取最新的公共时间线(这将是 100 条推文,如果你需要 200 条,那么你需要请求一个游标并创建一个循环来检查 next_cursor 值是否大于 0),然后构建一个确定相关性的蜘蛛。
http://api.twitter.com/1/statuses/public_timeline。? ?
在哪里 ???是json、xml、rss或atom
如果你想确定单词的流行度,那么将所有文本转储到一个字符串中,然后将其分割为空格、标点符号等,丢弃非名词,对其进行排序并使用单词创建一个字典变量以及单词数。
如果您想确定链接的受欢迎程度,则过程相同,但需要额外的步骤,即对每个链接执行 Web 请求以确定最终链接目标。
Fundamentally you can get this from the api, first get the latest public timeline (this will be 100 tweets, if you need 200 then you need to request a cursor and create a loop that checks if the next_cursor value is greater than 0) and then build a spider that determines relevancy.
http://api.twitter.com/1/statuses/public_timeline.???
where ??? is json, xml, rss or atom
If you want to determine the popularity of words then dump all the text into a string and then split it on spaces, punctuation etc, discard non-nouns, sort it and create a dictionary variable with the words and the count of the words.
If you want to determine the popularity of links then it is the same process but with an extra step to do a web request on each link to determine the ultimate link destination.
根据其他人的说法,您可以使用 Twitter 搜索来获取推文,没有问题,我不会在这个答案中讨论这一部分。
短链接的可能路径:
例如,您可以转到 bit.ly 并为您想要跟踪的网址创建自定义短链接。使用该链接,如果您在网址末尾添加 +,您将获得链接统计信息。例如: http://bit.ly/tweelay+ 此外,bit.ly 还会跟踪指向该链接的其他短链接到同一个网址。然后您可以在搜索中使用它。
使用 bit.ly /stats API 您可以获得缩短网址列表。
根据您尝试跟踪的网址,您可能有权访问推荐日志。 (即您自己的网站)使用您的推荐日志,您还可以找到可用于搜索的其他短网址。
Building on what others say, you can use twitter search to get the tweets no problem and I wont go into that part in this answer.
A possible route for the short links:
You could, for example, goto bit.ly and create a custom short link for the url you are wanting to keep track of. Using that link if you add a + to the end of the url you will get link stats. example: http://bit.ly/tweelay+ Additionally, bit.ly keeps track of other short links that point to the same url. Which you could then use in your searches.
using bit.ly /stats API you can get a list of the shorten urls.
depending on the urls you are trying to keep track of you may have access to referral logs. (i.e. your own website) Using your referral log you may also be able to find additional short urls that you can use to search.