从 Google 网络历史记录中检索旧搜索
我想检索几年前/几个月前进行的旧 Google 搜索,这些搜索存在于 Google 网络历史记录中。我如何以编程方式检索它们?
仅https://www.google.com/history/?output=rss提供最近的 Google 搜索,但不是全部。
还有这个问题:如何检索我的 Google 搜索历史记录? 没有为我的问题提供任何答案!
I want to retrieve old Google searches which I did a few years/months back and that are present in Google web history. How can I programmatically retrieve them all?
https://www.google.com/history/?output=rss only provides recent Google searches, but not all of them.
Also this question : How can I retrieve my Google search history? doesn't provide any answer for my question!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以传递月、日和年作为参数来获取特定日期的历史记录。
例如 https://www.google。 com/history/lookup?month=12&day=1&yr=2010&output=rss(2010 年 12 月 1 日)。
无法获取完整月份或年份的历史记录,更不用说获取整个历史记录了。但是有关参数的这些信息至少必须使您能够获取某个循环中的整个历史记录,该循环每次都会向后退一天。注意不要在太短的时间内吸太多。
You can pass month, day and year as parameters to obtain history of a specific day.
E.g. https://www.google.com/history/lookup?month=12&day=1&yr=2010&output=rss for Dec, 1 2010.
There are no ways to obtain history for a full month or year, let alone the entire history. But this information about the parameters must at least enable you to obtain the entire history in some loop which goes one day further back in the time everytime. Be carecul that you don't leech too much in a too short time.
您确实需要逐页解析 HTML,然后获取数据,因为我认为没有其他选择!
You really need to parse HTML page by page and then fetch your data, because i dont think there is any alternative!
我认为这将是非常困难的。
我知道这并不能完全回答您的问题,但至少可以保留网页。有些组织和工具允许您重新创建过去日期的网页 - 例如,请参阅 http://www.mementoweb.org /。
更新:我刚刚获悉 Memento 赢得了数字保存奖 (http://www.dpconline.org/newsroom)
I think this will be very difficult.
I know this doesn't answer you question completely but at least the web pages may be preserved. There are organizations and tools that allow you to recreate web pages from past dates - see for example http://www.mementoweb.org/.
UPDATE: I have just learnt that Memento has won a digital preservation award (http://www.dpconline.org/newsroom)
我知道您不想返回每个页面,但您实际上并不需要解析整个页面,只需查找始终位于条目之前的 html 即可。从我刚刚启动谷歌网络历史记录并进行一些简单搜索开始,如果您查看历史记录页面,您搜索的每个字符串如下:
如果您使用两个术语,则会在术语之间得到一个 +。其他不同搜索模式的约定,我没有一一列举。
看起来如果你使用 BalusC 的方法来传递参数,那么你可以检索 html,在文档中搜索我提到的字符串(确保 \" 和其他特殊字符),然后复制下一个字符串,直到到达 & 。然后,您需要做的就是解析您的搜索词,而不是整个页面,直到到达末尾,然后进入循环中的下一个迭代。
I know you're not looking to go back through every page, but you don't really need to parse the whole page, just look for the html that always precedes an entry. From me just starting up google web history and doing some simple searches, if you look through a history page, each String that you've searched follows:
<td style="padding:3px 0"><table id=bkmk_view_ class=noborder ><tr><td><table class="elem noborder"><tr><td class="grey" nowrap>Searched for </td><td nowrap><a title="http://www.google.com/search?q=
and is followed by
&
(ampersand). This sequence of preceding html is unique on the page, only occuring when historical search terms are listed.If you use two terms, you get a + in between the terms. Other conventions for different searching modes, I didn't go through them all.
It looks like if you use BalusC's method to pass parameters, then you could retreive the html, search the document for the string I mentioned(be sure to \" and other special characters), then copy the next String until you reach a & character. Then, all you need to do is parse your search term, not the whole page. Go through source code until you reach the end, then go to your next iteration in the loop.