与 Google Reader 同步时如何跳过已知条目?
为了将离线客户端写入 Google Reader 服务,我想知道如何最好地与该服务同步。
似乎还没有官方文档,到目前为止我找到的最佳来源是: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
现在考虑一下:根据上面的信息,我可以下载所有未读的项目,我可以指定要下载的项目数量并使用原子- id 我可以检测到我已经下载的重复条目。
我缺少的是一种指定我只需要自上次同步以来的更新的方法。 我可以说给我 10 个(参数n=10)最新(参数r=d)条目。 如果我指定参数r=o(日期升序),那么我还可以指定参数ot=[上次同步时间],但只有这样,升序才不会'当我只想阅读某些项目而不是所有项目时,这没有任何意义。
知道如何解决这个问题,而无需再次下载所有项目并拒绝重复项吗? 这不是一种非常经济的民意调查方式。
有人建议我可以指定只想要未读的条目。 但为了使该解决方案以 Google Reader 不再提供此条目的方式工作,我需要将它们标记为已读。 反过来,这意味着我需要在客户端上保留自己的已读/未读状态,并且当用户登录到在线版本的 Google 阅读器时,条目已标记为已读。 那对我不起作用。
干杯, 马里亚诺
for writing an offline client to the Google Reader service I would like to know how to best sync with the service.
There doesn't seem to be official documentation yet and the best source I found so far is this: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
Now consider this: With the information from above I can download all unread items, I can specify how many items to download and using the atom-id I can detect duplicate entries that I already downloaded.
What's missing for me is a way to specify that I just want the updates since my last sync.
I can say give me the 10 (parameter n=10) latest (parameter r=d) entries. If I specify the parameter r=o (date ascending) then I can also specify parameter ot=[last time of sync], but only then and the ascending order doesn't make any sense when I just want to read some items versus all items.
Any idea how to solve that without downloading all items again and just rejecting duplicates? Not a very economic way of polling.
Someone proposed that I can specify that I only want the unread entries. But to make that solution work in the way that Google Reader will not offer this entries again, I would need to mark them as read. In turn that would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.
Cheers,
Mariano
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要获取最新条目,请使用标准的从最新日期降序下载,该下载将从最新条目开始。 您将在 XML 结果中收到一个“继续”标记,看起来像这样:
扫描结果,找出任何新的内容。 您应该发现,要么所有结果都是新的,要么在某一点上的所有内容都是新的,而之后的所有结果您都已经知道了。
在后一种情况下,你已经完成了,但在前一种情况下,你需要找到比你已经检索到的内容更旧的新内容。 通过使用延续来获取从刚刚检索到的集合中最后一个结果之后开始的结果(通过在 GET 请求中将其作为
c
参数传递)来执行此操作,例如:以这种方式继续,直到获得一切。
n
参数是要检索的项目数的计数,非常适合于此,并且您可以随时更改它。 如果检查频率是用户设置的,因此可能非常频繁或非常罕见,您可以使用自适应算法来减少网络流量和处理负载。 最初请求少量最新条目,例如 5 个(将n=5
添加到 GET 请求的 URL)。 如果都是新的,在下一个请求中,当你使用延续时,要求一个更大的数字,比如 20。如果这些仍然是新的,要么是提要有很多更新,要么已经有一段时间了,所以以 100 为一组继续,或者其他什么。
但是,如果我错了,请纠正我,您还想知道,在下载一个项目后,其状态是否因使用 Google Reader 界面阅读该项目的人而从“未读”更改为“已读”。
一种方法是:
如果用户订阅了很多不同的博客,他也可能对它们进行广泛的标记,因此您可以在每个标签的基础上完成整个事情,而不是针对整个提要,这应该有助于减少数据量,因为如果用户没有在谷歌阅读器上阅读任何新内容,则无需对标签进行任何传输。
整个方案也可以应用于其他状态,例如加星标或未加星标。
现在,正如你所说,这
确实如此。 既不保持本地已读/未读状态(因为您无论如何都保留所有项目的数据库),也不标记在谷歌中已读的项目(API 支持)似乎都非常困难,那么为什么这对您不起作用呢?
然而,还有一个进一步的问题:用户可能会在谷歌上将已读的内容标记为未读。 这给系统带来了一些麻烦。 我的建议是,如果你真的想尝试解决这个问题,那就假设用户一般只会接触最近的东西,并且每次下载最新的几百个左右的项目,检查所有项目的状态他们。 (这并不是那么糟糕;下载 100 个项目需要 0.3 秒(300KB)到 2.5 秒(2.5MB),尽管是在非常快的宽带连接上。)
同样,如果用户有大量的订阅,他也可能拥有相当多的标签,因此在每个标签的基础上执行此操作会加快速度。 实际上,我建议您不仅要按标签进行检查,还要分散检查,每分钟检查一个标签,而不是每二十分钟检查一次所有标签。 如果您想降低带宽,您还可以对旧项目的状态更改进行“大检查”,频率低于“新项目”检查的频率,也许每隔几个小时一次。
这有点占用带宽,主要是因为您需要从 Google 下载完整的文章来检查状态。 不幸的是,我在可用的 API 文档中看不到任何解决办法。 我唯一真正的建议是尽量减少对非新项目的状态检查。
To get the latest entries, use the standard from-newest-date-descending download, which will start from the latest entries. You will receive a "continuation" token in the XML result, looking something like this:
Scan through the results, pulling out anything new to you. You should find that either all results are new, or everything up to a point is new, and all after that are already known to you.
In the latter case, you're done, but in the former you need to find the new stuff older than what you've already retrieved. Do this by using the continuation to get the results starting from just after the last result in the set you just retrieved by passing it in the GET request as the
c
parameter, e.g.:Continue this way until you have everything.
The
n
parameter, which is a count of the number of items to retrieve, works well with this, and you can change it as you go. If the frequency of checking is user-set, and thus could be very frequent or very rare, you can use an adaptive algorithm to reduce network traffic and your processing load. Initially request a small number of the latest entries, say five (addn=5
to the URL of your GET request). If all are new, in the next request,where you use the continuation, ask for a larger number, say, 20. If those are still all new, either the feed has a lot of updates or it's been a while, so continue on in groups of 100 or whatever.
However, and correct me if I'm wrong here, you also want to know, after you've downloaded an item, whether its state changes from "unread" to "read" due to the person reading it using the Google Reader interface.
One approach to this would be:
If the user subscribes to a lot of different blogs, it's also likely he labels them extensively, so you can do this whole thing on a per-label basis rather than for the entire feed, which should help keep the amount of data down, since you won't need to do any transfers for labels where the user didn't read anything new on google reader.
This whole scheme can be applied to other statuses, such as starred or unstarred, as well.
Now, as you say, this
True enough. Neither keeping a local read/unread state (since you're keeping a database of all of the items anyway) nor marking items read in google (which the API supports) seems very difficult, so why doesn't this work for you?
There is one further hitch, however: the user may mark something read as unread on google. This throws a bit of a wrench into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user in general will be touching only more recent stuff, and download the latest couple hundred or so items every time, checking the status on all of them. (This isn't all that bad; downloading 100 items took me anywhere from 0.3s for 300KB, to 2.5s for 2.5MB, albeit on a very fast broadband connection.)
Again, if the user has a large number of subscriptions, he's also probably got a reasonably large number of labels, so doing this on a per-label basis will speed things up. I'd suggest, actually, that not only do you check on a per-label basis, but you also spread out the checks, checking a single label each minute rather than everything once every twenty minutes. You can also do this "big check" for status changes on older items less often than you do a "new stuff" check, perhaps once every few hours, if you want to keep bandwidth down.
This is a bit of bandwidth hog, mainly because you need to download the full article from Google merely to check the status. Unfortunately, I can't see any way around that in the API docs that we have available to us. My only real advice is to minimize the checking of status on non-new items.
Google API 尚未发布,届时这个答案可能会改变。
目前,您必须调用 API 并忽略已下载的项目,正如您所说,这不是非常有效,因为您每次都会重新下载项目,即使您已经拥有它们。
The Google API hasn't yet been released, at which point this answer may change.
Currently, you would have to call the API and dis-regard items already downloaded, which as you said isn't terribly efficient as you will be re-downloading items every time, even if you already have them.