对于 RSS 提要 XML 文件来说多大算太大?
我正在为网站实现 RSS 提要,但我不了解有关提要的 XML 文件的格式/大小/内容的某些信息。
我正在使用过去的数据初始化该网站,这些数据可以追溯到 1999 年(在此之前的任何时候都没有提要),并且每年只会添加几百个项目。
是否有某种存档协议,或者我可以只保留一个文件并继续附加到它吗?我认为这效率很低,因为聚合器必须下载整个内容(我认为)。
那么,这通常有什么习俗呢?限制在最后一个月吗?当前包含 900 多个项目的文件大小为 1.5MB,我预计 1 年的价值约为该大小的 1/10 或更小。
关于使用什么原则以及如何实施它的任何指示?我正在使用 PHP,但我的数据足够复杂,我滚动了自己的脚本来编写文件(并且它验证得很好),所以我不能使用固定的解决方案 - 我需要了解在我自己的中实现什么脚本。
I'm implementing an RSS feed for a website and I don't understand certain things about the format/size/content of the XML file for the feed.
I'm initializing the site with the past data, which runs back to 1999 (there was no feed at any point before now), and only a couple hundred items will be added per year.
Is there some protocol for archiving, or can I just keep the one file and continue appending to it? I'd think that would be inefficient, as the aggregators have to download the whole thing (I assume).
So, what's the usual custom for this? Limit it to the last month? The current file with over 900 items is 1.5MB, and I'd expect 1 year's worth to be about 1/10th that in size or less.
Any pointers on this on what principles to use and how to implement it? I'm using PHP, but my data is complicated enough I rolled my own script to write the file (and it validates just fine), so I can't use a canned solution -- I need to understand what to implement in my own script.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
大多数联合供稿的消费者都期望该供稿将包含相对较新的内容,而先前发布的内容会从该供稿中“脱落”。您在 Feed 中维护的内容量通常取决于您要发布的内容类型,但随着 Feed 大小的增长,它可能会影响 Feed 客户端检索和解析您的信息的能力。
如果您确实想要发布一个不断添加但从未删除内容项的历史提要,您可能需要考虑以下选项(根据消费者的需求):
仅当您知道将使用您的 Feed 的 Feed 客户端类型时,选项 1 才是合理的方法,因为并非所有 Feed 客户端都支持分页。
选项 2 是面向公众的网站上最常见的一种,因为大多数浏览器和客户端都支持自动发现,并且您可以提供完整的历史提要和较小的较新的内容提要(或片段)以对您的内容有意义的方式)。
选项 3 可能允许您提供前两个选项的优点,此外您还可以提供多种 Feed 格式和丰富的内容过滤。这是公开 Feed 内容的一种非常有效的方式,但通常只有当您的消费者表示希望定制他们想要消费的 Feed 内容时,才值得付出努力。
虽然大多数丰富的 Feed 客户端会异步检索 Feed 内容,但随着 Feed 大小的增加,对 Feed 发出同步(且可能频繁)请求的客户端可能会遇到超时问题。
无论您采取什么方向,请考虑在您的 Feed 上实施条件 GET;并了解联合内容的潜在消费者,以便选择最适合的策略。当您考虑采用哪种联合 Feed 格式时,请参阅此答案 (s) 您想要提供的。
Most consumers of syndication feeds have the expectation that the feed will contain relatively recent content, with previously published content 'falling off' of the feed. How much content you maintain in the feed is usually based on the type of content you are publishing but as the size of your feed grows it can impact a feed clients ability to retrieve and parse your information.
If you truly want to publish a historical feed that is continually added to but never has content items removed, you may want to consider the following options (based on the needs of your consumers):
Option 1 is a reasonable approach only if you know the type of feed clients that will be consuming your feed, as not all feed clients support pagination.
Option 2 is the most common one seen on public facing web sites, as most browsers and clients support auto-discovery, and you can provide both a full historical feed and a smaller more recent content feed (or segment in ways that make sense for your content).
Option 3 potentially allows you to provide the benefits of both of the first two options, plus you can provide multiple feed formats and rich filtering of your content. It is a very powerful way to expose feed content, but usually is only worth the effort if your consumers indicate a desire for tailoring the feed content they wish to consume.
While most rich feed clients will retrieve feed content asynchronously, clients that make synchronous (and potentially frequent) requests for your feed may experience timeout issues as the size of your feed increases.
Regardless of what direction you take, consider implementing Conditional GET on your feeds; and understand the potential consumers of your syndicated content in order to choose the strategy that fits best. See this answer when you consider which syndication feed format(s) you want to provide.
聚合器会重复下载文件,因此限制大小很重要。我会让 feed 包含 10 个项目,或者包含一周前最旧的项目,以提供更多条目为准,除非使用 GET 参数覆盖。当然,这会根据您从客户那里看到的实际使用情况以及提要本身的活动而有所不同。
Aggregators will download the file repeatedly, so limiting the size is important. I would have the feed contain either 10 items, or have the oldest item a week old, whichever gives more entries, unless overridden with a GET parameter. Of course this will vary by the actual usage you see from your clients as well as the activity in the feed itself.