对于 RSS 提要 XML 文件来说多大算太大?

发布于 2024-10-22 00:57:10 字数 390 浏览 5 评论 0原文

我正在为网站实现 RSS 提要,但我不了解有关提要的 XML 文件的格式/大小/内容的某些信息。

我正在使用过去的数据初始化该网站,这些数据可以追溯到 1999 年(在此之前的任何时候都没有提要),并且每年只会添加几百个项目。

是否有某种存档协议,或者我可以只保留一个文件并继续附加到它吗?我认为这效率很低,因为聚合器必须下载整个内容(我认为)。

那么,这通常有什么习俗呢?限制在最后一个月吗?当前包含 900 多个项目的文件大小为 1.5MB,我预计 1 年的价值约为该大小的 1/10 或更小。

关于使用什么原则以及如何实施它的任何指示?我正在使用 PHP,但我的数据足够复杂,我滚动了自己的脚本来编写文件(并且它验证得很好),所以我不能使用固定的解决方案 - 我需要了解在我自己的中实现什么脚本。

I'm implementing an RSS feed for a website and I don't understand certain things about the format/size/content of the XML file for the feed.

I'm initializing the site with the past data, which runs back to 1999 (there was no feed at any point before now), and only a couple hundred items will be added per year.

Is there some protocol for archiving, or can I just keep the one file and continue appending to it? I'd think that would be inefficient, as the aggregators have to download the whole thing (I assume).

So, what's the usual custom for this? Limit it to the last month? The current file with over 900 items is 1.5MB, and I'd expect 1 year's worth to be about 1/10th that in size or less.

Any pointers on this on what principles to use and how to implement it? I'm using PHP, but my data is complicated enough I rolled my own script to write the file (and it validates just fine), so I can't use a canned solution -- I need to understand what to implement in my own script.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

别忘他 2024-10-29 00:57:10

大多数联合供稿的消费者都期望该供稿将包含相对较新的内容,而先前发布的内容会从该供稿中“脱落”。您在 Feed 中维护的内容量通常取决于您要发布的内容类型,但随着 Feed 大小的增长,它可能会影响 Feed 客户端检索和解析您的信息的能力。

如果您确实想要发布一个不断添加但从未删除内容项的历史提要,您可能需要考虑以下选项(根据消费者的需求):

  1. 实施 提要分页和存档根据 RFC 5005 第 3 部分,因为当条目数量非常大、无限或不确定时,分页提要非常有用。客户端可以通过提要“分页”,仅根据需要访问提要条目的子集。
  2. 按逻辑将您的内容分段为多个 Fe​​ed,并为您网站上的 Feed 提供自动发现
  3. 实现基于 REST 的服务接口,允许消费者以 Atom 或 RSS 格式的提要形式检索和过滤您的内容,并使用一些合理的默认值进行默认表示。

仅当您知道将使用您的 Feed 的 Feed 客户端类型时,选项 1 才是合理的方法,因为并非所有 Feed 客户端都支持分页。

选项 2 是面向公众的网站上最常见的一种,因为大多数浏览器和客户端都支持自动发现,并且您可以提供完整的历史提要和较小的较新的内容提要(或片段)以对您的内容有意义的方式)。

选项 3 可能允许您提供前两个选项的优点,此外您还可以提供多种 Feed 格式和丰富的内容过滤。这是公开 Feed 内容的一种非常有效的方式,但通常只有当您的消费者表示希望定制他们想要消费的 Feed 内容时,才值得付出努力。

虽然大多数丰富的 Feed 客户端会异步检索 Feed 内容,但随着 Feed 大小的增加,对 Feed 发出同步(且可能频繁)请求的客户端可能会遇到超时问题。

无论您采取什么方向,请考虑在您的 Feed 上实施条件 GET;并了解联合内容的潜在消费者,以便选择最适合的策略。当您考虑采用哪种联合 Feed 格式时,请参阅此答案 (s) 您想要提供的。

Most consumers of syndication feeds have the expectation that the feed will contain relatively recent content, with previously published content 'falling off' of the feed. How much content you maintain in the feed is usually based on the type of content you are publishing but as the size of your feed grows it can impact a feed clients ability to retrieve and parse your information.

If you truly want to publish a historical feed that is continually added to but never has content items removed, you may want to consider the following options (based on the needs of your consumers):

  1. Implement Feed Paging and Archiving, per RFC 5005 Section 3, as paged feeds can be useful when the number of entries is very large, infinite, or indeterminate. Clients can "page" through the feed, only accessing a subset of the feed's entries as necessary.
  2. Logically segment your content into multiple feeds, and provide auto-discovery to the feeds on your website.
  3. Implement a REST based service interface that allows consumers to retrieve and filter your content as an Atom or RSS formatted feed, with the default representation using some reasonable defaults.

Option 1 is a reasonable approach only if you know the type of feed clients that will be consuming your feed, as not all feed clients support pagination.

Option 2 is the most common one seen on public facing web sites, as most browsers and clients support auto-discovery, and you can provide both a full historical feed and a smaller more recent content feed (or segment in ways that make sense for your content).

Option 3 potentially allows you to provide the benefits of both of the first two options, plus you can provide multiple feed formats and rich filtering of your content. It is a very powerful way to expose feed content, but usually is only worth the effort if your consumers indicate a desire for tailoring the feed content they wish to consume.

While most rich feed clients will retrieve feed content asynchronously, clients that make synchronous (and potentially frequent) requests for your feed may experience timeout issues as the size of your feed increases.

Regardless of what direction you take, consider implementing Conditional GET on your feeds; and understand the potential consumers of your syndicated content in order to choose the strategy that fits best. See this answer when you consider which syndication feed format(s) you want to provide.

眼藏柔 2024-10-29 00:57:10

聚合器会重复下载文件,因此限制大小很重要。我会让 feed 包含 10 个项目,或者包含一周前最旧的项目,以提供更多条目为准,除非使用 GET 参数覆盖。当然,这会根据您从客户那里看到的实际使用情况以及提要本身的活动而有所不同。

Aggregators will download the file repeatedly, so limiting the size is important. I would have the feed contain either 10 items, or have the oldest item a week old, whichever gives more entries, unless overridden with a GET parameter. Of course this will vary by the actual usage you see from your clients as well as the activity in the feed itself.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文