如何在 Web 中使用 PostgreSQL 获取连续列表

发布于 2024-12-09 06:57:49 字数 377 浏览 0 评论 0原文

我正在通过 HTTP 创建一个 API,通过分页从 PostgreSQL 获取许多行。在普通情况下,我通常通过朴素的 OFFET/LIMIT 子句来实现这种分页。然而,在这种情况下有一些特殊的要求:

  • 有很多行,所以我相信用户无法到达末尾(想象一下 Twitter 时间线)。
  • 页面不必是随机访问的,而只能是顺序访问的。
  • API 将返回一个 URL,其中包含指向连续块页面的游标标记。
  • 游标标记不必永久存在,而是存在一段时间。
  • 它的顺序经常波动(如 Reddit 排名),但是连续游标应保持一致的顺序。

我怎样才能完成使命?我已准备好为其更改整个数据库架构!

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:

  • A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
  • Pages does not have to be randomly accessible but only sequentially.
  • API would return a URL which contains a cursor token that directs to the page of continuous chunks.
  • Cursor tokens have not to exist permanently but for some time.
  • Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.

How can I achieve the mission? I am ready to change my whole database schema for it!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

北斗星光 2024-12-16 06:57:49

假设只是结果的顺序发生波动,而不是行中的数据发生波动,那么 Fredrik 的答案是有道理的。不过,我建议添加以下内容:

  • 使用 数组将 id 列表存储在 postgresql 表中 键入而不是在内存中。在内存中执行此操作,除非您小心地使用具有自动过期和内存限制的 redis 之类的东西,否则您将面临 DOS 内存消耗攻击。我想它会看起来像这样:

    创建表 foo_paging_cursor (
      cursor_token ..., -- 可能 uuid 或时间戳是最好的(见下文)
      result_ids 整数 [],-- 或文本 [] 如果你有非整数 id
      expiry_time 时间戳
    );
    
  • 您需要决定是否可以在用户之间共享cursor_token和result_ids,以减少存储需求以及每个用户运行初始查询所需的时间。如果可以共享,请选择一个缓存窗口,例如 1 或 5 分钟,然后根据新请求创建该时间段的 cache_token,然后检查是否已为该令牌计算结果 id。如果没有,请为该标记添加一个新行。您可能应该在检查/插入代码周围添加一个锁来处理对新令牌的并发请求。

  • 安排一个后台作业来清除旧令牌/结果,并确保您的客户端代码可以处理与过期/无效令牌相关的任何错误。

甚至不要考虑为此使用真正的数据库游标。

将结果 ID 保留在 Redis 列表中是处理此问题的另一种方法(请参阅 LRANGE 命令),但要小心过期和内存使用情况如果你走那条路。您的 Redis 键将是cursor_token,id 将是列表的成员。

Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:

  • store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:

    create table foo_paging_cursor (
      cursor_token ..., -- probably a uuid is best or timestamp (see below)
      result_ids integer[], -- or text[] if you have non-integer ids
      expiry_time TIMESTAMP
    );
    
  • You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.

  • Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.

Don't even consider using real db cursors for this.

Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.

眼眸里的快感 2024-12-16 06:57:49

我对 PostgreSQL 完全一无所知,但我是一名相当不错的 SQL Server 开发人员,所以无论如何我都想尝试一下:)

您期望用户会显示多少行/页每个会话最大浏览次数?例如,如果您希望用户每次会话最多翻阅 10 页 [每页包含 50 行],您可以采用该最大值,并设置 Web 服务,以便当用户请求第一页时,您可以缓存10*50 行(或者只是行的 Id:s,取决于您获得的内存/并发用户数)。

这肯定会以多种方式帮助加快您的网络服务速度。而且它很容易实现。所以:

  • 当用户从第 1 页请求数据时。运行查询(完成排序依据、连接检查等),将所有 id:s 存储到一个数组中(但最多 500 个 id)。返回与数组中 id:s 位置 0-9 对应的数据行。
  • 当用户请求页面#2-10时。返回与数组中位置 (page-1)*50 - (page)*50-1 处的 id:s 对应的数据行。

您还可以增加数字,500 个 int:s 的数组仅占用 2K 内存,但这也取决于您希望初始查询/响应的速度。

我在实际网站上使用了类似的技术,当用户继续浏览第 10 页时,我只是切换到查询。我想另一个解决方案是继续扩展/填充数组。 (再次运行查询,但排除已包含的 id:s)。

无论如何,希望这有帮助!

I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)

How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).

This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:

  • When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
  • When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.

You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.

I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).

Anyway, hope this helps!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文