如何在 Web 中使用 PostgreSQL 获取连续列表

发布于 2024-12-09 06:57:49 字数 377 浏览 4 评论 0原文

我正在通过 HTTP 创建一个 API，通过分页从 PostgreSQL 获取许多行。在普通情况下，我通常通过朴素的 OFFET/LIMIT 子句来实现这种分页。然而，在这种情况下有一些特殊的要求：

有很多行，所以我相信用户无法到达末尾（想象一下 Twitter 时间线）。
页面不必是随机访问的，而只能是顺序访问的。
API 将返回一个 URL，其中包含指向连续块页面的游标标记。
游标标记不必永久存在，而是存在一段时间。
它的顺序经常波动（如 Reddit 排名），但是连续游标应保持一致的顺序。

我怎样才能完成使命？我已准备好为其更改整个数据库架构！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北斗星光 2024-12-16 06:57:49

假设只是结果的顺序发生波动，而不是行中的数据发生波动，那么 Fredrik 的答案是有道理的。不过，我建议添加以下内容：

使用数组将 id 列表存储在 postgresql 表中键入而不是在内存中。在内存中执行此操作，除非您小心地使用具有自动过期和内存限制的 redis 之类的东西，否则您将面临 DOS 内存消耗攻击。我想它会看起来像这样：
```
创建表 foo_paging_cursor (
  cursor_token ..., -- 可能 uuid 或时间戳是最好的（见下文）
  result_ids 整数 []，-- 或文本 [] 如果你有非整数 id
  expiry_time 时间戳
）；
```
您需要决定是否可以在用户之间共享cursor_token和result_ids，以减少存储需求以及每个用户运行初始查询所需的时间。如果可以共享，请选择一个缓存窗口，例如 1 或 5 分钟，然后根据新请求创建该时间段的 cache_token，然后检查是否已为该令牌计算结果 id。如果没有，请为该标记添加一个新行。您可能应该在检查/插入代码周围添加一个锁来处理对新令牌的并发请求。
安排一个后台作业来清除旧令牌/结果，并确保您的客户端代码可以处理与过期/无效令牌相关的任何错误。

甚至不要考虑为此使用真正的数据库游标。

将结果 ID 保留在 Redis 列表中是处理此问题的另一种方法（请参阅 LRANGE 命令），但要小心过期和内存使用情况如果你走那条路。您的 Redis 键将是cursor_token，id 将是列表的成员。

Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:

store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:
```
create table foo_paging_cursor (
  cursor_token ..., -- probably a uuid is best or timestamp (see below)
  result_ids integer[], -- or text[] if you have non-integer ids
  expiry_time TIMESTAMP
);
```
You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.
Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.

Don't even consider using real db cursors for this.

Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.

回复收藏 0 原文

眼眸里的快感 2024-12-16 06:57:49

我对 PostgreSQL 完全一无所知，但我是一名相当不错的 SQL Server 开发人员，所以无论如何我都想尝试一下:)

您期望用户会显示多少行/页每个会话最大浏览次数？例如，如果您希望用户每次会话最多翻阅 10 页 [每页包含 50 行]，您可以采用该最大值，并设置 Web 服务，以便当用户请求第一页时，您可以缓存10*50 行（或者只是行的 Id:s，取决于您获得的内存/并发用户数）。

这肯定会以多种方式帮助加快您的网络服务速度。而且它很容易实现。所以：

当用户从第 1 页请求数据时。运行查询（完成排序依据、连接检查等），将所有 id:s 存储到一个数组中（但最多 500 个 id）。返回与数组中 id:s 位置 0-9 对应的数据行。
当用户请求页面#2-10时。返回与数组中位置 (page-1)*50 - (page)*50-1 处的 id:s 对应的数据行。

您还可以增加数字，500 个 int:s 的数组仅占用 2K 内存，但这也取决于您希望初始查询/响应的速度。

我在实际网站上使用了类似的技术，当用户继续浏览第 10 页时，我只是切换到查询。我想另一个解决方案是继续扩展/填充数组。（再次运行查询，但排除已包含的 id:s）。

无论如何，希望这有帮助！

回复收藏 0 原文

~没有更多了~