如何查询 DynamoDB?

发布于 2025-01-02 09:26:28 字数 823 浏览 2 评论 0原文

我正在关注 Amazon 的 DynamoDB,因为它看起来消除了维护和扩展数据库服务器的所有麻烦。我目前正在使用 MySQL,维护和扩展数据库非常令人头疼。

我已经阅读了文档,但我很难理解如何构建数据以便可以轻松检索数据。

我对 NoSQL 和非关系数据库完全陌生。

从 Dynamo 文档看来,您只能使用有限数量的比较运算符查询主哈希键和主范围键上的表。

或者您可以运行全表扫描并对其应用过滤器。问题是它一次只能扫描 1Mb,因此您可能必须重复扫描才能找到 X 个结果。

我意识到这些限制使它们能够提供可预测的性能,但似乎这使得获取数据变得非常困难。执行全表扫描似乎效率很低,而且随着表的增长,效率只会变得更低。

例如,假设我有一个 Flickr 克隆版。我的图像表可能类似于:

  • 图像 ID(数字、主哈希键)
  • 添加日期(数字、主范围键)
  • 用户 ID(字符串)
  • 标签(字符串集)

因此,使用查询我将能够列出来自最近 7 天并很容易将其限制为 X 个结果。

但是,如果我想列出特定用户的所有图像,我需要进行全表扫描并按用户名进行过滤。标签也是如此。

由于您一次只能扫描 1Mb,因此您可能需要进行多次扫描才能找到 X 个图像。我也没有找到一种方法可以轻松地停止在 X 个图像上。如果您尝试抓取 30 张图像,第一次扫描可能会找到 5 张图像,第二次扫描可能会找到 40 张图像。

我这样说对吗?这基本上是一种权衡吗?您可以获得真正快速、可预测的数据库性能,而且几乎无需维护。但代价是您需要构建更多的逻辑来处理结果?

还是我完全偏离了基地?

I'm looking at Amazon's DynamoDB as it looks like it takes away all of the hassle of maintaining and scaling your database server. I'm currently using MySQL, and maintaining and scaling the database is a complete headache.

I've gone through the documentation and I'm having a hard time trying to wrap my head around how you would structure your data so it could be easily retrieved.

I'm totally new to NoSQL and non-relational databases.

From the Dynamo documentation it sounds like you can only query a table on the primary hash key, and the primary range key with a limited number of comparison operators.

Or you can run a full table scan and apply a filter to it. The catch is that it will only scan 1Mb at a time, so you'd likely have to repeat your scan to find X number of results.

I realize these limitations allow them to provide predictable performance, but it seems like it makes it really difficult to get your data out. And performing full table scans seems like it would be really inefficient, and would only become less efficient over time as your table grows.

For Instance, say I have a Flickr clone. My Images table might look something like:

  • Image ID (Number, Primary Hash Key)
  • Date Added (Number, Primary Range Key)
  • User ID (String)
  • Tags (String Set)
  • etc

So using query I would be able to list all images from the last 7 days and limit it to X number of results pretty easily.

But if I wanted to list all images from a particular user I would need to do a full table scan and filter by username. Same would go for tags.

And because you can only scan 1Mb at a time you may need to do multiple scans to find X number of images. I also don't see a way to easily stop at X number of images. If you're trying to grab 30 images, your first scan might find 5, and your second may find 40.

Do I have this right? Is it basically a trade-off? You get really fast predictable database performance that is virtually maintenance free. But the trade-off is that you need to build way more logic to deal with the results?

Or am I totally off base here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

携君以终年 2025-01-09 09:26:28

是的,您关于性能和查询灵活性之间的权衡是正确的。

但有一些技巧可以减轻痛苦 - 二级索引/非规范化可能是最重要的。

例如,您将有另一个以用户 ID 为关键字的表,列出他们的所有图像。添加图像时,您会更新此表,并向表中添加以图像 ID 为键的行。

您必须决定需要哪些查询,然后围绕它们设计数据模型。

Yes, you are correct about the trade-off between performance and query flexibility.

But there are a few tricks to reduce the pain - secondary indexes/denormalising probably being the most important.

You would have another table keyed on user ID, listing all their images, for example. When you add an image, you update this table as well as adding a row to the table keyed on image ID.

You have to decide what queries you need, then design the data model around them.

倾城°AllureLove 2025-01-09 09:26:28

我认为您需要使用另一个表创建自己的二级索引

该表“模式”可以是:

    User ID (String, Primary Key)
    Date Added (Number, Range Key)
    Image ID (Number)

--

这样您就可以按用户 ID 查询并按日期过滤

I think you need create your own secondary index, using another table.

This table "schema" could be:

    User ID (String, Primary Key)
    Date Added (Number, Range Key)
    Image ID (Number)

--

That way you can query by User ID and filter by Date as well

难忘№最初的完美 2025-01-09 09:26:28

您可以使用复合哈希范围键作为主索引。

从 DynamoDB 页面:

主键可以是单属性哈希键,也可以是复合键
哈希范围键。单个属性哈希主键可以是,例如
例如,“用户 ID”。这将使您能够快速读取和写入数据
对于与给定用户 ID 关联的项目。

复合哈希范围键被索引为哈希键元素和
范围关键元素。这个多部分密钥维护了之间的层次结构
第一个和第二个元素值。例如,复合
哈希范围键可以是“UserID”(哈希)和
“时间戳”(范围)。保持哈希键元素不变,您可以
跨范围键元素搜索以检索项目。 这会
例如,允许您使用查询 API 检索所有项目
跨一系列时间戳的单个 UserID。

You can use composite hash-range key as primary index.

From the DynamoDB Page:

A primary key can either be a single-attribute hash key or a composite
hash-range key. A single attribute hash primary key could be, for
example, “UserID”. This would allow you to quickly read and write data
for an item associated with a given user ID.

A composite hash-range key is indexed as a hash key element and a
range key element. This multi-part key maintains a hierarchy between
the first and second element values. For example, a composite
hash-range key could be a combination of “UserID” (hash) and
“Timestamp” (range). Holding the hash key element constant, you can
search across the range key element to retrieve items. This would
allow you to use the Query API to, for example, retrieve all items for
a single UserID across a range of timestamps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文