如何查询 DynamoDB?
我正在关注 Amazon 的 DynamoDB,因为它看起来消除了维护和扩展数据库服务器的所有麻烦。我目前正在使用 MySQL,维护和扩展数据库非常令人头疼。
我已经阅读了文档,但我很难理解如何构建数据以便可以轻松检索数据。
我对 NoSQL 和非关系数据库完全陌生。
从 Dynamo 文档看来,您只能使用有限数量的比较运算符查询主哈希键和主范围键上的表。
或者您可以运行全表扫描并对其应用过滤器。问题是它一次只能扫描 1Mb,因此您可能必须重复扫描才能找到 X 个结果。
我意识到这些限制使它们能够提供可预测的性能,但似乎这使得获取数据变得非常困难。执行全表扫描似乎效率很低,而且随着表的增长,效率只会变得更低。
例如,假设我有一个 Flickr 克隆版。我的图像表可能类似于:
- 图像 ID(数字、主哈希键)
- 添加日期(数字、主范围键)
- 用户 ID(字符串)
- 标签(字符串集)
- 等
因此,使用查询我将能够列出来自最近 7 天并很容易将其限制为 X 个结果。
但是,如果我想列出特定用户的所有图像,我需要进行全表扫描并按用户名进行过滤。标签也是如此。
由于您一次只能扫描 1Mb,因此您可能需要进行多次扫描才能找到 X 个图像。我也没有找到一种方法可以轻松地停止在 X 个图像上。如果您尝试抓取 30 张图像,第一次扫描可能会找到 5 张图像,第二次扫描可能会找到 40 张图像。
我这样说对吗?这基本上是一种权衡吗?您可以获得真正快速、可预测的数据库性能,而且几乎无需维护。但代价是您需要构建更多的逻辑来处理结果?
还是我完全偏离了基地?
I'm looking at Amazon's DynamoDB as it looks like it takes away all of the hassle of maintaining and scaling your database server. I'm currently using MySQL, and maintaining and scaling the database is a complete headache.
I've gone through the documentation and I'm having a hard time trying to wrap my head around how you would structure your data so it could be easily retrieved.
I'm totally new to NoSQL and non-relational databases.
From the Dynamo documentation it sounds like you can only query a table on the primary hash key, and the primary range key with a limited number of comparison operators.
Or you can run a full table scan and apply a filter to it. The catch is that it will only scan 1Mb at a time, so you'd likely have to repeat your scan to find X number of results.
I realize these limitations allow them to provide predictable performance, but it seems like it makes it really difficult to get your data out. And performing full table scans seems like it would be really inefficient, and would only become less efficient over time as your table grows.
For Instance, say I have a Flickr clone. My Images table might look something like:
- Image ID (Number, Primary Hash Key)
- Date Added (Number, Primary Range Key)
- User ID (String)
- Tags (String Set)
- etc
So using query I would be able to list all images from the last 7 days and limit it to X number of results pretty easily.
But if I wanted to list all images from a particular user I would need to do a full table scan and filter by username. Same would go for tags.
And because you can only scan 1Mb at a time you may need to do multiple scans to find X number of images. I also don't see a way to easily stop at X number of images. If you're trying to grab 30 images, your first scan might find 5, and your second may find 40.
Do I have this right? Is it basically a trade-off? You get really fast predictable database performance that is virtually maintenance free. But the trade-off is that you need to build way more logic to deal with the results?
Or am I totally off base here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,您关于性能和查询灵活性之间的权衡是正确的。
但有一些技巧可以减轻痛苦 - 二级索引/非规范化可能是最重要的。
例如,您将有另一个以用户 ID 为关键字的表,列出他们的所有图像。添加图像时,您会更新此表,并向表中添加以图像 ID 为键的行。
您必须决定需要哪些查询,然后围绕它们设计数据模型。
Yes, you are correct about the trade-off between performance and query flexibility.
But there are a few tricks to reduce the pain - secondary indexes/denormalising probably being the most important.
You would have another table keyed on user ID, listing all their images, for example. When you add an image, you update this table as well as adding a row to the table keyed on image ID.
You have to decide what queries you need, then design the data model around them.
我认为您需要使用另一个表创建自己的二级索引。
该表“模式”可以是:
--
这样您就可以按用户 ID 查询并按日期过滤
I think you need create your own secondary index, using another table.
This table "schema" could be:
--
That way you can query by User ID and filter by Date as well
您可以使用复合哈希范围键作为主索引。
从 DynamoDB 页面:
You can use composite hash-range key as primary index.
From the DynamoDB Page: