AWS雅典娜桌自动出现在AWS胶水控制台中
我最近发现,AWS Athena表可能具有的分区数量有限制(目前20000年,此处提到: https://docs.aws.amazon.com/athena/latest/latest/ug/partitions.html )。
同一页面提到AWS胶水桌可能具有1000万个分区,因此我打开了AWS胶水控制台,以重新创建我到目前为止在雅典娜(Athena)使用的桌子,并惊讶地看到我在雅典娜控制台中创建的所有桌子都是也在AWS胶水控制台中列出。
因此,一个问题,这是否意味着在雅典娜控制台中创建的每张桌子都将成为AWS胶水表,并将支持1000万个分区?
我目前正在使用athena sdk for java(插入查询,这些查询以蜂巢格式动态生成分区(即col1 =< ...>/col2 =< ...>/...
)。我还能使用它吗?还有其他专门用于胶水表的SDK吗? 我目前担心的是表t2
:它将很快到达20000分区限制,所以我想知道我是否仍然需要担心这一点?
如果在AWS胶水控制台中列出的事实并不意味着支持1000万个分区,那么如何使现有的Athena Table支持10M分区?是否应该使用“ add Table”在AWS胶水控制台中创建表格,以便具有1000万个分区支持?
I recently found out that there's a restriction on the number of partitions that AWS Athena table may have (20000 at the moment, mentioned here: https://docs.aws.amazon.com/athena/latest/ug/partitions.html).
The same page mentions that AWS Glue tables may have 10 million partitions, so I opened my AWS Glue console to recreate the tables that I had been using in Athena so far, and was surprised to see all the tables that I created in Athena console being listed in AWS Glue console as well.
Hence a question, does that mean every table created in Athena console is going to be an AWS Glue table and is going to support 10 million partitions?
I am currently using Athena SDK for Java (https://docs.aws.amazon.com/athena/latest/ug/code-samples.html) to select and load data from table t1
into table t2
using INSERT INTO
queries which dynamically generate partitions in Hive format (i.e. col1=<...>/col2=<...>/...
). Can I still use it? Is there any other SDK specifically for Glue tables?
My current concern is table t2
: it's going to reach 20000 partitions limit quite soon so I'm wondering if I still need to worry about that or not?
And in case if the fact of being listed in AWS Glue console does not yet imply supporting 10M partitions, then how to make existing Athena table support 10M partitions? Should the table be created in AWS Glue console using "Add table" in order to have 10M partition support?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是,否。如果您使用胶水数据目录来查询雅典娜(默认情况下,您是),则雅典娜支持用10m分区的查询表。但是,它实际上只能一次使用1M。 source
Yes and no. If you are using the Glue data catalog to query Athena (by default, you are), then Athena supports querying tables with 10m partitions. However, it can only actually use 1m of those partitions at a time. source