添加由 Athena 中的两个现有列组成的列
我有一个由 100 列组成的表,其中两列是 dt 和 tm,它们是分区。看起来像这样:
-------------------
| dt tm |
-------------------
| 2022/01/01 10 |
| 2022/01/02 15 |
| 2022/01/03 03 |
| 2022/01/04 20 |
-------------------
我希望能够添加另一列,它将用作基于 dt 和 tm 的“连接”的时间戳。也就是说:
--------------------------------------------
| dt tm timestamp |
--------------------------------------------
| 2022/01/01 10 2022/01/01 10:00:00 |
| 2022/01/02 15 2022/01/02 15:00:00 |
| 2022/01/03 03 2022/01/03 03:00:00 |
| 2022/01/04 20 2022/01/04 20:00:00 |
--------------------------------------------
该表是使用 Glue 创建的,并使用爬虫每隔几个小时更新一次。
到目前为止,我尝试使用 ALTER TABLE ADD COLUMNS 函数,但无法解决问题,因为除了新列的名称及其类型之外,我无法添加任何内容。我需要的是添加一个为添加的每个新行计算的列。 这可能吗? 感谢任何帮助!
I have a table consisting of 100 columns, two of which are dt and tm, which are partitions. looks something like this:
-------------------
| dt tm |
-------------------
| 2022/01/01 10 |
| 2022/01/02 15 |
| 2022/01/03 03 |
| 2022/01/04 20 |
-------------------
I want to be able to add another column, which will serve as a timestamp based in the "concationation" of dt and tm. That is:
--------------------------------------------
| dt tm timestamp |
--------------------------------------------
| 2022/01/01 10 2022/01/01 10:00:00 |
| 2022/01/02 15 2022/01/02 15:00:00 |
| 2022/01/03 03 2022/01/03 03:00:00 |
| 2022/01/04 20 2022/01/04 20:00:00 |
--------------------------------------------
The table is created with Glue and updated every couple of hours using a crawler.
So far I tried using the ALTER TABLE ADD COLUMNS
function but couldn't work it out since I can't add anything but the name of the new column and its type. What I need is to add a column that is computed for each new row added.
Is that possible?
Appreciate any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
坦率地说,我建议您在表上创建一个具有附加列的视图,而不是添加另一列。
这样,每当在原始表中添加数据时,额外的列将包含您期望的数据。无需“填充”附加列。
如果您的字段是 TEXT,您可以使用:
如果您的字段是 DATE 和 INT 并且您想要 TIMESTAMP 输出,请使用:
然后您可以使用视图的名称访问表,例如:
Frankly, rather than adding another column, I would recommend that you create a view on the table that has an additional column.
This way, whenever data is added in the original table, the extra columns will have the data you expect. There will be no need to 'populate' the additional column.
If your fields are TEXT, you could use:
If your fields are DATE and INT and you'd like a TIMESTAMP output, use:
You can then access the table using the name of the View, such as: