雅典娜与前斜线互动

发布于 2025-02-08 20:49:56 字数 1051 浏览 2 评论 0原文

我设置了涉及胶水，S3和雅典娜的流程。

这些文件以CSV设置为CSV，带有围绕每个值的报价，示例：

“ 1”，“伦敦/，英国”，“”，“ True”
“ 1”，“英国”，“中间”，“ false”

Col A是很大的INT，其余的作为琴弦

在查询雅典娜时，结果显示为：

Col A	Col a Col B	Col C Col C	Col C Col C Col d
1	伦敦	英国
1	United王国	中间	错误，

这显然归结为前向斜线导致胶水，或者雅典娜忽略了引用，或者逗号导致所有结果转移了一个。我的问题是如何解决这个问题？我真的不想更改我的ETL（我一直在考虑将数据移至DynamoDB），因为雅典娜还适合其他所有要求。我希望有一个我可以制定的，即雅典娜如何解释

我的serde serialization lib是：org.apache.hadoop.hive.hive.serde2.lazy.lazysimpleserde

serde serde参数： field.delim，

原文

I've set up a flow involving Glue, S3 and Athena.

The files are set up as csv with quotations surrounding each value, sample:

"1","London/,United Kingdom","","TRUE"
"1","United Kingdom","mid","FALSE"

Col A is big int, with the rest as strings

When querying Athena, the results appear as:

Col A	Col B	Col C	Col D
1	London	United Kingdom
1	United Kingdom	Mid	FALSE

This obviously comes down to the forward-slash causing glue, or Athena to ignore the quotation, or the comma which is causing all the results to shift over one. My question is how do I get around this? I really don't want to change my ETL (I've been considering moving the data to dynamodb) as Athena fits every other requirement. I'm hoping there is a setting I can make as to how Athena interprets slashes

my Serde serialization lib is: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Serde parameters:
field.delim ,

分享到QQ

分享到微博