雅典娜与前斜线互动
我设置了涉及胶水,S3和雅典娜的流程。
这些文件以CSV设置为CSV,带有围绕每个值的报价,示例:
“ 1”,“伦敦/,英国”,“”,“ True”
“ 1”,“英国”,“中间”,“ false”
Col A是很大的INT,其余的作为琴弦
在查询雅典娜时,结果显示为:
Col A | Col a Col B | Col C Col C | Col C Col C Col d |
---|---|---|---|
1 | 伦敦 | 英国 | |
1 | United王国 | 中间 | 错误, |
这显然归结为前向斜线导致胶水,或者雅典娜忽略了引用,或者逗号导致所有结果转移了一个。我的问题是如何解决这个问题?我真的不想更改我的ETL(我一直在考虑将数据移至DynamoDB),因为雅典娜还适合其他所有要求。我希望有一个我可以制定的,即雅典娜如何解释
我的serde serialization lib是:org.apache.hadoop.hive.hive.serde2.lazy.lazysimpleserde
serde serde参数: field.delim,
I've set up a flow involving Glue, S3 and Athena.
The files are set up as csv with quotations surrounding each value, sample:
"1","London/,United Kingdom","","TRUE"
"1","United Kingdom","mid","FALSE"
Col A is big int, with the rest as strings
When querying Athena, the results appear as:
Col A | Col B | Col C | Col D |
---|---|---|---|
1 | London | United Kingdom | |
1 | United Kingdom | Mid | FALSE |
This obviously comes down to the forward-slash causing glue, or Athena to ignore the quotation, or the comma which is causing all the results to shift over one. My question is how do I get around this? I really don't want to change my ETL (I've been considering moving the data to dynamodb) as Athena fits every other requirement. I'm hoping there is a setting I can make as to how Athena interprets slashes
my Serde serialization lib is: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Serde parameters:
field.delim ,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论