如何定义从Python SDK发送到Firehose的JSON的AWS胶模式?
我考虑了此设置:
pythonsdk发送预定义的json-> AWS Kinesis Firehose->使用AWS Glue Schema->将数据转换为“镶木”。 是否成功)。
将数据保存到S3 ( 布尔值很容易,发送数组/结构根本不是微不足道的。 我一直收到奇怪的错误消息:
模式无效。解析模式的错误:错误:类型预期 在“ structname:string,id:bigint,is_bla:boolean”的位置0 但是找到了“结构”。
或者
模式无效。解析模式的错误:错误:类型预期 在“数组”的位置0,但找到了“数组”。
I have this setup in mind:
PythonSDK sending predefined JSON -> aws kinesis firehose -> convert data to "Parquet" using AWS GLUE schema -> save data to S3 (either if succeed or not).
While sending primities type like strings, ints & booleans is easy, sending array/struct isn't trivial at all.
I keep getting weird error messages of:
The schema is invalid. Error parsing the schema: Error: type expected
at the position 0 of 'STRUCTname:STRING,id:BIGINT,is_bla:BOOLEAN'
but 'STRUCT' is found.
OR
The schema is invalid. Error parsing the schema: Error: type expected
at the position 0 of 'ARRAY' but 'ARRAY' is found.
- Why I'm getting those error messages?
- Is there a proper doc/examples for schema data types?
i could only find this saying ColumnType
should match the "Single-line string pattern".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会回答我的问题:
保存胶水模式和amp;将数据发送到Firehose。更新的JSON我发送了使用的旧架构,因此错误。
也来自 不幸的是,AWS在创建时并没有做到这一点。
I'll answer my question:
there is some delay between saving GLUE schema & sending data to firehose. updated JSONs I send used old schema hence the errors.
also from this and that we have to validate some naming conventions ourselfs, it's quite unfortunate AWS doesn't do it upon creation.