如何将包含时间段_ltz的表转换为pyflink 1.15.0中的dataStream?
我使用pyflink 1.15.0的Kinesis连接器读取事件使用源表。 An example of the sorts of data that are in this stream is
请注意,数据流包含许多不同类型的事件,其中“详细信息”字段在不同的事件类型之间完全不同。使用PYFLINK DATASTREAM API对此连接器没有支持,因此我使用表API构建源表。该表看起来像这样:
CREATE TABLE events (
`id` VARCHAR,
`source` VARCHAR,
`account` VARCHAR,
`region` VARCHAR,
`detail-type` VARCHAR,
`detail` VARCHAR,
`source` VARCHAR,
`resources` VARCHAR,
`time` TIMESTAMP(0) WITH LOCAL TIME ZONE,
WATERMARK FOR `time` as `time` - INTERVAL '30' SECOND,
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
...
)
使用:
table_env.execute_sql(CREATE_STRING_ABOVE)
我想将此表转换为数据流,以便我可以执行一些在DataStream API中更容易执行的处理:
events_stream_table = table_env.from_path('events')
events_stream = table_env.to_data_stream(events_stream_table)
# now do some processing - let's filter by the type of event we get
codebuild_stream = events_stream.filter(
lambda event: event['source'] == 'aws.codebuild'
)
# now do other stuff on a stream containing only events that are identical in shape
...
# maybe convert back into a Table and perform SQL on the data
当我运行此功能时,我会得到一个例外:
org.apache.flink.table.api.tableException:未支撑的转换 数据类型“时间戳(6)与时区”(转换类别: Java.Time.OffsetDateTime)要输入信息。只有数据类型 起源于类型信息完全支持反向转换。
有人报告了类似的错误在这里。当我在那里尝试建议并用本地时区域 Timestamp(3)
timestamp(0)
我得到不同的例外:
TypeError:Java类型信息:不支持LocalDateTime pyflink当前。
是否有一种方法可以将此表转换为数据串联(然后再次返回)?我需要在“时间”字段中使用数据作为事件的水印来源。
I have a source table using a Kinesis connector reading events from AWS EventBridge using PyFlink 1.15.0. An example of the sorts of data that are in this stream is here.
Note that the stream of data contains many different types of events, where the 'detail' field is completely different between different event types. There is no support for this connector using PyFlink DataStream API, so I use the Table API to construct the source table. The table looks like this:
CREATE TABLE events (
`id` VARCHAR,
`source` VARCHAR,
`account` VARCHAR,
`region` VARCHAR,
`detail-type` VARCHAR,
`detail` VARCHAR,
`source` VARCHAR,
`resources` VARCHAR,
`time` TIMESTAMP(0) WITH LOCAL TIME ZONE,
WATERMARK FOR `time` as `time` - INTERVAL '30' SECOND,
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
...
)
The table was created using:
table_env.execute_sql(CREATE_STRING_ABOVE)
I'd like to turn this table into a data stream so I can perform some processing that is easier to do in the DataStream API:
events_stream_table = table_env.from_path('events')
events_stream = table_env.to_data_stream(events_stream_table)
# now do some processing - let's filter by the type of event we get
codebuild_stream = events_stream.filter(
lambda event: event['source'] == 'aws.codebuild'
)
# now do other stuff on a stream containing only events that are identical in shape
...
# maybe convert back into a Table and perform SQL on the data
When I run this, I get an exception:
org.apache.flink.table.api.TableException: Unsupported conversion from
data type 'TIMESTAMP(6) WITH TIME ZONE' (conversion class:
java.time.OffsetDateTime) to type information. Only data types that
originated from type information fully support a reverse conversion.
Somebody reported a similar error here. When I try the suggestion there and replace the TIMESTAMP(0) WITH LOCAL TIME ZONE
with a TIMESTAMP(3)
I get a different exception:
TypeError: The java type info: LocalDateTime is not supported in
PyFlink currently.
Is there a way of converting this Table into a DataStream (and then back again)? I need to use the data in the "time" field as the source of watermarks for my events.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论