ParquetSharp的UTC调整
我正在尝试使用parquetSharp库( https://github.com/github.com/g--reseach/parquetsharp )从SQL Server数据库中编写一些Parquet文件。 由于未将原始数据调整到UTC,因此我希望输出镶木文件也为“ dateTime64 [ns]”,而不是“ dateTime64 [ns,utc]”。
我已经尝试使用参数LogicalType.timestamp(isadjustedToutc:false)更改日期时间,但是由于原始数据是类型dateTime,这会导致异常。
columns[i] = new Column(typeof(DateTime?), cdt.Schema[i].Name,LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos));
我尝试使用文档中建议的逻辑Writeroverride:
using (var dtWriter = groupWriter.NextColumn().LogicalWriterOverride<DateTime?>())
{
dtWriter.WriteBatch(dtArray);
}
但是抛出了notsupportedException:
system.notsupportedException:'不支持的逻辑系统类型system.nullable`1 [system.dateTime]搭配逻辑类型的时间戳(isadjustedToutc = false,timeUnit = nanoseconds,is_from_from_converted_type = false_set_set_set_set_set_converted_converted_converted_tef feffals = false)
)解决方案,这与原始数据没有冲突。在logicalType.timestamp(isadjustedToutc:false,timeunit:timeunit.nanos)
我更改为timeunit.micros
,因为DateTime类型不支持Nanos Precision代码> nextColumn()。逻辑Writeroverride&lt; datetime?&gt; ,并且在生成的Parquet文件中删除了UTC,而精度则为微秒。
I am trying to use the ParquetSharp library (https://github.com/G-Research/ParquetSharp) to write some Parquet files from an SQL server database.
As the original data is not adjusted to UTC, I want the output parquet file to also be "datetime64[ns]" instead of "datetime64[ns, UTC]".
I have tried changing the DateTime with argument LogicalType.Timestamp(isAdjustedToUtc: false) but this causes an exception as the original data is of type DateTime.
columns[i] = new Column(typeof(DateTime?), cdt.Schema[i].Name,LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos));
I try to use LogicalWriterOverride as suggested in the documentation:
using (var dtWriter = groupWriter.NextColumn().LogicalWriterOverride<DateTime?>())
{
dtWriter.WriteBatch(dtArray);
}
But a NotSupportedException gets thrown:
System.NotSupportedException: 'unsupported logical system type System.Nullable`1[System.DateTime] with logical type Timestamp(isAdjustedToUTC=false, timeUnit=nanoseconds, is_from_converted_type=false, force_set_converted_type=false)'
Edit: I managed to find somewhat of a solution, that does not conflict with the original data. In LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos)
I changed to TimeUnit.Micros
as the DateTime type does not support Nanos precision, so then I override NextColumn().LogicalWriterOverride<DateTime?>
and the UTC is removed in the generated parquet file, while the precision is in microseconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论