ParquetSharp的UTC调整

发布于 2025-01-26 14:01:41 字数 1310 浏览 2 评论 0原文

我正在尝试使用parquetSharp库( https://github.com/github.com/g--reseach/parquetsharp )从SQL Server数据库中编写一些Parquet文件。 由于未将原始数据调整到UTC,因此我希望输出镶木文件也为“ dateTime64 [ns]”,而不是“ dateTime64 [ns,utc]”。

我已经尝试使用参数LogicalType.timestamp(isadjustedToutc:false)更改日期时间,但是由于原始数据是类型dateTime,这会导致异常。


columns[i] = new Column(typeof(DateTime?), cdt.Schema[i].Name,LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos)); 

我尝试使用文档中建议的逻辑Writeroverride:

using (var dtWriter = groupWriter.NextColumn().LogicalWriterOverride<DateTime?>())
{
   dtWriter.WriteBatch(dtArray);
}

但是抛出了notsupportedException:

system.notsupportedException:'不支持的逻辑系统类型system.nullable`1 [system.dateTime]搭配逻辑类型的时间戳(isadjustedToutc = false,timeUnit = nanoseconds,is_from_from_converted_type = false_set_set_set_set_set_converted_converted_converted_tef feffals = false)

)解决方案,这与原始数据没有冲突。在logicalType.timestamp(isadjustedToutc:false,timeunit:timeunit.nanos)我更改为timeunit.micros,因为DateTime类型不支持Nanos Precision代码> nextColumn()。逻辑Writeroverride&lt; datetime?&gt; ,并且在生成的Parquet文件中删除了UTC,而精度则为微秒。

I am trying to use the ParquetSharp library (https://github.com/G-Research/ParquetSharp) to write some Parquet files from an SQL server database.
As the original data is not adjusted to UTC, I want the output parquet file to also be "datetime64[ns]" instead of "datetime64[ns, UTC]".

I have tried changing the DateTime with argument LogicalType.Timestamp(isAdjustedToUtc: false) but this causes an exception as the original data is of type DateTime.


columns[i] = new Column(typeof(DateTime?), cdt.Schema[i].Name,LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos)); 

I try to use LogicalWriterOverride as suggested in the documentation:

using (var dtWriter = groupWriter.NextColumn().LogicalWriterOverride<DateTime?>())
{
   dtWriter.WriteBatch(dtArray);
}

But a NotSupportedException gets thrown:

System.NotSupportedException: 'unsupported logical system type System.Nullable`1[System.DateTime] with logical type Timestamp(isAdjustedToUTC=false, timeUnit=nanoseconds, is_from_converted_type=false, force_set_converted_type=false)'

Edit: I managed to find somewhat of a solution, that does not conflict with the original data. In LogicalType.Timestamp(isAdjustedToUtc: false, timeUnit: TimeUnit.Nanos) I changed to TimeUnit.Micros as the DateTime type does not support Nanos precision, so then I override NextColumn().LogicalWriterOverride<DateTime?> and the UTC is removed in the generated parquet file, while the precision is in microseconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文