在GO中写入Parquet时,如何处理NAN值?

发布于 2025-02-09 13:21:04 字数 399 浏览 2 评论 0 原文

我试图在Go中写入镶木木文件。写入此文件时,我可以获得 nan 值。由于 nan 尚未在原始类型中或逻辑类型中定义,因此如何在GO中处理此值?现有的模式是否有效?

我正在使用在这里的Parquet Go库。您可以使用JSON模式找到代码的示例,以写入parquet 在这里使用此库。

I am trying to write to a parquet file in GO. While writing to this file, I can get NaN values. Since NaN is neither defined in the primitive types nor in logical type then how do I handle this value in GO? Does any existing schema work for it?

I am using the parquet GO library from here. You can find an example of the code using JSON schema for writing to parquet here using this library.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

安稳善良 2025-02-16 13:21:04

ISSE在,推荐是

使用可选类型。
即使您没有分配一个值(例如代码),非点值也会被分配一个默认值。
因此

然而:

所归的是,我无法使用可选类型,换句话说,我无法将我的结构转换为使用指针。
我尝试使用 repetitionType =可选作为标签,但这会导致一些奇怪的行为。
我希望该标签的行为与Golang Standard库中的 omitempty 标签的行为相同,即如果不存在该值,则不会将其放入JSON中。

这很重要的原因是,如果丢失或未设置字段,则将其编码到Parquet时,则无法确定该值是在INT64中是否设置为0或不设置该值。

这说明了问题:

package main

import (
    "encoding/json"
    "io/ioutil"
)

type Salary struct {
    Basic, HRA, TA float64 `json:",omitempty"`
}

type Employee struct {
    FirstName, LastName, Email string `json:",omitempty"`
    Age                        int
    MonthlySalary              []Salary `json:",omitempty"`
}

func main() {
    data := Employee{
        Email: "[email protected]",
        MonthlySalary: []Salary{
            {
                Basic: 15000.00,
            },
        },
    }

    file, _ := json.MarshalIndent(data, "", " ")

    _ = ioutil.WriteFile("test.json", file, 0o644)
}

JSON产生为:

{
 "Email": "[email protected]",
 "Age": 0,
 "MonthlySalary": [
  {
   "Basic": 15000
  }
 ]
}

您可以看到,结构中具有省略空的标签的项目,未分配 在JSON中没有出现,即 hra ta
但另一方面

这是有问题的,因为当该Golang库写入 parquet- 时,将分配了结构中的所有字段内存。
当文件再次读取时,这是一个更大的问题,因为没有办法知道镶木quet文件中的值是空值还是没有分配。

如果我能说服您拥有它的价值,我很乐意帮助实现此库的标签。

回声第403条“在不使用指针时不省略省略的选项”

The isse was discussed at lenght in xitongsys/parquet-go issue 281, with the recommandation being to

use OPTIONAL type.
Even you don't assign a value (like you code), the non-point value will be assigned a default value.
So parquet-go don't know it's null or default value.

However:

What is comes down to is that I cannot use the OPTIONAL type, in other words I cannot convert my structure to use pointers.
I have tried to use repetitiontype=OPTIONAL as a tag, but this leads to some weird behavior.
I would expect that tag to behave the same way that the omitempty tag in the Golang standard library, i.e. if the value is not present then it is not put into the JSON.

The reason this is important is that if the field is missing or not set, when it is encoded to parquet then there is no way of telling if the value was 0 or just not set in the case of int64.

This illustrates the issue:

package main

import (
    "encoding/json"
    "io/ioutil"
)

type Salary struct {
    Basic, HRA, TA float64 `json:",omitempty"`
}

type Employee struct {
    FirstName, LastName, Email string `json:",omitempty"`
    Age                        int
    MonthlySalary              []Salary `json:",omitempty"`
}

func main() {
    data := Employee{
        Email: "[email protected]",
        MonthlySalary: []Salary{
            {
                Basic: 15000.00,
            },
        },
    }

    file, _ := json.MarshalIndent(data, "", " ")

    _ = ioutil.WriteFile("test.json", file, 0o644)
}

with a JSON produced as:

{
 "Email": "[email protected]",
 "Age": 0,
 "MonthlySalary": [
  {
   "Basic": 15000
  }
 ]
}

As you can see, the item in the struct that have the omit empty tag and that are not assigned do no appear in the JSON, i.e. HRA TA.
But on the other hand Age does not have this tag and hence it is still included in the JSON.

This is problematic as all fields in the struct are assigned memory when this golang library writes to parquet- so if you have a big struct that is only sparsely populated it will still take the full amount of memory.
It is a bigger problem when the file is read again as there is no way of know if the value that was put in the parquet file was the empty value or it is was just not assigned.

I am happy to help implement an omitempty tag for this library if I can convince you of the value of having it.

That echoes issue 403 "No option to omitempty when not using pointers".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文