在 Haskell 中序列化复杂的 AST
我正在使用 Haskell 中的一个库,它具有非常非常复杂的递归数据结构,代表 AST。它包含数十个不同的构造函数,有些具有简单的递归定义,有些具有相互递归定义,而且都是令人讨厌的。
我希望能够将这个巨大的递归怪物序列化为 JSON 字符串,然后能够反序列化它。它是一个数据类,所以我觉得我应该能够拥有某种通用函数,将其转换为 JSON 格式的巨大的人类可读字符串。我真的、真的希望避免为其 80 多个构造函数编写自定义序列化逻辑。
这可能吗?
为了澄清,我正在尝试序列化 此数据结构,它是官方 GHC API 的一部分。我知道漂亮打印给了我一个字符串,但我真的很喜欢它作为 JSON 结构。
编辑:该类对于 Generic 来说太复杂,无法创建合适的 ToJSON 和 FromJSON,除非我遗漏了某些内容。
I'm using a library in Haskell which has this very, very complex recursive data structure that represents an AST. It contains dozens of different constructors, some with simply recursive definitions, some with mutually recursive definitions, and it's all around nasty.
I want to be able to serialize this giant recursive monster into a JSON string, and then be able to de-serialize it. It's a data class, so I feel I should be able to just have some sort of generic function that turns it into a giant human-readable string in JSON format. I really, really want to avoid writing custom serialization logic for it's 80+ constructors.
Is this even possible?
To clarify, I'm trying to serialize this data structure, which is part of the official GHC API. I'm aware pretty-printing gives me a string but I'd really like this as a JSON structure.
EDIT: The class is too complex for Generic to create a suitable ToJSON and FromJSON, unless I'm missing something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
唯一合理的方法是使用独立的派生子句为(大多数)涉及的类型派生
Generic
实例,并生成尽可能多的FromJSON
/ToJSON 实例尽可能使用基于默认
Generic
的默认值。我开始摆弄它,我发现没有无法克服的技术障碍,但所需的样板数量并不小。您将需要大量
Generic
实例。您可能还需要使用ghc-lib
源代码的修改副本,因为某些类型(例如TyCon
)不会随其构造函数一起导出,从而阻止派生实例。总体而言,
Generic
实例并没有那么糟糕,因为大多数都可以在阶段中以多态方式派生:FromJSON
、ToJSON
实例稍微多一点难的。 Phase 参数用于通过类型族来更改树的某些部分中的类型,因此多态实例:将开始需要很多类型族实例,例如
instance FromJSON (XWrap p)
以及其他几十个。您无法以多态方式提供这些:因为它们是类型系列,而 GHC 不支持它们。我认为最好的方法是为每个所需阶段定义实例,并且由于存在一些阶段间依赖关系,因此您需要为多个阶段定义实例,即使您只是尝试序列化一个阶段。因此:
从这里开始,就需要跟踪编译器错误消息 re:丢失实例并将其全部填充。您选择的编辑器中的一些键盘宏应该可以减轻痛苦。
您最终将了解一些可能不应该一般序列化的叶类型。例如,
FastString
是存储在公共哈希表中用于快速比较的字符串,您需要/需要手动序列化和反序列化它(或在反序列化端重建哈希表) 。不管怎样,我在大约 35 个 Generic 实例和 50 个 FromJSON 实例后停止了,我认为那时我只完成了大约四分之一。另一方面,这花了我不到一个小时,所以我认为一两天的乏味工作就可以完成。
这是我失去兴趣之前的经历。大约一半的
FromJSON
实例进行类型检查;其余的仍然是要求较高的情况。不过,我使用的是 GHC 8.10.7,因此模块名称和类型可能与您的不匹配。The only reasonable approach will be to use standalone deriving clauses to derive
Generic
instances for (most of) the types involved, and generate as manyFromJSON
/ToJSON
instances as possible using the defaultGeneric
-based defaults.I started fiddling with it, and I saw no insurmountable technical barriers, but the amount of boilerplate required is non-trivial. You'll need a boatload of
Generic
instances. You may also need to work with a modified copy of theghc-lib
source, because some types (e.g.,TyCon
) are not exported with their constructors, preventing derivation of the instances.Overall, the
Generic
instances aren't so bad because most can be derived polymorphically in the phase:The
FromJSON
,ToJSON
instances are a little more difficult. The phase parameter is used, via type families, to change the types in parts of the tree, so a polymorphic instance:will start demanding a lot of type family instances, like
instance FromJSON (XWrap p)
and a few dozen others. You can't supply these polymorphically:because they're type families, and that's not supported by GHC. I think the best approach is to define instances for each needed phase, and since there are some inter-phase dependencies, you'll need to define instances for multiple phases, even if you're only trying to serialize for one phase. So:
From there, it's a matter for following the trail of compiler error messages re: missing instances and filling them all in. A few keyboard macros in your editor of choice should ease the pain.
You'll eventually get down to some leaf types that probably shouldn't be serialized generically. For example,
FastString
is a string stored in a common hash table for fast comparison, and you'll want/need to serialize and deserialize it manually (or deal with reconstructing the hash table on the deserialized end).Anyway, I stopped after around 35
Generic
instances and 50FromJSON
instances, and I figure I was only about a quarter done at that point. On the other hand, that took me less than an hour, so I think it's doable with a day or two of tedious work.Here's what I had before I lost interest. About half of the
FromJSON
instances typecheck; the rest are still demanding instances. I was using GHC 8.10.7, though, so the module names and types probably won't match yours.“Scrap-your-boilerplate”(syb)库具有“gshow”和“gread”函数,可以读取和加载 Haskell 中的大多数数据类,但具有私有字段或构造函数的数据类除外。
The "Scrap-your-boilerplate" (syb) library has "gshow" and "gread" functions that can read and load most Data classes in Haskell, with the exception of Data classes with private fields or constructors.