在 Haskell 中序列化复杂的 AST

发布于 2025-01-12 14:31:01 字数 539 浏览 4 评论 0原文

我正在使用 Haskell 中的一个库,它具有非常非常复杂的递归数据结构,代表 AST。它包含数十个不同的构造函数,有些具有简单的递归定义,有些具有相互递归定义,而且都是令人讨厌的。

我希望能够将这个巨大的递归怪物序列化为 JSON 字符串,然后能够反序列化它。它是一个数据类,所以我觉得我应该能够拥有某种通用函数,将其转换为 JSON 格式的巨大的人类可读字符串。我真的、真的希望避免为其 80 多个构造函数编写自定义序列化逻辑。

这可能吗?

为了澄清,我正在尝试序列化 此数据结构,它是官方 GHC API 的一部分。我知道漂亮打印给了我一个字符串,但我真的很喜欢它作为 JSON 结构。

编辑:该类对于 Generic 来说太复杂,无法创建合适的 ToJSON 和 FromJSON,除非我遗漏了某些内容。

I'm using a library in Haskell which has this very, very complex recursive data structure that represents an AST. It contains dozens of different constructors, some with simply recursive definitions, some with mutually recursive definitions, and it's all around nasty.

I want to be able to serialize this giant recursive monster into a JSON string, and then be able to de-serialize it. It's a data class, so I feel I should be able to just have some sort of generic function that turns it into a giant human-readable string in JSON format. I really, really want to avoid writing custom serialization logic for it's 80+ constructors.

Is this even possible?

To clarify, I'm trying to serialize this data structure, which is part of the official GHC API. I'm aware pretty-printing gives me a string but I'd really like this as a JSON structure.

EDIT: The class is too complex for Generic to create a suitable ToJSON and FromJSON, unless I'm missing something.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

万劫不复 2025-01-19 14:31:01

唯一合理的方法是使用独立的派生子句为(大多数)涉及的类型派生 Generic 实例,并生成尽可能多的 FromJSON/ToJSON 实例尽可能使用基于默认 Generic 的默认值。

我开始摆弄它,我发现没有无法克服的技术障碍,但所需的样板数量并不小。您将需要大量Generic 实例。您可能还需要使用 ghc-lib 源代码的修改副本,因为某些类型(例如 TyCon)不会随其构造函数一起导出,从而阻止派生实例。

总体而言,Generic 实例并没有那么糟糕,因为大多数都可以在阶段中以多态方式派生:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}

import BasicTypes
import CoAxiom
-- etc. --

import GHC.Generics

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
-- etc. --

FromJSONToJSON 实例稍微多一点难的。 Phase 参数用于通过类型族来更改树的某些部分中的类型,因此多态实例:

import Data.Aeson

instance FromJSON (HSExpr p)

将开始需要很多类型族实例,例如instance FromJSON (XWrap p) 以及其他几十个。您无法以多态方式提供这些:

instance FromJSON (XWrap p)  -- Illegal type synonym family application

因为它们是类型系列,而 GHC 不支持它们。我认为最好的方法是为每个所需阶段定义实例,并且由于存在一些阶段间依赖关系,因此您需要为多个阶段定义实例,即使您只是尝试序列化一个阶段。因此:

instance FromJSON (HSExpr GhcTc)
instance FromJSON (HSExpr GhcRn)
-- etc. --

从这里开始,就需要跟踪编译器错误消息 re:丢失实例并将其全部填充。您选择的编辑器中的一些键盘宏应该可以减轻痛苦。

您最终将了解一些可能不应该一般序列化的叶类型。例如,FastString 是存储在公共哈希表中用于快速比较的字符串,您需要/需要手动序列化和反序列化它(或在反序列化端重建哈希表) 。

不管怎样,我在大约 35 个 Generic 实例和 50 个 FromJSON 实例后停止了,我认为那时我只完成了大约四分之一。另一方面,这花了我不到一个小时,所以我认为一两天的乏味工作就可以完成。

这是我失去兴趣之前的经历。大约一半的 FromJSON 实例进行类型检查;其余的仍然是要求较高的情况。不过,我使用的是 GHC 8.10.7,因此模块名称和类型可能与您的不匹配。

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TemplateHaskell #-}

module MyModule where

import BasicTypes
import CoAxiom
import FastString
import GHC.Hs
import GHC.Hs.Extension
import Name
import SrcLoc
import TyCoRep
import TyCon
import Unique
import UniqSet
import Var
import qualified Data.Array as Array

import GHC.Generics
import Data.Aeson

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
deriving instance Generic (HsGroup p)
deriving instance Generic (HsImplicitBndrs p (LHsType p))
deriving instance Generic (HsRecField' id arg)
deriving instance Generic (HsSplice p)
deriving instance Generic (HsType p)
deriving instance Generic (HsWildCardBndrs p (LHsType p))
deriving instance Generic (Match p (LHsExpr p))
deriving instance Generic (MatchGroup p (LHsExpr p))
deriving instance Generic (RuleDecls p)
deriving instance Generic (StmtLR p p (LHsExpr p))
deriving instance Generic (VarBndr var argf)
deriving instance Generic (WarnDecl p)
deriving instance Generic (WarnDecls p)
deriving instance Generic AnonArgFlag
deriving instance Generic ArgFlag
deriving instance Generic CoAxBranch
deriving instance Generic Coercion
deriving instance Generic ForeignImport
deriving instance Generic NoExtCon
deriving instance Generic NoExtField
deriving instance Generic Role
deriving instance Generic SourceText
deriving instance Generic SrcSpan
deriving instance Generic StringLiteral
deriving instance Generic TyLit
deriving instance Generic Type
deriving instance Generic WarningTxt

instance (FromJSON l, FromJSON e) => FromJSON (GenLocated l e)
instance FromJSON (AnnDecl GhcTc)
instance FromJSON (AnnProvenance Var)
instance FromJSON (Branches br)
instance FromJSON (CoAxiom Branched)
instance FromJSON (ConDeclField GhcRn)
instance FromJSON (ConDeclField GhcTc)
instance FromJSON (ForeignDecl GhcTc)
instance FromJSON (GRHS GhcTc (LHsExpr GhcTc))
instance FromJSON (HsBracket GhcRn)
instance FromJSON (HsBracket GhcTc)
instance FromJSON (HsExpr GhcRn)
instance FromJSON (HsExpr GhcTc)
instance FromJSON (HsGroup GhcRn)
instance FromJSON (HsGroup GhcTc)
instance FromJSON (HsImplicitBndrs GhcTc (LHsExpr GhcTc))
instance FromJSON (HsImplicitBndrs GhcTc (LHsType GhcTc))
instance FromJSON (HsLocalBindsLR GhcTc GhcTc)
instance FromJSON (HsRecField' (AmbiguousFieldOcc GhcTc) (LHsExpr GhcTc))
instance FromJSON (HsRecFields GhcTc (LHsExpr GhcTc))
instance FromJSON (HsSplice GhcTc)
instance FromJSON (HsTyVarBndr GhcRn)
instance FromJSON (HsTyVarBndr GhcTc)
instance FromJSON (HsType GhcRn)
instance FromJSON (HsType GhcTc)
instance FromJSON (HsValBindsLR GhcTc GhcTc)
instance FromJSON (HsWildCardBndrs GhcRn (LHsSigType GhcRn))
instance FromJSON (HsWildCardBndrs GhcRn (LHsType GhcRn))
instance FromJSON (Match GhcTc (LHsExpr GhcTc))
instance FromJSON (MatchGroup GhcTc (LHsExpr GhcTc))
instance FromJSON (RuleDecls GhcRn)
instance FromJSON (RuleDecls GhcTc)
instance FromJSON (StmtLR GhcRn GhcRn (LHsExpr GhcRn))
instance FromJSON (StmtLR GhcTc GhcTc (LHsExpr GhcTc))
instance FromJSON (VarBndr TyCoVar ArgFlag)
instance FromJSON (WarnDecl GhcTc)
instance FromJSON (WarnDecls GhcTc)
instance FromJSON AnonArgFlag
instance FromJSON ArgFlag
instance FromJSON CoAxBranch
instance FromJSON Coercion
instance FromJSON ForeignImport
instance FromJSON NoExtField
instance FromJSON Role
instance FromJSON SourceText
instance FromJSON SrcSpan
instance FromJSON StringLiteral
instance FromJSON TyLit
instance FromJSON Type
instance FromJSON WarningTxt

-- Non-generic instances, a mixture of:
-- 1. Those that shouldn't be derived generically (e.g., FastString)
-- 2. Those that will need access to the constructors (e.g., TyCon)
instance FromJSON RealSrcSpan where parseJSON = undefined
instance FromJSON FastString where parseJSON = undefined
instance FromJSON a => FromJSON (UniqSet a) where parseJSON = undefined
instance FromJSON Var where parseJSON = undefined
instance FromJSON NoExtCon where parseJSON = undefined
instance (FromJSON i, FromJSON e) => FromJSON (Array.Array i e) where parseJSON = undefined
instance FromJSON TyCon where parseJSON = undefined
instance FromJSON Unique where parseJSON = undefined
instance FromJSON Name where parseJSON = undefined

The only reasonable approach will be to use standalone deriving clauses to derive Generic instances for (most of) the types involved, and generate as many FromJSON/ToJSON instances as possible using the default Generic-based defaults.

I started fiddling with it, and I saw no insurmountable technical barriers, but the amount of boilerplate required is non-trivial. You'll need a boatload of Generic instances. You may also need to work with a modified copy of the ghc-lib source, because some types (e.g., TyCon) are not exported with their constructors, preventing derivation of the instances.

Overall, the Generic instances aren't so bad because most can be derived polymorphically in the phase:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}

import BasicTypes
import CoAxiom
-- etc. --

import GHC.Generics

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
-- etc. --

The FromJSON, ToJSON instances are a little more difficult. The phase parameter is used, via type families, to change the types in parts of the tree, so a polymorphic instance:

import Data.Aeson

instance FromJSON (HSExpr p)

will start demanding a lot of type family instances, like instance FromJSON (XWrap p) and a few dozen others. You can't supply these polymorphically:

instance FromJSON (XWrap p)  -- Illegal type synonym family application

because they're type families, and that's not supported by GHC. I think the best approach is to define instances for each needed phase, and since there are some inter-phase dependencies, you'll need to define instances for multiple phases, even if you're only trying to serialize for one phase. So:

instance FromJSON (HSExpr GhcTc)
instance FromJSON (HSExpr GhcRn)
-- etc. --

From there, it's a matter for following the trail of compiler error messages re: missing instances and filling them all in. A few keyboard macros in your editor of choice should ease the pain.

You'll eventually get down to some leaf types that probably shouldn't be serialized generically. For example, FastString is a string stored in a common hash table for fast comparison, and you'll want/need to serialize and deserialize it manually (or deal with reconstructing the hash table on the deserialized end).

Anyway, I stopped after around 35 Generic instances and 50 FromJSON instances, and I figure I was only about a quarter done at that point. On the other hand, that took me less than an hour, so I think it's doable with a day or two of tedious work.

Here's what I had before I lost interest. About half of the FromJSON instances typecheck; the rest are still demanding instances. I was using GHC 8.10.7, though, so the module names and types probably won't match yours.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TemplateHaskell #-}

module MyModule where

import BasicTypes
import CoAxiom
import FastString
import GHC.Hs
import GHC.Hs.Extension
import Name
import SrcLoc
import TyCoRep
import TyCon
import Unique
import UniqSet
import Var
import qualified Data.Array as Array

import GHC.Generics
import Data.Aeson

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
deriving instance Generic (HsGroup p)
deriving instance Generic (HsImplicitBndrs p (LHsType p))
deriving instance Generic (HsRecField' id arg)
deriving instance Generic (HsSplice p)
deriving instance Generic (HsType p)
deriving instance Generic (HsWildCardBndrs p (LHsType p))
deriving instance Generic (Match p (LHsExpr p))
deriving instance Generic (MatchGroup p (LHsExpr p))
deriving instance Generic (RuleDecls p)
deriving instance Generic (StmtLR p p (LHsExpr p))
deriving instance Generic (VarBndr var argf)
deriving instance Generic (WarnDecl p)
deriving instance Generic (WarnDecls p)
deriving instance Generic AnonArgFlag
deriving instance Generic ArgFlag
deriving instance Generic CoAxBranch
deriving instance Generic Coercion
deriving instance Generic ForeignImport
deriving instance Generic NoExtCon
deriving instance Generic NoExtField
deriving instance Generic Role
deriving instance Generic SourceText
deriving instance Generic SrcSpan
deriving instance Generic StringLiteral
deriving instance Generic TyLit
deriving instance Generic Type
deriving instance Generic WarningTxt

instance (FromJSON l, FromJSON e) => FromJSON (GenLocated l e)
instance FromJSON (AnnDecl GhcTc)
instance FromJSON (AnnProvenance Var)
instance FromJSON (Branches br)
instance FromJSON (CoAxiom Branched)
instance FromJSON (ConDeclField GhcRn)
instance FromJSON (ConDeclField GhcTc)
instance FromJSON (ForeignDecl GhcTc)
instance FromJSON (GRHS GhcTc (LHsExpr GhcTc))
instance FromJSON (HsBracket GhcRn)
instance FromJSON (HsBracket GhcTc)
instance FromJSON (HsExpr GhcRn)
instance FromJSON (HsExpr GhcTc)
instance FromJSON (HsGroup GhcRn)
instance FromJSON (HsGroup GhcTc)
instance FromJSON (HsImplicitBndrs GhcTc (LHsExpr GhcTc))
instance FromJSON (HsImplicitBndrs GhcTc (LHsType GhcTc))
instance FromJSON (HsLocalBindsLR GhcTc GhcTc)
instance FromJSON (HsRecField' (AmbiguousFieldOcc GhcTc) (LHsExpr GhcTc))
instance FromJSON (HsRecFields GhcTc (LHsExpr GhcTc))
instance FromJSON (HsSplice GhcTc)
instance FromJSON (HsTyVarBndr GhcRn)
instance FromJSON (HsTyVarBndr GhcTc)
instance FromJSON (HsType GhcRn)
instance FromJSON (HsType GhcTc)
instance FromJSON (HsValBindsLR GhcTc GhcTc)
instance FromJSON (HsWildCardBndrs GhcRn (LHsSigType GhcRn))
instance FromJSON (HsWildCardBndrs GhcRn (LHsType GhcRn))
instance FromJSON (Match GhcTc (LHsExpr GhcTc))
instance FromJSON (MatchGroup GhcTc (LHsExpr GhcTc))
instance FromJSON (RuleDecls GhcRn)
instance FromJSON (RuleDecls GhcTc)
instance FromJSON (StmtLR GhcRn GhcRn (LHsExpr GhcRn))
instance FromJSON (StmtLR GhcTc GhcTc (LHsExpr GhcTc))
instance FromJSON (VarBndr TyCoVar ArgFlag)
instance FromJSON (WarnDecl GhcTc)
instance FromJSON (WarnDecls GhcTc)
instance FromJSON AnonArgFlag
instance FromJSON ArgFlag
instance FromJSON CoAxBranch
instance FromJSON Coercion
instance FromJSON ForeignImport
instance FromJSON NoExtField
instance FromJSON Role
instance FromJSON SourceText
instance FromJSON SrcSpan
instance FromJSON StringLiteral
instance FromJSON TyLit
instance FromJSON Type
instance FromJSON WarningTxt

-- Non-generic instances, a mixture of:
-- 1. Those that shouldn't be derived generically (e.g., FastString)
-- 2. Those that will need access to the constructors (e.g., TyCon)
instance FromJSON RealSrcSpan where parseJSON = undefined
instance FromJSON FastString where parseJSON = undefined
instance FromJSON a => FromJSON (UniqSet a) where parseJSON = undefined
instance FromJSON Var where parseJSON = undefined
instance FromJSON NoExtCon where parseJSON = undefined
instance (FromJSON i, FromJSON e) => FromJSON (Array.Array i e) where parseJSON = undefined
instance FromJSON TyCon where parseJSON = undefined
instance FromJSON Unique where parseJSON = undefined
instance FromJSON Name where parseJSON = undefined
初懵 2025-01-19 14:31:01

“Scrap-your-boilerplate”(syb)库具有“gshow”和“gread”函数,可以读取和加载 Haskell 中的大多数数据类,但具有私有字段或构造函数的数据类除外。

The "Scrap-your-boilerplate" (syb) library has "gshow" and "gread" functions that can read and load most Data classes in Haskell, with the exception of Data classes with private fields or constructors.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文