使用 Haskell 状态 monad 有代码味道吗？

发布于 2024-07-14 02:07:11 字数 609 浏览 14 评论 0原文

天哪，我讨厌“代码味道”这个词，但我想不出更准确的词了。

我正在设计一种高级语言& 闲暇时将编译器转换为 Whitespace 来学习编译器构造、语言设计和函数式编程（编译器是用 Haskell 编写的）。

在编译器的代码生成阶段，我必须在遍历语法树时维护“状态”数据。例如，在编译流程控制语句时，我需要为要跳转到的标签生成唯一名称（从传入、更新和返回的计数器生成的标签，并且计数器的旧值绝不能再次使用）。另一个例子是，当我在语法树中遇到内联字符串文字时，它们需要永久转换为堆变量（在空白中，字符串最好存储在堆上）。我目前正在将整个代码生成模块包装在状态单子中来处理这个问题。

有人告诉我，编写编译器是一个非常适合函数范式的问题，但我发现我的设计方式与用 C 语言设计它的方式大致相同（你真的可以用任何语言编写 C 语言 - 甚至带状态单子的 Haskell）。

我想学习如何在 Haskell 中思考（而不是在函数范式中），而不是使用 Haskell 语法的 C 语言。我真的应该尝试消除/最小化状态单子的使用，还是它是一个合法的功能“设计模式”？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹤舞 2024-07-21 02:07:11

我已经用 Haskell 编写了多个编译器，状态 monad 是许多编译器问题的合理解决方案。但你想让它保持抽象——不要让你明显地看出你正在使用一个 monad。

这是来自 Glasgow Haskell 编译器的示例（我不是编写的；我只是围绕一些边缘进行工作），我们在其中构建控制流图。以下是制作图表的基本方法：

empyGraph    :: Graph
mkLabel      :: Label -> Graph
mkAssignment :: Assignment -> Graph  -- modify a register or memory
mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
(<*>)        :: Graph -> Graph -> Graph

但是正如您所发现的，维护唯一标签的供应充其量是乏味的，因此我们也提供了这些函数：

withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
             -> Graph   -- code in the 'then' branch
             -> Graph   -- code in the 'else' branch 
             -> Graph   -- resulting if-then-else construct

整个 Graph 事物是一个抽象类型，而翻译者只是以纯函数的方式愉快地构建图表，而没有意识到任何单子正在发生。然后，当最终构建图时，为了将其转换为我们可以生成代码的代数数据类型，我们为其提供唯一标签，运行状态 monad，并提取数据结构。

状态单子隐藏在下面；虽然它没有暴露给客户端，但是 Graph 的定义是这样的：

type Graph = RealGraph -> [Label] -> (RealGraph, [Label])

或者更准确一点

type Graph = RealGraph -> State [Label] RealGraph
  -- a Graph is a monadic function from a successor RealGraph to a new RealGraph

状态 monad 隐藏在抽象层后面，它一点也不臭！

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

empyGraph    :: Graph
mkLabel      :: Label -> Graph
mkAssignment :: Assignment -> Graph  -- modify a register or memory
mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
(<*>)        :: Graph -> Graph -> Graph

But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
             -> Graph   -- code in the 'then' branch
             -> Graph   -- code in the 'else' branch 
             -> Graph   -- resulting if-then-else construct

The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

type Graph = RealGraph -> [Label] -> (RealGraph, [Label])

or a bit more accurately

type Graph = RealGraph -> State [Label] RealGraph
  -- a Graph is a monadic function from a successor RealGraph to a new RealGraph

With the state monad hidden behind a layer of abstraction, it's not smelly at all!

回复收藏 0 原文

空城旧梦 2024-07-21 02:07:11

我想说，状态通常不是代码味道，只要它保持较小且控制良好即可。

这意味着使用诸如 State、ST 或定制的 monad，或者只是拥有一个包含传递到几个地方的状态数据的数据结构，并不是一件坏事。（实际上，单子只是帮助做到这一点！）然而，拥有到处都是的状态（是的，这意味着你，IO 单子！）是一种不好的气味。

一个相当明显的例子是，当我的团队正在为 ICFP 编程竞赛 2009 制作参赛作品时（代码可在在 git://git.cynic.net/haskell/icfp-contest-2009）。我们最终得到了几个不同的模块化部分：

VM：运行模拟程序的虚拟机
控制器：几组不同的例程，用于读取模拟器的输出并生成新的控制输入
解决方案：基于以下内容生成解决方案文件控制器的输出
可视化器：几组不同的例程，它们读取输入和输出端口，并生成某种可视化或模拟过程中所发生情况的日志。

其中每一个都有自己的状态，并且它们都以不同的方式交互通过VM的输入和输出值的方式。我们有几个不同的控制器和可视化器，每个都有自己不同的状态。

这里的关键点是，任何特定状态的内部都仅限于它们自己的特定模块，并且每个模块甚至不知道其他模块的状态是否存在。任何特定的有状态代码和数据集通常只有几十行长，并且状态中有少量数据项。

所有这些都被粘合在一个由大约十几行代码组成的小函数中，该函数无法访问任何状态的内部结构，并且在循环模拟时仅以正确的顺序调用正确的事物，并通过一个非常有限的函数。每个模块的外部信息量（当然还有模块之前的状态）。

当状态以如此有限的方式使用时，并且类型系统阻止您无意中修改它，那么处理起来就很容易了。 Haskell 的优点之一就是它可以让你做到这一点。

一个答案是：“不要使用 monad”。从我的角度来看，这完全是本末倒置。 Monad 是一种控制结构，除其他外，它可以帮助您最大限度地减少涉及状态的代码量。如果您以一元解析器为例，则解析器的状态（即正在解析的文本、已解析的程度、累积的任何警告等）必须贯穿解析器中使用的每个组合器。然而，只有少数组合器能够真正直接操纵状态；其他任何东西都使用这几个函数之一。这使您可以在一个地方清楚地看到所有可以更改状态的少量代码，并且更容易地推断如何更改它，再次使其更容易处理。

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

回复收藏 0 原文

撞了怀 2024-07-21 02:07:11

您看过属性语法（AG）吗？（有关维基百科和Monad 阅读器中的文章）？

使用 AG，您可以将属性添加到语法树中。这些属性分为合成和继承属性。

综合属性是您从语法树生成（或综合）的东西，这可能是生成的代码，或所有注释，或您感兴趣的任何其他内容。

继承的属性是语法树的输入，这可能是环境，或代码生成期间使用的标签列表。

在乌得勒支大学，我们使用属性语法系统 (UUAGC) 编写编译器。这是一个预处理器，它从提供的 .ag 文件生成 haskell 代码（.hs 文件）。

不过，如果您仍在学习 Haskell，那么也许现在还不是开始学习另一层抽象的时候。

在这种情况下，您可以手动编写属性语法为您生成的代码，例如：

data AbstractSyntax = Literal Int | Block AbstractSyntax
                    | Comment String AbstractSyntax

compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _      = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                             in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
                             in (code, s : comments')

generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?

With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.

Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.

Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.

At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.

Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.

In that case, you could manually write the sort of code that attributes grammars generate for you, for example:

data AbstractSyntax = Literal Int | Block AbstractSyntax
                    | Comment String AbstractSyntax

compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _      = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                             in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
                             in (code, s : comments')

generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

回复收藏 0 原文

仅此而已 2024-07-21 02:07:11

您可能需要一个应用函子而不是
monad：

http://www.haskell.org/haskellwiki/Applicative_functor

我认为原始论文然而，它比 wiki 解释得更好：

http://www. soi.city.ac.uk/~ross/papers/Applicative.html

回复收藏 0 原文

哭泣的笑容 2024-07-21 02:07:11

我不认为使用 State Monad 来建模状态时会产生代码味道。

如果您需要通过函数对状态进行线程化，
您可以明确地执行此操作，将状态作为参数并在每个函数中返回它。
State Monad 提供了一个很好的抽象：它为你传递状态，并且
提供了许多有用的函数来组合需要状态的函数。
在这种情况下，使用 State Monad（或 Applicatives）并不是代码异味。

但是，如果您使用 State Monad 来模拟命令式编程风格
虽然一个功能性的解决方案就足够了，但你只是让事情变得复杂。

回复收藏 0 原文