保持类型通用而不使用 η-扩展

发布于 2024-09-29 10:51:30 字数 1933 浏览 3 评论 0原文

我正在做的事情：我正在编写一个小型解释器系统，它可以解析文件，将其转换为一系列操作，然后将数千个数据集输入到该序列中以提取一些最终值来自每一个。已编译的解释器由一系列纯函数组成，这些函数带有两个参数：数据集和执行上下文。每个函数返回修改后的执行上下文：

type('data,'context)解释器=('data -> 'context -> 'context)列表

编译器本质上是一个分词器，具有最终的分词到指令映射步骤，该步骤使用地图描述定义如下：

type('data,'context)map = (string *('data -> 'context -> 'context)) list

典型的解释器用法如下：

let pocket_calc = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  Interpreter.parse map "path/to/file.txt"

let new_context = Interpreter.run pocket_calc data old_context

问题：< /strong> 我希望我的 pocket_calc 解释器能够处理任何支持 add、sub 和 mul 的类方法，以及相应的数据类型（一个上下文类可以是整数，另一个上下文类可以是浮点数）。

但是，pocket_calc 被定义为值而不是函数，因此类型系统不会使其类型通用：第一次使用时，'data 和 'context 类型绑定到我首先提供的任何数据和上下文的类型，并且解释器将永远与任何其他数据和上下文类型不兼容。

一个可行的解决方案是对解释器的定义进行 eta 扩展，以允许其类型参数是通用的：

let pocket_calc data context = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  let interpreter = Interpreter.parse map "path/to/file.txt" in
  Interpreter.run interpreter data context

但是，由于以下几个原因，该解决方案是不可接受的：

它每次调用时都会重新编译解释器，这会显着降低性能。即使是映射步骤（使用映射列表将标记列表转换为解释器）也会导致明显的速度减慢。
我的设计依赖于在初始化时加载的所有解释器，因为只要加载文件中的标记与映射列表中的行不匹配，编译器就会发出警告，并且我希望在软件启动时看到所有这些警告（不是当单个解释器最终运行时）。
我有时想在多个解释器中重用给定的地图列表，无论是单独使用还是通过预先添加其他指令（例如，“div”）。

问题：除了 eta 扩展之外，还有其他方法可以使类型参数化吗？也许有一些涉及模块签名或继承的聪明技巧？如果这是不可能的，有没有什么方法可以缓解我上面提到的三个问题，使 eta 扩展成为一个可接受的解决方案？谢谢你！

原文

What I'm doing: I'm writing a small interpreter system that can parse a file, turn it into a sequence of operations, and then feed thousands of data sets into that sequence to extract some final value from each. A compiled interpreter consists of a list of pure functions that take two arguments: a data set, and an execution context. Each function returns the modified execution context:

type ('data, 'context) interpreter = ('data -> 'context -> 'context) list

The compiler is essentially a tokenizer with a final token-to-instruction mapping step that uses a map description defined as follows:

type ('data, 'context) map = (string * ('data -> 'context -> 'context)) list

Typical interpreter usage looks like this:

let pocket_calc = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  Interpreter.parse map "path/to/file.txt"

let new_context = Interpreter.run pocket_calc data old_context

The problem: I'd like my pocket_calc interpreter to work with any class that supports add, sub and mul methods, and the corresponding data type (could be integers for one context class and floating-point numbers for another).

However, pocket_calc is defined as a value and not a function, so the type system does not make its type generic: the first time it's used, the 'data and 'context types are bound to the types of whatever data and context I first provide, and the interpreter becomes forever incompatible with any other data and context types.

A viable solution is to eta-expand the definition of the interpreter to allow its type parameters to be generic:

let pocket_calc data context = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  let interpreter = Interpreter.parse map "path/to/file.txt" in
  Interpreter.run interpreter data context

However, this solution is unacceptable for several reasons:

It re-compiles the interpreter every time it's called, which significantly degrades performance. Even the mapping step (turning a token list into a interpreter using the map list) causes a noticeable slowdown.
My design relies on all interpreters being loaded at initialization time, because the compiler issues warnings whenever a token in the loaded file does not match a line in the map list, and I want to see all those warnings when the software launches (not when individual interpreters are eventually run).
I sometimes want to reuse a given map list in several interpreters, whether on its own or by prepending additional instructions (for instance, "div").

The questions: is there any way to make the type parametric other than eta-expansion? Maybe some clever trick involving module signatures or inheritance? If that's impossible, is there any way to alleviate the three issues I have mentioned above in order to make eta-expansion an acceptable solution? Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦归所梦 2024-10-06 10:51:31

一个可行的解决方案是 eta 扩展
解释器的定义允许
它的类型参数是通用的：

 let pocket_calc data context = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   Interpreter.run interpreter data context

但是，这个解决方案是不可接受的
有几个原因：
每次调用时都会重新编译解释器，这
显着降低性能。
甚至映射步骤（将令牌
使用地图将列表放入解释器中
列表）导致明显的速度减慢。

它每次都会重新编译解释器，因为你做错了。正确的形式更像是这样（从技术上讲，如果 Interpreter.run 到 interpreter 的部分解释可以进行一些计算，您应该将其移出 也很有趣）。

 let pocket_calc = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   fun data context -> Interpreter.run interpreter data context

A viable solution is to eta-expand the
definition of the interpreter to allow
its type parameters to be generic:

 let pocket_calc data context = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   Interpreter.run interpreter data context

However, this solution is unacceptable
for several reasons:
It re-compiles the interpreter every time it's called, which
significantly degrades performance.
Even the mapping step (turning a token
list into a interpreter using the map
list) causes a noticeable slowdown.

It recompiles the interpreter every time because you are doing it wrong. The proper form is more something like this (and technically, if the partial interpretation of Interpreter.run to interpreter can do some computations, you should move it out of the fun too).

 let pocket_calc = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   fun data context -> Interpreter.run interpreter data context

回复收藏 0 原文

新人笑 2024-10-06 10:51:31

我认为你的问题在于你的操作缺乏多态性，你希望有一个封闭的参数类型（适用于支持以下算术基元的所有数据），而不是有一个代表固定数据类型的类型参数。
但是，要确保确实如此有点困难，因为您的代码不够独立，无法对其进行测试。

假设给定的基元类型：

type 'a primitives = <
  add : 'a -> 'a;
  mul : 'a -> 'a; 
  sub : 'a -> 'a;
>

您可以使用结构和对象提供的一阶多态性：

type op = { op : 'a . 'a -> 'a primitives -> 'a }

let map = [ "add", { op = fun d c -> c # add d } ;
            "sub", { op = fun d c -> c # sub d } ;
            "mul", { op = fun d c -> c # mul d } ];;

您将得到以下与数据无关的类型：

 val map : (string * op) list

编辑：关于您对不同操作类型的评论，我'我不确定您想要哪种级别的灵活性。我认为您不能在同一列表中混合对不同原语的操作，并且仍然受益于每个原语的特殊性：充其量，您只能将“对 add/sub/mul 的操作”转换为“对 add/ 的操作” sub/mul/div”（因为我们在原语类型中是逆变的），但肯定不多。

在更务实的层面上，确实，通过这种设计，您需要为每个基元类型使用不同的“操作”类型。然而，您可以轻松地构建一个由基元类型参数化的函子并返回操作类型。

我不知道如何公开不同原始类型之间的直接子类型关系。问题是这需要函子级别的子类型关系，我认为 Caml 中没有这种关系。但是，您可以使用更简单的显式子类型（而不是转换 a :> b，而是使用函数 a -> b），构建第二个函子，逆变，即给定从一种基本类型到另一种类型的映射，将构建从一种操作类型到另一种操作类型的映射。

完全有可能的是，通过进化出不同且巧妙的类型表示，可以得到更简单的解决方案。 3.12 的一流模块也可能发挥作用，但它们往往对一流的存在类型有帮助，而在这里我们不愿使用通用类型。

解释开销和操作具体化

除了您本地的打字问题之外，我不确定您的方向是否正确。您试图通过“提前”（在使用操作之前）构建与操作的语言表示相对应的闭包来消除解释开销。

根据我的经验，这种方法通常不会消除解释开销，而是将其移动到另一层。如果您天真地创建闭包，您将在闭包层重现控制的解析流程：闭包将调用其他闭包等，因为您的解析代码在创建闭包时“解释”了输入。您消除了解析成本，但可能次优的控制流仍然相同。此外，直接操作闭包往往很痛苦：您必须非常小心比较操作，例如序列化等。

我认为您可能对代表您的操作的中间“具体化”语言的长期感兴趣：用于算术运算的简单代数数据类型，您可以根据文本表示构建它。您仍然可以尝试从中“提前”构建闭包，尽管我不确定性能是否比直接解释它好得多，如果内存中的表示不错的话。此外，插入中间分析器/转换器来优化您的操作会更容易，例如从“关联二元操作”模型到“n 元操作”模型，可以更有效地评估。

I think your problem lies in a lack of polymorphism in your operations, which you would like to have a closed parametric type (works for all data supporting the following arithmetic primitives) instead of having a type parameter representing a fixed data type.
However, it's a bit difficult to ensure it's exactly this, because your code is not self-contained enough to test it.

Assuming the given type for primitives :

type 'a primitives = <
  add : 'a -> 'a;
  mul : 'a -> 'a; 
  sub : 'a -> 'a;
>

You can use the first-order polymorphism provided by structures and objects :

type op = { op : 'a . 'a -> 'a primitives -> 'a }

let map = [ "add", { op = fun d c -> c # add d } ;
            "sub", { op = fun d c -> c # sub d } ;
            "mul", { op = fun d c -> c # mul d } ];;

You get back the following data-agnostic type :

 val map : (string * op) list

Edit: regarding your comment about different operation types, I'm not sure which level of flexibility you want. I don't think you could mix operations over different primitives in the same list, and still benefit from the specifities of each : at best, you could only transform an "operation over add/sub/mul" into an "operation over add/sub/mul/div" (as we're contravariant in the primitives type), but certainly not much.

On a more pragmatic level, it's true that, with that design, you need a different "operation" type for each primitives type. You could easily, however, build a functor parametrized by the primitives type and returning the operation type.

I don't know how one would expose a direct subtyping relation between different primitive types. The problem is that this would need a subtyping relation at the functor level, which I don't think we have in Caml. You could, however, using a simpler form of explicit subtyping (instead of casting a :> b, use a function a -> b), build second functor, contravariant, that, given a map from a primitive type to the other, would build a map from one operation type to the other.

It's entirely possible that, with a different and clever representation of the type evolved, a much simpler solution is possible. First-class modules of 3.12 might also come in play, but they tend to be helpful for first-class existential types, whereas here we rhater use universal types.

Interpretive overhead and operation reifications

Besides your local typing problem, I'm not sure you're heading the right way. You're trying to eliminate interpretive overhead by building, "ahead of time" (before using the operations), a closure corresponding to a in-language representation of your operation.

In my experience, this approach doesn't generally get rid of interpretive overhead, it rather moves it to another layer. If you create your closures naïvely, you will have the parsing flow of control reproduced at the closure layer : the closure will call other closures, etc., as your parsing code "interpreted" the input when creating the closure. You eliminated the cost of parsing, but the possibly suboptimal flow of control is still the same. Additionnaly, closures tend to be a pain to manipulate directly : you have to be very careful about comparison operations for example, serialization, etc.

I think you may be interested in the long term in an intermediate "reified" language representing your operations : a simple algebraic data type for arithmetic operations, that you would build from your textual representation. You can still try to build closures "ahead of time" from it, though I'm not sure the performances are much better than directly interpreting it, if the in-memory representation is decent. Moreover, it will be much easier to plug in intermediary analyzers/transformers to optimize your operations, for example going from an "associative binary operations" model to a "n-ary operations" model, which could be more efficiently evaluated.

回复收藏 0 原文

~没有更多了~