在 OCaml 中扩展不可变类型（或：不可变类型的快速缓存）

发布于 2025-01-02 00:01:35 字数 885 浏览 1 评论 0原文

我在 ocaml 中有一个递归的不可变数据结构，可以将其简化为如下所示：

type expr =
{
    eexpr : expr_expr;
    some_other_complex_field : a_complex_type;
}

and expr_expr =
    | TInt of int
    | TSum of (expr * expr)
    | TMul of (expr * expr)

它是一个 AST，有时它会变得非常复杂（非常深）。

有一个计算表达式的递归函数。例如，

let rec result expr =
    match expr.eexpr with
        | TInt i -> i
        | TSum (e1, e2) -> result e1 + result e2
        | TMul (e1, e2) -> result e1 * result e2

现在假设我将一个表达式映射到另一个表达式，并且我需要不断检查 expr 的结果，有时对同一个 expr 多次检查，有时检查最近使用模式映射的表达式

{ someExpr with eexpr = TSum(someExpr, otherExpr) }

现在，结果函数非常轻量级，但多次运行深度 AST 不会得到很好的优化。我知道我可以使用 Hashtbl 缓存该值，但据我所知，Hashtbl 只会执行结构相等，因此无论如何它都需要遍历我的长 AST。我知道最好的选择是在 expr 类型中包含一个可能不可变的“结果”字段。但我不能。

那么 Ocaml 有没有办法将值缓存为不可变类型，这样我就不必每次需要时都急切地计算它？

谢谢！

原文

I have a recursive immutable data structure in ocaml which can be simplified to something like this:

type expr =
{
    eexpr : expr_expr;
    some_other_complex_field : a_complex_type;
}

and expr_expr =
    | TInt of int
    | TSum of (expr * expr)
    | TMul of (expr * expr)

It's an AST, and sometimes it gets pretty complex (it's very deep).

there is a recursive function that evaluates an expression. For example, let's say,

let rec result expr =
    match expr.eexpr with
        | TInt i -> i
        | TSum (e1, e2) -> result e1 + result e2
        | TMul (e1, e2) -> result e1 * result e2

Now suppose I am mapping an expression to another expression, and I need to constantly check the result of an expr, sometimes more than once for the same expr, and sometimes for expressions that were recently mapped by using the pattern

{ someExpr with eexpr = TSum(someExpr, otherExpr) }

Now, the result function is very lightweight, but running it many times for a deep AST will not be very optimized. I know I could cache the value using a Hashtbl, but AFAIK the Hashtbl will only do structural equality, so it will need to traverse my long AST anyway.
I know the best option would be to include a probably immutable "result" field in the expr type. But I can't.

So is there any way in Ocaml to cache a value to an immutable type, so I don't have to calculate it eagerly every time I need it ?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你的他你的她 2025-01-09 00:01:36

对 expr_expr 的值进行哈希构造。通过执行此操作，程序中结构相等的值将共享完全相同的内存表示形式，并且您可以用物理相等 (==) 替换结构相等 (=)。

这篇论文应该可以帮助您快速开始使用哈希consing奥卡姆。

回复收藏 0 原文

羁绊已千年 2025-01-09 00:01:36

您可以使用函子接口来控制哈希表使用的相等类型。我相信 (==) 的语义对于您的目的来说是合法的；即，对于任何纯函数 f，如果 A == B，则 f A = f B。因此，您可以缓存 f A 的结果。然后，如果您找到物理上等于 A 的 B，则缓存的值对于 B 来说是正确的。

使用 (==) 进行散列的缺点是，散列函数将发送所有结构上相等的结果对象到同一个哈希桶，在那里它们将被视为不同的对象。如果表中有很多结构相同的对象，则散列不会带来任何好处。该行为退化为线性搜索。

您无法定义哈希函数来处理物理地址，因为垃圾收集器可以随时更改物理地址。

但是，如果您知道您的表仅包含相对较少的大值，那么使用物理相等可能对您有用。

回复收藏 0 原文

我一直都在从未离去 2025-01-09 00:01:36

我认为您可以合并上面的两个想法：使用类似散列构造的技术来获取数据的“纯表达式”部分的散列，并使用该散列作为eval记忆表中的键函数。

当然，只有当您的 eval 函数确实仅依赖于函数的“纯表达式”部分时，这才有效，如您给出的示例所示。我认为这是一个相对普遍的情况，至少如果您限制自己存储成功评估（例如，不会返回包括某些位置信息的错误）。

编辑：一个小的概念证明：

type 'a _expr =
  | Int of int
  | Add of 'a * 'a

(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr

type loc = int
type loc_expr = {
  loc : loc;
  expr : loc_expr _expr;
  pure : pure_expr (* or any hash_consing of it for efficiency *)
}

(* this is where you could hash-cons *)
let pure x = Pure x

let int loc n =
  { loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
  { loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }

let eval =
  let cache = Hashtbl.create 251 in
  let rec eval term =
    (* for debug and checking memoization *)
    Printf.printf "log: %d\n" term.loc;
    try Hashtbl.find cache term.pure with Not_found ->
      let result =
        match term.expr with
          | Int n -> n
          | Add(a, b) -> eval a + eval b in
      Hashtbl.add cache term.pure result;
      result
  in eval



let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3

I think you can merge the two ideas above : use hash-consing-like techniques to get the hash of the "pure expression" part of your data, and use this hash as key in the memoization table for the eval function.

Of course this only works when your eval function indeed only depends on the "pure expression" part of the function, as in the example you gave. I believe that is a relatively general case, at least if you restrict yourself to storing the successful evaluations (that won't, for example, return an error including some location information).

Edit: a small proof of concept:

type 'a _expr =
  | Int of int
  | Add of 'a * 'a

(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr

type loc = int
type loc_expr = {
  loc : loc;
  expr : loc_expr _expr;
  pure : pure_expr (* or any hash_consing of it for efficiency *)
}

(* this is where you could hash-cons *)
let pure x = Pure x

let int loc n =
  { loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
  { loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }

let eval =
  let cache = Hashtbl.create 251 in
  let rec eval term =
    (* for debug and checking memoization *)
    Printf.printf "log: %d\n" term.loc;
    try Hashtbl.find cache term.pure with Not_found ->
      let result =
        match term.expr with
          | Int n -> n
          | Add(a, b) -> eval a + eval b in
      Hashtbl.add cache term.pure result;
      result
  in eval



let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3

回复收藏 0 原文

~没有更多了~