在 OCaml 中扩展不可变类型(或:不可变类型的快速缓存)

发布于 2025-01-02 00:01:35 字数 885 浏览 1 评论 0原文

我在 ocaml 中有一个递归的不可变数据结构,可以将其简化为如下所示:

type expr =
{
    eexpr : expr_expr;
    some_other_complex_field : a_complex_type;
}

and expr_expr =
    | TInt of int
    | TSum of (expr * expr)
    | TMul of (expr * expr)

它是一个 AST,有时它会变得非常复杂(非常深)。

有一个计算表达式的递归函数。例如,

let rec result expr =
    match expr.eexpr with
        | TInt i -> i
        | TSum (e1, e2) -> result e1 + result e2
        | TMul (e1, e2) -> result e1 * result e2

现在假设我将一个表达式映射到另一个表达式,并且我需要不断检查 expr 的结果,有时对同一个 expr 多次检查,有时检查最近使用模式映射的表达式

{ someExpr with eexpr = TSum(someExpr, otherExpr) }

现在,结果函数非常轻量级,但多次运行深度 AST 不会得到很好的优化。我知道我可以使用 Hashtbl 缓存该值,但据我所知,Hashtbl 只会执行结构相等,因此无论如何它都需要遍历我的长 AST。 我知道最好的选择是在 expr 类型中包含一个可能不可变的“结果”字段。但我不能。

那么 Ocaml 有没有办法将值缓存为不可变类型,这样我就不必每次需要时都急切地计算它?

谢谢!

I have a recursive immutable data structure in ocaml which can be simplified to something like this:

type expr =
{
    eexpr : expr_expr;
    some_other_complex_field : a_complex_type;
}

and expr_expr =
    | TInt of int
    | TSum of (expr * expr)
    | TMul of (expr * expr)

It's an AST, and sometimes it gets pretty complex (it's very deep).

there is a recursive function that evaluates an expression. For example, let's say,

let rec result expr =
    match expr.eexpr with
        | TInt i -> i
        | TSum (e1, e2) -> result e1 + result e2
        | TMul (e1, e2) -> result e1 * result e2

Now suppose I am mapping an expression to another expression, and I need to constantly check the result of an expr, sometimes more than once for the same expr, and sometimes for expressions that were recently mapped by using the pattern

{ someExpr with eexpr = TSum(someExpr, otherExpr) }

Now, the result function is very lightweight, but running it many times for a deep AST will not be very optimized. I know I could cache the value using a Hashtbl, but AFAIK the Hashtbl will only do structural equality, so it will need to traverse my long AST anyway.
I know the best option would be to include a probably immutable "result" field in the expr type. But I can't.

So is there any way in Ocaml to cache a value to an immutable type, so I don't have to calculate it eagerly every time I need it ?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你的他你的她 2025-01-09 00:01:36

expr_expr 的值进行哈希构造。通过执行此操作,程序中结构相等的值将共享完全相同的内存表示形式,并且您可以用物理相等 (==) 替换结构相等 (=)。

这篇论文应该可以帮助您快速开始使用哈希consing奥卡姆。

Hash-cons the values of expr_expr. By doing this structurally equal values in your program will share exactly the same memory representation and you can substitute structural equality (=) by physical equality (==).

This paper should get you quickly started on hash-consing in OCaml.

羁绊已千年 2025-01-09 00:01:36

您可以使用函子接口来控制哈希表使用的相等类型。我相信 (==) 的语义对于您的目的来说是合法的;即,对于任何纯函数 f,如果 A == B,则 f A = f B。因此,您可以缓存 f A 的结果。然后,如果您找到物理上等于 A 的 B,则缓存的值对于 B 来说是正确的。

使用 (==) 进行散列的缺点是,散列函数将发送所有结构上相等的结果对象到同一个哈希桶,在那里它们将被视为不同的对象。如果表中有很多结构相同的对象,则散列不会带来任何好处。该行为退化为线性搜索。

您无法定义哈希函数来处理物理地址,因为垃圾收集器可以随时更改物理地址。

但是,如果您知道您的表仅包含相对较少的大值,那么使用物理相等可能对您有用。

You can use the functorial interface to control the kind of equality used by the hash table. I believe the semantics of (==) are legitimate for your purposes; i.e., if A == B then f A = f B for any pure function f. So you can cache the results of f A. Then if you find a B that's physically equal to A, the cached value is correct for B.

The downside of using (==) for hashing is that the hash function will send all structurally equal objects to the same hash bucket, where they will be treated as distinct objects. If you have a lot of structurally equal objects in the table, you get no benefit from the hashing. The behavior degenerates to a linear search.

You can't define the hash function to work with physical addresses, because the physical addresses can be changed at any time by the garbage collector.

However, if you know your table will only contain relatively few large-ish values, using physical equality might work for you.

我一直都在从未离去 2025-01-09 00:01:36

我认为您可以合并上面的两个想法:使用类似散列构造的技术来获取数据的“纯表达式”部分的散列,并使用该散列作为eval记忆表中的键函数。

当然,只有当您的 eval 函数确实仅依赖于函数的“纯表达式”部分时,这才有效,如您给出的示例所示。我认为这是一个相对普遍的情况,至少如果您限制自己存储成功评估(例如,不会返回包括某些位置信息的错误)。

编辑:一个小的概念证明:

type 'a _expr =
  | Int of int
  | Add of 'a * 'a

(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr

type loc = int
type loc_expr = {
  loc : loc;
  expr : loc_expr _expr;
  pure : pure_expr (* or any hash_consing of it for efficiency *)
}

(* this is where you could hash-cons *)
let pure x = Pure x

let int loc n =
  { loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
  { loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }

let eval =
  let cache = Hashtbl.create 251 in
  let rec eval term =
    (* for debug and checking memoization *)
    Printf.printf "log: %d\n" term.loc;
    try Hashtbl.find cache term.pure with Not_found ->
      let result =
        match term.expr with
          | Int n -> n
          | Add(a, b) -> eval a + eval b in
      Hashtbl.add cache term.pure result;
      result
  in eval



let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3

I think you can merge the two ideas above : use hash-consing-like techniques to get the hash of the "pure expression" part of your data, and use this hash as key in the memoization table for the eval function.

Of course this only works when your eval function indeed only depends on the "pure expression" part of the function, as in the example you gave. I believe that is a relatively general case, at least if you restrict yourself to storing the successful evaluations (that won't, for example, return an error including some location information).

Edit: a small proof of concept:

type 'a _expr =
  | Int of int
  | Add of 'a * 'a

(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr

type loc = int
type loc_expr = {
  loc : loc;
  expr : loc_expr _expr;
  pure : pure_expr (* or any hash_consing of it for efficiency *)
}

(* this is where you could hash-cons *)
let pure x = Pure x

let int loc n =
  { loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
  { loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }

let eval =
  let cache = Hashtbl.create 251 in
  let rec eval term =
    (* for debug and checking memoization *)
    Printf.printf "log: %d\n" term.loc;
    try Hashtbl.find cache term.pure with Not_found ->
      let result =
        match term.expr with
          | Int n -> n
          | Add(a, b) -> eval a + eval b in
      Hashtbl.add cache term.pure result;
      result
  in eval



let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文