类似 ML 模式匹配的类型检查算法？

发布于 2024-12-11 20:09:04 字数 316 浏览 0 评论 0原文

对于 ML 风格的编程语言，如何确定给定模式是否“良好”，特别是它是否详尽且不重叠？

假设您有如下模式：

match lst with
  x :: y :: [] -> ...
  [] -> ...

或：

match lst with
  x :: xs -> ...
  x :: [] -> ...
  [] -> ...

一个好的类型检查器会警告第一个不详尽，第二个是重叠的。对于任意数据类型，类型检查器通常如何做出此类决策？

原文

How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages?

Suppose you have patterns like:

match lst with
  x :: y :: [] -> ...
  [] -> ...

or:

match lst with
  x :: xs -> ...
  x :: [] -> ...
  [] -> ...

A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮光沉寂 2024-12-18 20:09:04

这是算法的草图。它也是 Lennart Augustsson 著名的高效编译模式匹配技术的基础。（论文收录在令人难以置信的 FPCA 程序中 (LNCS 201 ）哦，有这么多点击。）这个想法是通过重复地将最通用的模式分解为构造函数案例来重建详尽的、非冗余的分析。

一般来说，问题在于您的程序可能有一堆空的“实际”模式 {p₁, .., p_n}，并且您想知道是否它们涵盖了给定的“理想”模式 q。首先，将 q 设为变量 x。最初满足并随后保持的不变量是，对于将变量映射到模式的某些替换 σ_i，每个 p_i 都是 σ_iq。

如何进行。如果 n=0，则该串为空，因此可能有一种情况 q 未被模式覆盖。抱怨ps不详尽。如果 σ₁ 是变量的单射重命名，则 p₁ 捕获与 q 匹配的每个情况，所以我们很高兴：如果 n= 1、我们赢了；如果 n>1 则哎呀，根本不需要 p₂ 。否则，对于某些变量 x，我们有 σ₁x 是构造函数模式。在这种情况下，将问题拆分为多个子问题，每个子问题对应 x 类型的每个构造函数 c_j。即将原始q拆分为多个理想模式 q_j = [x:=c_j y₁ .. y_{arity (c_j)}]q，并对每个 q_j 相应地细化模式以保持不变性，删除那些不匹配的模式。

让我们以 {[], x :: y :: zs} 为例（使用 :: 表示 cons）。我们从 [xs := []] 开始

  xs covering  {[], x :: y :: zs}

，使第一个模式成为理想的实例。所以我们拆分 xs，得到

  [] covering {[]}
  x :: ys covering {x :: y :: zs}

其中第一个由空单射重命名证明是合理的，所以没问题。第二个需要 [x := x, ys := y :: zs]，所以我们再次离开，分割 ys，得到。

  x :: [] covering {}
  x :: y :: zs covering {x :: y :: zs}

从第一个子问题我们可以看出我们被banjaxed了。

重叠情况更加微妙，并且允许变化，具体取决于您是否想要标记任何重叠，或者只是在从上到下的优先级顺序中完全冗余的模式。你的基本摇滚乐是一样的。例如，从

  xs covering {[], ys}

[xs := []] 开始，证明其中第一个合理，因此拆分。请注意，我们必须使用构造函数案例来优化 ys 以保持不变量。

  [] covering {[], []}
  x :: xs covering {y :: ys}

显然，第一种情况完全是重叠的。另一方面，当我们注意到需要细化实际的程序模式来维持不变性时，我们可以过滤掉那些变得多余的严格细化，并检查至少有一个幸存下来（如 ::案例在这里）。

因此，该算法以实际程序模式 p 为动机的方式构建了一组理想的详尽重叠模式 q。每当实际模式需要特定变量的更多细节时，您就可以将理想模式拆分为构造函数案例。如果幸运的话，每个实际模式都被不相交的非空理想模式集覆盖，并且每个理想模式仅被一个实际模式覆盖。产生理想模式的案例拆分树为您提供了实际模式的高效跳转表驱动编译。

我提出的算法显然是终止的，但如果存在没有构造函数的数据类型，它可能无法接受空模式集是详尽的。这在依赖类型语言中是一个严重的问题，传统模式的详尽性是不可判定的：明智的方法是允许“反驳”以及方程。在 Agda 中，您可以在任何不可能对构造函数进行细化的地方编写 ()，发音为“my Aunt Fanny”，这样您就不需要用返回值来完成方程了。通过添加足够的反驳，每组详尽的模式都可以变得详尽无遗。

无论如何，这就是基本情况。

Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases.

In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p₁, .., p_n}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each p_i is σ_iq for some substitution σ_i mapping variables to patterns.

How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ₁ is an injective renaming of variables, then p₁ catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p₂ can ever be needed. Otherwise, we have that for some variable x, σ₁x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor c_j of x's type. That is, split the original q into multiple ideal patterns q_j = [x:=c_j y₁ .. y_{arity(c_j)}]q, and refine the patterns accordingly for each q_j to maintain the invariant, dropping those that don't match.

Let's take the example with {[], x :: y :: zs} (using :: for cons). We start with

  xs covering  {[], x :: y :: zs}

and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting

  [] covering {[]}
  x :: ys covering {x :: y :: zs}

The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting.

  x :: [] covering {}
  x :: y :: zs covering {x :: y :: zs}

and we can see from the first subproblem that we're banjaxed.

The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with

  xs covering {[], ys}

with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant.

  [] covering {[], []}
  x :: xs covering {y :: ys}

Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the :: case here).

So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns.

The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations.

Anyhow, that's the basic picture.

回复收藏 0 原文

凡间太子 2024-12-18 20:09:04

这是非专家的一些代码。它显示了如果将模式限制为列表构造函数，问题会是什么样子。换句话说，这些模式只能与包含列表的列表一起使用。以下是一些类似的列表：[]、[[]]、[[];[]]。

如果您在 OCaml 解释器中启用 -rectypes，则这组列表具有单一类型：('a list) as 'a。

type reclist = ('a list) as 'a

这是用于表示匹配模式的类型针对 reclist 类型：

type p = Nil | Any | Cons of p * p

要将 OCaml 模式转换为这种形式，首先使用 (::) 重写。然后替换[]
与 Nil、_ 与 Any、(::) 与 Cons。所以模式 [] :: _ 转换为
缺点（无，任何）

这是一个将模式与 relist 相匹配的函数：

let rec pmatch (p: p) (l: reclist) =
    match p, l with
    | Any, _ -> true
    | Nil, [] -> true
    | Cons (p', q'), h :: t -> pmatch p' h && pmatch q' t
    | _ -> false

以下是它在使用中的外观。请注意 -rectypes 的使用：

$ ocaml312 -rectypes
        Objective Caml version 3.12.0

# #use "pat.ml";;
type p = Nil | Any | Cons of p * p
type reclist = 'a list as 'a
val pmatch : p -> reclist -> bool = <fun>
# pmatch (Cons(Any, Nil)) [];;
- : bool = false
# pmatch (Cons(Any, Nil)) [[]];;
- : bool = true
# pmatch (Cons(Any, Nil)) [[]; []];;
- : bool = false
# pmatch (Cons (Any, Nil)) [ [[]; []] ];;
- : bool = true
#

模式 Cons (Any, Nil) 应该匹配任何长度为 1 的列表，而且它似乎确实有效。

因此，编写一个函数 intersect 似乎相当简单，它接受两个模式并返回一个与这两个模式匹配的交集相匹配的模式。由于模式可能根本不相交，因此当没有相交时它返回 None ，否则返回 Some p 。

let rec inter_exc pa pb =
    match pa, pb with
    | Nil, Nil -> Nil
    | Cons (a, b), Cons (c, d) -> Cons (inter_exc a c, inter_exc b d)
    | Any, b -> b
    | a, Any -> a
    | _ -> raise Not_found

let intersect pa pb =
    try Some (inter_exc pa pb) with Not_found -> None

let intersectn ps =
    (* Intersect a list of patterns.
     *)
    match ps with
    | [] -> None
    | head :: tail ->
        List.fold_left
            (fun a b -> match a with None -> None | Some x -> intersect x b)
            (Some head) tail

作为一个简单的测试，将模式 [_, []] 与模式 [[], _] 相交。
前者与_::[]::[]相同，Cons (Any, Cons (Nil, Nil))也相同。
后者与 [] :: _ :: [] 相同，Cons (Nil, (Cons (Any, Nil)) 也相同。

# intersect (Cons (Any, Cons (Nil, Nil))) (Cons (Nil, Cons (Any, Nil)));;
- : p option = Some (Cons (Nil, Cons (Nil, Nil)))

结果看起来不错 []]

似乎这足以回答有关重叠模式的问题，如果它们的交集不是None。

右：[[] , 您需要使用模式列表这是一个函数。
exhaust 用于测试给定的模式列表是否详尽：

let twoparts l =
    (* All ways of partitioning l into two sets.
     *)
    List.fold_left
        (fun accum x ->
            let absent = List.map (fun (a, b) -> (a, x :: b)) accum
            in
                List.fold_left (fun accum (a, b) -> (x :: a, b) :: accum)
                    absent accum)
        [([], [])] l

let unique l =
   (* Eliminate duplicates from the list.  Makes things
    * faster.
    *)
   let rec u sl=
        match sl with
        | [] -> []
        | [_] -> sl
        | h1 :: ((h2 :: _) as tail) ->
            if h1 = h2 then u tail else h1 :: u tail
    in
        u (List.sort compare l)

let mkpairs ps =
    List.fold_right
        (fun p a -> match p with Cons (x, y) -> (x, y) :: a | _ -> a) ps []

let rec submatches pairs =
    (* For each matchable subset of fsts, return a list of the
     * associated snds.  A matchable subset has a non-empty
     * intersection, and the intersection is not covered by the rest of
     * the patterns.  I.e., there is at least one thing that matches the
     * intersection without matching any of the other patterns.
     *)
    let noncovint (prs, rest) =
        let prs_firsts = List.map fst prs in
        let rest_firsts = unique (List.map fst rest) in
        match intersectn prs_firsts with
        | None -> false
        | Some i -> not (cover i rest_firsts)
    in let pairparts = List.filter noncovint (twoparts pairs)
    in
        unique (List.map (fun (a, b) -> List.map snd a) pairparts)

and cover_pairs basepr pairs =
    cover (fst basepr) (unique (List.map fst pairs)) &&
        List.for_all (cover (snd basepr)) (submatches pairs)

and cover_cons basepr ps =
    let pairs = mkpairs ps
    in let revpair (a, b) = (b, a)
    in
        pairs <> [] &&
        cover_pairs basepr pairs &&
        cover_pairs (revpair basepr) (List.map revpair pairs)

and cover basep ps =
    List.mem Any ps ||
        match basep with
        | Nil -> List.mem Nil ps
        | Any -> List.mem Nil ps && cover_cons (Any, Any) ps
        | Cons (a, b) -> cover_cons (a, b) ps

let exhaust ps =
    cover Any ps

模式就像一棵树，内部节点中有 Cons ，而 Nil 或 >叶子上的任何。基本思想是，如果您始终在至少一个模式中达到 Any（无论输入是什么样的），则一组模式是详尽的。在此过程中，您需要在每个点同时看到“零”和“缺点”。如果您在所有模式中的同一位置达到 Nil ，则意味着有一个较长的输入无法与任何模式匹配。另一方面，如果您在所有模式中的同一位置仅看到 Cons，则表明存在一个在该点结束的输入，该输入不会匹配。

困难的部分是检查 Cons 的两个子模式是否详尽。该代码的工作方式与我手动检查时的方式相同：它找到可以在左侧匹配的所有不同子集，然后确保相应的右侧子模式在每种情况下都是详尽的。然后同样的方法左右颠倒。由于我不是专家（对我来说一直更明显），因此可能有更好的方法来做到这一点。

这是一个使用此函数的会话：

# exhaust [Nil];;
- : bool = false
# exhaust [Any];;
- : bool = true
# exhaust [Nil; Cons (Nil, Any); Cons (Any, Nil)];;
- : bool = false
# exhaust [Nil; Cons (Any, Any)];;
- : bool = true
# exhaust [Nil; Cons (Any, Nil); Cons (Any, (Cons (Any, Any)))];;
- : bool = true

我根据 30,000 个随机生成的模式检查了此代码，因此我有信心它是正确的。我希望这些粗浅的观察可能会有一些用处。

Here is some code from a non-expert. It shows what the problem looks like if you restrict your patterns to list constructors. In other words, the patterns can only be used with lists that contain lists. Here are some lists like that: [], [[]], [[];[]].

If you enable -rectypes in your OCaml interpreter, this set of lists has a single type: ('a list) as 'a.

type reclist = ('a list) as 'a

Here's a type for representing patterns that match against the reclist type:

type p = Nil | Any | Cons of p * p

To translate an OCaml pattern into this form, first rewrite using (::). Then replace []
with Nil, _ with Any, and (::) with Cons. So the pattern [] :: _ translates to
Cons (Nil, Any)

Here is a function that matches a pattern against a reclist:

let rec pmatch (p: p) (l: reclist) =
    match p, l with
    | Any, _ -> true
    | Nil, [] -> true
    | Cons (p', q'), h :: t -> pmatch p' h && pmatch q' t
    | _ -> false

Here's how it looks in use. Note the use of -rectypes:

$ ocaml312 -rectypes
        Objective Caml version 3.12.0

# #use "pat.ml";;
type p = Nil | Any | Cons of p * p
type reclist = 'a list as 'a
val pmatch : p -> reclist -> bool = <fun>
# pmatch (Cons(Any, Nil)) [];;
- : bool = false
# pmatch (Cons(Any, Nil)) [[]];;
- : bool = true
# pmatch (Cons(Any, Nil)) [[]; []];;
- : bool = false
# pmatch (Cons (Any, Nil)) [ [[]; []] ];;
- : bool = true
#

The pattern Cons (Any, Nil) should match any list of length 1, and it definitely seems to be working.

So then it seems fairly straightforward to write a function intersect that takes two patterns and returns a pattern that matches the intersection of what is matched by the two patterns. Since the patterns might not intersect at all, it returns None when there's no intersection and Some p otherwise.

let rec inter_exc pa pb =
    match pa, pb with
    | Nil, Nil -> Nil
    | Cons (a, b), Cons (c, d) -> Cons (inter_exc a c, inter_exc b d)
    | Any, b -> b
    | a, Any -> a
    | _ -> raise Not_found

let intersect pa pb =
    try Some (inter_exc pa pb) with Not_found -> None

let intersectn ps =
    (* Intersect a list of patterns.
     *)
    match ps with
    | [] -> None
    | head :: tail ->
        List.fold_left
            (fun a b -> match a with None -> None | Some x -> intersect x b)
            (Some head) tail

As a simple test, intersect the pattern [_, []] against the pattern [[], _].
The former is the same as _ :: [] :: [], and so is Cons (Any, Cons (Nil, Nil)).
The latter is the same as [] :: _ :: [], and so is Cons (Nil, (Cons (Any, Nil)).

# intersect (Cons (Any, Cons (Nil, Nil))) (Cons (Nil, Cons (Any, Nil)));;
- : p option = Some (Cons (Nil, Cons (Nil, Nil)))

The result looks pretty right: [[], []].

It seems like this is enough to answer the question about overlapping patterns. Two patterns overlap if their intersection is not None.

For exhaustiveness you need to work with a list of patterns. Here is a function
exhaust that tests whether a given list of patterns is exhaustive:

let twoparts l =
    (* All ways of partitioning l into two sets.
     *)
    List.fold_left
        (fun accum x ->
            let absent = List.map (fun (a, b) -> (a, x :: b)) accum
            in
                List.fold_left (fun accum (a, b) -> (x :: a, b) :: accum)
                    absent accum)
        [([], [])] l

let unique l =
   (* Eliminate duplicates from the list.  Makes things
    * faster.
    *)
   let rec u sl=
        match sl with
        | [] -> []
        | [_] -> sl
        | h1 :: ((h2 :: _) as tail) ->
            if h1 = h2 then u tail else h1 :: u tail
    in
        u (List.sort compare l)

let mkpairs ps =
    List.fold_right
        (fun p a -> match p with Cons (x, y) -> (x, y) :: a | _ -> a) ps []

let rec submatches pairs =
    (* For each matchable subset of fsts, return a list of the
     * associated snds.  A matchable subset has a non-empty
     * intersection, and the intersection is not covered by the rest of
     * the patterns.  I.e., there is at least one thing that matches the
     * intersection without matching any of the other patterns.
     *)
    let noncovint (prs, rest) =
        let prs_firsts = List.map fst prs in
        let rest_firsts = unique (List.map fst rest) in
        match intersectn prs_firsts with
        | None -> false
        | Some i -> not (cover i rest_firsts)
    in let pairparts = List.filter noncovint (twoparts pairs)
    in
        unique (List.map (fun (a, b) -> List.map snd a) pairparts)

and cover_pairs basepr pairs =
    cover (fst basepr) (unique (List.map fst pairs)) &&
        List.for_all (cover (snd basepr)) (submatches pairs)

and cover_cons basepr ps =
    let pairs = mkpairs ps
    in let revpair (a, b) = (b, a)
    in
        pairs <> [] &&
        cover_pairs basepr pairs &&
        cover_pairs (revpair basepr) (List.map revpair pairs)

and cover basep ps =
    List.mem Any ps ||
        match basep with
        | Nil -> List.mem Nil ps
        | Any -> List.mem Nil ps && cover_cons (Any, Any) ps
        | Cons (a, b) -> cover_cons (a, b) ps

let exhaust ps =
    cover Any ps

A pattern is like a tree with Cons in the internal nodes and Nil or Any at the leaves. The basic idea is that a set of patterns is exhaustive if you always reach Any in at least one of the patterns (no matter what the input looks like). And along the way, you need to see both Nil and Cons at each point. If you reach Nil at the same spot in all the patterns, it means there's a longer input that won't be matched by any of them. On the other hand, if you see just Cons at the same spot in all the patterns, there's an input that ends at that point that won't be matched.

The difficult part is checking for exhaustiveness of the two subpatterns of a Cons. This code works the way I do when I check by hand: it finds all the different subsets that could match at the left, then makes sure that the corresponding right subpatterns are exhaustive in each case. Then the same with left and right reversed. Since I'm a nonexpert (more obvious to me all the time), there are probably better ways to do this.

Here is a session with this function:

# exhaust [Nil];;
- : bool = false
# exhaust [Any];;
- : bool = true
# exhaust [Nil; Cons (Nil, Any); Cons (Any, Nil)];;
- : bool = false
# exhaust [Nil; Cons (Any, Any)];;
- : bool = true
# exhaust [Nil; Cons (Any, Nil); Cons (Any, (Cons (Any, Any)))];;
- : bool = true

I checked this code against 30,000 randomly generated patterns, and so I have some confidence that it's right. I hope these humble observations may prove to be of some use.

回复收藏 0 原文