当前位置：文江博客话题详情

RUST pattern-matching compiler-construction

匹配表达式（模式匹配）如何不需要运行时类型信息来工作？

发布于 2025-01-11 20:17:07 字数 2482 浏览 0 评论 0 原文

匹配 表达式在高层实现？为了让编译器知道如何将某些代码片段定向到一个分支与另一个分支，并在编译时弄清楚它，在幕后会发生什么？我不明白如果不存储运行时使用的类型信息，这怎么可能。

像这样的例子：

fn tree_weight_v1(t: BinaryTree) -> i32 {
    match t {
        BinaryTree::Leaf(payload) => payload,
        BinaryTree::Node(left, payload, right) => {
            tree_weight_v1(*left) + payload + tree_weight_v1(*right)
        }
    }
}

/// Returns tree that Looks like:
///
///      +----(4)---+
///      |          |
///   +-(2)-+      [5]
///   |     |   
///  [1]   [3]
///
fn sample_tree() -> BinaryTree {
    let l1 = Box::new(BinaryTree::Leaf(1));
    let l3 = Box::new(BinaryTree::Leaf(3));
    let n2 = Box::new(BinaryTree::Node(l1, 2, l3));
    let l5 = Box::new(BinaryTree::Leaf(5));

    BinaryTree::Node(n2, 4, l5)
}

#[test]
fn tree_demo_1() {
    let tree = sample_tree();
    assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
}

通过运行时类型信息，我的意思是你实际上存储了一个指向类型定义的指针或类似的东西，所以当运行时执行到达 match 时，它只是检查 value.type ==给定_模式_表达式或它的一些变体。据我了解，事实并非如此。也就是说，Rust 不会在运行时将类型与结构/记录一起存储。根据我的理解，它是在编译时计算的。

如果我在每个结构/记录上存储.type，我可能可以实现这种功能。然后您只需查找类型并查看是否匹配即可。但是，我看不到在编译器级别实现此功能的方法，因此它在运行时提前计算出匹配。

我认为它发生在编译时，因为像是否可以对 Rust 中的泛型进行编译时类型检查？或运行时特征实现检查？，以及许多其他帖子似乎暗示了一切发生在编译时，除了极少数情况下您可以选择加入运行时检查。 （这是我这几天总结几十篇文章的理解）。或者另一个引用是：

确实需要检查给定的二叉树是叶子还是节点，但编译器静态地确保完成此类检查：您不能意外地将叶子的数据解释为节点，反之亦然。

对我来说，编译器静态地计算出我上面发布的 BinaryTree 案例中的模式匹配。不知何故。

将模式匹配与类型检查集成是一篇关于编程语言的随机帖子这与我的理解一致，您需要运行时类型信息来完成此操作。所以我很困惑。

如果类型是模式，则您需要能够在运行时确定值的类型。一般来说，这需要运行时类型信息（许多语言故意删除这些信息）；但它还需要一些包含这两种情况的超类型（许多语言故意没有）。

为 Rust

原文

How is the match expression implemented at a high level? What happens under the hood for the compiler to know how to direct certain strains of code to one branch vs. the other, figuring it out at compile time? I don't see how this is possible without storing type information for use at runtime.

Something like this example:

fn tree_weight_v1(t: BinaryTree) -> i32 {
    match t {
        BinaryTree::Leaf(payload) => payload,
        BinaryTree::Node(left, payload, right) => {
            tree_weight_v1(*left) + payload + tree_weight_v1(*right)
        }
    }
}

/// Returns tree that Looks like:
///
///      +----(4)---+
///      |          |
///   +-(2)-+      [5]
///   |     |   
///  [1]   [3]
///
fn sample_tree() -> BinaryTree {
    let l1 = Box::new(BinaryTree::Leaf(1));
    let l3 = Box::new(BinaryTree::Leaf(3));
    let n2 = Box::new(BinaryTree::Node(l1, 2, l3));
    let l5 = Box::new(BinaryTree::Leaf(5));

    BinaryTree::Node(n2, 4, l5)
}

#[test]
fn tree_demo_1() {
    let tree = sample_tree();
    assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
}

By runtime type information, I mean you literally store a pointer to the type definition or something like that, so under the hood when the runtime execution reaches the match, it simply checks value.type == given_pattern_expression or some variant of that. From my understanding, this is not the case. That is, Rust doesn't store the type with the structs/records in the runtime. From my understanding, somehow it is computed at compile time.

I could probably implement this sort of feature if I stored the .type on each struct/record. Then you just simply look up the type and see if it matches. However, I cannot see a way to implement this at the compiler level, so it figures out matching in advance of runtime.

I think it happens at compile time because of posts like Is it possible to do a compile time type check on a generic in Rust?, or Runtime trait implementation checking?, amongst many other posts which seems to suggest everything happens at compile time, except for a tiny few cases where you can opt-into runtime checking. (this is my understanding from summarizing dozens of articles the past few days). Or another quote is:

One does need to check whether a given BinaryTree is a Leaf or is a Node, but the compiler statically ensures such checks are done: you cannot accidentally interpret the data of a Leaf as if it were a Node, nor vice versa.

That to me says the compiler statically figures out the pattern matching in the BinaryTree case I posted above. Somehow.

Integrating Pattern Matching with Type Checking is a random post about programming languages that agrees with my understanding, that you need runtime type information to accomplish this. So I am confused.

If types are patterns, you need to be able to determine the type of a value at runtime. In general, this requires runtime type-information (which many languages intentionally erase); but it also requires some supertype that contains the two cases (which many languages intentionally do not have).

Building a runtime reflection system for Rust ????️ (Part 1) also explains how Rust doesn't have runtime reflection, which aids in my thinking that everything happens at compile time.

https://oswalt.dev/2021/06/polymorphism-in-rust/

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遇到 2025-01-18 20:17:07

match 表达式不需要需要运行时类型信息；由于 match 只接受单个已知类型的单个表达式，根据定义，它可以利用编译时信息。

另请参阅：

`匹配`在编译时与运行时

在编译时，每个match 表达式将被验证为详尽：处理值的所有可能形状。

在运行时，代码将确定执行哪个特定的匹配臂。您可以将 match 视为通过奇特的 if-else 链实现的。

由于我们人类在沟通时往往不是非常精确，因此某些资源很可能模糊了这两个方面之间的界限。

具体关注枚举

枚举变体不是独立的类型。也就是说，给定枚举 Foo，Foo::Bar 不是类型 — 它是 Foo 类型的值。这与 false 不是类型相同 - 它是 bool 类型的值。相同的逻辑适用于i32（类型）和42（值）。

在大多数情况下，枚举被实现为与每个枚举变体可能的值相对应的大量字节，每个变体的数据彼此分层。这称为联合。

然后在这堆字节旁边添加一个判别式。这是一个整数值，指定该值是哪个变体。添加判别式使其成为标记联合。

概念上类似于此伪 Rust：

if discriminant(&enum_value) == VARIANT_1_DISCR {
    let data = mem::transmute(data(&enum_value));
} else if discriminant(&enum_value) == VARIANT_2_DISCR {
    let data = mem::transmute(data(&enum_value));
}

另请参阅：

为什么枚举需要额外的内存大小？

枚举上的匹配在

Rust 为何没有运行时反射

我不同意它没有有它，但它默认肯定没有它。 Rust 中的运行时类型信息通常会利用 Any< /code> 特征：

fn thing(value: &dyn std::any::Any) {
    if let Some(s) = value.downcast_ref::<String>() {
        dbg!(s.len());
    } else if let Some(i) = value.downcast_ref::<i32>() {
        dbg!(i + 1);
    }
}

匹配时缺少 Any 特征是一个额外的提示，表明不存在运行时类型信息。

其他语言

您需要能够在运行时确定值的类型

只有当你允许一个值有多种类型时，这才是正确的——Rust 不允许，因为它是一种静态类型语言。在动态类型语言中，您想要使用类型来匹配值，您确实需要拥有一定数量的运行时类型信息。

A match expression does not need runtime type information; as a match only accepts a single expression of a single known type, by definition it can leverage compile time information.

`match` at compile time vs runtime

At compile time, every match expression will be verified to be exhaustive: all possible shapes of the value are handled.

At run time, the code will determine which specific match arm is executed. You can think of a match as implemented via a fancy if-else chain.

As we humans tend to be not-extremely-precise when communicating, it's highly likely that some resources blur the line between these two aspects.

Concretely focusing on an enum

Enum variants are not standalone types. That is, given an enum Foo, Foo::Bar is not a type — it's a value of type Foo. This is the same as how false is not a type — it's a value of type bool. The same logic applies for i32 (type) and 42 (value).

In most cases, enums are implemented as a sea of bytes that correspond to the values each enum variant might be, with each variant's data layered on top of each other. This is known as a union.

Then a discriminant is added next to this soup of bytes. This is an integer value that specifies which variant the value is. Adding the discriminant makes it into a tagged union.

Matching on an enum is conceptually similar to this pseudo-Rust:

if discriminant(&enum_value) == VARIANT_1_DISCR {
    let data = mem::transmute(data(&enum_value));
} else if discriminant(&enum_value) == VARIANT_2_DISCR {
    let data = mem::transmute(data(&enum_value));
}

Reflection

how Rust doesn't have runtime reflection

I wouldn't agree that it doesn't have it, but it certainly doesn't have it by default. Runtime type information in Rust will usually leverage the Any trait:

fn thing(value: &dyn std::any::Any) {
    if let Some(s) = value.downcast_ref::<String>() {
        dbg!(s.len());
    } else if let Some(i) = value.downcast_ref::<i32>() {
        dbg!(i + 1);
    }
}

The absence of the Any trait when matching is an additional hint that no runtime type information is present.

Other languages

you need to be able to determine the type of a value at runtime

This is only true if you allow a value to be multiple types — Rust does not as it's a statically-typed language. In a dynamically-typed language where you want to match a value using a type, you would indeed need to have some amount of runtime type information.

回复收藏 0 原文

我不在是我 2025-01-18 20:17:07

Rust 使用的可以保存数据的枚举类型通常被称为“标记联合”。一般概念是它们或多或少通过应用区分标签以及可能值的联合来工作。

Rust 中的标记联合

例如，这里我使用匹配在 Rust 中打印一个枚举值：

pub enum Foo {
    Bar(i32),
    Baz {
        a: i32,
        b: i8,
    }
}

match foo {
    Foo::Bar(x) => println!("Bar({})", x),
    Foo::Baz { a, b } => println!("Baz {{ a: {}, b: {} }}", a, b),
}

内存中的标记联合

下面是如何使用没有标记联合的语言（如 C）做同样的事情：

struct Foo {
    enum {
        Bar, Baz
    } tag;

    union {
        // Data for Bar
        int32_t Bar;
        
        // Data for Baz
        struct {
            int32_t a;
            int8_t b;
        } Baz;
    } data;
}

// Check the tag to distinguish variants before accessing data
switch foo.tag {
    case Bar:
        printf("Bar(%d)", foo.data.Bar);
        break;
    case Baz:
        printf("Baz { a: %d, b: %d }", foo.data.Baz.a, foo.data.Baz.b);
}

现在，我可能犯了一个错误或我的 C 版本中有两个，有人可能能够指出一些改进，但我希望它通常能够显示标记联合如何在获取数据时区分变体。

Rust 中的标记联合

我不太了解 Rust 中如何实现标记联合，但很容易想象，Rust 可能会在这些系统上进行扩展，以进一步提高匹配效率。例如，如果您有一个枚举类型嵌套在另一个枚举类型中，您可以将它们的标签合并在一起，这样您就不需要进行多次比较（例如：编译器可能会压缩 Option> ） 到 OptionResultAB 中，只需要一个标签）。

另一个众所周知的例子是 Option> 与 *mut T 具有相同的大小，因为 null 可以用作 None > 价值。还有一些其他类似的情况，例如函数指针。

虽然我只是推测 Rust 中标记联合的形式，但

我确实找到了一个更好的研究解释，它很好地解释了这个概念：

https://patshaughnessy.net/2018/3/15/how-rust-implements-tagged-unions#:~:text=The%20Rust%20compiler%20implements%20tagged,unions%20at%20the%20same% 20次。

The style of enums Rust uses which can hold data are generally referred to as "tagged unions". The general concept is they more or less work by applying a distinguishing tag along with the union of possible values.

Tagged Unions in Rust

For example, here I print an enum value in Rust using a match:

pub enum Foo {
    Bar(i32),
    Baz {
        a: i32,
        b: i8,
    }
}

match foo {
    Foo::Bar(x) => println!("Bar({})", x),
    Foo::Baz { a, b } => println!("Baz {{ a: {}, b: {} }}", a, b),
}

Tagged Unions in Memory

And here is how you might do the same thing using a language without tagged unions like C:

struct Foo {
    enum {
        Bar, Baz
    } tag;

    union {
        // Data for Bar
        int32_t Bar;
        
        // Data for Baz
        struct {
            int32_t a;
            int8_t b;
        } Baz;
    } data;
}

// Check the tag to distinguish variants before accessing data
switch foo.tag {
    case Bar:
        printf("Bar(%d)", foo.data.Bar);
        break;
    case Baz:
        printf("Baz { a: %d, b: %d }", foo.data.Baz.a, foo.data.Baz.b);
}

Now, I probably made a mistake or two in my C version and someone will probably be able to point out some improvements, but I hope it generally shows how a tagged union can distinguish between variants when fetching data.

Tagged Unions in Rust

I don't know much about how tagged unions are implemented in rust, but it is easy to imagine that rust could likely expand on these systems to further increase the efficiency of the match. For instance if you have one enum type nested in another you may be able to merge their tags together so you don't need to do multiple comparisons (Ex: The compiler might compress Option<Result<A, B>> into a OptionResultAB which only requires a single tag).

Another well known example is how Option<NonNull<T>> has the same size as *mut T since null can be used as the None value. There are also a few other similar cases like function pointers.

References

While I am just speculating on the form of tagged unions in Rust, I did find a better researched explanation which does a good job explaining the concept:

https://patshaughnessy.net/2018/3/15/how-rust-implements-tagged-unions#:~:text=The%20Rust%20compiler%20implements%20tagged,unions%20at%20the%20same%20time.

回复收藏 0 原文

~没有更多了~

关于作者

阿楠

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

匹配表达式（模式匹配）如何不需要运行时类型信息来工作？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

`匹配`在编译时与运行时

具体关注枚举

枚举上的匹配在

其他语言

`match` at compile time vs runtime

Concretely focusing on an enum

Reflection

Other languages

Rust 中的标记联合

内存中的标记联合

Rust 中的标记联合

虽然我只是推测 Rust 中标记联合的形式，但

Tagged Unions in Rust

Tagged Unions in Memory

Tagged Unions in Rust

References

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

匹配表达式（模式匹配）如何不需要运行时类型信息来工作？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

匹配在编译时与运行时

具体关注枚举

枚举上的匹配在

其他语言

match at compile time vs runtime

Concretely focusing on an enum

Reflection

Other languages

Rust 中的标记联合

内存中的标记联合

Rust 中的标记联合

虽然我只是推测 Rust 中标记联合的形式，但

Tagged Unions in Rust

Tagged Unions in Memory

Tagged Unions in Rust

References

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

`匹配`在编译时与运行时

`match` at compile time vs runtime