当前位置：文江博客话题详情

无堆栈 VM 实施会出现哪些 C 集成问题？

发布于 2024-07-18 11:34:09 字数 157 浏览 19 评论 0原文

我所说的无堆栈虚拟机是指在堆上维护自己的堆栈而不是使用系统“C-stack”的实现。这有很多优点，例如延续和可序列化状态，但在 C 绑定方面也有一些缺点，尤其是 C-VM-C 类型的回调（或 VM-C-VM）。

问题是这些缺点到底是什么？谁能举一个实际问题的好例子吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贩梦商人 2024-07-25 11:34:09

听起来您已经熟悉了一些缺点和优点。

其他一些：
a）即使底层实现没有任何支持，也可以支持适当的尾部调用优化
b）更容易构建诸如语言级“堆栈跟踪”之类的东西
c) 更容易添加适当的延续，正如您所指出的，

我最近用 C# 编写了一个简单的“Scheme”解释器，它最初使用 .NET 堆栈。然后我重新编写它以使用显式堆栈 - 所以也许以下内容会对您有所帮助：

第一个版本使用隐式 .NET 运行时堆栈...

最初，它只是一个类层次结构，具有不同的形式（Lambda、Let、等）是以下接口的实现：

// A "form" is an expression that can be evaluted with
// respect to an environment
// e.g.
// "(* x 3)"
// "x"
// "3"
public interface IForm
{
    object Evaluate(IEnvironment environment);
}

IEnvironment 看起来正如您所期望的：

/// <summary>
/// Fundamental interface for resolving "symbols" subject to scoping.
/// </summary>
public interface IEnvironment
{
    object Lookup(string name);
    IEnvironment Extend(string name, object value);
}

为了将“内置函数”添加到我的方案解释器中，我最初具有以下接口：

/// <summary>
/// A function is either a builtin function (i.e. implemented directly in CSharp)
/// or something that's been created by the Lambda form.
/// </summary>
public interface IFunction
{
    object Invoke(object[] args);
}

那是它使用隐式 .NET 运行时堆栈的时候。代码肯定少了，但不可能添加适当的尾递归之类的东西，最重要的是，我的解释器在运行时错误的情况下能够提供“语言级”堆栈跟踪是很尴尬的。

所以我重写了它以具有显式的（堆分配的）堆栈。

我的“IFunction”接口必须更改为以下内容，以便我可以实现诸如“map”和“apply”之类的东西，它们会回调到Scheme解释器：

/// <summary>
/// A function that wishes to use the thread state to
/// evaluate its arguments. The function should either:
/// a) Push tasks on to threadState.Pending which, when evaluated, will
///   result in the result being placed on to threadState.Results
/// b) Push its result directly on to threadState.Results
/// </summary>
public interface IStackFunction
{
    void Evaluate(IThreadState threadState, object[] args);
}

并且IForm更改为：

public interface IForm
{
    void Evaluate(IEnvironment environment, IThreadState s);
}

其中IThreadState如下：

/// <summary>
/// The state of the interpreter.
/// The implementation of a task which takes some arguments,
/// call them "x" and "y", and which returns an argument "z",
/// should follow the following protocol:
/// a) Call "PopResult" to get x and y
/// b) Either
///   i) push "z" directly onto IThreadState using PushResult OR
///   ii) push a "task" on to the stack which will result in "z" being
///       pushed on to the result stack.
/// 
/// Note that ii) is "recursive" in its definition - that is, a task
/// that is pushed on to the task stack may in turn push other tasks
/// on the task stack which, when evaluated, 
/// ... ultimately will end up pushing the result via PushResult.
/// </summary>
public interface IThreadState
{
    void PushTask(ITask task);
    object PopResult();
    void PushResult(object result);
}

ITask是：

public interface ITask
{
    void Execute(IThreadState s);
}

我的主要“事件”循环是：

ThreadState threadState = new ThreadState();
threadState.PushTask(null);
threadState.PushTask(new EvaluateForm(f, environment));
ITask next = null;

while ((next = threadState.PopTask()) != null)
    next.Execute(threadState);

return threadState.PopResult(); // Get what EvaluateForm evaluated to

EvaluateForm 只是一个在特定环境下调用 IForm.Evaluate 的任务。

就我个人而言，我发现这个新版本从实现的角度来看“更好”使用 - 易于获得堆栈跟踪，易于使其实现完整的延续（尽管......我还没有这样做 - 需要使我的“堆栈”持久化链表而不是使用 C# Stack，并且 ITask“返回”新的 ThreadState 而不是改变它，以便我可以执行“调用连续”任务）...等等。

基本上，您只是减少了对底层语言实现的依赖。

我能发现的唯一缺点是性能......但就我而言，它只是一个解释器，所以我不太关心性能。

我还想向您推荐这篇非常好的文章，介绍了用堆栈将递归代码重写为迭代代码的好处，作者是 KAI C++ 编译器的作者之一：考虑递归

It sounds like you're already familiar with some of the disadvantages and the advantages.

Some others:
a) Makes it possible to support proper tail call optimization even if the underlying implementation does not have any support for it
b) Easier to construct things like a language level "stack trace"
c) Easier to add proper continuations, as you pointed out

I recently wrote a simple "Scheme" interpreter in C#, which initially used the .NET stack. I then re-wrote it to use an explicit stack - so perhaps the following will help you:

The first version used the implicit .NET runtime stack...

Initially, it was just a class hierarchy, with different forms (Lambda, Let, etc.) being implementations of the following interface:

// A "form" is an expression that can be evaluted with
// respect to an environment
// e.g.
// "(* x 3)"
// "x"
// "3"
public interface IForm
{
    object Evaluate(IEnvironment environment);
}

IEnvironment looked as you'd expect:

/// <summary>
/// Fundamental interface for resolving "symbols" subject to scoping.
/// </summary>
public interface IEnvironment
{
    object Lookup(string name);
    IEnvironment Extend(string name, object value);
}

For adding "builtins" to my Scheme interpreter, I initially had the following interface:

/// <summary>
/// A function is either a builtin function (i.e. implemented directly in CSharp)
/// or something that's been created by the Lambda form.
/// </summary>
public interface IFunction
{
    object Invoke(object[] args);
}

That was when it used the implicit .NET runtime stack. There was definitely less code, but it was impossible to add things like proper tail recursion, and most importantly, it was awkward for my interpreter to be able to provide a "language level" stack trace in the case of a runtime error.

So I rewrote it to have an explicit (heap allocated) stack.

My "IFunction" interface had to change to the following, so that I could implement things like "map" and "apply", which call back into the Scheme interpreter:

/// <summary>
/// A function that wishes to use the thread state to
/// evaluate its arguments. The function should either:
/// a) Push tasks on to threadState.Pending which, when evaluated, will
///   result in the result being placed on to threadState.Results
/// b) Push its result directly on to threadState.Results
/// </summary>
public interface IStackFunction
{
    void Evaluate(IThreadState threadState, object[] args);
}

And IForm changed to:

public interface IForm
{
    void Evaluate(IEnvironment environment, IThreadState s);
}

Where IThreadState is as follows:

/// <summary>
/// The state of the interpreter.
/// The implementation of a task which takes some arguments,
/// call them "x" and "y", and which returns an argument "z",
/// should follow the following protocol:
/// a) Call "PopResult" to get x and y
/// b) Either
///   i) push "z" directly onto IThreadState using PushResult OR
///   ii) push a "task" on to the stack which will result in "z" being
///       pushed on to the result stack.
/// 
/// Note that ii) is "recursive" in its definition - that is, a task
/// that is pushed on to the task stack may in turn push other tasks
/// on the task stack which, when evaluated, 
/// ... ultimately will end up pushing the result via PushResult.
/// </summary>
public interface IThreadState
{
    void PushTask(ITask task);
    object PopResult();
    void PushResult(object result);
}

And ITask is:

public interface ITask
{
    void Execute(IThreadState s);
}

And my main "event" loop is:

ThreadState threadState = new ThreadState();
threadState.PushTask(null);
threadState.PushTask(new EvaluateForm(f, environment));
ITask next = null;

while ((next = threadState.PopTask()) != null)
    next.Execute(threadState);

return threadState.PopResult(); // Get what EvaluateForm evaluated to

EvaluateForm is just a task that calls IForm.Evaluate with a specific environment.

Personally, I found this new version much "nicer" to work with from an implementation point of view - easy to get a stack trace, easy to make it implement full continuations (although... I haven't done this as yet - need to make my "stacks" persistent linked-lists rather than using C# Stack, and ITask "returns" the new ThreadState rather than mutating it so that I can have a "call-continuation" task)... etc. etc.

Basically, you're just less dependent on the underlying language implementation.

About the only downside I can find is performance... But in my case, it's just an interpreter so I don't care that much about performance anyway.

I'd also point you to this very nice article on the benefits of re-writing recursive code as iterative code with a stack, by one of the authors of the KAI C++ compiler: Considering Recursion

回复收藏 0 原文

白色秋天 2024-07-25 11:34:09

在与 Steve Dekorte（Io 编程语言的作者）和 Konstantin Olenin 进行电子邮件交谈后，我发现了一个问题及其（部分）解决方案。
想象一下从 VM 到 C 函数的调用，它回调 VM 方法。在VM执行回调期间，VM状态的一部分位于VM之外：在C堆栈和寄存器中。如果您当时保存虚拟机状态，则保证下次加载虚拟机时无法正确恢复状态。

解决方案是将 VM 建模为消息接收参与者：VM 可以向本机代码发送异步通知，本机代码可以向 VM 发送异步通知。也就是说，在单线程环境中，当VM获得控制权时，外部不会存储任何附加状态（与VM运行时无关的数据除外）。

这并不意味着您可以在任何情况下正确恢复虚拟机状态，但至少，您可以在其之上构建自己的可靠系统。

回复收藏 0 原文

~没有更多了~