用 C 和 C++ 编写的解释器是如何工作的?将标识符绑定到 C(++) 函数

发布于 2024-08-27 11:48:59 字数 587 浏览 12 评论 0原文

我在这里谈论的是 C 和/或 C++,因为这是我所知道的唯一用于解释器的语言,其中以下可能会出现问题:

如果我们有一种解释性语言 X,为其编写的库如何向该语言添加函数然后可以从用该语言编写的程序中调用吗?

PHP 示例:

substr( $str, 5, 10 );
  • 如何将函数 substr 添加到 PHP 的“函数池”中,以便可以在脚本中调用它?

PHP 可以轻松地将所有已注册的函数名称存储在数组中,并在脚本中调用函数时对其进行搜索。但是,C(++)中显然没有eval,那么如何调用该函数呢?我假设 PHP 没有 100MB 的代码,比如:

if( identifier == "substr" )
{
   return PHP_SUBSTR(...);
} else if( ... ) {
   ...
}

哈哈,那会很有趣。我希望到目前为止您已经理解了我的问题。

  • 用C/C++编写的解释器如何解决这个问题?
  • 我该如何为我自己用 C++ 编写的实验性玩具解释器解决这个问题?

I'm talking about C and/or C++ here as this are the only languages I know used for interpreters where the following could be a problem:

If we have an interpreted language X how can a library written for it add functions to the language which can then be called from within programs written in the language?

PHP example:

substr( $str, 5, 10 );
  • How is the function substr added to the "function pool" of PHP so it can be called from within scripts?

It is easy for PHP storing all registered function names in an array and searching through it as a function is called in a script. However, as there obviously is no eval in C(++), how can the function then be called? I assume PHP doesn't have 100MB of code like:

if( identifier == "substr" )
{
   return PHP_SUBSTR(...);
} else if( ... ) {
   ...
}

Ha ha, that would be pretty funny. I hope you have understood my question so far.

  • How do interpreters written in C/C++ solve this problem?
  • How can I solve this for my own experimental toy interpreter written in C++?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眼藏柔 2024-09-03 11:48:59

实际上脚本语言所做的事情就像你提到的那样。
它们包装函数并将该函数注册到解释器引擎。

卢阿样本:

static int io_read (lua_State *L) {
  return g_read(L, getiofile(L, IO_INPUT), 1);
}


static int f_read (lua_State *L) {
  return g_read(L, tofile(L), 2);
}
...
static const luaL_Reg flib[] = {
  {"close", io_close},
  {"flush", f_flush},
  {"lines", f_lines},
  {"read", f_read},
  {"seek", f_seek},
  {"setvbuf", f_setvbuf},
  {"write", f_write},
  {"__gc", io_gc},
  {"__tostring", io_tostring},
  {NULL, NULL}
};
...
luaL_register(L, NULL, flib);  /* file methods */

Actually scripting languages do something like what you mentioned.
They wrap functions and they register that functions to the interpreter engine.

Lua sample:

static int io_read (lua_State *L) {
  return g_read(L, getiofile(L, IO_INPUT), 1);
}


static int f_read (lua_State *L) {
  return g_read(L, tofile(L), 2);
}
...
static const luaL_Reg flib[] = {
  {"close", io_close},
  {"flush", f_flush},
  {"lines", f_lines},
  {"read", f_read},
  {"seek", f_seek},
  {"setvbuf", f_setvbuf},
  {"write", f_write},
  {"__gc", io_gc},
  {"__tostring", io_tostring},
  {NULL, NULL}
};
...
luaL_register(L, NULL, flib);  /* file methods */
苍景流年 2024-09-03 11:48:59

解释器可能只是将函数名称的哈希图保留到函数定义(其中包括参数信息、返回类型、函数位置/定义等),这样,您就可以在哈希图上搜索函数名称(当您的解释器遇到一个)。如果存在,则使用哈希表中的函数信息来评估它。

显然,您需要为不同级别的范围等添加规定,但这就是要点。

Interpreters probably just keep a hashmap of function names to the function definition (which will include parameter information, return type, function location/definition etc.) That way, you can just do a search on the hashmap for a function name (when your interpreter encounters one). If it exists, use the function info in the hashtable to evaluate it.

You obviously need to add provisions for different levels of scope, etc. but that's the gist of it.

掩饰不了的爱 2024-09-03 11:48:59

几乎所有编译器都有一个“符号表”,用于查找标识符所代表的内容。符号表将保存函数名称、变量名称、类型名称等...任何有名称的内容都会放入符号表中,符号表基本上是编译器所知道的有关该名称的所有内容的名称映射(我在这里进行了简化) )。然后,当编译器遇到标识符时,它会在符号表中查找它,并发现它是一个函数。如果您使用解释器,则符号表将包含有关在何处查找函数并继续解释的信息。如果这是一个编译器,则符号表将具有该函数在编译代码中的位置的地址(或稍后填充该地址的占位符)。然后可以生成汇编,其本质上是:将参数放入堆栈,并在某个地址恢复执行。

因此,对于您的示例,解释器会

substr( $str, 5, 10 );

在其符号表中查看并找到“substr”:

symbolTableEntry entry = symbolTable["substr"];

从那里,它将收集 $str510 作为参数,并查看 entry 以查看参数对于该函数是否有效。然后它将在 entry 中查找以找出使用编组参数跳转到的位置。

Pretty much all compilers have a "symbol table" that they use to look up what an identifier represents. The symbol table will hold function name, variable names, type names, etc... Anything that has a name goes in a symbol table, which is basically a map of names to everything the compiler knows about that name (I'm simplifying here). Then when the compiler encounters an identifier, it look it up in the symbol table, and finds out that it's a function. If you're using an interpreter, then the symbol table will have information on where to find the function and continue interpretation. If this is a compiler, the symbol table will have an address of where that function will be in the compiled code (or a placeholder to fill in the address later). Assembly can then be produced that essentially says: put the arguments on the stack, and resume execution at some address.

So, for you're example an interpreter would look at

substr( $str, 5, 10 );

and find "substr" in it's symbol table:

symbolTableEntry entry = symbolTable["substr"];

from there, it will gather up $str, 5 and 10 as arguments, and look at entry to see that the arguments are valid for the function. Then it will look in entry to find out where to jump to with the marshalled arguments.

聊慰 2024-09-03 11:48:59

在 C++ 中,您可能会使用与 Nick D 类似的机制,但要利用其 OO 功能:

typedef luaFunction boost::function<void(*)(lua_State&)>
std::map<std::string, luaFunction > symbolTable;
symbolTable["read"] = f_read;
symbolTable["close"] = f_close; // etc.
// ...
luaFunction& f = symbolTable[*symbolIterator++];
f(currentLuaState);

In C++ you'd probably use a similar mechanism as Nick D did, but taking advantage of its OO capabilities:

typedef luaFunction boost::function<void(*)(lua_State&)>
std::map<std::string, luaFunction > symbolTable;
symbolTable["read"] = f_read;
symbolTable["close"] = f_close; // etc.
// ...
luaFunction& f = symbolTable[*symbolIterator++];
f(currentLuaState);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文