C++口译员概念问题
我已经用 C++ 为我创建的语言构建了一个解释器。
设计中的一个主要问题是我的语言有两种不同的类型:数字和字符串。所以我必须传递一个结构,例如:
class myInterpreterValue
{
myInterpreterType type;
int intValue;
string strValue;
}
在我的语言中的倒计时循环期间,此类的对象每秒传递大约百万次。
分析指出:85% 的性能被字符串模板的分配函数消耗掉了。
这对我来说非常清楚:我的解释器设计很糟糕,并且没有充分使用指针。然而,我没有选择:在大多数情况下我不能使用指针,因为我只需要复制。
如何对此采取措施?像这样的课程是更好的主意吗?
vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
myInterpreterType type;
int locationInTable;
}
因此该类只知道它代表什么类型以及在表中的位置
但这又存在缺点: 我必须向 string/int 向量表添加临时值,然后再次删除它们,这会再次消耗大量性能。
- 求助,Python 或 Ruby 等语言的解释器是如何做到这一点的?他们以某种方式需要一个结构来表示语言中的值,例如可以是 int 或 string 的值。
I've built an interpreter in C++ for a language created by me.
One main problem in the design was that I had two different types in the language: number and string. So I have to pass around a struct like:
class myInterpreterValue
{
myInterpreterType type;
int intValue;
string strValue;
}
Objects of this class are passed around million times a second during e.g.: a countdown loop in my language.
Profiling pointed out: 85% of the performance is eaten by the allocation function of the string template.
This is pretty clear to me: My interpreter has bad design and doesn't use pointers enough. Yet, I don't have an option: I can't use pointers in most cases as I just have to make copies.
How to do something against this? Is a class like this a better idea?
vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
myInterpreterType type;
int locationInTable;
}
So the class only knows what type it represents and the position in the table
This however again has disadvantages:
I'd have to add temporary values to the string/int vector table and then remove them again, this would eat a lot of performance again.
- Help, how do interpreters of languages like Python or Ruby do that? They somehow need a struct that represents a value in the language like something that can either be int or string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我怀疑许多值不是字符串。因此,您可以做的第一件事就是删除
string
对象(如果您不需要它)。将其放入工会。另一件事是,可能许多字符串都很小,因此如果将小字符串保存在对象本身中,则可以摆脱堆分配。 LLVM 有一个SmallString
模板。然后你可以使用字符串驻留,正如另一个答案所说。 LLVM 有StringPool
类:调用intern ("foo")
并获取一个智能指针,该指针引用其他myInterpreterValue
对象也可能使用的共享字符串。联合可以这样写
boost::variant
为您做类型标记。如果你没有 boost,你可以这样实现。对齐方式在 C++ 中还无法移植,因此我们将一些可能需要较大对齐方式的类型推入存储联合中。你明白了。
I suspect many values aren't strings. So the first thing you can do is to get rid of the
string
object if you don't need it. Put it into an union. Another thing is that probably many of your strings are only small, thus you can get rid of heap allocation if you save small strings in the object itself. LLVM has theSmallString
template for that. And then you can use string interning, as another answer says too. LLVM has theStringPool
class for that: Callintern("foo")
and get a smart pointer refering to a shared string potentially used by othermyInterpreterValue
objects too.The union can be written like this
boost::variant
does the type tagging for you. You can implement it like this, if you don't have boost. The alignment can't be gotten portably in C++ yet, so we push some types that possibly require some large alignment into the storage union.You get the idea.
我认为某些动态语言在运行时通过哈希查找缓存所有等效的字符串,并且仅存储指针。因此,在字符串保持不变的循环的每次迭代中,只会有一个指针分配或最多一个字符串哈希函数。我知道一些语言(Smalltalk,我想?)不仅可以使用字符串,还可以使用小数字来实现此目的。请参阅享元模式。
IANAE 关于这一点。如果这没有帮助,您应该提供循环代码并引导我们了解它是如何解释的。
I think some dynamic languages cache all equivalent strings at runtime with a hash lookup and only store pointers. In each iteration of the loop where the string is staying the same, therefore, there would be just a pointer assigment or at most a string hashing function. I know some languages (Smalltalk, I think?) do this with not only strings but small numbers. See Flyweight Pattern.
IANAE on this one. If that doesn't help, you should give the loop code and walk us through how it's being interpreted.
在 Python 和 Ruby 中,整数都是对象。所以这不是“值”是整数还是字符串的问题,它可以是任何东西。此外,这两种语言中的所有内容都会被垃圾收集。不需要复制对象,指针可以在内部使用,只要它们安全地存储在垃圾收集器能够看到它们的地方即可。
因此,解决您的问题的一种方法是:
然后使用虚拟调用和
dynamic_cast
来打开或检查类型,而不是与 myInterpreterType 的值进行比较。此时通常要做的事情是担心虚拟函数调用和动态转换可能会很慢。 Ruby 和 Python 到处都使用虚函数调用。尽管不是 C++ 虚拟调用:对于这两种语言,它们的“标准”实现都是用 C 语言实现的,具有自定义的多态性机制。但原则上没有理由假设“虚拟”意味着“性能超出预期”。
也就是说,我希望它们可能都对整数的某些用途进行了一些巧妙的优化,包括作为循环计数器。但是,如果您当前发现大部分时间都花在复制空字符串上,那么相比之下,虚拟函数调用几乎是瞬时的。
真正担心的是您将如何进行资源管理 - 取决于您对解释语言的计划,垃圾收集可能比您想要的更麻烦。
In both Python and Ruby, integers are objects. So it's not a question of a "value" being either an integer or a string, it can be anything at all. Furthermore, everything in both of those languages is garbage collected. There's no need for copying of objects, pointers can be used internally so long as they are safely stored somewhere the garbage collector will see them.
So, one solution to your problem would be:
Then use virtual calls and
dynamic_cast
to switch on or check types, instead of comparing against values of myInterpreterType.The usual thing to do at this point is worry that virtual function calls and dynamic cast might be slow. Both Ruby and Python use virtual function calls all over the place. Albeit not C++ virtual calls: for both languages their "standard" implementation is in C with custom mechanisms for polymorphism. But there's no reason in principle to assume that "virtual" means "performance out the window".
That said, I expect they probably both have some clever optimisations for certain uses of integers, including as loop counters. But if you're currently seeing that most of your time is spent copying empty strings, then virtual function calls by comparison are near-instantaneous.
The real worry is how you're going to do resource-management - depending what your plans are for your interpreted language, garbage collection might be more trouble than you want to go to.
解决这个问题的最简单方法是使其成为指向字符串的指针,并且仅在创建字符串值时才分配它。您还可以使用 union 来节省内存。
The easiest way to solve that would be to make it a pointer to string, and only allocate it when you create the string value. You can also use union to save on memory.