C++ STL Map 与 Vector 速度

发布于 2024-08-27 18:52:45 字数 737 浏览 5 评论 0 原文

在我的实验编程语言的解释器中,我有一个符号表。每个符号由一个名称和一个值组成(该值可以是例如:字符串、int、函数等类型)。

首先,我用向量表示表格,并迭代符号检查给定的符号名称是否合适。

然后我虽然使用地图,在我的例子中 map,会比一直迭代向量更好但是

这有点难以解释这部分但我会尝试。

如果第一次在我的语言的程序中检索变量,当然必须找到它在符号表中的位置(现在使用向量)。如果我每次执行该行时都会迭代该向量(想象一下循环),那么它会非常慢(就像目前的情况一样,几乎与微软的批处理一样慢)。

因此,我可以使用映射来检索变量: SymbolTable[ myVar.Name ]

但请考虑以下情况:如果第一次找到仍使用向量的变量,我可以存储其精确整数与它在向量中的位置。这意味着:下次需要它时,我的解释器知道它已被“缓存”,并且不会在符号表中搜索它,而是执行类似 SymbolTable.at( myVar.CachedPosition ) 的操作。

现在我的(相当困难?)问题:

  • 我应该为符号表使用向量并缓存向量中变量的位置吗?

  • 我应该使用地图吗?为什么? [] 运算符有多快?

  • 我应该使用完全不同的东西吗?

In the interpreter for my experimental programming language I have a symbol table. Each symbol consists of a name and a value (the value can be e.g.: of type string, int, function, etc.).

At first I represented the table with a vector and iterated through the symbols checking if the given symbol name fitted.

Then I though using a map, in my case map<string,symbol>, would be better than iterating through the vector all the time but:

It's a bit hard to explain this part but I'll try.

If a variable is retrieved the first time in a program in my language, of course its position in the symbol table has to be found (using vector now). If I would iterate through the vector every time the line gets executed (think of a loop), it would be terribly slow (as it currently is, nearly as slow as microsoft's batch).

So I could use a map to retrieve the variable: SymbolTable[ myVar.Name ]

But think of the following: If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it. That means: The next time it is needed, my interpreter knows that it has been "cached" and doesn't search the symbol table for it but does something like SymbolTable.at( myVar.CachedPosition ).

Now my (rather hard?) question:

  • Should I use a vector for the symbol table together with caching the position of the variable in the vector?

  • Should I rather use a map? Why? How fast is the [] operator?

  • Should I use something completely different?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

|煩躁 2024-09-03 18:52:45

映射对于符号表来说是个好东西。但地图的 operator[] 则不然。一般来说,除非您正在编写一些简单的代码,否则您应该使用映射的成员函数 insert()find() 而不是 operator[]operator[] 的语义有些复杂,如果您要查找的符号不在地图中,几乎肯定不会执行您想要的操作。

至于 mapunordered_map 之间的选择,在实现简单的解释性语言时,性能差异不太可能很大。如果您使用映射,则可以保证所有当前标准 C++ 实现都支持它。

A map is a good thing to use for a symbol table. but operator[] for maps is not. In general, unless you are writing some trivial code, you should use the map's member functions insert() and find() instead of operator[]. The semantics of operator[] are somewhat complicated, and almost certainly don't do what you want if the symbol you are looking for is not in the map.

As for the choice between map and unordered_map, the difference in performance is highly unlikely to be significant when implementing a simple interpretive language. If you use map, you are guaranteed it will be supported by all current Standard C++ implementations.

相权↑美人 2024-09-03 18:52:45

您实际上有多种选择。

库存在

批评

  • 地图查找和检索需要 O(log N),但项目可能分散在整个内存中,因此不能很好地配合缓存策略。
  • Vector 更适合缓存,但是除非您对它进行排序,否则您将在 find 上获得 O(N) 性能,这是可以接受的吗?
  • 为什么不使用 unordered_map ?它们提供 O(1) 查找和检索(尽管常数可能很高)并且肯定适合此任务。如果您查看维基百科关于哈希表的文章,您会发现有很多可用的策略您当然可以选择一款适合您特定使用模式的产品。

You effectively have a number of alternatives.

Libraries exist:

Critics

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N) performance on find, is it acceptable ?
  • Why not using a unordered_map ? They provide O(1) lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.
小霸王臭丫头 2024-09-03 18:52:45

通常,您会使用符号表来查找源中出现的给定名称的变量。在这种情况下,您只有名称可以使用,因此符号表中没有地方可以存储变量的缓存位置。所以我想说地图是一个不错的选择。 [] 运算符所需的时间与映射中元素数量的对数成正比 - 如果结果很慢,您可以使用像 std::tr1:: 这样的哈希映射unordered_map

Normally you'd use a symbol table to look up the variable given its name as it appears in the source. In this case, you only have the name to work with, so there's nowhere to store the cached position of the variable in the symbol table. So I'd say a map is a good choice. The [] operator takes time proportional to the log of the number of elements in the map - if it turns out to be slow, you could use a hash map like std::tr1::unordered_map.

疯狂的代价 2024-09-03 18:52:45

std::map 的 operator[] 需要 O(log(n)) 时间。这意味着它非常有效,但您仍然应该避免一遍又一遍地进行查找。也许您可以存储对值的引用或对容器的迭代器,而不是存储索引?这避免了完全进行查找。

std::map's operator[] takes O(log(n)) time. This means that it is quite efficient, but you still should avoid doing the lookups over and over again. Instead of storing an index, perhaps you can store a reference to the value, or an iterator to the container? This avoids having to do lookup entirely.

孤千羽 2024-09-03 18:52:45

当大多数解释器解释代码时,他们首先将其编译成中间语言。这些中间语言通常通过索引或指针而不是名称来引用变量。

例如,Python(C 实现)将局部变量通过索引更改为引用,但全局变量和类变量通过使用哈希表的名称进行引用。

我建议查看有关编译器的介绍性文本。

When most interpreters interpret code, they compile it into an intermediate language first. These intermediate languages often refer to variables by index or by pointer, instead of by name.

For example, Python (the C implementation) changes local variables into references by index, but global variables and class variables get referenced by name using a hash table.

I suggest looking at an introductory text on compilers.

聊慰 2024-09-03 18:52:45

std::map (O(log(n))) 或哈希表(“摊销”O(1))将是首选 - 如果您确定它是瓶颈,请使用自定义机制。一般来说,使用哈希或对输入进行标记是第一个优化。

在分析它之前,最重要的是隔离查找,以便您可以轻松地替换和分析它。


对于少量元素来说,std::map 可能会慢一点(但是,这并不重要)。

a std::map (O(log(n))) or a hashtable ("amortized" O(1)) would be the first choice - use custom mechanisms if you determin it's a bottleneck. Generally, using a hash or tokenizing the input is the first optimization.

Before you have profiled it, it's most important that you isolate lookup, so you can easily replace and profile it.


std::map is likely a tad slower for a small number of elements (but then, it doesn't really matter).

岁月蹉跎了容颜 2024-09-03 18:52:45

Map 的复杂度为 O(log N),因此不如数组中的位置查找快。但确切的结果将取决于很多因素,因此最好的方法是以允许您稍后在实现之间进行交换的方式与容器进行交互。也就是说,编写一个可以由任何合适的容器有效实现的“查找”函数,以允许您自己切换和比较不同实现的速度。

Map is O(log N), so not as fast as positional lookup in an array. But the exact results will depend on a lot of factors, and so the best approach is to interface with the container in a way that allows you to swap between implementation later on. That is, write a "lookup" function that can be efficiently implemented by any suitable container, to allow yourself to switch and compare speeds of different implementation.

方圜几里 2024-09-03 18:52:45

Map 的运算符 [] 是 O(log(n)),参见维基百科: http ://en.wikipedia.org/wiki/Map_(C%2B%2B)

我认为当您经常寻找符号时,使用地图当然是正确的。也许哈希映射(std::unordered_map)可以使您性能更好。

Map's operator [] is O(log(n)), see wikipedia : http://en.wikipedia.org/wiki/Map_(C%2B%2B)

I think as you're looking often for symbols, using a map is certainly right. Maybe a hash map (std::unordered_map) could make your performance better.

ゝ偶尔ゞ 2024-09-03 18:52:45

如果您要使用向量并且不厌其烦地缓存最新的符号查找结果,那么您可以执行相同的操作(缓存最新的查找结果),如果您的符号表作为 map 实现(但在使用 map 的情况下,缓存可能不会有太多好处)。使用map,您将获得额外的优势,即任何非缓存符号查找都会比在向量中搜索性能更高(假设向量 未排序 - 如果您必须多次进行排序,则保持向量排序可能会很昂贵)。

采纳尼尔的建议map 通常是符号表的良好数据结构,但您需要确保正确使用它(并且不要意外添加符号)。

If you're going to use a vector and go to the trouble of caching the most recent symbol look up result, you could do the same (cache the most recent look-up result) if your symbol table were implemented as a map (but there probably wouldn't be a whole lot of benefit to the cache in the case of using a map). With a map you'd have the additional advantage that any non-cached symbol look ups would be much more performant than searching in a vector (assuming that the vector isn't sorted - and keeping a vector sorted can be expensive if you have to do the sort more than once).

Take Neil's advice; map is generally a good data structure for a symbol table, but you need to make sure you're using it correctly (and not adding symbols accidentally).

舟遥客 2024-09-03 18:52:45

你说:“如果第一次找到仍然使用向量的变量,我可以用它在向量中存储它的精确整数位置。”。

您可以对映射执行相同的操作:使用 find 搜索变量并存储指向它而不是位置的iterator

You say: "If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it.".

You can do the same with the map: search the variable using find and store the iterator pointing to it instead of the position.

戈亓 2024-09-03 18:52:45

对于通过字符串键查找值,映射数据类型是合适的类型,正如其他用户所提到的。

STL 映射实现通常使用自平衡树来实现,例如红黑树数据结构,其操作需要O(logn)时间。

我的建议是将表操作代码包装在函数中,
例如 table_has(name)table_put(name)table_get(name)

这样,如果您有经验,您可以轻松更改内部符号表表示
运行时性能较慢,而且您可以稍后在这些例程中嵌入缓存功能。

For looking up values, by a string key, map data type is the appropriate one, as mentioned by other users.

STL map implementations usually are implemented with self-balancing trees, like the red black tree data structure, and their operations take O(logn) time.

My advice is to wrap the table manipulation code in functions,
like table_has(name), table_put(name) and table_get(name).

That way you can change the inner symbol table representation easily if you experience
slow run time performance, plus you can embed in those routines cache functionality later.

双马尾 2024-09-03 18:52:45

地图的缩放比例会更好,这将是一个重要的功能。但是,不要忘记,在使用地图时,您可以(与向量不同)获取指针和引用。在这种情况下,您可以轻松地使用映射“缓存”变量,就像矢量一样有效。在这里,地图几乎肯定是正确的选择。

A map will scale much better, which will be an important feature. However, don't forget that when using a map, you can (unlike a vector) take pointers and references. In this case, you could easily "cache" variables with a map just as validly as a vector. A map is almost certainly the right choice here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文