我正在尝试使用哈希表构建符号表。总体思路是
int alpha;
2 int beta;
3 alpha = 0; // the alpha declared in line 1
4 beta = 0; // the beta declared in line 2
5 gamma = 0; // Error! gamma hasn't been declared.
6 {
7 int beta; // This beta shadows the one declared in line 2.
8 int gamma;
9 alpha = 0; // the alpha declared in line 1
10 beta = 0; // the beta declared in line 7
11 gamma = 0; // the gamma declared in line 8
12 }
等等。您只能使用向量、列表、堆栈和队列库,并尝试使其尽可能快。
我的想法是在每个作用域上,我在列表中声明哈希表并将所有信息保存到该表中,每当我有新的作用域时,我都会将新的哈希表推送到列表中。
但是,当程序正在寻找超出范围的很远的项目时,这种方法似乎非常慢,因为您必须查找每个范围才能找到该项目。
你们有什么想法来实现比这更快的范围吗?它应该更快,因为我的实现比“他们提供的慢速版本”慢
非常感谢!
I'm trying to build a symbol table using hash table. The general idea is
int alpha;
2 int beta;
3 alpha = 0; // the alpha declared in line 1
4 beta = 0; // the beta declared in line 2
5 gamma = 0; // Error! gamma hasn't been declared.
6 {
7 int beta; // This beta shadows the one declared in line 2.
8 int gamma;
9 alpha = 0; // the alpha declared in line 1
10 beta = 0; // the beta declared in line 7
11 gamma = 0; // the gamma declared in line 8
12 }
and so on. You only can use vector, list, stack, and queue library and try to make it as fast as it can.
My idea was on each scope, I declare hashtable in the list and save all the information to that table, and whenever I have new scope, I push_back new hashtable into the list.
But it seems like this approach is very slow when the program is looking for on item that is out of scope far far away since you have to look for each scope to find the item.
Do you guys have any idea to implement this scope that is way faster than this? It should be way faster cause my implementation is slower than the "slow version they provided"
Thanks alot!
发布评论
评论(3)
所有内容都只有一个哈希表,以及一堆上下文对象。每当看到“{”时,就将新对象推入堆栈。当您隐藏变量时,请记住上下文对象中的旧符号,然后在哈希表中覆盖它。当您看到“}”并弹出上下文时,恢复您在那里记住的符号。
Have just one hash table for everything, and a stack of context objects. Push a new object on the stack whenever you see a '{'. When you shadow a variable, remember the old symbol in your context object and just overwrite it in the hash table. When you see a '}' and pop the context, restore the symbols you have remembered there.
当我这样做时,我将哈希表用于符号,而不是实体。
对于每个符号,我都有一个实体列表。每个实体分为两个
列表,第一个来自符号,第二个来自范围。名单
来自符号的管理就像堆栈一样:当我访问符号时,
范围内的实体始终是列表中的第一个。当我离开瞄准镜时,
我遍历了它的实体列表,并将它们从列表中删除了
符号。
这是很久以前的事了,在STL之前(甚至在C++之前);我
手工实现一切,并使用侵入式链接
列表,其算法设计得可以删除一个元素
从列表中提取,而不知道列表本身在哪里。 (这是
手写双链表相对容易。)今天,
STL(以及访问局部性的重要性),我可能会简单地
在每个实体中放置一个指向列表头部的指针。使用STL,并且
考虑到局部性,我可能会使用
std::vector
作为列表对于每个实体(符号表因此是
std::string
到std::vector
)。一旦找到条目,符号表查找就很简单在向量上调用
back()
(在检查它不为空之后,如果当向量变空时,您不会清除条目)。当你
创建新实体,在新范围内,您对向量调用
push_back
,并将向量的地址保存在范围的条目中(a
std::vector*> 的
std::stack
);离开时范围内,您迭代此堆栈顶部的向量,调用
pop_back
在其所有条目上,然后将其从堆栈中弹出。 (和显然,当进入一个新的范围时,您将一个新的空向量
push
到范围堆栈。)
维护双列表意味着对于所有操作,您知道
确切地访问哪个元素,而无需迭代
任何事物;永远无法访问某个实体来查看它是否是那个实体
你正在寻找。在我那个时代,这相当简单,但是很多
代码,需要相当小心,我的实现可能
今天不会那么快,因为位置不好。今天,随着
STL,您需要的代码要少得多,并且通过使用
std::vector
来处理所有列表,你会得到更好的局部性;你还是要小心,
这样您就不会最终保存将失效的迭代器。
(但按照上述方式实现,您不需要;所有访问
总是到向量的最后一个元素,因此保存地址
向量本身就足够了。当然,前提是你的哈希值
实现不会移动向量。)
When I did this, I used the hash table for symbols, not for entities.
For each symbol, I had a list of entities. Each entity was in two
lists, one from the symbol, and the second from the scope. The list
from the symbol was managed like a stack: when I accessed a symbol, the
entity in scope was always the first in the list. When I left a scope,
I walked its list of entities, and removed them from the list of
symbols.
This was some time ago, before the STL (and even before C++); I
implemented everything by hand, and used invasive chaining for the
lists, with the algorithm so designed that I could remove an element
from a list without knowing where the list itself was. (This is
relatively easy with hand written double linked lists.) Today, with the
STL (and with the importance of locality of access), I'd probably simply
put a pointer to the head of the list in each entity. With the STL, and
considerations of locality, I'd probably use
std::vector
for the listfor each entity (the symbol table thus being a map of
std::string
tostd::vector
). Symbol table lookup, once the entry is found, is simplycalling
back()
on the vector (after checking that it isn't empty, ifyou don't purge the entry when the vector becomes empty). When you
create new entity, in a new scope, you call
push_back
on the vector,and save the address of the vector in an entry for the scope (a
std::stack
ofstd::vector<std::vector<Entity>*>
); when leavingscope, you iterate over the vector at the top of this stack, calling
pop_back
on all of its entries, then pop it off the stack. (Andobviously, when entering a new scope, you
push
a new empty vector ontothe scope stack.)
Maintaining the double list means that for all operations, you know
exactly which element to access, without having to iterate over
anything; there's never any access to an entity to see if it's the one
you're looking for. In my day, it was fairly straightforward, but a lot
of code, requiring considerable care, and my implemention probably
wouldn't be that fast today, because of poor locality. Today, with the
STL, you need a lot less code, and by using
std::vector
for all of thelists, you get slightly better locality; you still have to be careful,
so that you don't end up saving iterators which will be invalidated.
(But implemented as described above, you shouldn't need to; all accesses
are always to the last element of the vector, so saving the address of
the vector itself is sufficient. Provided, of course, that your hash
implementation doesn't move the vectors.)
这是个好主意。在大多数语言中,作用域很少嵌套得很深,平均可能有 3 或 4 个嵌套作用域。所以我不会担心那些“很远很远”的事情,因为这些都是病态的情况。
This is an OK idea. In most languages, scopes are rarely nested very deeply, perhaps an average of 3 or 4 nested scopes. So I wouldn't worry about things that are "far, far away", as these will be pathological cases.