独特的合成名称
我想在 C++ 中生成具有唯一确定性名称的各种数据类型。例如:
struct struct_int_double { int mem0; double mem1; };
目前我的编译器使用计数器合成名称,这意味着在不同的翻译单元中编译相同的数据类型时名称不一致。
以下是行不通的方法:
使用 ABI mangled_name 函数。因为它已经依赖于具有唯一名称的结构。通过假装结构是匿名的,可以在符合 C++11 的 ABI 中工作吗?
模板,例如 struct2,因为模板不适用于递归类型。
彻底的破坏。因为它给出的名称太长(数百个字符!)
除了全局注册表(YUK!)之外,我唯一能想到的就是首先创建一个唯一的长损坏名称,然后使用摘要或哈希函数来缩短它(并希望没有冲突)。
实际问题:生成可以在匿名类型(例如元组、求和类型、函数类型)的情况下调用的库。
还有其他想法吗?
编辑:递归类型问题的附加描述。考虑定义一个这样的链表:
template<class T>
typedef pair<list<T>*, T> list;
这实际上是所需要的。它不起作用有两个原因:首先,你不能模板化 typedef。 [不,您不能使用其中包含 typedef 的模板类,它不起作用] 其次,您不能将 list* 作为参数传递,因为它尚未定义。在没有多态性的 C 中,你可以做到这一点:
struct list_int { struct list_int *next; int value; };
有几种解决方法。对于这个特定问题,您可以使用 Barton-Nackman 技巧的变体,但它不能概括。
有一个通用的解决方法,首先由 Gabrielle des Rois 向我展示,使用具有开放递归的模板,然后使用部分专业化来关闭它。但这是非常难以生成的,即使我能弄清楚如何做到这一点,也可能无法读取。
正确处理变体还存在另一个问题,但这并不直接相关(这只是更糟糕,因为对声明与可构造类型的联合的愚蠢限制)。
因此,我的编译器只是使用普通的 C 类型。无论如何,它必须处理多态性:编写它的原因之一是为了绕过包括模板在内的 C++ 类型系统的问题。这就会导致命名问题。
I would like to generate various data types in C++ with unique deterministic names. For example:
struct struct_int_double { int mem0; double mem1; };
At present my compiler synthesises names using a counter, which means the names don't agree when compiling the same data type in distinct translation units.
Here's what won't work:
Using the ABI mangled_name function. Because it depends already on structs having unique names. Might work in C++11 compliant ABI by pretending struct is anonymous?
Templates eg struct2 because templates don't work with recursive types.
A complete mangling. Because it gives names which are way too long (hundreds of characters!)
Apart from a global registry (YUK!) the only thing I can think of is to first create a unique long mangled name, and then use a digest or hash function to shorten it (and hope there are no clashes).
Actual problem: to generate libraries which can be called where the types are anonymous, eg tuples, sum types, function types.
Any other ideas?
EDIT: Addition description of recursive type problem. Consider defining a linked list like this:
template<class T>
typedef pair<list<T>*, T> list;
This is actually what is required. It doesn't work for two reasons: first, you can't template a typedef. [NO, you can NOT use a template class with a typedef in it, it doesn't work] Second, you can't pass in list* as an argument because it isn't defined yet. In C without polymorphism you can do it:
struct list_int { struct list_int *next; int value; };
There are several work arounds. For this particular problem you can use a variant of the Barton-Nackman trick, but it doesn't generalise.
There is a general workaround, first shown me by Gabrielle des Rois, using a template with open recursion, and then a partial specialisation to close it. But this is extremely difficult to generate and would probably be unreadable even if I could figure out how to do it.
There's another problem doing variants properly too, but that's not directly related (it's just worse because of the stupid restriction against declaring unions with constructable types).
Therefore, my compiler simply uses ordinary C types. It has to handle polymorphism anyhow: one of the reasons for writing it was to bypass the problems of C++ type system including templates. This then leads to the naming problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你真的需要名字一致吗?只需在不同的翻译单元中使用不同的名称分别定义结构体,并在必要时使用
reinterpret_cast
来保持 C++ 编译器的满意。当然,这对于手写代码来说是可怕的,但这是由编译器生成的代码,因此您可以(并且我假设这样做)在生成 C++ 代码之前执行必要的静态类型检查。如果我错过了一些东西,并且您确实需要类型名称一致,那么我认为您已经回答了自己的问题:除非编译器可以在多个翻译单元的翻译之间共享信息(通过某些全局注册表),否则我可以'除了明显的名称修改之外,没有看到任何从类型的结构形式生成唯一的、确定性名称的方法。
至于名字的长度,我不确定为什么它很重要?如果您正在考虑使用哈希函数来缩短名称,那么显然您不需要它们是人类可读的,那么为什么它们需要简短呢?
就我个人而言,我可能会生成半人类可读的名称,其风格与现有的名称修饰方案类似,而不用担心哈希函数。因此,您可以生成
sid
(struct, int, double) 或si32f64
(struct,32 位整数,64-位浮动)或其他什么。像这样的名称的优点是它们仍然可以直接解析(这似乎对于调试来说非常重要)。编辑
更多想法:
编辑2:抱歉,脑残——sha-1摘要是160位,而不是128位
。PS。不知道为什么这个问题被否决了——对我来说这似乎是合理的,尽管有关您正在开发的这个编译器的更多上下文可能会有所帮助。
Do you actually need the names to agree? Just define the structs separately, with different names, in the different translation units and
reinterpret_cast<>
where necessary to keep the C++ compiler happy. Of course that would be horrific in hand-written code, but this is code generated by your compiler, so you can (and I assume do) perform the necessary static type checks before the C++ code is generated.If I've missed something and you really do need the type names to agree, then I think you already answered your own question: Unless the compiler can share information between the translation of multiple translation units (through some global registry), I can't see any way of generating unique, deterministic names from the type's structural form except the obvious one of name-mangling.
As for the length of names, I'm not sure why it matters? If you're considering using a hash function to shorten the names then clearly you don't need them to be human-readable, so why do they need to be short?
Personally I'd probably generate semi-human-readable names, in a similar style to existing name-mangling schemes, and not bother with the hash function. So, instead of generating
struct_int_double
you might generatesid
(struct, int, double) orsi32f64
(struct, 32-bit integer, 64-bit float) or whatever. Names like that have the advantage that they can still be parsed directly (which seems like it would be pretty much essential for debugging).Edit
Some more thoughts:
Edit 2: Sorry, brain failure -- sha-1 digests are 160 bits, not 128.
PS. Not sure why this question was down-voted -- it seems reasonable to me, although some more context about this compiler you're working on might help.
我不太明白你的问题。
I don't really understand your problem.