为什么我不能将字符串键存储在关联数组中?
我是 D 编程语言的新手,刚刚开始阅读《D 编程语言》这本书。
我在尝试一个关联数组示例代码 DMD 时遇到错误,
#!/usr/bin/rdmd
import std.stdio, std.string;
void main() {
uint[string] dict;
foreach (line; stdin.byLine()) {
foreach (word; splitter(strip(line))) {
if (word in dict) continue;
auto newId = dict.length;
dict[word] = newId;
writeln(newId, '\t', word);
}
}
}
显示此错误消息:
./vocab.d(11): 错误:关联数组只能用不可变键赋值,而不是 char[]
我正在使用 DMD 编译 2.051
我猜测自 TDPL 书以来关联数组的规则已经改变。
我应该如何使用带有字符串键的关联数组?
谢谢。
更新:
我在本书的后面部分找到了解决方案。
在放入数组之前,使用 string.idup 制作一个重复的不可变值。
所以
dict[word.idup] = newId;
会完成这项工作。
但这有效率吗?
I'm new to D programming language, just started reading The D Programming Language book.
I run into error when trying one associative array example code
#!/usr/bin/rdmd
import std.stdio, std.string;
void main() {
uint[string] dict;
foreach (line; stdin.byLine()) {
foreach (word; splitter(strip(line))) {
if (word in dict) continue;
auto newId = dict.length;
dict[word] = newId;
writeln(newId, '\t', word);
}
}
}
DMD shows this Error message:
./vocab.d(11): Error: associative arrays can only be assigned values with immutable keys, not char[]
I'm using DMD compile 2.051
I was guessing the rules for associative arrays has changed since the TDPL book.
How should I use Associative arrays with string keys?
Thanks.
Update:
I found the solution in later parts of the book.
use string.idup to make a duplicate immutable value before putting into the array.
so
dict[word.idup] = newId;
would do the job.
But is that efficient ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
关联数组要求它们的键不可变。当你想到这样一个事实时,这是有道理的:如果它不是不可变的,那么它可能会改变,这意味着它的哈希值改变了,这意味着当你再次取出该值时,计算机将找不到它。如果你要替换它,你最终会在关联数组中添加另一个值(因此,你将拥有一个具有正确散列的值和一个具有不正确散列的值)。然而,如果密钥是不可变的,它就不能改变,所以就不存在这个问题。
在 dmd 2.051 之前,该示例可以运行(这是一个 bug)。不过现在它已被修复,因此 TDPL 中的示例不再正确。然而,与其说关联数组的规则发生了变化,不如说其中存在一个未被捕获的错误。这个例子在不应该编译的时候编译了,Andrei 错过了。它列在 TDPL 官方勘误表中,应该在未来的印刷中修复。
更正后的代码应使用
dictionary[word.idup]
或dictionary[to!string(word)]
。word.idup
创建一个不可变的word
副本。另一方面,to!string(word)
以最合适的方式将word
转换为string
。在本例中,由于word
是char[]
,因此需要使用idup
。但是,如果word
已经是string
,那么它只会返回传入的值,而不会不必要地复制它。因此,在一般情况下,to!string(word)
是更好的选择(特别是在模板化函数中),但在这种情况下,两者都可以正常工作(to!()
) code> 位于std.conv
中)。从技术上讲,将
char[]
转换为string
是可行的,但这通常是一个坏主意。如果您知道char[]
永远不会改变,那么您可以摆脱它,但在一般情况下,您会面临问题的风险,因为编译器会然后假设生成的字符串永远不会改变,并且它可能生成不正确的代码。甚至可能出现段错误。因此,除非分析表明您确实需要避免复制的额外效率,否则不要这样做,否则您无法通过首先使用string
之类的操作来避免复制(因此不需要转换),并且您知道字符串
永远不会改变。一般来说,我不会太担心复制字符串的效率。一般来说,您应该使用
string
而不是char[]
,这样您就可以复制它们(即复制它们的引用(例如str1 = str2;< /code>),而不是像
dup
和idup
那样复制它们的整个内容),而不用担心效率特别低。该示例的问题在于stdin.byLine()
返回一个char[]
而不是string
(大概是为了避免复制数据,如果没有必要)。因此,splitter()
返回一个char[]
,因此word
是一个char[]
而不是字符串
。现在,您可以执行splitter(strip(line.idup))
或splitter(strip(line).idup)
而不是idup
钥匙。这样,splitter()
将返回string
而不是char[]
,但这可能本质上与idup< 一样高效。 /code>ing
单词
。无论如何,由于文本最初来自何处,它是一个char[]
而不是string
,这迫使您在某个地方idup
它如果您打算将其用作关联数组中的键,请沿着该行。然而,在一般情况下,最好只使用string
而不是char[]
。那么您就不需要idup
任何东西。编辑:
实际上,即使您发现从
char[]
转换为string
看起来既安全又必要的情况,请考虑使用std.exception.assumeUnique()(文档)。它本质上是当您需要并且知道可以时将可变数组转换为不可变数组的首选方法。这通常是在您构建了一个无法使其不可变的数组的情况下完成的,因为您必须分块进行,但该数组没有其他引用,并且您不想创建它的深层副本。但在像您所询问的示例这样的情况下,它没有用,因为您确实需要复制数组。
Associative arrays require that their keys be immutable. It makes sense when you think about the fact that if it's not immutable, then it might change, which means that its hash changes, which means that when you go to get the value out again, the computer won't find it. And if you go to replace it, you'll end up with another value added to the associative array (so, you'll have one with the correct hash and one with an incorrect hash). However, if the key is immutable, it cannot change, and so there is no such problem.
Prior to dmd 2.051, the example worked (which was a bug). It has now been fixed though, so the example in TDPL is no longer correct. However, it's not so much the case that the rules for associative arrays have changed as that there was a bug in them which was not caught. The example compiled when it shouldn't have, and Andrei missed it. It's listed in the official errata for TDPL and should be fixed in future printings.
The corrected code should use either
dictionary[word.idup]
ordictionary[to!string(word)]
.word.idup
creates a duplicate ofword
which is immutable.to!string(word)
, on the other hand convertsword
to astring
in the most appropriate manner. Asword
is achar[]
in this case, that would be to useidup
. However, ifword
were already astring
, then it would simply return the value which was passed in and not needlessly copy it. So, in the general case,to!string(word)
is the better choice (particularly in templated functions), but in this case, either works just fine (to!()
is instd.conv
).It is technically possible to cast a
char[]
to astring
, but it's generally a bad idea. If you know that thechar[]
will never change, then you can get away with it, but in the general case, you're risking problems, since the compiler will then assume that the resultingstring
can never change, and it could generate code which is incorrect. It may even segfault. So, don't do it unless profiling shows that you really need the extra efficiency of avoiding the copy, you can't otherwise avoid the copy by doing something like just using astring
in the first place (so no conversion would be necessary), and you know that thestring
will never be changed.In general, I wouldn't worry too much of the efficiency of copying strings. Generally, you should be using
string
instead ofchar[]
, so you can copy them around (that is copy their reference around (e.g.str1 = str2;
) rather than copying their entire contents likedup
andidup
do) without worrying about it being particularly inefficient. The problem with the example is thatstdin.byLine()
returns achar[]
rather than astring
(presumably to avoid copying the data if its not necessary). So,splitter()
returns achar[]
, and soword
is achar[]
instead of astring
. Now, you could dosplitter(strip(line.idup))
orsplitter(strip(line).idup)
instead ofidup
ing the key. That way,splitter()
would return astring
rather thanchar[]
, but that's probably essentially just as efficient asidup
ingword
. Regardless, because of where the text is coming from originally, it's achar[]
instead of astring
, which forces you toidup
it somewhere along the line if you intend to use it as a key in an associative array. In the general case, however, it's better to just usestring
and notchar[]
. Then you don't need toidup
anything.EDIT:
Actually, even if you find a situation where casting from
char[]
tostring
seems both safe and necessary, consider usingstd.exception.assumeUnique()
(documentation). It's essentially the preferred way of converting a mutable array to an immutable one when you need to and know that you can. It would typically be done in cases where you've constructed an array which you couldn't make immutable because you had to do it in pieces but which has no other references, and you don't want to create a deep copy of it. It wouldn't be useful in situations like the example that you're asking about though, since you really do need to copy the array.不,它效率不高,因为它显然重复了字符串。如果您可以保证您创建的字符串永远不会在内存中被修改,请随意显式使用强制转换
cast(immutable)str
它,而不是复制它。(尽管如此,我注意到垃圾收集器运行良好,所以我建议您不要实际尝试,除非您看到瓶颈,因为您可能决定稍后更改字符串。只需在代码中添加注释即可帮助您稍后找到瓶颈(如果存在)。)
No, it's not efficient, since it obviously duplicates the string. If you can guarantee that the string you create will never be modified in memory, feel free to explicitly use a cast
cast(immutable)str
on it, instead of duplicating it.(Although, I've noticed that the garbage collector works well, so I suggest you don't actually try that unless you see a bottleneck, since you might decide to change the string later. Just place a comment in your code to help you find the bottleneck later, if it exists.)