为什么我不能将字符串键存储在关联数组中？

发布于 2024-10-10 17:48:47 字数 816 浏览 7 评论 0原文

我是 D 编程语言的新手，刚刚开始阅读《D 编程语言》这本书。

我在尝试一个关联数组示例代码 DMD 时遇到错误，

#!/usr/bin/rdmd
import std.stdio, std.string;

void main() {
    uint[string] dict;
    foreach (line; stdin.byLine()) {
        foreach (word; splitter(strip(line))) {
            if (word in dict) continue;
            auto newId = dict.length;
            dict[word] = newId;
            writeln(newId, '\t', word);
        }   
    }   
}

显示此错误消息：

./vocab.d(11): 错误：关联数组只能用不可变键赋值，而不是 char[]

我正在使用 DMD 编译 2.051

我猜测自 TDPL 书以来关联数组的规则已经改变。

我应该如何使用带有字符串键的关联数组？

谢谢。

更新：

我在本书的后面部分找到了解决方案。

在放入数组之前，使用 string.idup 制作一个重复的不可变值。

所以

dict[word.idup] = newId;

会完成这项工作。

但这有效率吗？

原文

I'm new to D programming language, just started reading The D Programming Language book.

I run into error when trying one associative array example code

#!/usr/bin/rdmd
import std.stdio, std.string;

void main() {
    uint[string] dict;
    foreach (line; stdin.byLine()) {
        foreach (word; splitter(strip(line))) {
            if (word in dict) continue;
            auto newId = dict.length;
            dict[word] = newId;
            writeln(newId, '\t', word);
        }   
    }   
}

DMD shows this Error message:

./vocab.d(11): Error: associative arrays can only be assigned values with immutable keys, not char[]

I'm using DMD compile 2.051

I was guessing the rules for associative arrays has changed since the TDPL book.

How should I use Associative arrays with string keys?

Thanks.

Update:

I found the solution in later parts of the book.

use string.idup to make a duplicate immutable value before putting into the array.

dict[word.idup] = newId;

would do the job.

But is that efficient ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绅士风度i 2024-10-17 17:48:48

关联数组要求它们的键不可变。当你想到这样一个事实时，这是有道理的：如果它不是不可变的，那么它可能会改变，这意味着它的哈希值改变了，这意味着当你再次取出该值时，计算机将找不到它。如果你要替换它，你最终会在关联数组中添加另一个值（因此，你将拥有一个具有正确散列的值和一个具有不正确散列的值）。然而，如果密钥是不可变的，它就不能改变，所以就不存在这个问题。

在 dmd 2.051 之前，该示例可以运行（这是一个 bug）。不过现在它已被修复，因此 TDPL 中的示例不再正确。然而，与其说关联数组的规则发生了变化，不如说其中存在一个未被捕获的错误。这个例子在不应该编译的时候编译了，Andrei 错过了。它列在 TDPL 官方勘误表中，应该在未来的印刷中修复。

更正后的代码应使用 dictionary[word.idup] 或 dictionary[to!string(word)]。 word.idup 创建一个不可变的 word 副本。另一方面，to!string(word) 以最合适的方式将word 转换为string。在本例中，由于 word 是 char[]，因此需要使用 idup。但是，如果 word 已经是 string，那么它只会返回传入的值，而不会不必要地复制它。因此，在一般情况下，to!string(word) 是更好的选择（特别是在模板化函数中），但在这种情况下，两者都可以正常工作（to!()） code> 位于 std.conv 中）。

从技术上讲，将 char[] 转换为 string 是可行的，但这通常是一个坏主意。如果您知道 char[] 永远不会改变，那么您可以摆脱它，但在一般情况下，您会面临问题的风险，因为编译器会然后假设生成的字符串永远不会改变，并且它可能生成不正确的代码。甚至可能出现段错误。因此，除非分析表明您确实需要避免复制的额外效率，否则不要这样做，否则您无法通过首先使用 string 之类的操作来避免复制（因此不需要转换），并且您知道字符串永远不会改变。

一般来说，我不会太担心复制字符串的效率。一般来说，您应该使用 string 而不是 char[]，这样您就可以复制它们（即复制它们的引用（例如 str1 = str2;< /code>），而不是像 dup 和 idup 那样复制它们的整个内容），而不用担心效率特别低。该示例的问题在于 stdin.byLine() 返回一个 char[] 而不是 string （大概是为了避免复制数据，如果没有必要）。因此，splitter() 返回一个 char[]，因此 word 是一个 char[] 而不是字符串。现在，您可以执行 splitter(strip(line.idup)) 或 splitter(strip(line).idup) 而不是 idup钥匙。这样，splitter() 将返回 string 而不是 char[]，但这可能本质上与 idup< 一样高效。 /code>ing 单词。无论如何，由于文本最初来自何处，它是一个 char[] 而不是 string，这迫使您在某个地方 idup 它如果您打算将其用作关联数组中的键，请沿着该行。然而，在一般情况下，最好只使用 string 而不是 char[]。那么您就不需要idup任何东西。

编辑：
实际上，即使您发现从 char[] 转换为 string 看起来既安全又必要的情况，请考虑使用 std.exception.assumeUnique()（文档）。它本质上是当您需要并且知道可以时将可变数组转换为不可变数组的首选方法。这通常是在您构建了一个无法使其不可变的数组的情况下完成的，因为您必须分块进行，但该数组没有其他引用，并且您不想创建它的深层副本。但在像您所询问的示例这样的情况下，它没有用，因为您确实需要复制数组。

Associative arrays require that their keys be immutable. It makes sense when you think about the fact that if it's not immutable, then it might change, which means that its hash changes, which means that when you go to get the value out again, the computer won't find it. And if you go to replace it, you'll end up with another value added to the associative array (so, you'll have one with the correct hash and one with an incorrect hash). However, if the key is immutable, it cannot change, and so there is no such problem.

Prior to dmd 2.051, the example worked (which was a bug). It has now been fixed though, so the example in TDPL is no longer correct. However, it's not so much the case that the rules for associative arrays have changed as that there was a bug in them which was not caught. The example compiled when it shouldn't have, and Andrei missed it. It's listed in the official errata for TDPL and should be fixed in future printings.

The corrected code should use either dictionary[word.idup] or dictionary[to!string(word)]. word.idup creates a duplicate of word which is immutable. to!string(word), on the other hand converts word to a string in the most appropriate manner. As word is a char[] in this case, that would be to use idup. However, if word were already a string, then it would simply return the value which was passed in and not needlessly copy it. So, in the general case, to!string(word) is the better choice (particularly in templated functions), but in this case, either works just fine (to!() is in std.conv).

It is technically possible to cast a char[] to a string, but it's generally a bad idea. If you know that the char[] will never change, then you can get away with it, but in the general case, you're risking problems, since the compiler will then assume that the resulting string can never change, and it could generate code which is incorrect. It may even segfault. So, don't do it unless profiling shows that you really need the extra efficiency of avoiding the copy, you can't otherwise avoid the copy by doing something like just using a string in the first place (so no conversion would be necessary), and you know that the string will never be changed.

In general, I wouldn't worry too much of the efficiency of copying strings. Generally, you should be using string instead of char[], so you can copy them around (that is copy their reference around (e.g. str1 = str2;) rather than copying their entire contents like dup and idup do) without worrying about it being particularly inefficient. The problem with the example is that stdin.byLine() returns a char[] rather than a string (presumably to avoid copying the data if its not necessary). So, splitter() returns a char[], and so word is a char[] instead of a string. Now, you could do splitter(strip(line.idup)) or splitter(strip(line).idup) instead of iduping the key. That way, splitter() would return a string rather than char[], but that's probably essentially just as efficient as iduping word. Regardless, because of where the text is coming from originally, it's a char[] instead of a string, which forces you to idup it somewhere along the line if you intend to use it as a key in an associative array. In the general case, however, it's better to just use string and not char[]. Then you don't need to idup anything.

EDIT:
Actually, even if you find a situation where casting from char[] to string seems both safe and necessary, consider using std.exception.assumeUnique() (documentation). It's essentially the preferred way of converting a mutable array to an immutable one when you need to and know that you can. It would typically be done in cases where you've constructed an array which you couldn't make immutable because you had to do it in pieces but which has no other references, and you don't want to create a deep copy of it. It wouldn't be useful in situations like the example that you're asking about though, since you really do need to copy the array.

回复收藏 0 原文