解释如何使用位向量来确定所有字符是否唯一
我对位向量如何执行此操作感到困惑(不太熟悉位向量)。这是给出的代码。有人可以引导我完成这个吗?
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
特别是,检查器
在做什么?
I am confused about how a bit vector would work to do this (not too familiar with bit vectors). Here is the code given. Could someone please walk me through this?
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
Particularly, what is the checker
doing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
我偷偷怀疑你从我正在读的同一本书中得到了这段代码...这里的代码本身并不像运算符 |=、& 和 << 那样神秘。我们外行通常不会使用它们——作者没有费心花额外的时间来解释这个过程,也没有解释这里涉及的实际机制是什么。我一开始对该线程的先前答案感到满意,但只是在抽象层面上。我回过头来是因为我觉得需要一个更具体的解释——缺乏一个总是让我有一种不安的感觉。
该运算符 <<是左移位器,它采用该数字或操作数的二进制表示形式,并将其移动到操作数或右侧数字指定的任意位置,就像仅在二进制中的十进制数一样。我们乘以 2 为基数 - 当我们向上移动许多位置而不是 10 时 - 因此右侧的数字是指数,左侧的数字是 2 的基数倍数
。此运算符 |= (称为按位或赋值)将左侧的操作数与右侧的操作数进行“或”运算,并将结果赋给左侧的操作数(x |= y 相当于 x = x | y)。同样,运算符('&')将'and'运算符的左侧与右侧相结合。这也有按位 AND 赋值(x &= y 相当于 x = x & y)。
因此,我们这里有一个哈希表,每次检查器与指定的值进行或运算(
checker |= (1 << val)
)时,该哈希表都会存储在 32 位二进制数中字母的二进制值及其对应位被设置为 true。字符的值与检查器 (
checker & (1 << val)) > 进行与运算。 0
) - 如果它大于 0,我们就知道我们有一个骗局 - 因为两个相同的位设置为 true 并在一起将返回 true 或“1”。有 26 个二进制位,每个位对应一个小写字母 - 作者确实说过假设字符串只包含小写字母 - 这是因为我们只剩下 6 个位(32 位整数)来消耗 - 并且比我们发生碰撞
因此,对于输入字符串 'azya',当我们一步步移动
string 'a'
string 'az'
string 'azy'
string 'azya'
现在,它声明了一个 duplicate
I have a sneaking suspicion you got this code from the same book I'm reading...The code itself here isn't nearly as cryptic as the the operators- |=, &, and << which aren't normally used by us layman- the author didn't bother taking the extra time out in explaining the process nor what the actual mechanics involved here are. I was content with the previous answer on this thread in the beginning but only on an abstract level. I came back to it because I felt there needed to be a more concrete explanation- the lack of one always leaves me with an uneasy feeling.
This operator << is a left bitwise shifter it takes the binary representation of that number or operand and shifts it over however many places specified by the operand or number on the right like in decimal numbers only in binaries. We are multiplying by base 2-when we move up however many places not base 10- so the number on the right is the exponent and the number on the left is a base multiple of 2.
This operator |= (called Bitwise OR assignment) take the operand on the left and or's it with the operand on the right and assigns the result to the left operand (x |= y is equivalent to x = x | y). Similarly, the operator ('&') will 'and' the left side of the operator with the right side. This also has a Bitwise AND assignment (x &= y is equivalent to x = x & y).
So what we have here is a hash table which is being stored in a 32 bit binary number every time the checker gets or'd (
checker |= (1 << val)
) with the designated binary value of a letter its corresponding bit it is being set to true.The character's value is and'd with the checker (
checker & (1 << val)) > 0
)- if it is greater than 0 we know we have a dupe- because two identical bits set to true and'd together will return true or '1''.There are 26 binary places each of which corresponds to a lowercase letter-the author did say to assume the string only contains lowercase letters- and this is because we only have 6 more (in 32 bit integer) places left to consume- and than we get a collision
So, for an input string 'azya', as we move step by step
string 'a'
string 'az'
string 'azy'
string 'azya'
Now, it declares a duplicate
int checker
在这里用作位的存储。整数值中的每一位都可以被视为一个标志,因此最终int
是一个位(标志)数组。代码中的每个位都表明是否在字符串中找到了具有位索引的字符。出于同样的原因,您可以使用位向量来代替int
。它们之间有两个区别:大小。
int
具有固定大小,通常为 4 个字节,即 8*4=32 位(标志)。位向量通常可以具有不同的大小,或者您应该在构造函数中指定大小。API。使用位向量,您将更容易阅读代码,可能是这样的:
vector.SetFlag(4, true); // 将索引 4 处的标志设置为 true
对于
int
,您将拥有较低级别的位逻辑代码:检查器 |= (1 << 5); // 将索引 5 处的标志设置为 true
另外,
int
可能会快一点,因为位操作的级别非常低,可以由 CPU 按原样执行。 BitVector 允许编写一些不那么神秘的代码,而且它可以存储更多标志。供将来参考:位向量也称为 bitSet 或 bitArray。以下是针对不同语言/平台的此数据结构的一些链接:
int checker
is used here as a storage for bits. Every bit in integer value can be treated as a flag, so eventuallyint
is an array of bits (flag). Each bit in your code states whether the character with bit's index was found in string or not. You could use bit vector for the same reason instead ofint
. There are two differences between them:Size.
int
has fixed size, usually 4 bytes which means 8*4=32 bits (flags). Bit vector usually can be of different size or you should specify the size in constructor.API. With bit vectors you will have easier to read code, probably something like this:
vector.SetFlag(4, true); // set flag at index 4 as true
for
int
you will have lower-level bit logic code:checker |= (1 << 5); // set flag at index 5 to true
Also probably
int
may be a little bit faster, because operations with bits are very low level and can be executed as-is by CPU. BitVector allows writing a little bit less cryptic code instead plus it can store more flags.For future reference: bit vector is also known as bitSet or bitArray. Here are some links to this data structure for different languages/platforms:
我认为所有这些答案确实解释了它是如何工作的,但是我想通过重命名一些变量,添加其他一些变量并添加注释来提供我如何更好地看待它的意见:
I think all these answers do explain how this works, however i felt like giving my input on how i saw it better, by renaming some variables, adding some others and adding comments to it:
我还假设您的示例来自破解代码面试一书和我的答案与这个背景有关。
为了使用这个算法来解决问题,我们必须承认我们只会将字符从a传递到z(小写)。
由于只有 26 个字母,并且这些字母在我们使用的编码表中正确排序,这保证了所有潜在差异
str.charAt(i) - 'a'
将小于 32( int 变量checker
的大小)。正如 Snowbear 所解释的,我们将使用
checker
变量作为位数组。让我们通过例子来说明一下:比方说
str 等于“test”
等等..直到我们通过条件在检查器中找到特定字符的已设置位
希望它有帮助
I also assume that your example comes from the book Cracking The Code Interview and my answer is related to this context.
In order to use this algorithm to solve the problem, we have to admit that we only are going to pass characters from a to z (lowercase).
As there is only 26 letters and these are properly sorted in the encoding table we use, this guarantees us that all the potential differences
str.charAt(i) - 'a'
will be inferior to 32 (the size of the int variablechecker
).As explained by Snowbear, we are about to use the
checker
variable as an array of bits. Lets have an approach by example :Let's say
str equals "test"
and so on.. until we find an already set bit in checker for a specific character via the condition
Hope it helps
阅读伊万上面的回答确实对我有帮助,尽管我的措辞有所不同。
(1 << val)
中的<<
是位移运算符。它采用1
(二进制表示为000000001
,前面有任意多个零/由内存分配)并将其向左移动val
空格。由于我们仅假设 az 并每次减去a
,因此每个字母的值为 0-25,这将是该字母在检查器
中从右侧开始的索引整数的布尔表示,因为我们会将1
向左移动checker
val
次。在每次检查结束时,我们都会看到
|=
运算符。这会合并两个二进制数,如果该索引处的任一操作数中存在1
,则将所有0
替换为1
。在这里,这意味着只要(1 << val)
中存在1
,该1
就会被复制到中checker
,而所有checker
现有的 1 将被保留。您可能会猜到,
1
在这里充当 true 的布尔标志。当我们检查某个字符是否已在字符串中表示时,我们会比较checker
,此时它本质上是一个布尔标志数组(1
值),位于已经表示的字符的索引,本质上是一个布尔值数组,当前字符的索引处带有1
标志。&
运算符完成此检查。与|=
类似,&
运算符将复制1
仅,如果两个操作数都有 < code>1 在该索引处。因此,本质上,只有checker
中已经存在且也在(1 << val)
中表示的标志才会被复制。在这种情况下,这意味着仅当当前字符已被表示时,checker & 的结果中的任何位置才会出现
。如果1
。 (1 << val)1
出现在该操作结果中的任何位置,则返回的布尔值是>
,该方法返回 false。1
。 0我猜这就是为什么位向量也被称为位数组。因为,即使它们不是数组数据类型,它们的使用方式也类似于使用数组来存储布尔标志。
Reading Ivan's answer above really helped me, although I would phrase it somewhat differently.
The
<<
in(1 << val)
is a bit shifting operator. It takes1
(which in binary is represented as000000001
, with as many preceding zeroes as you like / are allocated by memory) and shifts it to the left byval
spaces. Since we're assuming a-z only and subtractinga
each time, each letter will have a value of 0-25, which will be that letter's index from the right in thechecker
integer's boolean representation, since we will move the1
to the left inchecker
val
times.At the end of each check, we see the
|=
operator. This merges two binary numbers, replacing all0
's with1
's if a1
exists in either operand at that index. Here, that means that wherever a1
exists in(1 << val)
, that1
will be copied over intochecker
, while all ofchecker
's existing 1's will be preserved.As you can probably guess, a
1
functions here as a boolean flag for true. When we check to see if a character is already represented in the string, we comparechecker
, which at this point is essentially an array of boolean flags (1
values) at the indexes of characters that have already been represented, with what is essentially an array of boolean values with a1
flag at the index of the current character.The
&
operator accomplishes this check. Similar to the|=
, the&
operator will copy over a1
only if both operands have a1
at that index. So, essentially, only flags already present inchecker
that are also represented in(1 << val)
will be copied over. In this case, that means only if the current character has already been represented will there be a1
present anywhere in the result ofchecker & (1 << val)
. And if a1
is present anywhere in the result of that operation, then the value of the returned boolean is> 0
, and the method returns false.This is, I'm guessing, why bit vectors are also called bit arrays. Because, even though they aren't of the array data type, they can be used similar to the way arrays are used in order to store boolean flags.
上面已经提供了几个很好的答案。所以我不想重复已经说过的一切。但确实想添加一些内容来帮助完成上述程序,因为我刚刚完成了同一个程序并有几个问题,但花了一些时间后,我对该程序有了更清晰的了解。
首先,“检查器”用于跟踪字符串中已经遍历的字符,以查看是否有任何字符重复。
现在“checker”是一个 int 数据类型,因此它只能有 32 位或 4 个字节(取决于平台),因此该程序只能对 32 个字符范围内的字符集正确工作。这就是原因,该程序从每个字符中减去“a”,以使该程序仅针对小写字符运行。但是,如果混合使用小写和大写字符,则它将不起作用。
顺便说一句,如果您不从每个字符中减去“a”(请参见下面的语句),那么该程序将仅适用于包含大写字符的字符串或仅包含小写字符的字符串。因此,上述程序的范围也从小写字符增加到大写字符,但它们不能混合在一起。
然而,我想使用按位运算编写一个通用程序,它应该适用于任何 ASCII 字符,而不必担心大写、小写、数字或任何特殊字符。为了做到这一点,我们的“检查器”应该足够大以存储 256 个字符(ASCII 字符集大小)。但是 Java 中的 int 无法工作,因为它只能存储 32 位。因此,在下面的程序中,我使用 JDK 中可用的 BitSet 类,它可以在实例化 BitSet 对象时传递任何用户定义的大小。
这是一个与上面使用按位运算符编写的程序执行相同操作的程序,但该程序适用于具有 ASCII 字符集中任何字符的字符串。
There are couple of excellent answers already provided above. So I don't want to repeat what's everything already said. But did want to add couple of things to help with the above program as I just worked through the same program and had couple of questions but after spending some time, I have more clarity on this program.
First of all "checker" is used to track the character which is already traversed in the String in order to see if there are any characters are getting repeated.
Now "checker" is an int data type so it can only have 32 bits or 4 bytes (depending upon platform) so this program can only work correctly for a character set within a range of 32 characters. That's the reason, this program subtracts 'a' from each character in order to make this program run for only lower case characters. However if you mix lower and upper case characters then it would not work.
By the way, if you do not subtract 'a' from each character (see below statement) then this program will work correctly for only String with upper case characters or String with only lower case characters. So the scope of above program increases from just lower case characters to upper case characters too but they can't be mixed together.
However I wanted to write a generic program using Bitwise Operation which should work for any ASCII characters without worrying about upper case, lower case, numbers or any special character. In order to do this, our "checker" should be large enough to store 256 characters (ASCII Character Set size). But an int in Java would not work as it can only store 32 bits. Hence in below program, I am using BitSet class available in JDK which can have any user defined size passed while instantiating a BitSet object.
Here is a program which does the same thing as above program written using Bitwise operator but this program will work for a String with any character from ASCII character set.
简单解释(下面有JS代码)
32位
DEC64
。a
,则设置0th
索引,1st
代表b
&很快。操作总结:
checker
和checker
之间进行AND运算。字符的index
Int-32-Arrays
if
的输出 则操作为1
output == 1
checker
变量在两个数组中都设置了特定的索引位输出== 0
checker
和checker
之间执行OR 运算。字符的索引
1
检查器
假设:
97
下面给出的是 JavaScript 源代码。
注意,在 JS 中,尽管整数是 64 位,但按位运算总是在 32 位上完成。
示例:
如果字符串是
aa
则:i = 0
i = 1
Simple Explanation (with JS code below)
32-bit
DEC64
for JS.0th
index if we finda
in the string,1st
forb
& so on.Summary of operations:
checker
&index
of the characterInt-32-Arrays
if
the output of the operation was1
output == 1
checker
variable has that particular index-th bit set in both arraysoutput == 0
checker
&index
of the character1
checker
Assumptions:
a
is97
Given below is the JavaScript source code.
Note that in JS, despite integers being of 64 bits, a bit wise operation is always done on 32 bits.
Example:
If the string is
aa
then:i = 0
i = 1
让我们逐行分解代码。
int checker = 0; 我们正在启动一个检查器,它将帮助我们找到重复的值。
int val = str.charAt(i) - 'a'; 我们获取字符串第 'i' 个位置处的字符的 ASCII 值,并减去 'a' 的 ASCII 值'。由于假设字符串仅是低位字符,因此字符数限制为 26。因此,'val' 的值将始终 >= 0。
if ((checker & (10) return false;
checker |= (1 << val);
现在这是棘手的部分。让我们考虑一个带有字符串“abcda”的示例。理想情况下,这应该返回 false。
For 循环迭代 1:
检查器:00000000000000000000000000000000
val:97-97 = 0
1 << 0: 00000000000000000000000000000001
检查器 & (1 << val): 00000000000000000000000000000000 不> 0
因此检查器:00000000000000000000000000000001
For 循环迭代 2:
检查器:000000000000000000000000000000001
val:98-97 = 1
1 << 1: 00000000000000000000000000000010
检查器 & (1 << val): 00000000000000000000000000000000 不> 0
因此检查器:00000000000000000000000000000011
For 循环迭代 3:
检查器:00000000000000000000000000000011
val:99-97 = 2
1 << 2: 00000000000000000000000000000100
检查器 & (1 << val): 00000000000000000000000000000000 不> 0
因此检查器:00000000000000000000000000000111
For 循环迭代 4:
检查器:00000000000000000000000000000111
val:100-97 = 3
1 << 3:00000000000000000000000000001000
检查器& (1 << val): 00000000000000000000000000000000 不> 0
因此检查器:00000000000000000000000000001111
For 循环迭代 5:
检查器:00000000000000000000000000001111
val:97-97 = 0
1 << 0: 00000000000000000000000000000001
检查器 & (1<<val):00000000000000000000000000000001 是> 0
因此返回 false。
Lets break down the code line by line.
int checker = 0; We are initiating a checker which will help us find duplicate values.
int val = str.charAt(i) - 'a'; We are getting the ASCII value of the character at the 'i'th position of the string and subtracting it with the ASCII value of 'a'. Since the assumption is that the string is lower characters only, the number of characters in limited to 26. Hece, the value of 'val' will always be >= 0.
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
Now this is the tricky part. Lets us consider an example with string "abcda". This should ideally return false.
For loop iteration 1:
Checker: 00000000000000000000000000000000
val: 97-97 = 0
1 << 0: 00000000000000000000000000000001
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000001
For loop iteration 2:
Checker: 00000000000000000000000000000001
val: 98-97 = 1
1 << 1: 00000000000000000000000000000010
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000011
For loop iteration 3:
Checker: 00000000000000000000000000000011
val: 99-97 = 2
1 << 2: 00000000000000000000000000000100
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000111
For loop iteration 4:
Checker: 00000000000000000000000000000111
val: 100-97 = 3
1 << 3: 00000000000000000000000000001000
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000001111
For loop iteration 5:
Checker: 00000000000000000000000000001111
val: 97-97 = 0
1 << 0: 00000000000000000000000000000001
checker & (1 << val): 00000000000000000000000000000001 is > 0
Hence return false.
以前的帖子很好地解释了代码块的作用,我想使用 BitSet java 数据结构添加我的简单解决方案:
Previous Posts explain well what the code block does and i want to add my simple Solution using the BitSet java Data structure :
我理解使用 Javascript 的方式。假设输入
var inputChar = "abca"; // 查找 inputChar 是否包含所有唯一字符
让我们开始
第 4 行:int val = str.charAt(i) - 'a';
在 Javascript 中,例如:
"a".charCodeAt().toString(2)
返回 1100001checker = 1100001 | checker;
//checker 变为 1100001 (在 32 位表示中,它变为 000000000.....00001100001)但我希望我的位掩码(
int checker
)只设置一位,但是 checker是 1100001让我们使用重置的 val
5 和 6 行很好解释了@Ivan 答案
The way I understood using Javascript. Assuming input
var inputChar = "abca"; //find if inputChar has all unique characters
Lets Start
Line 4: int val = str.charAt(i) - 'a';
In Javascript Eg:
"a".charCodeAt().toString(2)
returns 1100001checker = 1100001 | checker;
//checker becomes 1100001 (In 32 bit representation it becomes 000000000.....00001100001)But i want my bitmask (
int checker
) to set only one bit, but checker is 1100001Lets use
val
which is resettedLine 5 and Line 6 is well explained @Ivan answer
以防万一,如果有人正在使用位向量查找字符串中唯一字符的 kotlin 等效项
参考: https ://www.programiz.com/kotlin-programming/bitwise
Just in case if anyone is looking for kotlin equivalent of unique characters in a string using bit vector
Ref: https://www.programiz.com/kotlin-programming/bitwise