解释如何使用位向量来确定所有字符是否唯一

发布于 2025-01-02 15:08:53 字数 401 浏览 1 评论 0原文

我对位向量如何执行此操作感到困惑(不太熟悉位向量)。这是给出的代码。有人可以引导我完成这个吗?

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

特别是,检查器在做什么?

I am confused about how a bit vector would work to do this (not too familiar with bit vectors). Here is the code given. Could someone please walk me through this?

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

Particularly, what is the checker doing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

绾颜 2025-01-09 15:08:53

我偷偷怀疑你从我正在读的同一本书中得到了这段代码...这里的代码本身并不像运算符 |=、& 和 << 那样神秘。我们外行通常不会使用它们——作者没有费心花额外的时间来解释这个过程,也没有解释这里涉及的实际机制是什么。我一开始对该线程的先前答案感到满意,但只是在抽象层面上。我回过头来是因为我觉得需要一个更具体的解释——缺乏一个总是让我有一种不安的感觉。

该运算符 <<是左移位器,它采用该数字或操作数的二进制表示形式,并将其移动到操作数或右侧数字指定的任意位置,就像仅在二进制中的十进制数一样。我们乘以 2 为基数 - 当我们向上移动许多位置而不是 10 时 - 因此右侧的数字是指数,左侧的数字是 2 的基数倍数

。此运算符 |= (称为按位或赋值)将左侧的操作数与右侧的操作数进行“或”运算,并将结果赋给左侧的操作数(x |= y 相当于 x = x | y)。同样,运算符('&')将'and'运算符的左侧与右侧相结合。这也有按位 AND 赋值(x &= y 相当于 x = x & y)。

因此,我们这里有一个哈希表,每次检查器与指定的值进行或运算(checker |= (1 << val))时,该哈希表都会存储在 32 位二进制数中字母的二进制值及其对应位被设置为 true。
字符的值与检查器 (checker & (1 << val)) > 进行与运算。 0) - 如果它大于 0,我们就知道我们有一个骗局 - 因为两个相同的位设置为 true 并在一起将返回 true 或“1”。

有 26 个二进制位,每个位对应一个小写字母 - 作者确实说过假设字符串只包含小写字母 - 这是因为我们只剩下 6 个位(32 位整数)来消耗 - 并且比我们发生碰撞

00000000000000000000000000000001 a 2^0

00000000000000000000000000000010 b 2^1

00000000000000000000000000000100 c 2^2

00000000000000000000000000001000 d 2^3

00000000000000000000000000010000 e 2^4

00000000000000000000000000100000 f 2^5

00000000000000000000000001000000 g 2^6

00000000000000000000000010000000 h 2^7

00000000000000000000000100000000 i 2^8

00000000000000000000001000000000 j 2^9

00000000000000000000010000000000 k 2^10

00000000000000000000100000000000 l 2^11

00000000000000000001000000000000 m 2^12

00000000000000000010000000000000 n 2^13

00000000000000000100000000000000 o 2^14

00000000000000001000000000000000 p 2^15

00000000000000010000000000000000 q 2^16

00000000000000100000000000000000 r 2^17

00000000000001000000000000000000 s 2^18

00000000000010000000000000000000 t 2^19

00000000000100000000000000000000 u 2^20

00000000001000000000000000000000 v 2^21

00000000010000000000000000000000 w 2^22

00000000100000000000000000000000 x 2^23

00000001000000000000000000000000 y 2^24

00000010000000000000000000000000 z 2^25

因此,对于输入字符串 'azya',当我们一步步移动

string 'a'

a      =00000000000000000000000000000001
checker=00000000000000000000000000000000

checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001

a and checker=0 no dupes condition

string 'az'

checker=00000000000000000000000000000001
z      =00000010000000000000000000000000

z and checker=0 no dupes 

checker=z or checker;
// checker now becomes 00000010000000000000000000000001  
      

string 'azy'

checker= 00000010000000000000000000000001    
y      = 00000001000000000000000000000000 

checker and y=0 no dupes condition 

checker= checker or y;
// checker now becomes = 00000011000000000000000000000001

string 'azya'

checker= 00000011000000000000000000000001
a      = 00000000000000000000000000000001

a and checker=1 we have a dupe

现在,它声明了一个 duplicate

I have a sneaking suspicion you got this code from the same book I'm reading...The code itself here isn't nearly as cryptic as the the operators- |=, &, and << which aren't normally used by us layman- the author didn't bother taking the extra time out in explaining the process nor what the actual mechanics involved here are. I was content with the previous answer on this thread in the beginning but only on an abstract level. I came back to it because I felt there needed to be a more concrete explanation- the lack of one always leaves me with an uneasy feeling.

This operator << is a left bitwise shifter it takes the binary representation of that number or operand and shifts it over however many places specified by the operand or number on the right like in decimal numbers only in binaries. We are multiplying by base 2-when we move up however many places not base 10- so the number on the right is the exponent and the number on the left is a base multiple of 2.

This operator |= (called Bitwise OR assignment) take the operand on the left and or's it with the operand on the right and assigns the result to the left operand (x |= y is equivalent to x = x | y). Similarly, the operator ('&') will 'and' the left side of the operator with the right side. This also has a Bitwise AND assignment (x &= y is equivalent to x = x & y).

So what we have here is a hash table which is being stored in a 32 bit binary number every time the checker gets or'd ( checker |= (1 << val)) with the designated binary value of a letter its corresponding bit it is being set to true.
The character's value is and'd with the checker (checker & (1 << val)) > 0)- if it is greater than 0 we know we have a dupe- because two identical bits set to true and'd together will return true or '1''.

There are 26 binary places each of which corresponds to a lowercase letter-the author did say to assume the string only contains lowercase letters- and this is because we only have 6 more (in 32 bit integer) places left to consume- and than we get a collision

00000000000000000000000000000001 a 2^0

00000000000000000000000000000010 b 2^1

00000000000000000000000000000100 c 2^2

00000000000000000000000000001000 d 2^3

00000000000000000000000000010000 e 2^4

00000000000000000000000000100000 f 2^5

00000000000000000000000001000000 g 2^6

00000000000000000000000010000000 h 2^7

00000000000000000000000100000000 i 2^8

00000000000000000000001000000000 j 2^9

00000000000000000000010000000000 k 2^10

00000000000000000000100000000000 l 2^11

00000000000000000001000000000000 m 2^12

00000000000000000010000000000000 n 2^13

00000000000000000100000000000000 o 2^14

00000000000000001000000000000000 p 2^15

00000000000000010000000000000000 q 2^16

00000000000000100000000000000000 r 2^17

00000000000001000000000000000000 s 2^18

00000000000010000000000000000000 t 2^19

00000000000100000000000000000000 u 2^20

00000000001000000000000000000000 v 2^21

00000000010000000000000000000000 w 2^22

00000000100000000000000000000000 x 2^23

00000001000000000000000000000000 y 2^24

00000010000000000000000000000000 z 2^25

So, for an input string 'azya', as we move step by step

string 'a'

a      =00000000000000000000000000000001
checker=00000000000000000000000000000000

checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001

a and checker=0 no dupes condition

string 'az'

checker=00000000000000000000000000000001
z      =00000010000000000000000000000000

z and checker=0 no dupes 

checker=z or checker;
// checker now becomes 00000010000000000000000000000001  
      

string 'azy'

checker= 00000010000000000000000000000001    
y      = 00000001000000000000000000000000 

checker and y=0 no dupes condition 

checker= checker or y;
// checker now becomes = 00000011000000000000000000000001

string 'azya'

checker= 00000011000000000000000000000001
a      = 00000000000000000000000000000001

a and checker=1 we have a dupe

Now, it declares a duplicate

长不大的小祸害 2025-01-09 15:08:53

int checker 在这里用作位的存储。整数值中的每一位都可以被视为一个标志,因此最终 int 是一个位(标志)数组。代码中的每个位都表明是否在字符串中找到了具有位索引的字符。出于同样的原因,您可以使用位向量来代替 int。它们之间有两个区别:

  • 大小int 具有固定大小,通常为 4 个字节,即 8*4=32 位(标志)。位向量通常可以具有不同的大小,或者您应该在构造函数中指定大小。

  • API。使用位向量,您将更容易阅读代码,可能是这样的:

    vector.SetFlag(4, true); // 将索引 4 处的标志设置为 true

    对于int,您将拥有较低级别的位逻辑代码:

    检查器 |= (1 << 5); // 将索引 5 处的标志设置为 true

另外,int 可能会快一点,因为位操作的级别非常低,可以由 CPU 按原样执行。 BitVector 允许编写一些不那么神秘的代码,而且它可以存储更多标志。

供将来参考:位向量也称为 bitSet 或 bitArray。以下是针对不同语言/平台的此数据结构的一些链接:

int checker is used here as a storage for bits. Every bit in integer value can be treated as a flag, so eventually int is an array of bits (flag). Each bit in your code states whether the character with bit's index was found in string or not. You could use bit vector for the same reason instead of int. There are two differences between them:

  • Size. int has fixed size, usually 4 bytes which means 8*4=32 bits (flags). Bit vector usually can be of different size or you should specify the size in constructor.

  • API. With bit vectors you will have easier to read code, probably something like this:

    vector.SetFlag(4, true); // set flag at index 4 as true

    for int you will have lower-level bit logic code:

    checker |= (1 << 5); // set flag at index 5 to true

Also probably int may be a little bit faster, because operations with bits are very low level and can be executed as-is by CPU. BitVector allows writing a little bit less cryptic code instead plus it can store more flags.

For future reference: bit vector is also known as bitSet or bitArray. Here are some links to this data structure for different languages/platforms:

嗳卜坏 2025-01-09 15:08:53

我认为所有这些答案确实解释了它是如何工作的,但是我想通过重命名一些变量,添加其他一些变量并添加注释来提供我如何更好地看待它的意见:

public static boolean isUniqueChars(String str) {

    /*
    checker is the bit array, it will have a 1 on the character index that
    has appeared before and a 0 if the character has not appeared, you
    can see this number initialized as 32 0 bits:
    00000000 00000000 00000000 00000000
     */
    int checker = 0;

    //loop through each String character
    for (int i = 0; i < str.length(); ++i) {
        /*
        a through z in ASCII are charactets numbered 97 through 122, 26 characters total
        with this, you get a number between 0 and 25 to represent each character index
        0 for 'a' and 25 for 'z'

        renamed 'val' as 'characterIndex' to be more descriptive
         */
        int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26

        /*
        created a new variable to make things clearer 'singleBitOnPosition'

        It is used to calculate a number that represents the bit value of having that 
        character index as a 1 and the rest as a 0, this is achieved
        by getting the single digit 1 and shifting it to the left as many
        times as the character index requires
        e.g. character 'd'
        00000000 00000000 00000000 00000001
        Shift 3 spaces to the left (<<) because 'd' index is number 3
        1 shift: 00000000 00000000 00000000 00000010
        2 shift: 00000000 00000000 00000000 00000100
        3 shift: 00000000 00000000 00000000 00001000

        Therefore the number representing 'd' is
        00000000 00000000 00000000 00001000

         */
        int singleBitOnPosition = 1 << characterIndex;

        /*
        This peforms an AND between the checker, which is the bit array
        containing everything that has been found before and the number
        representing the bit that will be turned on for this particular
        character. e.g.
        if we have already seen 'a', 'b' and 'd', checker will have:
        checker = 00000000 00000000 00000000 00001011
        And if we see 'b' again:
        'b' = 00000000 00000000 00000000 00000010

        it will do the following:
        00000000 00000000 00000000 00001011
        & (AND)
        00000000 00000000 00000000 00000010
        -----------------------------------
        00000000 00000000 00000000 00000010

        Since this number is different than '0' it means that the character
        was seen before, because on that character index we already have a 
        1 bit value
         */
        if ((checker & singleBitOnPosition) > 0) {
            return false;
        }

        /* 
        Remember that 
        checker |= singleBitOnPosition is the same as  
        checker = checker | singleBitOnPosition
        Sometimes it is easier to see it expanded like that.

        What this achieves is that it builds the checker to have the new 
        value it hasnt seen, by doing an OR between checker and the value 
        representing this character index as a 1. e.g.
        If the character is 'f' and the checker has seen 'g' and 'a', the 
        following will happen

        'f' = 00000000 00000000 00000000 00100000
        checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001

        00000000 00000000 00000000 00100000
        | (OR)
        00000000 00000000 00000000 01000001
        -----------------------------------
        00000000 00000000 00000000 01100001

        Therefore getting a new checker as 00000000 00000000 00000000 01100001

         */
        checker |= singleBitOnPosition;
    }
    return true;
}

I think all these answers do explain how this works, however i felt like giving my input on how i saw it better, by renaming some variables, adding some others and adding comments to it:

public static boolean isUniqueChars(String str) {

    /*
    checker is the bit array, it will have a 1 on the character index that
    has appeared before and a 0 if the character has not appeared, you
    can see this number initialized as 32 0 bits:
    00000000 00000000 00000000 00000000
     */
    int checker = 0;

    //loop through each String character
    for (int i = 0; i < str.length(); ++i) {
        /*
        a through z in ASCII are charactets numbered 97 through 122, 26 characters total
        with this, you get a number between 0 and 25 to represent each character index
        0 for 'a' and 25 for 'z'

        renamed 'val' as 'characterIndex' to be more descriptive
         */
        int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26

        /*
        created a new variable to make things clearer 'singleBitOnPosition'

        It is used to calculate a number that represents the bit value of having that 
        character index as a 1 and the rest as a 0, this is achieved
        by getting the single digit 1 and shifting it to the left as many
        times as the character index requires
        e.g. character 'd'
        00000000 00000000 00000000 00000001
        Shift 3 spaces to the left (<<) because 'd' index is number 3
        1 shift: 00000000 00000000 00000000 00000010
        2 shift: 00000000 00000000 00000000 00000100
        3 shift: 00000000 00000000 00000000 00001000

        Therefore the number representing 'd' is
        00000000 00000000 00000000 00001000

         */
        int singleBitOnPosition = 1 << characterIndex;

        /*
        This peforms an AND between the checker, which is the bit array
        containing everything that has been found before and the number
        representing the bit that will be turned on for this particular
        character. e.g.
        if we have already seen 'a', 'b' and 'd', checker will have:
        checker = 00000000 00000000 00000000 00001011
        And if we see 'b' again:
        'b' = 00000000 00000000 00000000 00000010

        it will do the following:
        00000000 00000000 00000000 00001011
        & (AND)
        00000000 00000000 00000000 00000010
        -----------------------------------
        00000000 00000000 00000000 00000010

        Since this number is different than '0' it means that the character
        was seen before, because on that character index we already have a 
        1 bit value
         */
        if ((checker & singleBitOnPosition) > 0) {
            return false;
        }

        /* 
        Remember that 
        checker |= singleBitOnPosition is the same as  
        checker = checker | singleBitOnPosition
        Sometimes it is easier to see it expanded like that.

        What this achieves is that it builds the checker to have the new 
        value it hasnt seen, by doing an OR between checker and the value 
        representing this character index as a 1. e.g.
        If the character is 'f' and the checker has seen 'g' and 'a', the 
        following will happen

        'f' = 00000000 00000000 00000000 00100000
        checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001

        00000000 00000000 00000000 00100000
        | (OR)
        00000000 00000000 00000000 01000001
        -----------------------------------
        00000000 00000000 00000000 01100001

        Therefore getting a new checker as 00000000 00000000 00000000 01100001

         */
        checker |= singleBitOnPosition;
    }
    return true;
}
天赋异禀 2025-01-09 15:08:53

我还假设您的示例来自破解代码面试一书和我的答案与这个背景有关。

为了使用这个算法来解决问题,我们必须承认我们只会将字符从a传递到z(小写)。

由于只有 26 个字母,并且这些字母在我们使用的编码表中正确排序,这保证了所有潜在差异 str.charAt(i) - 'a' 将小于 32( int 变量 checker 的大小)。

正如 Snowbear 所解释的,我们将使用 checker 变量作为位数组。让我们通过例子来说明一下:

比方说
str 等于“test”

  • 第一遍 (i = t)

检查器== 0 (00000000000000000000000000000000)

In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1          == 00000000000000000000000000000001
1 << 19    == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)
  • 第二遍 (i = e)

检查器== 524288 (00000000000010000000000000000000)

val = 101 - 97 = 4
1          == 00000000000000000000000000000001
1 << 4     == 00000000000000000000000000010000
checker |= (1 << val) 
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)

等等..直到我们通过条件在检查器中找到特定字符的已设置位

(checker & (1 << val)) > 0

希望它有帮助

I also assume that your example comes from the book Cracking The Code Interview and my answer is related to this context.

In order to use this algorithm to solve the problem, we have to admit that we only are going to pass characters from a to z (lowercase).

As there is only 26 letters and these are properly sorted in the encoding table we use, this guarantees us that all the potential differences str.charAt(i) - 'a' will be inferior to 32 (the size of the int variable checker).

As explained by Snowbear, we are about to use the checker variable as an array of bits. Lets have an approach by example :

Let's say
str equals "test"

  • First pass (i = t)

checker == 0 (00000000000000000000000000000000)

In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1          == 00000000000000000000000000000001
1 << 19    == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)
  • Second pass (i = e)

checker == 524288 (00000000000010000000000000000000)

val = 101 - 97 = 4
1          == 00000000000000000000000000000001
1 << 4     == 00000000000000000000000000010000
checker |= (1 << val) 
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)

and so on.. until we find an already set bit in checker for a specific character via the condition

(checker & (1 << val)) > 0

Hope it helps

蹲在坟头点根烟 2025-01-09 15:08:53

阅读伊万上面的回答确实对我有帮助,尽管我的措辞有所不同。

(1 << val) 中的 << 是位移运算符。它采用 1 (二进制表示为 000000001,前面有任意多个零/由内存分配)并将其向左移动 val 空格。由于我们仅假设 az 并每次减去 a,因此每个字母的值为 0-25,这将是该字母在 检查器 中从右侧开始的索引整数的布尔表示,因为我们会将 1 向左移动 checker val 次。

在每次检查结束时,我们都会看到 |= 运算符。这会合并两个二进制数,如果该索引处的任一操作数中存在 1,则将所有 0 替换为 1。在这里,这意味着只要 (1 << val) 中存在 1,该 1 就会被复制到 中checker,而所有 checker 现有的 1 将被保留。

您可能会猜到,1 在这里充当 true 的布尔标志。当我们检查某个字符是否已在字符串中表示时,我们会比较 checker,此时它本质上是一个布尔标志数组(1 值),位于已经表示的字符的索引,本质上是一个布尔值数组,当前字符的索引处带有 1 标志。

& 运算符完成此检查。与 |= 类似,& 运算符将复制 1 ,如果两个操作数都有 < code>1 在该索引处。因此,本质上,只有 checker 中已经存在且也在 (1 << val) 中表示的标志才会被复制。在这种情况下,这意味着仅当当前字符已被表示时,checker & 的结果中的任何位置才会出现1。 (1 << val)。如果 1 出现在该操作结果中的任何位置,则返回的布尔值是 >1 。 0,该方法返回 false。

我猜这就是为什么位向量也被称为位数组。因为,即使它们不是数组数据类型,它们的使用方式也类似于使用数组来存储布尔标志。

Reading Ivan's answer above really helped me, although I would phrase it somewhat differently.

The << in (1 << val) is a bit shifting operator. It takes 1 (which in binary is represented as 000000001, with as many preceding zeroes as you like / are allocated by memory) and shifts it to the left by val spaces. Since we're assuming a-z only and subtracting a each time, each letter will have a value of 0-25, which will be that letter's index from the right in the checker integer's boolean representation, since we will move the 1 to the left in checker val times.

At the end of each check, we see the |= operator. This merges two binary numbers, replacing all 0's with 1's if a 1 exists in either operand at that index. Here, that means that wherever a 1 exists in (1 << val), that 1 will be copied over into checker, while all of checker's existing 1's will be preserved.

As you can probably guess, a 1 functions here as a boolean flag for true. When we check to see if a character is already represented in the string, we compare checker, which at this point is essentially an array of boolean flags (1 values) at the indexes of characters that have already been represented, with what is essentially an array of boolean values with a 1 flag at the index of the current character.

The & operator accomplishes this check. Similar to the |=, the & operator will copy over a 1 only if both operands have a 1 at that index. So, essentially, only flags already present in checker that are also represented in (1 << val) will be copied over. In this case, that means only if the current character has already been represented will there be a 1 present anywhere in the result of checker & (1 << val). And if a 1 is present anywhere in the result of that operation, then the value of the returned boolean is > 0, and the method returns false.

This is, I'm guessing, why bit vectors are also called bit arrays. Because, even though they aren't of the array data type, they can be used similar to the way arrays are used in order to store boolean flags.

棒棒糖 2025-01-09 15:08:53

上面已经提供了几个很好的答案。所以我不想重复已经说过的一切。但确实想添加一些内容来帮助完成上述程序,因为我刚刚完成了同一个程序并有几个问题,但花了一些时间后,我对该程序有了更清晰的了解。

首先,“检查器”用于跟踪字符串中已经遍历的字符,以查看是否有任何字符重复。

现在“checker”是一个 int 数据类型,因此它只能有 32 位或 4 个字节(取决于平台),因此该程序只能对 32 个字符范围内的字符集正确工作。这就是原因,该程序从每个字符中减去“a”,以使该程序仅针对小写字符运行。但是,如果混合使用小写和大写字符,则它将不起作用。

顺便说一句,如果您不从每个字符中减去“a”(请参见下面的语句),那么该程序将仅适用于包含大写字符的字符串或仅包含小写字符的字符串。因此,上述程序的范围也从小写字符增加到大写字符,但它们不能混合在一起。

int val = str.charAt(i) - 'a'; 

然而,我想使用按位运算编写一个通用程序,它应该适用于任何 ASCII 字符,而不必担心大写、小写、数字或任何特殊字符。为了做到这一点,我们的“检查器”应该足够大以存储 256 个字符(ASCII 字符集大小)。但是 Java 中的 int 无法工作,因为它只能存储 32 位。因此,在下面的程序中,我使用 JDK 中可用的 BitSet 类,它可以在实例化 BitSet 对象时传递任何用户定义的大小。

这是一个与上面使用按位运算符编写的程序执行相同操作的程序,但该程序适用于具有 ASCII 字符集中任何字符的字符串。

public static boolean isUniqueStringUsingBitVectorClass(String s) {

    final int ASCII_CHARACTER_SET_SIZE = 256;

    final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

    // if more than  256 ASCII characters then there can't be unique characters
    if(s.length() > 256) {
        return false;
    }

    //this will be used to keep the location of each character in String
    final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

    for(int i = 0; i < s.length(); i++) {

        int charVal = s.charAt(i);
        charBitLocation.set(charVal); //set the char location in BitSet

        //check if tracker has already bit set with the bit present in charBitLocation
        if(tracker.intersects(charBitLocation)) {
            return false;
        }

        //set the tracker with new bit from charBitLocation
        tracker.or(charBitLocation);

        charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop

    }

    return true;

}

There are couple of excellent answers already provided above. So I don't want to repeat what's everything already said. But did want to add couple of things to help with the above program as I just worked through the same program and had couple of questions but after spending some time, I have more clarity on this program.

First of all "checker" is used to track the character which is already traversed in the String in order to see if there are any characters are getting repeated.

Now "checker" is an int data type so it can only have 32 bits or 4 bytes (depending upon platform) so this program can only work correctly for a character set within a range of 32 characters. That's the reason, this program subtracts 'a' from each character in order to make this program run for only lower case characters. However if you mix lower and upper case characters then it would not work.

By the way, if you do not subtract 'a' from each character (see below statement) then this program will work correctly for only String with upper case characters or String with only lower case characters. So the scope of above program increases from just lower case characters to upper case characters too but they can't be mixed together.

int val = str.charAt(i) - 'a'; 

However I wanted to write a generic program using Bitwise Operation which should work for any ASCII characters without worrying about upper case, lower case, numbers or any special character. In order to do this, our "checker" should be large enough to store 256 characters (ASCII Character Set size). But an int in Java would not work as it can only store 32 bits. Hence in below program, I am using BitSet class available in JDK which can have any user defined size passed while instantiating a BitSet object.

Here is a program which does the same thing as above program written using Bitwise operator but this program will work for a String with any character from ASCII character set.

public static boolean isUniqueStringUsingBitVectorClass(String s) {

    final int ASCII_CHARACTER_SET_SIZE = 256;

    final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

    // if more than  256 ASCII characters then there can't be unique characters
    if(s.length() > 256) {
        return false;
    }

    //this will be used to keep the location of each character in String
    final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

    for(int i = 0; i < s.length(); i++) {

        int charVal = s.charAt(i);
        charBitLocation.set(charVal); //set the char location in BitSet

        //check if tracker has already bit set with the bit present in charBitLocation
        if(tracker.intersects(charBitLocation)) {
            return false;
        }

        //set the tracker with new bit from charBitLocation
        tracker.or(charBitLocation);

        charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop

    }

    return true;

}
红颜悴 2025-01-09 15:08:53

简单解释(下面有JS代码)

  • 每个机器码的整型变量是一个32位数组
  • 所有按位操作都是32位
  • 它们与操作系统/CPU 体系结构或所选语言的数字系统无关,例如 JS 的 DEC64
  • 这种重复查找方法类似于在大小为 32 的数组中存储字符,其中,如果我们在字符串中找到a,则设置0th索引, 1st 代表 b &很快。
  • 字符串中的重复字符将占用其相应位,或者在本例中设置为 1。
  • Ivan 已经解释过:这个指数计算在之前的答案中是如何工作的

操作总结:

  • checkerchecker之间进行AND运算。字符的 index
  • 内部都是 Int-32-Arrays
  • 这是这两者之间的按位运算 2.
  • 检查 if 的输出 则操作为 1
  • 如果 output == 1
    • checker 变量在两个数组中都设置了特定的索引位
    • 因此它是重复的。
  • 如果输出== 0
    • 目前尚未找到该角色
    • checkerchecker 之间执行OR 运算。字符的索引
    • 从而将第索引位更新为 1
    • 将输出分配给检查器

假设:

  • 我们假设我们将获得所有小写字符
  • 并且大小 32 就足够了
  • 因此,考虑到 ascii 代码,我们从 96 作为参考点开始索引计数代码>a是97

下面给出的是 JavaScript 源代码。

function checkIfUniqueChars (str) {

    var checker = 0; // 32 or 64 bit integer variable 

    for (var i = 0; i< str.length; i++) {
        var index = str[i].charCodeAt(0) - 96;
        var bitRepresentationOfIndex = 1 << index;

        if ( (checker & bitRepresentationOfIndex) > 1) {
            console.log(str, false);
            return false;
        } else {
            checker = (checker | bitRepresentationOfIndex);
        }
    }
    console.log(str, true);
    return true;
}

checkIfUniqueChars("abcdefghi");  // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false

注意,在 JS 中,尽管整数是 64 位,但按位运算总是在 32 位上完成。

示例:
如果字符串是 aa 则:

// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000

i = 0

str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false

// So, we go for the '`OR`' operation.

checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001

i = 1

str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker= 00000000000000000000000000000001
a      = 00000000000000000000000000000001

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now

Simple Explanation (with JS code below)

  • An integer variable per machine code is a 32-bit array
  • All bit wise operations are 32-bit
  • They're agnostic of OS / CPU architecture or chosen number system of the language, e.g. DEC64 for JS.
  • This duplication finding approach is similar to storing characters in an array of size 32 where, we set 0th index if we find a in the string, 1st for b & so on.
  • A duplicate character in the string will have its corresponding bit occupied, or, in this case, set to 1.
  • Ivan has already explained: How this index calculation works in this previous answer.

Summary of operations:

  • Perform AND operation between checker & index of the character
  • Internally both are Int-32-Arrays
  • It is a bit-wise operation between these 2.
  • Check if the output of the operation was 1
  • if output == 1
    • The checker variable has that particular index-th bit set in both arrays
    • Thus it's a duplicate.
  • if output == 0
    • This character hasn't been found so far
    • Perform an OR operation between checker & index of the character
    • Thereby, updating the index-th bit to 1
    • Assign the output to checker

Assumptions:

  • We've assumed we'll get all lower case characters
  • And, that size 32 is enough
  • Hence, we began our index counting from 96 as reference point considering the ascii code for a is 97

Given below is the JavaScript source code.

function checkIfUniqueChars (str) {

    var checker = 0; // 32 or 64 bit integer variable 

    for (var i = 0; i< str.length; i++) {
        var index = str[i].charCodeAt(0) - 96;
        var bitRepresentationOfIndex = 1 << index;

        if ( (checker & bitRepresentationOfIndex) > 1) {
            console.log(str, false);
            return false;
        } else {
            checker = (checker | bitRepresentationOfIndex);
        }
    }
    console.log(str, true);
    return true;
}

checkIfUniqueChars("abcdefghi");  // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false

Note that in JS, despite integers being of 64 bits, a bit wise operation is always done on 32 bits.

Example:
If the string is aa then:

// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000

i = 0

str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false

// So, we go for the '`OR`' operation.

checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001

i = 1

str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker= 00000000000000000000000000000001
a      = 00000000000000000000000000000001

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now
撩发小公举 2025-01-09 15:08:53

让我们逐行分解代码。

int checker = 0; 我们正在启动一个检查器,它将帮助我们找到重复的值。

int val = str.charAt(i) - 'a'; 我们获取字符串第 'i' 个位置处的字符的 ASCII 值,并减去 'a' 的 ASCII 值'。由于假设字符串仅是低位字符,因此字符数限制为 26。因此,'val' 的值将始终 >= 0。

if ((checker & (1 0) return false;

checker |= (1 << val);

现在这是棘手的部分。让我们考虑一个带有字符串“abcda”的示例。理想情况下,这应该返回 false。

For 循环迭代 1:

检查器:00000000000000000000000000000000

val:97-97 = 0

1 << 0: 00000000000000000000000000000001

检查器 & (1 << val): 00000000000000000000000000000000 不> 0

因此检查器:00000000000000000000000000000001

For 循环迭代 2:

检查器:000000000000000000000000000000001

val:98-97 = 1

1 << 1: 00000000000000000000000000000010

检查器 & (1 << val): 00000000000000000000000000000000 不> 0

因此检查器:00000000000000000000000000000011

For 循环迭代 3:

检查器:00000000000000000000000000000011

val:99-97 = 2

1 << 2: 00000000000000000000000000000100

检查器 & (1 << val): 00000000000000000000000000000000 不> 0

因此检查器:00000000000000000000000000000111

For 循环迭代 4:

检查器:00000000000000000000000000000111

val:100-97 = 3

1 << 3:00000000000000000000000000001000

检查器& (1 << val): 00000000000000000000000000000000 不> 0

因此检查器:00000000000000000000000000001111

For 循环迭代 5:

检查器:00000000000000000000000000001111

val:97-97 = 0

1 << 0: 00000000000000000000000000000001

检查器 & (1<<val):00000000000000000000000000000001 是> 0

因此返回 false。

Lets break down the code line by line.

int checker = 0; We are initiating a checker which will help us find duplicate values.

int val = str.charAt(i) - 'a'; We are getting the ASCII value of the character at the 'i'th position of the string and subtracting it with the ASCII value of 'a'. Since the assumption is that the string is lower characters only, the number of characters in limited to 26. Hece, the value of 'val' will always be >= 0.

if ((checker & (1 << val)) > 0) return false;

checker |= (1 << val);

Now this is the tricky part. Lets us consider an example with string "abcda". This should ideally return false.

For loop iteration 1:

Checker: 00000000000000000000000000000000

val: 97-97 = 0

1 << 0: 00000000000000000000000000000001

checker & (1 << val): 00000000000000000000000000000000 is not > 0

Hence checker: 00000000000000000000000000000001

For loop iteration 2:

Checker: 00000000000000000000000000000001

val: 98-97 = 1

1 << 1: 00000000000000000000000000000010

checker & (1 << val): 00000000000000000000000000000000 is not > 0

Hence checker: 00000000000000000000000000000011

For loop iteration 3:

Checker: 00000000000000000000000000000011

val: 99-97 = 2

1 << 2: 00000000000000000000000000000100

checker & (1 << val): 00000000000000000000000000000000 is not > 0

Hence checker: 00000000000000000000000000000111

For loop iteration 4:

Checker: 00000000000000000000000000000111

val: 100-97 = 3

1 << 3: 00000000000000000000000000001000

checker & (1 << val): 00000000000000000000000000000000 is not > 0

Hence checker: 00000000000000000000000000001111

For loop iteration 5:

Checker: 00000000000000000000000000001111

val: 97-97 = 0

1 << 0: 00000000000000000000000000000001

checker & (1 << val): 00000000000000000000000000000001 is > 0

Hence return false.

锦上情书 2025-01-09 15:08:53
public static void main (String[] args)
{
    //In order to understand this algorithm, it is necessary to understand the following:

    //int checker = 0;
    //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
    //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with

    //int val = str.charAt(i) - 'a';
    //In order to understand what is going on here, we must realize that all characters have a numeric value
    for (int i = 0; i < 256; i++)
    {
        char val = (char)i;
        System.out.print(val);
    }

    //The output is something like:
    //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead

    //To only print the characters from 'a' on forward:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        //char val2 = val + 'a'; //incompatible types. required: char found: int
        int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
        char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
        System.out.print(val3);
    }

    //Notice how the following does not work:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        int val2 = val - 'a';
        char val3 = (char)val2;
        System.out.print(val3);
    }
    //I'm not sure why this spills out into 2 lines:
    //EDIT I cant seem to copy this into stackoverflow!

    System.out.println();
    System.out.println();

    //So back to our original algorithm:
    //int val = str.charAt(i) - 'a';
    //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems

    //if ((checker & (1 << val)) > 0) return false;
    //This line is quite a mouthful, lets break it down:
    System.out.println(0<<0);
    //00000000000000000000000000000000
    System.out.println(0<<1);
    //00000000000000000000000000000000
    System.out.println(0<<2);
    //00000000000000000000000000000000
    System.out.println(0<<3);
    //00000000000000000000000000000000
    System.out.println(1<<0);
    //00000000000000000000000000000001
    System.out.println(1<<1);
    //00000000000000000000000000000010 == 2
    System.out.println(1<<2);
    //00000000000000000000000000000100 == 4
    System.out.println(1<<3);
    //00000000000000000000000000001000 == 8
    System.out.println(2<<0);
    //00000000000000000000000000000010 == 2
    System.out.println(2<<1);
    //00000000000000000000000000000100 == 4
    System.out.println(2<<2);
    // == 8
    System.out.println(2<<3);
    // == 16
    System.out.println("3<<0 == "+(3<<0));
    // != 4 why 3???
    System.out.println(3<<1);
    //00000000000000000000000000000011 == 3
    //shift left by 1
    //00000000000000000000000000000110 == 6
    System.out.println(3<<2);
    //00000000000000000000000000000011 == 3
    //shift left by 2
    //00000000000000000000000000001100 == 12
    System.out.println(3<<3);
    // 24

    //It seems that the -  'a' is not necessary
    //Back to if ((checker & (1 << val)) > 0) return false;
    //(1 << val means we simply shift 1 by the numeric representation of the current character
    //the bitwise & works as such:
    System.out.println();
    System.out.println();
    System.out.println(0&0);    //0
    System.out.println(0&1);       //0
    System.out.println(0&2);          //0
    System.out.println();
    System.out.println();
    System.out.println(1&0);    //0
    System.out.println(1&1);       //1
    System.out.println(1&2);          //0
    System.out.println(1&3);             //1
    System.out.println();
    System.out.println();
    System.out.println(2&0);    //0
    System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
    System.out.println(2&2);          //2  0010 & 0010 == 2
    System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
    System.out.println();
    System.out.println();
    System.out.println(3&0);    //0    0011 & 0000 == 0
    System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
    System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
    System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
    System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!

    //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97

    //why is it that the result of bitwise & is > 0 means its a dupe?
    //lets see..

    //0011 & 0011 is 0011 means its a dupe
    //0000 & 0011 is 0000 means no dupe
    //0010 & 0001 is 0011 means its no dupe
    //hmm
    //only when it is all 0000 means its no dupe

    //so moving on:
    //checker |= (1 << val)
    //the |= needs exploring:

    int x = 0;
    int y = 1;
    int z = 2;
    int a = 3;
    int b = 4;
    System.out.println("x|=1 "+(x|=1));  //1
    System.out.println(x|=1);     //1
    System.out.println(x|=1);      //1
    System.out.println(x|=1);       //1
    System.out.println(x|=1);       //1
    System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
    System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
    System.out.println(y);  //should be 3?? 
    System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
    System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
    System.out.println(y|=3); //0011 |= 0011, still 3
    System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
    System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
    System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!

    //so the |= is just a bitwise OR!
}

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

public static boolean is_unique(String input)
{
    int using_int_as_32_flags = 0;
    for (int i=0; i < input.length(); i++)
    {
        int numeric_representation_of_char_at_i = input.charAt(i);
        int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
        int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
        boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
        if (already_bit_flagged)
            return false;
        using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
    }
    return true;
}
public static void main (String[] args)
{
    //In order to understand this algorithm, it is necessary to understand the following:

    //int checker = 0;
    //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
    //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with

    //int val = str.charAt(i) - 'a';
    //In order to understand what is going on here, we must realize that all characters have a numeric value
    for (int i = 0; i < 256; i++)
    {
        char val = (char)i;
        System.out.print(val);
    }

    //The output is something like:
    //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead

    //To only print the characters from 'a' on forward:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        //char val2 = val + 'a'; //incompatible types. required: char found: int
        int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
        char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
        System.out.print(val3);
    }

    //Notice how the following does not work:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        int val2 = val - 'a';
        char val3 = (char)val2;
        System.out.print(val3);
    }
    //I'm not sure why this spills out into 2 lines:
    //EDIT I cant seem to copy this into stackoverflow!

    System.out.println();
    System.out.println();

    //So back to our original algorithm:
    //int val = str.charAt(i) - 'a';
    //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems

    //if ((checker & (1 << val)) > 0) return false;
    //This line is quite a mouthful, lets break it down:
    System.out.println(0<<0);
    //00000000000000000000000000000000
    System.out.println(0<<1);
    //00000000000000000000000000000000
    System.out.println(0<<2);
    //00000000000000000000000000000000
    System.out.println(0<<3);
    //00000000000000000000000000000000
    System.out.println(1<<0);
    //00000000000000000000000000000001
    System.out.println(1<<1);
    //00000000000000000000000000000010 == 2
    System.out.println(1<<2);
    //00000000000000000000000000000100 == 4
    System.out.println(1<<3);
    //00000000000000000000000000001000 == 8
    System.out.println(2<<0);
    //00000000000000000000000000000010 == 2
    System.out.println(2<<1);
    //00000000000000000000000000000100 == 4
    System.out.println(2<<2);
    // == 8
    System.out.println(2<<3);
    // == 16
    System.out.println("3<<0 == "+(3<<0));
    // != 4 why 3???
    System.out.println(3<<1);
    //00000000000000000000000000000011 == 3
    //shift left by 1
    //00000000000000000000000000000110 == 6
    System.out.println(3<<2);
    //00000000000000000000000000000011 == 3
    //shift left by 2
    //00000000000000000000000000001100 == 12
    System.out.println(3<<3);
    // 24

    //It seems that the -  'a' is not necessary
    //Back to if ((checker & (1 << val)) > 0) return false;
    //(1 << val means we simply shift 1 by the numeric representation of the current character
    //the bitwise & works as such:
    System.out.println();
    System.out.println();
    System.out.println(0&0);    //0
    System.out.println(0&1);       //0
    System.out.println(0&2);          //0
    System.out.println();
    System.out.println();
    System.out.println(1&0);    //0
    System.out.println(1&1);       //1
    System.out.println(1&2);          //0
    System.out.println(1&3);             //1
    System.out.println();
    System.out.println();
    System.out.println(2&0);    //0
    System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
    System.out.println(2&2);          //2  0010 & 0010 == 2
    System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
    System.out.println();
    System.out.println();
    System.out.println(3&0);    //0    0011 & 0000 == 0
    System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
    System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
    System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
    System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!

    //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97

    //why is it that the result of bitwise & is > 0 means its a dupe?
    //lets see..

    //0011 & 0011 is 0011 means its a dupe
    //0000 & 0011 is 0000 means no dupe
    //0010 & 0001 is 0011 means its no dupe
    //hmm
    //only when it is all 0000 means its no dupe

    //so moving on:
    //checker |= (1 << val)
    //the |= needs exploring:

    int x = 0;
    int y = 1;
    int z = 2;
    int a = 3;
    int b = 4;
    System.out.println("x|=1 "+(x|=1));  //1
    System.out.println(x|=1);     //1
    System.out.println(x|=1);      //1
    System.out.println(x|=1);       //1
    System.out.println(x|=1);       //1
    System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
    System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
    System.out.println(y);  //should be 3?? 
    System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
    System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
    System.out.println(y|=3); //0011 |= 0011, still 3
    System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
    System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
    System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!

    //so the |= is just a bitwise OR!
}

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

public static boolean is_unique(String input)
{
    int using_int_as_32_flags = 0;
    for (int i=0; i < input.length(); i++)
    {
        int numeric_representation_of_char_at_i = input.charAt(i);
        int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
        int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
        boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
        if (already_bit_flagged)
            return false;
        using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
    }
    return true;
}
怀里藏娇 2025-01-09 15:08:53

以前的帖子很好地解释了代码块的作用,我想使用 BitSet java 数据结构添加我的简单解决方案:

private static String isUniqueCharsUsingBitSet(String string) {
  BitSet bitSet =new BitSet();
    for (int i = 0; i < string.length(); ++i) {
        int val = string.charAt(i);
        if(bitSet.get(val)) return "NO";
        bitSet.set(val);
    }
  return "YES";
}

Previous Posts explain well what the code block does and i want to add my simple Solution using the BitSet java Data structure :

private static String isUniqueCharsUsingBitSet(String string) {
  BitSet bitSet =new BitSet();
    for (int i = 0; i < string.length(); ++i) {
        int val = string.charAt(i);
        if(bitSet.get(val)) return "NO";
        bitSet.set(val);
    }
  return "YES";
}
兔小萌 2025-01-09 15:08:53
Line 1:   public static boolean isUniqueChars(String str) {
Line 2:      int checker = 0;
Line 3:      for (int i = 0; i < str.length(); ++i) {
Line 4:          int val = str.charAt(i) - 'a';
Line 5:          if ((checker & (1 << val)) > 0) return false;
Line 6:         checker |= (1 << val);
Line 7:      }
Line 8:      return true;
Line 9:   }

我理解使用 Javascript 的方式。假设输入 var inputChar = "abca"; // 查找 inputChar 是否包含所有唯一字符

让我们开始

第 4 行:int val = str.charAt(i) - 'a';

上面一行查找inputChar中第一个字符的二进制值,即a,ascii中的a = 97,然后将97转换为二进制为1100001 >.

在 Javascript 中,例如:"a".charCodeAt().toString(2) 返回 1100001

checker = 0 // 二进制 32 位表示 = 00000000000000000000000000

checker = 1100001 | checker; //checker 变为 1100001 (在 32 位表示中,它变为 000000000.....00001100001)

但我希望我的位掩码(int checker)只设置一位,但是 checker是 1100001

Line 4:          int val = str.charAt(i) - 'a';

现在上面的代码就派上用场了。我总是减去 97(a 的 ASCII 值)

val = 0; // 97 - 97  Which is  a - a
val = 1; // 98 - 97 Which is b - a
val = 1;  // 99 - 97 Which is c - a

让我们使用重置的 val

5 和 6 行很好解释了@Ivan 答案

Line 1:   public static boolean isUniqueChars(String str) {
Line 2:      int checker = 0;
Line 3:      for (int i = 0; i < str.length(); ++i) {
Line 4:          int val = str.charAt(i) - 'a';
Line 5:          if ((checker & (1 << val)) > 0) return false;
Line 6:         checker |= (1 << val);
Line 7:      }
Line 8:      return true;
Line 9:   }

The way I understood using Javascript. Assuming input var inputChar = "abca"; //find if inputChar has all unique characters

Lets Start

Line 4: int val = str.charAt(i) - 'a';

Above line Finds Binary value of first character in inputChar which is a, a = 97 in ascii, then convert 97 to binary becomes 1100001.

In Javascript Eg: "a".charCodeAt().toString(2) returns 1100001

checker = 0 // binary 32 bit representation = 0000000000000000000000000

checker = 1100001 | checker; //checker becomes 1100001 (In 32 bit representation it becomes 000000000.....00001100001)

But i want my bitmask (int checker) to set only one bit, but checker is 1100001

Line 4:          int val = str.charAt(i) - 'a';

Now above code comes handy. I just subtract 97 always (ASCII val of a)

val = 0; // 97 - 97  Which is  a - a
val = 1; // 98 - 97 Which is b - a
val = 1;  // 99 - 97 Which is c - a

Lets use val which is resetted

Line 5 and Line 6 is well explained @Ivan answer

烂人 2025-01-09 15:08:53

以防万一,如果有人正在使用位向量查找字符串中唯一字符的 kotlin 等效项

fun isUnique(str: String): Boolean {
    var checker = 0
    for (i in str.indices) {
        val bit = str.get(i) - 'a'
        if (checker.and(1 shl bit) > 0) return false
        checker = checker.or(1 shl bit)
    }
    return true
}

参考: https ://www.programiz.com/kotlin-programming/bitwise

Just in case if anyone is looking for kotlin equivalent of unique characters in a string using bit vector

fun isUnique(str: String): Boolean {
    var checker = 0
    for (i in str.indices) {
        val bit = str.get(i) - 'a'
        if (checker.and(1 shl bit) > 0) return false
        checker = checker.or(1 shl bit)
    }
    return true
}

Ref: https://www.programiz.com/kotlin-programming/bitwise

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文