在Java中检测字符串中重复字符的最有效方法是什么？

发布于 2024-10-27 07:55:13 字数 1333 浏览 1 评论 0原文

使用数据结构（HashMap）我能够做到这一点。

这是代码：

import java.util.*;

class unique{
    public static void main(String[] args){
        HashMap<Character, Integer> charMap = new HashMap<Character, Integer>();
        boolean isNotUnique = false;
            for ( char loop : args[0].toCharArray() ){
            Integer freq = charMap.get( loop );
            charMap.put( loop, ( freq == null )? 1 : freq+1 );
            if ( charMap.get( loop ) > 1 )
            {
                isNotUnique = true;
            }
        }
            System.out.println ( isNotUnique );
    }
}

没有数据结构，我想出了一个生硬的方法。这有 O(n^2)

class unique
{
    public static void main(String[] args)
    {
        String inputString = args[0];
        System.out.println( isUnique( inputString ) );

    }

    private static boolean isUnique(String inputString) {
        String methodString = inputString;
        for ( int i = 0; i < inputString.length(); i++ )
        {
            for ( int j = i+1; j < inputString.length(); j++ )
            {
                if ( methodString.charAt( i ) == methodString.charAt( j ) )
                {
                    return false;
                }
            }
        }
        return true;
    }
}

我想知道是否可以在 O(n) 时间复杂度内解决

原文

Using data structures (HashMap) I was able to do it.

This is the code:

import java.util.*;

class unique{
    public static void main(String[] args){
        HashMap<Character, Integer> charMap = new HashMap<Character, Integer>();
        boolean isNotUnique = false;
            for ( char loop : args[0].toCharArray() ){
            Integer freq = charMap.get( loop );
            charMap.put( loop, ( freq == null )? 1 : freq+1 );
            if ( charMap.get( loop ) > 1 )
            {
                isNotUnique = true;
            }
        }
            System.out.println ( isNotUnique );
    }
}

Without datastructures, I came up with a blunt approach.
This has O(n^2)

class unique
{
    public static void main(String[] args)
    {
        String inputString = args[0];
        System.out.println( isUnique( inputString ) );

    }

    private static boolean isUnique(String inputString) {
        String methodString = inputString;
        for ( int i = 0; i < inputString.length(); i++ )
        {
            for ( int j = i+1; j < inputString.length(); j++ )
            {
                if ( methodString.charAt( i ) == methodString.charAt( j ) )
                {
                    return false;
                }
            }
        }
        return true;
    }
}

I was wondering if it was possible to solve in O(n) time complexity

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

金兰素衣 2024-11-03 07:55:13

如果您需要支持不由代理字符对表示的 Unicode 字符，可以这样做：

private static boolean isUnique(String inputString) {
    long[] used = new long[1024];
    for (char c : inputString.toCharArray()) {
        if ((used[c >>> 6] & (1 << c)) > 0) {
            return false;
        }
        used[c >>> 6] |= 1 << c;
    }
    return true;
}

它使用位翻转来节省内存。从本质上讲，这与使用布尔数组是一样的：

private static boolean isUnique2(String inputString) {
    boolean[] used = new boolean[65536];
    for (char c : inputString.toCharArray()) {
        if (used[c]) {
            return false;
        }
        used[c] = true;
    }
    return true;
}

如果您只需要支持 ASCII 字符，则可以在任何一种情况下限制 used 的大小，以减少所需的内存（因此 long [4] 和 boolean[256]）。在 inputString 的特定长度以下，执行 n^2 检查可能比为此分配内存更快。因此，理想情况下，您可以根据长度将两者结合起来。

如果您需要支持所有可能的 Unicode 字符，则必须修改它以支持代理字符对。您可以使用 Character.isHighSurrogate(c) 检测它们。请参阅此页面了解一些帮助并搜索 Google 了解更多详细信息。

If you need to support Unicode characters that aren't represented by surrogate char pairs, this will do it:

private static boolean isUnique(String inputString) {
    long[] used = new long[1024];
    for (char c : inputString.toCharArray()) {
        if ((used[c >>> 6] & (1 << c)) > 0) {
            return false;
        }
        used[c >>> 6] |= 1 << c;
    }
    return true;
}

It's using bit flips to save memory. It's essentially the same thing as if you used an array of booleans:

private static boolean isUnique2(String inputString) {
    boolean[] used = new boolean[65536];
    for (char c : inputString.toCharArray()) {
        if (used[c]) {
            return false;
        }
        used[c] = true;
    }
    return true;
}

If you only need to support ASCII characters you could limit the size of used in either case to reduce the memory required (so long[4] and boolean[256]). Below a certain length of inputString it's probably faster to do the n^2 check than allocate the memory for this though. So ideally you do a combination of the two based on the length.

If you need to support all possible Unicode characters you'll have to modify this to support surrogate char pairs. You can detect them with Character.isHighSurrogate(c). See this page for some help and search Google for more details.

回复收藏 0 原文

素手挽清风 2024-11-03 07:55:13

性格的定义是什么？ az AZ 还是全部 unicode？

如果字符串的长度很大，比如一百万，你可以构建一个int数组，数组的长度是你的字符集的长度，并且数组将被初始化为零。

之后，根据每个字符遍历字符串：array[(int)char]++，这样，您就可以轻松地从数组中找到字符出现的时间。

然而，短字符串不需要这样的方法。

该方法的复杂度为 O(n)

回复收藏 0 原文

爱的那么颓废 2024-11-03 07:55:13

我想知道是否可以在 O(n) 时间复杂度内解决：

有两个简单的解决方案，时间复杂度为 O(N)：

HashSet方法在时间上为 O(N)，在空间上为 O(N)，其中 N 是字符串长度。（包含 N 个不同字符的常规 Java HashSet 将占用 O(N) 空间，并且具有相对较大的比例常数。）
位数组方法在时间上为 O(N)，在空间上为 O(1)，但 O(1) 为 8K 字节（或 64K 字节（如果使用 boolean[]），并且将添加到时间中的这么多内存归零会产生相关成本。