在Java中检测字符串中重复字符的最有效方法是什么?
使用数据结构(HashMap)我能够做到这一点。
这是代码:
import java.util.*;
class unique{
public static void main(String[] args){
HashMap<Character, Integer> charMap = new HashMap<Character, Integer>();
boolean isNotUnique = false;
for ( char loop : args[0].toCharArray() ){
Integer freq = charMap.get( loop );
charMap.put( loop, ( freq == null )? 1 : freq+1 );
if ( charMap.get( loop ) > 1 )
{
isNotUnique = true;
}
}
System.out.println ( isNotUnique );
}
}
没有数据结构,我想出了一个生硬的方法。 这有 O(n^2)
class unique
{
public static void main(String[] args)
{
String inputString = args[0];
System.out.println( isUnique( inputString ) );
}
private static boolean isUnique(String inputString) {
String methodString = inputString;
for ( int i = 0; i < inputString.length(); i++ )
{
for ( int j = i+1; j < inputString.length(); j++ )
{
if ( methodString.charAt( i ) == methodString.charAt( j ) )
{
return false;
}
}
}
return true;
}
}
我想知道是否可以在 O(n) 时间复杂度内解决
Using data structures (HashMap) I was able to do it.
This is the code:
import java.util.*;
class unique{
public static void main(String[] args){
HashMap<Character, Integer> charMap = new HashMap<Character, Integer>();
boolean isNotUnique = false;
for ( char loop : args[0].toCharArray() ){
Integer freq = charMap.get( loop );
charMap.put( loop, ( freq == null )? 1 : freq+1 );
if ( charMap.get( loop ) > 1 )
{
isNotUnique = true;
}
}
System.out.println ( isNotUnique );
}
}
Without datastructures, I came up with a blunt approach.
This has O(n^2)
class unique
{
public static void main(String[] args)
{
String inputString = args[0];
System.out.println( isUnique( inputString ) );
}
private static boolean isUnique(String inputString) {
String methodString = inputString;
for ( int i = 0; i < inputString.length(); i++ )
{
for ( int j = i+1; j < inputString.length(); j++ )
{
if ( methodString.charAt( i ) == methodString.charAt( j ) )
{
return false;
}
}
}
return true;
}
}
I was wondering if it was possible to solve in O(n) time complexity
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您需要支持不由代理字符对表示的 Unicode 字符,可以这样做:
它使用位翻转来节省内存。从本质上讲,这与使用布尔数组是一样的:
如果您只需要支持 ASCII 字符,则可以在任何一种情况下限制
used
的大小,以减少所需的内存(因此long [4]
和boolean[256]
)。在inputString
的特定长度以下,执行 n^2 检查可能比为此分配内存更快。因此,理想情况下,您可以根据长度将两者结合起来。如果您需要支持所有可能的 Unicode 字符,则必须修改它以支持代理字符对。您可以使用
Character.isHighSurrogate(c)
检测它们。请参阅此页面了解一些帮助并搜索 Google 了解更多详细信息。If you need to support Unicode characters that aren't represented by surrogate char pairs, this will do it:
It's using bit flips to save memory. It's essentially the same thing as if you used an array of booleans:
If you only need to support ASCII characters you could limit the size of
used
in either case to reduce the memory required (solong[4]
andboolean[256]
). Below a certain length ofinputString
it's probably faster to do the n^2 check than allocate the memory for this though. So ideally you do a combination of the two based on the length.If you need to support all possible Unicode characters you'll have to modify this to support surrogate char pairs. You can detect them with
Character.isHighSurrogate(c)
. See this page for some help and search Google for more details.性格的定义是什么? az AZ 还是全部 unicode?
如果字符串的长度很大,比如一百万,你可以构建一个int数组,数组的长度是你的字符集的长度,并且数组将被初始化为零。
之后,根据每个字符遍历字符串:array[(int)char]++,这样,您就可以轻松地从数组中找到字符出现的时间。
然而,短字符串不需要这样的方法。
该方法的复杂度为 O(n)
What is the definition of character? a-z A-Z or all unicode?
if the length of string is quite big, such as one million, you could build an int array, the length of array is the length of you character set and the array would be initialized with zero.
after this, go through the string, according to each charactor : array[(int)char]++, so that, u can easily find the time of character appearing from the array.
however, short string won't need such method.
this method is O(n)
有两个简单的解决方案,时间复杂度为 O(N):
HashSet方法在时间上为
O(N)
,在空间上为O(N)
,其中N
是字符串长度。 (包含N
个不同字符的常规 JavaHashSet
将占用O(N)
空间,并且具有相对较大的比例常数。)位数组方法在时间上为
O(N)
,在空间上为O(1)
,但O(1)
为 8K 字节(或 64K 字节(如果使用boolean[]
),并且将添加到时间中的这么多内存归零会产生相关成本。这些都不是所有情况下的最佳答案。
对于足够小的
N
,简单的O(N^2)
嵌套循环将是最快的。 (并且它不使用额外的内存。)对于中等大小的
N
,在冲突时使用重新哈希的自定义哈希表将比HashSet
或位数组更好方法。HashSet
方法比位数组方法更好。对于足够大的
N
,位数组方法将是最快的并且使用最少的内存。(对于上述内容,我假设输入字符串不包含任何重复字符。如果包含任何重复字符,则
N
的实际值将小于字符串长度。)如果重复字符检测需要处理 UTF-16 代理对,那么简单的方法是即时转码为 Unicode 代码点,并更改数据结构以使用
HashSet
、更大的位数组等。相反,如果可以限制输入字符集的大小,则可以减小位数组的大小等。
这些调整将对小/中/大阈值可能下降的位置产生很大影响。
There are two simple solutions that are
O(N)
in time:The
HashSet
approach isO(N)
in time andO(N)
in space, whereN
is the String length. (A regular JavaHashSet
containingN
distinct characters will takeO(N)
space, with a relatively large constant of proportionality.)The bit array approach is
O(N)
in time andO(1)
in space, but theO(1)
is 8K bytes (or 64K bytes if you use aboolean[]
) and there is an associated cost of zeroing that much memory added to the time.Neither of these is the best answer in all cases.
For small enough
N
, a simpleO(N^2)
nested loop will be be fastest. (And it uses no extra memory.)For medium-sized
N
, a custom hash table that uses rehashing on collision will be better thanHashSet
or the bit array approach. TheHashSet
approach will be better than the bit array approach.For large enough
N
, the bit array approach will be fastest and use the least memory.(For the above, I'm assuming that input strings don't contain any duplicate characters. If they do, then the actual value of
N
will be less than the string length.)If duplicate character detection needs to cope with UTF-16 surrogate pairs, then the simple approach is to transcode on the fly to Unicode codepoints, and change the data structures to use a
HashSet<Integer>
, bigger bit arrays, etc.Conversely, if you can limit the size of the input character set, you can reduce the size of the bit arrays, etc.
These tweaks would make a big difference to where the small / medium / large thresholds are likely to fall.