在Java中迭代字符串的字符最简单/最好/最正确的方法是什么?

发布于 2024-07-06 18:32:09 字数 178 浏览 11 评论 0原文

在 Java 中迭代字符串的字符的一些方法是:

  1. 使用 StringTokenizer
  2. String 转换为 char[] 并对其进行迭代。

最简单/最好/最正确的迭代方法是什么?

Some ways to iterate through the characters of a string in Java are:

  1. Using StringTokenizer?
  2. Converting the String to a char[] and iterating over that.

What is the easiest/best/most correct way to iterate?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(17

只为一人 2024-07-13 18:32:10

详细阐述这个答案这个回答

上面的答案指出了这里许多解决方案的问题,这些解决方案不按代码点值进行迭代 - 他们会对任何 代理字符。 java 文档还在此处概述了该问题(参见“Unicode 字符表示”)。 无论如何,这里有一些代码使用补充 Unicode 集中的一些实际代理字符,并将它们重新转换为字符串。 请注意,.toChars() 返回一个字符数组:如果您正在处理代理,则必须有两个字符。 此代码应该适用于任何 Unicode 字符。

    String supplementary = "Some Supplementary: 

Elaborating on this answer and this answer.

Above answers point out the problem of many of the solutions here which don't iterate by code point value -- they would have trouble with any surrogate chars. The java docs also outline the issue here (see "Unicode Character Representations"). Anyhow, here's some code that uses some actual surrogate chars from the supplementary Unicode set, and converts them back to a String. Note that .toChars() returns an array of chars: if you're dealing with surrogates, you'll necessarily have two chars. This code should work for any Unicode character.

    String supplementary = "Some Supplementary: ????????????????";
    supplementary.codePoints().forEach(cp -> 
            System.out.print(new String(Character.toChars(cp))));
我恋#小黄人 2024-07-13 18:32:10

此示例代码将为您提供帮助!

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

public class Solution {
    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<String, Integer>();
        map.put("a", 10);
        map.put("b", 30);
        map.put("c", 50);
        map.put("d", 40);
        map.put("e", 20);
        System.out.println(map);

        Map sortedMap = sortByValue(map);
        System.out.println(sortedMap);
    }

    public static Map sortByValue(Map unsortedMap) {
        Map sortedMap = new TreeMap(new ValueComparator(unsortedMap));
        sortedMap.putAll(unsortedMap);
        return sortedMap;
    }

}

class ValueComparator implements Comparator {
    Map map;

    public ValueComparator(Map map) {
        this.map = map;
    }

    public int compare(Object keyA, Object keyB) {
        Comparable valueA = (Comparable) map.get(keyA);
        Comparable valueB = (Comparable) map.get(keyB);
        return valueB.compareTo(valueA);
    }
}

This Example Code will Help you out!

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

public class Solution {
    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<String, Integer>();
        map.put("a", 10);
        map.put("b", 30);
        map.put("c", 50);
        map.put("d", 40);
        map.put("e", 20);
        System.out.println(map);

        Map sortedMap = sortByValue(map);
        System.out.println(sortedMap);
    }

    public static Map sortByValue(Map unsortedMap) {
        Map sortedMap = new TreeMap(new ValueComparator(unsortedMap));
        sortedMap.putAll(unsortedMap);
        return sortedMap;
    }

}

class ValueComparator implements Comparator {
    Map map;

    public ValueComparator(Map map) {
        this.map = map;
    }

    public int compare(Object keyA, Object keyB) {
        Comparable valueA = (Comparable) map.get(keyA);
        Comparable valueB = (Comparable) map.get(keyB);
        return valueB.compareTo(valueA);
    }
}
乱了心跳 2024-07-13 18:32:10

因此,通常有两种方法可以在java中迭代字符串,这已经被本线程中的多人回答了,只需添加我的版本即可
第一个是使用

String s = sc.next() // assuming scanner class is defined above
for(int i=0; i<s.length(); i++){
     s.charAt(i)   // This being the first way and is a constant time operation will hardly add any overhead
  }

char[] str = new char[10];
str = s.toCharArray() // this is another way of doing so and it takes O(n) amount of time for copying contents from your string class to the character array

如果性能受到威胁,那么我会建议在恒定时间内使用第一个,如果不是,那么考虑到 java 中字符串类的不变性,使用第二个会让你的工作更容易。

So typically there are two ways to iterate through string in java which has already been answered by multiple people here in this thread, just adding my version of it
First is using

String s = sc.next() // assuming scanner class is defined above
for(int i=0; i<s.length(); i++){
     s.charAt(i)   // This being the first way and is a constant time operation will hardly add any overhead
  }

char[] str = new char[10];
str = s.toCharArray() // this is another way of doing so and it takes O(n) amount of time for copying contents from your string class to the character array

If performance is at stake then I will recommend using the first one in constant time, if it is not then going with the second one makes your work easier considering the immutability with string classes in java.

强者自强 2024-07-13 18:32:10

如果您需要将所有字符一一作为字符串,您可以使用以下命令:

String text = "text";
for(String s: text.split("")) {

}

If you need all characters one by one as String you can use this:

String text = "text";
for(String s: text.split("")) {

}
寻找一个思念的角度 2024-07-13 18:32:09

我使用 for 循环来迭代字符串,并使用 charAt() 来获取每个字符来检查它。 由于 String 是用数组实现的,因此 charAt() 方法是一个恒定时间操作。

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
    char c = s.charAt(i);        
    //Process char
}

这就是我会做的。 这对我来说似乎是最简单的。

就正确性而言,我不相信这里存在这种情况。 这完全取决于您的个人风格。

I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
    char c = s.charAt(i);        
    //Process char
}

That's what I would do. It seems the easiest to me.

As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

Spring初心 2024-07-13 18:32:09

两个选项

for(int i = 0, n = s.length() ; i < n ; i++) { 
    char c = s.charAt(i); 
}

for(char c : s.toCharArray()) {
    // process c
}

第一个可能更快,然后第二个可能更具可读性。

Two options

for(int i = 0, n = s.length() ; i < n ; i++) { 
    char c = s.charAt(i); 
}

or

for(char c : s.toCharArray()) {
    // process c
}

The first is probably faster, then 2nd is probably more readable.

随遇而安 2024-07-13 18:32:09

请注意,如果您处理的是 BMP 之外的字符(Unicode ,则此处描述的大多数其他技术都会失败。基本多语言平面),即 u0000-uFFFF 范围之外的代码点 。 这种情况很少发生,因为此之外的代码点大多分配给死语言。 但除此之外还有一些有用的字符,例如一些用于数学符号的代码点,以及一些用于编码中文专有名称的代码点。

在这种情况下,您的代码将是:

String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
  int curChar = str.codePointAt(offset);
  offset += Character.charCount(curChar);
  // do something with curChar
}

Character.charCount(int) 方法需要 Java 5+。

来源:http://mindprod.com/jgloss/codepoint.html

Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code points that are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.

In that case your code will be:

String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
  int curChar = str.codePointAt(offset);
  offset += Character.charCount(curChar);
  // do something with curChar
}

The Character.charCount(int) method requires Java 5+.

Source: http://mindprod.com/jgloss/codepoint.html

且行且努力 2024-07-13 18:32:09

在 Java 8 中,我们可以这样解决:

String str = "xyz";
str.chars().forEachOrdered(i -> System.out.print((char)i));
str.codePoints().forEachOrdered(i -> System.out.print((char)i));

chars() 方法返回一个 IntStream,如 doc

返回一个 int 流,从这里零扩展 char 值
顺序。 映射到代理代码点的任何字符都会被传递
通过未解释。 如果序列在流发生变化时发生变化
正在读取,结果未定义。

根据文档,方法 codePoints() 还返回一个 IntStream

返回此序列中的代码点值流。 任何
序列中遇到的代理对被组合起来,就像通过
Character.toCodePoint 并将结果传递到流。 任何
其他代码单元,包括普通 BMP 字符,不成对
代理项和未定义的代码单元被零扩展为 int 值
然后将其传递到流。

字符和代码点有何不同?这篇文章:

Unicode 3.1 添加了增补字符,使总数增加
字符数超过 2^16 = 65536 个字符
由单个 16 位char 来区分。 因此,char 值不是
更长具有到基本语义单元的一对一映射
统一码。 JDK 5 已更新以支持更大的字符集
价值观。 一些内容不是更改 char 类型的定义,而是
新的补充字符由代理对表示
两个 char 值。 为了减少命名混乱,代码点将是
用于引用代表特定 Unicode 的数字
字符,包括补充字符。

最后为什么 forEachOrdered 而不是 forEach

forEach 的行为明显是不确定的,而 forEachOrdered< /code> 如果流具有已定义的遇到顺序,则按流的遇到顺序对此流的每个元素执行操作。 所以 forEach 不保证顺序会被保留。 另请检查此问题了解更多信息。

对于字符、代码点、字形和字形之间的差异,请检查此问题

In Java 8 we can solve it as:

String str = "xyz";
str.chars().forEachOrdered(i -> System.out.print((char)i));
str.codePoints().forEachOrdered(i -> System.out.print((char)i));

The method chars() returns an IntStream as mentioned in doc:

Returns a stream of int zero-extending the char values from this
sequence. Any char which maps to a surrogate code point is passed
through uninterpreted. If the sequence is mutated while the stream is
being read, the result is undefined.

The method codePoints() also returns an IntStream as per doc:

Returns a stream of code point values from this sequence. Any
surrogate pairs encountered in the sequence are combined as if by
Character.toCodePoint and the result is passed to the stream. Any
other code units, including ordinary BMP characters, unpaired
surrogates, and undefined code units, are zero-extended to int values
which are then passed to the stream.

How is char and code point different? As mentioned in this article:

Unicode 3.1 added supplementary characters, bringing the total number
of characters to more than the 2^16 = 65536 characters that can be
distinguished by a single 16-bit char. Therefore, a char value no
longer has a one-to-one mapping to the fundamental semantic unit in
Unicode. JDK 5 was updated to support the larger set of character
values. Instead of changing the definition of the char type, some of
the new supplementary characters are represented by a surrogate pair
of two char values. To reduce naming confusion, a code point will be
used to refer to the number that represents a particular Unicode
character, including supplementary ones.

Finally why forEachOrdered and not forEach ?

The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.

For difference between a character, a code point, a glyph and a grapheme check this question.

小瓶盖 2024-07-13 18:32:09

我同意 StringTokenizer 在这里太过分了。 事实上,我尝试了上面的建议并花了时间。

我的测试相当简单:创建一个包含大约一百万个字符的 StringBuilder,将其转换为 String,并在转换为 char 数组后/使用 CharacterIterator 一千次后使用 charAt() 遍历每个字符串(当然要确保对字符串做一些事情,这样编译器就无法优化整个循环:-))。

我的 2.6 GHz Powerbook(这是一台 mac :-) )和 JDK 1.5 上的结果:

  • 测试 1:charAt + String --> 3138msec
  • 测试 2:字符串转换为数组 --> 9568msec
  • 测试 3:StringBuilder charAt --> 3536msec
  • 测试 4:CharacterIterator 和 String --> 12151msec

由于结果明显不同,最直接的方法似乎也是最快的方法。 有趣的是,StringBuilder 的 charAt() 似乎比 String 的 charAt() 稍微慢一些。

顺便说一句,我建议不要使用CharacterIterator,因为我认为它滥用“\uFFFF”字符作为“迭代结束”,这是一个非常糟糕的黑客行为。 在大型项目中,总是有两个人使用同一种黑客技术来达到两个不同的目的,并且代码会非常神秘地崩溃。

这是其中一项测试:

    int count = 1000;
    ...

    System.out.println("Test 1: charAt + String");
    long t = System.currentTimeMillis();
    int sum=0;
    for (int i=0; i<count; i++) {
        int len = str.length();
        for (int j=0; j<len; j++) {
            if (str.charAt(j) == 'b')
                sum = sum + 1;
        }
    }
    t = System.currentTimeMillis()-t;
    System.out.println("result: "+ sum + " after " + t + "msec");

I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.

My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).

The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:

  • Test 1: charAt + String --> 3138msec
  • Test 2: String converted to array --> 9568msec
  • Test 3: StringBuilder charAt --> 3536msec
  • Test 4: CharacterIterator and String --> 12151msec

As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.

BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.

Here's one of the tests:

    int count = 1000;
    ...

    System.out.println("Test 1: charAt + String");
    long t = System.currentTimeMillis();
    int sum=0;
    for (int i=0; i<count; i++) {
        int len = str.length();
        for (int j=0; j<len; j++) {
            if (str.charAt(j) == 'b')
                sum = sum + 1;
        }
    }
    t = System.currentTimeMillis()-t;
    System.out.println("result: "+ sum + " after " + t + "msec");
星光不落少年眉 2024-07-13 18:32:09

有一些专门的类用于此目的:

import java.text.*;

final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
   // process c
   ...
}

There are some dedicated classes for this:

import java.text.*;

final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
   // process c
   ...
}
夜无邪 2024-07-13 18:32:09

如果您的类路径上有 Guava,那么以下是一个非常可读的替代方案。 对于这种情况,Guava 甚至有一个相当合理的自定义 List 实现,因此这应该不会效率低下。

for(char c : Lists.charactersOf(yourString)) {
    // Do whatever you want     
}

更新:正如 @Alex 指出的,Java 8 还有 CharSequence#chars 使用。 即使类型是 IntStream,因此它可以映射到字符,例如:

yourString.chars()
        .mapToObj(c -> Character.valueOf((char) c))
        .forEach(c -> System.out.println(c)); // Or whatever you want

If you have Guava on your classpath, the following is a pretty readable alternative. Guava even has a fairly sensible custom List implementation for this case, so this shouldn't be inefficient.

for(char c : Lists.charactersOf(yourString)) {
    // Do whatever you want     
}

UPDATE: As @Alex noted, with Java 8 there's also CharSequence#chars to use. Even the type is IntStream, so it can be mapped to chars like:

yourString.chars()
        .mapToObj(c -> Character.valueOf((char) c))
        .forEach(c -> System.out.println(c)); // Or whatever you want
雨巷深深 2024-07-13 18:32:09

如果您需要迭代 String 的代码点(请参阅此答案),请使用更短的/ 更具可读性的方法是使用 < Java 8中添加的code>CharSequence#codePoints方法:

for(int c : string.codePoints().toArray()){
    ...
}

或者直接使用流而不是for循环:

string.codePoints().forEach(c -> ...);

还有CharSequence#chars 如果您想要字符流(尽管它是一个 IntStream,因为没有 CharStream)。

If you need to iterate through the code points of a String (see this answer) a shorter / more readable way is to use the CharSequence#codePoints method added in Java 8:

for(int c : string.codePoints().toArray()){
    ...
}

or using the stream directly instead of a for loop:

string.codePoints().forEach(c -> ...);

There is also CharSequence#chars if you want a stream of the characters (although it is an IntStream, since there is no CharStream).

盗心人 2024-07-13 18:32:09

如果您需要性能,那么您必须在您的环境中进行测试。 别无退路。

这里的示例代码:

int tmp = 0;
String s = new String(new byte[64*1024]);
{
    long st = System.nanoTime();
    for(int i = 0, n = s.length(); i < n; i++) {
        tmp += s.charAt(i);
    }
    st = System.nanoTime() - st;
    System.out.println("1 " + st);
}

{
    long st = System.nanoTime();
    char[] ch = s.toCharArray();
    for(int i = 0, n = ch.length; i < n; i++) {
        tmp += ch[i];
    }
    st = System.nanoTime() - st;
    System.out.println("2 " + st);
}
{
    long st = System.nanoTime();
    for(char c : s.toCharArray()) {
        tmp += c;
    }
    st = System.nanoTime() - st;
    System.out.println("3 " + st);
}
System.out.println("" + tmp);

Java online 我得到:

1 10349420
2 526130
3 484200
0

在 Android x86 API 17 上我得到:

1 9122107
2 13486911
3 12700778
0

If you need performance, then you must test on your environment. No other way.

Here example code:

int tmp = 0;
String s = new String(new byte[64*1024]);
{
    long st = System.nanoTime();
    for(int i = 0, n = s.length(); i < n; i++) {
        tmp += s.charAt(i);
    }
    st = System.nanoTime() - st;
    System.out.println("1 " + st);
}

{
    long st = System.nanoTime();
    char[] ch = s.toCharArray();
    for(int i = 0, n = ch.length; i < n; i++) {
        tmp += ch[i];
    }
    st = System.nanoTime() - st;
    System.out.println("2 " + st);
}
{
    long st = System.nanoTime();
    for(char c : s.toCharArray()) {
        tmp += c;
    }
    st = System.nanoTime() - st;
    System.out.println("3 " + st);
}
System.out.println("" + tmp);

On Java online I get:

1 10349420
2 526130
3 484200
0

On Android x86 API 17 I get:

1 9122107
2 13486911
3 12700778
0
〃安静 2024-07-13 18:32:09

我不会使用 StringTokenizer,因为它是 JDK 中遗留的类之一。

javadoc 说:

StringTokenizer 是一个遗留类,
出于兼容性原因保留
尽管在新版本中不鼓励使用它
代码。 建议任何人
寻求此功能使用
String 的 split 方法或
改为 java.util.regex 包。

I wouldn't use StringTokenizer as it is one of classes in the JDK that's legacy.

The javadoc says:

StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.

指尖上的星空 2024-07-13 18:32:09
public class Main {

public static void main(String[] args) {
     String myStr = "Hello";
     String myStr2 = "World";
      
     for (int i = 0; i < myStr.length(); i++) {    
            char result = myStr.charAt(i);
                 System.out.println(result);
     } 
        
     for (int i = 0; i < myStr2.length(); i++) {    
            char result = myStr2.charAt(i);
                 System.out.print(result);              
     }    
   }
}

输出:

H
e
l
l
o
World
public class Main {

public static void main(String[] args) {
     String myStr = "Hello";
     String myStr2 = "World";
      
     for (int i = 0; i < myStr.length(); i++) {    
            char result = myStr.charAt(i);
                 System.out.println(result);
     } 
        
     for (int i = 0; i < myStr2.length(); i++) {    
            char result = myStr2.charAt(i);
                 System.out.print(result);              
     }    
   }
}

Output:

H
e
l
l
o
World
再可℃爱ぅ一点好了 2024-07-13 18:32:09

请参阅Java 教程:字符串

public class StringDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";
        int len = palindrome.length();
        char[] tempCharArray = new char[len];
        char[] charArray = new char[len];

        // put original string in an array of chars
        for (int i = 0; i < len; i++) {
            tempCharArray[i] = palindrome.charAt(i);
        } 

        // reverse array of chars
        for (int j = 0; j < len; j++) {
            charArray[j] = tempCharArray[len - 1 - j];
        }

        String reversePalindrome =  new String(charArray);
        System.out.println(reversePalindrome);
    }
}

将长度放入int len并使用for循环。

See The Java Tutorials: Strings.

public class StringDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";
        int len = palindrome.length();
        char[] tempCharArray = new char[len];
        char[] charArray = new char[len];

        // put original string in an array of chars
        for (int i = 0; i < len; i++) {
            tempCharArray[i] = palindrome.charAt(i);
        } 

        // reverse array of chars
        for (int j = 0; j < len; j++) {
            charArray[j] = tempCharArray[len - 1 - j];
        }

        String reversePalindrome =  new String(charArray);
        System.out.println(reversePalindrome);
    }
}

Put the length into int len and use for loop.

少女情怀诗 2024-07-13 18:32:09

StringTokenizer 完全不适合将字符串分解为各个字符的任务。 使用 String#split() ,您可以通过使用不匹配任何内容的正则表达式来轻松做到这一点,例如:

String[] theChars = str.split("|");

但是 StringTokenizer 不使用正则表达式,并且您无法指定与之间的任何内容都不匹配的分隔符字符串人物。 有一个可爱的小技巧可以用来完成同样的事情:使用字符串本身作为分隔符字符串(使其中的每个字符都成为分隔符)并让它返回分隔符:

StringTokenizer st = new StringTokenizer(str, str, true);

但是,我只提及这些选项是为了消除它们。 这两种技术都将原始字符串分解为单字符字符串而不是 char 基元,并且都以对象创建和字符串操作的形式涉及大量开销。 与在 for 循环中调用 charAt() 相比,后者几乎不会产生任何开销。

StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:

String[] theChars = str.split("|");

But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:

StringTokenizer st = new StringTokenizer(str, str, true);

However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文