在Java中将句子字符串转换为单词的字符串数组

发布于 2024-10-11 14:09:15 字数 280 浏览 2 评论 0原文

我需要我的 Java 程序采用如下字符串:

"This is a sample sentence."

并将其转换为字符串数组,如:

{"this","is","a","sample","sentence"}

没有句点或标点符号(最好)。顺便说一句,字符串输入始终是一个句子。

有没有一种我没有看到的简单方法可以做到这一点?或者我们真的必须大量搜索空格并从空格之间的区域(即单词)创建新字符串吗?

I need my Java program to take a string like:

"This is a sample sentence."

and turn it into a string array like:

{"this","is","a","sample","sentence"}

No periods, or punctuation (preferably). By the way, the string input is always one sentence.

Is there an easy way to do this that I'm not seeing? Or do we really have to search for spaces a lot and create new strings from the areas between the spaces (which are words)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

居里长安 2024-10-18 14:09:15

String.split( ) 会做你想做的大部分事情。然后,您可能需要循环遍历单词以提取所有标点符号。

例如:

String s = "This is a sample sentence.";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
    // You may want to check for a non-word character before blindly
    // performing a replacement
    // It may also be necessary to adjust the character class
    words[i] = words[i].replaceAll("[^\\w]", "");
}

String.split() will do most of what you want. You may then need to loop over the words to pull out any punctuation.

For example:

String s = "This is a sample sentence.";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
    // You may want to check for a non-word character before blindly
    // performing a replacement
    // It may also be necessary to adjust the character class
    words[i] = words[i].replaceAll("[^\\w]", "");
}
陌路终见情 2024-10-18 14:09:15

现在,只需使用 split 即可完成,因为它需要正则表达式:

String s = "This is a sample sentence with []s.";
String[] words = s.split("\\W+");

这将给出单词:{"this","is","a","sample","sentence" , "s"}

\\W+ 将匹配出现一次或多次的所有非字母字符。所以没有必要更换。您也可以检查其他模式。

Now, this can be accomplished just with split as it takes regex:

String s = "This is a sample sentence with []s.";
String[] words = s.split("\\W+");

this will give words as: {"this","is","a","sample","sentence", "s"}

The \\W+ will match all non-alphabetic characters occurring one or more times. So there is no need to replace. You can check other patterns also.

漆黑的白昼 2024-10-18 14:09:15

您可以使用 BreakIterator .getWordInstance 查找字符串中的所有单词。

public static List<String> getWords(String text) {
    List<String> words = new ArrayList<String>();
    BreakIterator breakIterator = BreakIterator.getWordInstance();
    breakIterator.setText(text);
    int lastIndex = breakIterator.first();
    while (BreakIterator.DONE != lastIndex) {
        int firstIndex = lastIndex;
        lastIndex = breakIterator.next();
        if (lastIndex != BreakIterator.DONE && Character.isLetterOrDigit(text.charAt(firstIndex))) {
            words.add(text.substring(firstIndex, lastIndex));
        }
    }

    return words;
}

测试:

public static void main(String[] args) {
    System.out.println(getWords("A PT CR M0RT BOUSG SABN NTE TR/GB/(G) = RAND(MIN(XXX, YY + ABC))"));
}

输出:

[A, PT, CR, M0RT, BOUSG, SABN, NTE, TR, GB, G, RAND, MIN, XXX, YY, ABC]

You can use BreakIterator.getWordInstance to find all words in a string.

public static List<String> getWords(String text) {
    List<String> words = new ArrayList<String>();
    BreakIterator breakIterator = BreakIterator.getWordInstance();
    breakIterator.setText(text);
    int lastIndex = breakIterator.first();
    while (BreakIterator.DONE != lastIndex) {
        int firstIndex = lastIndex;
        lastIndex = breakIterator.next();
        if (lastIndex != BreakIterator.DONE && Character.isLetterOrDigit(text.charAt(firstIndex))) {
            words.add(text.substring(firstIndex, lastIndex));
        }
    }

    return words;
}

Test:

public static void main(String[] args) {
    System.out.println(getWords("A PT CR M0RT BOUSG SABN NTE TR/GB/(G) = RAND(MIN(XXX, YY + ABC))"));
}

Ouput:

[A, PT, CR, M0RT, BOUSG, SABN, NTE, TR, GB, G, RAND, MIN, XXX, YY, ABC]
桃气十足 2024-10-18 14:09:15

尝试使用以下命令:

String str = "This is a simple sentence";
String[] strgs = str.split(" ");

这将使用空格作为分割点在字符串数组的每个索引处创建一个子字符串。

Try using the following:

String str = "This is a simple sentence";
String[] strgs = str.split(" ");

That will create a substring at each index of the array of strings using the space as a split point.

征﹌骨岁月お 2024-10-18 14:09:15

您可以使用此正则表达式来分割字符串

String l = "sofia, malgré tout aimait : la laitue et le choux !" <br/>
l.split("[[ ]*|[,]*|[\\.]*|[:]*|[/]*|[!]*|[?]*|[+]*]+");

You can just split your string like that using this regular expression

String l = "sofia, malgré tout aimait : la laitue et le choux !" <br/>
l.split("[[ ]*|[,]*|[\\.]*|[:]*|[/]*|[!]*|[?]*|[+]*]+");
画尸师 2024-10-18 14:09:15

我能想到的最简单和最好的答案是使用在 java 字符串上定义的以下方法 -

String[] split(String regex)

只需执行“这是一个示例句子”.split(“”)。因为它需要正则表达式,所以您还可以进行更复杂的分割,其中可能包括删除不需要的标点符号和其他此类字符。

The easiest and best answer I can think of is to use the following method defined on the java string -

String[] split(String regex)

And just do "This is a sample sentence".split(" "). Because it takes a regex, you can do more complicated splits as well, which can include removing unwanted punctuation and other such characters.

一百个冬季 2024-10-18 14:09:15

使用 string.replace(".", "").replace(",", "").replace("?", "").replace("!","").split(' ' ) 将代码拆分为一个不带句点、逗号、问号或感叹号的数组。您可以根据需要添加/删除任意数量的替换调用。

Use string.replace(".", "").replace(",", "").replace("?", "").replace("!","").split(' ') to split your code into an array with no periods, commas, question marks, or exclamation marks. You can add/remove as many replace calls as you want.

久光 2024-10-18 14:09:15

试试这个:

String[] stringArray = Pattern.compile("ian").split(
"This is a sample sentence"
.replaceAll("[^\\p{Alnum}]+", "") //this will remove all non alpha numeric chars
);

for (int j=0; i<stringArray .length; j++) {
  System.out.println(i + " \"" + stringArray [j] + "\"");
}

Try this:

String[] stringArray = Pattern.compile("ian").split(
"This is a sample sentence"
.replaceAll("[^\\p{Alnum}]+", "") //this will remove all non alpha numeric chars
);

for (int j=0; i<stringArray .length; j++) {
  System.out.println(i + " \"" + stringArray [j] + "\"");
}
时光匆匆的小流年 2024-10-18 14:09:15

我已经在某处发布了这个答案,我会在这里再次发布。此版本不使用任何主要的内置方法。 你得到了字符数组,将其转换为字符串。希望它有帮助!

import java.util.Scanner;

public class SentenceToWord 
{
    public static int getNumberOfWords(String sentence)
    {
        int counter=0;
        for(int i=0;i<sentence.length();i++)
        {
            if(sentence.charAt(i)==' ')
            counter++;
        }
        return counter+1;
    }

    public static char[] getSubString(String sentence,int start,int end) //method to give substring, replacement of String.substring() 
    {
        int counter=0;
        char charArrayToReturn[]=new char[end-start];
        for(int i=start;i<end;i++)
        {
            charArrayToReturn[counter++]=sentence.charAt(i);
        }
        return charArrayToReturn;
    }

    public static char[][] getWordsFromString(String sentence)
    {
        int wordsCounter=0;
        int spaceIndex=0;
        int length=sentence.length();
        char wordsArray[][]=new char[getNumberOfWords(sentence)][]; 
        for(int i=0;i<length;i++)
        {
            if(sentence.charAt(i)==' ' || i+1==length)
            {
            wordsArray[wordsCounter++]=getSubString(sentence, spaceIndex,i+1); //get each word as substring
            spaceIndex=i+1; //increment space index
            }
        }
        return  wordsArray; //return the 2 dimensional char array
    }


    public static void main(String[] args) 
    {
    System.out.println("Please enter the String");
    Scanner input=new Scanner(System.in);
    String userInput=input.nextLine().trim();
    int numOfWords=getNumberOfWords(userInput);
    char words[][]=new char[numOfWords+1][];
    words=getWordsFromString(userInput);
    System.out.println("Total number of words found in the String is "+(numOfWords));
    for(int i=0;i<numOfWords;i++)
    {
        System.out.println(" ");
        for(int j=0;j<words[i].length;j++)
        {
        System.out.print(words[i][j]);//print out each char one by one
        }
    }
    }

}

I already did post this answer somewhere, i will do it here again. This version doesn't use any major inbuilt method. You got the char array, convert it into a String. Hope it helps!

import java.util.Scanner;

public class SentenceToWord 
{
    public static int getNumberOfWords(String sentence)
    {
        int counter=0;
        for(int i=0;i<sentence.length();i++)
        {
            if(sentence.charAt(i)==' ')
            counter++;
        }
        return counter+1;
    }

    public static char[] getSubString(String sentence,int start,int end) //method to give substring, replacement of String.substring() 
    {
        int counter=0;
        char charArrayToReturn[]=new char[end-start];
        for(int i=start;i<end;i++)
        {
            charArrayToReturn[counter++]=sentence.charAt(i);
        }
        return charArrayToReturn;
    }

    public static char[][] getWordsFromString(String sentence)
    {
        int wordsCounter=0;
        int spaceIndex=0;
        int length=sentence.length();
        char wordsArray[][]=new char[getNumberOfWords(sentence)][]; 
        for(int i=0;i<length;i++)
        {
            if(sentence.charAt(i)==' ' || i+1==length)
            {
            wordsArray[wordsCounter++]=getSubString(sentence, spaceIndex,i+1); //get each word as substring
            spaceIndex=i+1; //increment space index
            }
        }
        return  wordsArray; //return the 2 dimensional char array
    }


    public static void main(String[] args) 
    {
    System.out.println("Please enter the String");
    Scanner input=new Scanner(System.in);
    String userInput=input.nextLine().trim();
    int numOfWords=getNumberOfWords(userInput);
    char words[][]=new char[numOfWords+1][];
    words=getWordsFromString(userInput);
    System.out.println("Total number of words found in the String is "+(numOfWords));
    for(int i=0;i<numOfWords;i++)
    {
        System.out.println(" ");
        for(int j=0;j<words[i].length;j++)
        {
        System.out.print(words[i][j]);//print out each char one by one
        }
    }
    }

}
行至春深 2024-10-18 14:09:15

string.replaceAll() 无法正确处理与预定义不同的区域设置。至少在jdk7u10中是这样。

此示例从文本文件创建一个带有 Windows 西里尔字符集 CP1251 的单词词典

    public static void main (String[] args) {
    String fileName = "Tolstoy_VoinaMir.txt";
    try {
        List<String> lines = Files.readAllLines(Paths.get(fileName),
                                                Charset.forName("CP1251"));
        Set<String> words = new TreeSet<>();
        for (String s: lines ) {
            for (String w : s.split("\\s+")) {
                w = w.replaceAll("\\p{Punct}","");
                words.add(w);
            }
        }
        for (String w: words) {
            System.out.println(w);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

string.replaceAll() doesn't correctly work with locale different from predefined. At least in jdk7u10.

This example creates a word dictionary from textfile with windows cyrillic charset CP1251

    public static void main (String[] args) {
    String fileName = "Tolstoy_VoinaMir.txt";
    try {
        List<String> lines = Files.readAllLines(Paths.get(fileName),
                                                Charset.forName("CP1251"));
        Set<String> words = new TreeSet<>();
        for (String s: lines ) {
            for (String w : s.split("\\s+")) {
                w = w.replaceAll("\\p{Punct}","");
                words.add(w);
            }
        }
        for (String w: words) {
            System.out.println(w);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
吻安 2024-10-18 14:09:15

以下是一个代码片段,它将句子拆分为单词并给出其计数。

 import java.util.HashMap;
 import java.util.Iterator;
 import java.util.Map;

 public class StringToword {
public static void main(String[] args) {
    String s="a a a A A";
    String[] splitedString=s.split(" ");
    Map m=new HashMap();
    int count=1;
    for(String s1 :splitedString){
         count=m.containsKey(s1)?count+1:1;
          m.put(s1, count);
        }
    Iterator<StringToword> itr=m.entrySet().iterator();
    while(itr.hasNext()){
        System.out.println(itr.next());         
    }
    }

}

Following is a code snippet which splits a sentense to word and give its count too.

 import java.util.HashMap;
 import java.util.Iterator;
 import java.util.Map;

 public class StringToword {
public static void main(String[] args) {
    String s="a a a A A";
    String[] splitedString=s.split(" ");
    Map m=new HashMap();
    int count=1;
    for(String s1 :splitedString){
         count=m.containsKey(s1)?count+1:1;
          m.put(s1, count);
        }
    Iterator<StringToword> itr=m.entrySet().iterator();
    while(itr.hasNext()){
        System.out.println(itr.next());         
    }
    }

}
背叛残局 2024-10-18 14:09:15

另一种方法是 StringTokenizer。
前任:-

 public static void main(String[] args) {

    String str = "This is a sample string";
    StringTokenizer st = new StringTokenizer(str," ");
    String starr[]=new String[st.countTokens()];
    while (st.hasMoreElements()) {
        starr[i++]=st.nextElement();
    }
}

Another way to do that is StringTokenizer.
ex:-

 public static void main(String[] args) {

    String str = "This is a sample string";
    StringTokenizer st = new StringTokenizer(str," ");
    String starr[]=new String[st.countTokens()];
    while (st.hasMoreElements()) {
        starr[i++]=st.nextElement();
    }
}
缺⑴份安定 2024-10-18 14:09:15

这里的大多数答案都按照问题的要求将字符串转换为字符串数组。但一般我们使用 List ,所以更有用的是 -

String dummy = "This is a sample sentence.";
List<String> wordList= Arrays.asList(dummy.split(" "));

Most of the answers here convert String to String Array as the question asked. But Generally we use List , so more useful will be -

String dummy = "This is a sample sentence.";
List<String> wordList= Arrays.asList(dummy.split(" "));
黑凤梨 2024-10-18 14:09:15

您可以使用以下简单的代码

String str= "This is a sample sentence.";
String[] words = str.split("[[ ]*|[//.]]");
for(int i=0;i<words.length;i++)
System.out.print(words[i]+" ");

You can use simple following code

String str= "This is a sample sentence.";
String[] words = str.split("[[ ]*|[//.]]");
for(int i=0;i<words.length;i++)
System.out.print(words[i]+" ");
給妳壹絲溫柔 2024-10-18 14:09:15

这是一个简单的 C++ 代码解决方案,没有花哨的函数,使用 DMA 分配动态字符串数组,并将数据放入数组中,直到找到空位。
请参考下面的代码和注释。
我希望它有帮助。

#include<bits/stdc++.h>
using namespace std;

int main()
{

string data="hello there how are you"; // a_size=5, char count =23
//getline(cin,data); 
int count=0; // initialize a count to count total number of spaces in string.
int len=data.length();
for (int i = 0; i < (int)data.length(); ++i)
{
    if(data[i]==' ')
    {
        ++count;
    }
}
//declare a string array +1 greater than the size 
// num of space in string.
string* str = new string[count+1];

int i, start=0;
for (int index=0; index<count+1; ++index) // index array to increment index of string array and feed data.
{   string temp="";
    for ( i = start; i <len; ++i)
    {   
        if(data[i]!=' ') //increment temp stored word till you find a space.
        {
            temp=temp+data[i];
        }else{
            start=i+1; // increment i counter to next to the space
            break;
        }
    }str[index]=temp;
}


//print data 
for (int i = 0; i < count+1; ++i)
{
    cout<<str[i]<<" ";
}

    return 0;
}

Here is a solution in plain and simple C++ code with no fancy function, use DMA to allocate a dynamic string array, and put data in array till you find a open space.
please refer code below with comments.
I hope it helps.

#include<bits/stdc++.h>
using namespace std;

int main()
{

string data="hello there how are you"; // a_size=5, char count =23
//getline(cin,data); 
int count=0; // initialize a count to count total number of spaces in string.
int len=data.length();
for (int i = 0; i < (int)data.length(); ++i)
{
    if(data[i]==' ')
    {
        ++count;
    }
}
//declare a string array +1 greater than the size 
// num of space in string.
string* str = new string[count+1];

int i, start=0;
for (int index=0; index<count+1; ++index) // index array to increment index of string array and feed data.
{   string temp="";
    for ( i = start; i <len; ++i)
    {   
        if(data[i]!=' ') //increment temp stored word till you find a space.
        {
            temp=temp+data[i];
        }else{
            start=i+1; // increment i counter to next to the space
            break;
        }
    }str[index]=temp;
}


//print data 
for (int i = 0; i < count+1; ++i)
{
    cout<<str[i]<<" ";
}

    return 0;
}
冬天的雪花 2024-10-18 14:09:15

这应该会有所帮助,

 String s = "This is a sample sentence";
 String[] words = s.split(" ");

这将创建一个数组,其中元素作为由“”分隔的字符串。

This should help,

 String s = "This is a sample sentence";
 String[] words = s.split(" ");

this will make an array with elements as the string separated by " ".

ι不睡觉的鱼゛ 2024-10-18 14:09:15

试试这个....

import java.util.Scanner;

public class test {
    public static void main(String[] args) {

        Scanner t = new Scanner(System.in);
        String x = t.nextLine();

        System.out.println(x);

        String[] starr = x.split(" ");

        System.out.println("reg no: "+ starr[0]);
        System.out.println("name: "+ starr[1]);
        System.out.println("district: "+ starr[2]);

    }
}

TRY THIS....

import java.util.Scanner;

public class test {
    public static void main(String[] args) {

        Scanner t = new Scanner(System.in);
        String x = t.nextLine();

        System.out.println(x);

        String[] starr = x.split(" ");

        System.out.println("reg no: "+ starr[0]);
        System.out.println("name: "+ starr[1]);
        System.out.println("district: "+ starr[2]);

    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文