java中用空格标记字符串

发布于 2024-08-06 11:40:38 字数 263 浏览 9 评论 0原文

我想标记这样的字符串

String line = "a=b c='123 456' d=777 e='uij yyy'";

我不能像这样分割

String [] words = line.split(" ");

任何想法如何分割以便我得到像这样的标记

a=b
c='123 456'
d=777
e='uij yyy';

原文

I want to tokenize a string like this

String line = "a=b c='123 456' d=777 e='uij yyy'";

I cannot split based like this

String [] words = line.split(" ");

Any idea how can I split so that I get tokens like

a=b
c='123 456'
d=777
e='uij yyy';

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

懵少女 2024-08-13 11:40:38

最简单的方法是手动实现一个简单的有限状态机。换句话说，一次处理一个字符的字符串：

当您点击空格时，断开一个标记；
当您点击一个引号时，将继续获取字符，直到您点击另一个引号。

回复收藏 0 原文

揪着可爱 2024-08-13 11:40:38

根据原始字符串的格式，您应该能够使用正则表达式作为 java“split”方法的参数：单击此处查看示例。

不过，该示例并未使用此任务所需的正则表达式。

您还可以使用这个SO线程作为指南（尽管它是用PHP编写的），它所做的事情非常接近您所需要的。稍微进行一些操作可能会达到目的（尽管引号是否成为输出的一部分可能会导致一些问题）。请记住，正则表达式在大多数语言中都非常相似。

编辑：对此类任务进行过多深入研究可能会超出正则表达式的功能，因此您可能需要创建一个简单的解析器。

回复收藏 0 原文

疏忽 2024-08-13 11:40:38

line.split(" (?=[a-z+]=)")

正确给出：

a=b
c='123 456'
d=777
e='uij yyy'

确保调整 [a-z+] 部分，以防您的键结构发生变化。

编辑：如果该对的值部分中有“=”字符，则此解决方案可能会严重失败。

line.split(" (?=[a-z+]=)")

correctly gives:

a=b
c='123 456'
d=777
e='uij yyy'

Make sure you adapt the [a-z+] part in case your keys structure changes.

Edit: this solution can fail miserably if there is a "=" character in the value part of the pair.

回复收藏 0 原文

不羁少年 2024-08-13 11:40:38

StreamTokenizer 可以提供帮助，尽管它是最简单的设置在“=”处中断，因为它总是在带引号的字符串的开头处中断：

String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
    switch (st.ttype) {
    case StreamTokenizer.TT_NUMBER:
        System.out.println(st.nval);
        break;
    case StreamTokenizer.TT_WORD:
        System.out.println(st.sval);
        break;
    case '=':
        System.out.println("=");
        break;
    default:
        System.out.println(st.sval);
    }
}

输出

Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy

如果省略将数字字符转换为字母的两行，则会得到d=777.0，这可能对您有用。

StreamTokenizer can help, although it is easiest to set up to break on '=', as it will always break at the start of a quoted string:

String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
    switch (st.ttype) {
    case StreamTokenizer.TT_NUMBER:
        System.out.println(st.nval);
        break;
    case StreamTokenizer.TT_WORD:
        System.out.println(st.sval);
        break;
    case '=':
        System.out.println("=");
        break;
    default:
        System.out.println(st.sval);
    }
}

outputs

Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy

If you leave out the two lines that convert numeric characters to alpha, then you get d=777.0, which might be useful to you.

回复收藏 0 原文

ゝ偶尔ゞ 2024-08-13 11:40:38

假设：

您的变量名称（赋值“a=b”中的“a”）的长度可以为 1 或更长
您的变量名称（赋值“a=b”中的“a”）不能包含空格字符或其他任何字符很好。
不需要验证您的输入（假定输入采用有效的 a=b 格式）

这对我来说效果很好。

输入：

a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk'              123sdkljhSDFjflsakd@*#&=456sldSLKD)#(

输出：

a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'             
123sdkljhSDFjflsakd@*#&=456sldSLKD)#(

代码：

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {

    // SPACE CHARACTER                                          followed by
    // sequence of non-space characters of 1 or more            followed by
    // first occuring EQUALS CHARACTER       
    final static String regex = " [^ ]+?=";


    // static pattern defined outside so that you don't have to compile it 
    // for each method call
    static final Pattern p = Pattern.compile(regex);

    public static List<String> tokenize(String input, Pattern p){
        input = input.trim(); // this is important for "last token case"
                                // see end of method
        Matcher m = p.matcher(input);
        ArrayList<String> tokens = new ArrayList<String>();
        int beginIndex=0;
        while(m.find()){
            int endIndex = m.start();
            tokens.add(input.substring(beginIndex, endIndex));
            beginIndex = endIndex+1;
        }

        // LAST TOKEN CASE
        //add last token
        tokens.add(input.substring(beginIndex));

        return tokens;
    }

    private static void println(List<String> tokens) {
        for(String token:tokens){
            System.out.println(token);
        }
    }


    public static void main(String args[]){
        String test = "a=b " +
                "abc='123 456' " +
                "&=777 " +
                "#='uij yyy' " +
                "ABC='slk slk'              " +
                "123sdkljhSDFjflsakd@*#&=456sldSLKD)#(";
        List<String> tokens = RegexTest.tokenize(test, p);
        println(tokens);
    }
}

Assumptions:

Your variable name ('a' in the assignment 'a=b') can be of length 1 or more
Your variable name ('a' in the assignment 'a=b') can not contain the space character, anything else is fine.
Validation of your input is not required (input assumed to be in valid a=b format)

This works fine for me.

Input:

a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk'              123sdkljhSDFjflsakd@*#&=456sldSLKD)#(

Output:

a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'             
123sdkljhSDFjflsakd@*#&=456sldSLKD)#(

Code:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {

    // SPACE CHARACTER                                          followed by
    // sequence of non-space characters of 1 or more            followed by
    // first occuring EQUALS CHARACTER       
    final static String regex = " [^ ]+?=";


    // static pattern defined outside so that you don't have to compile it 
    // for each method call
    static final Pattern p = Pattern.compile(regex);

    public static List<String> tokenize(String input, Pattern p){
        input = input.trim(); // this is important for "last token case"
                                // see end of method
        Matcher m = p.matcher(input);
        ArrayList<String> tokens = new ArrayList<String>();
        int beginIndex=0;
        while(m.find()){
            int endIndex = m.start();
            tokens.add(input.substring(beginIndex, endIndex));
            beginIndex = endIndex+1;
        }

        // LAST TOKEN CASE
        //add last token
        tokens.add(input.substring(beginIndex));

        return tokens;
    }

    private static void println(List<String> tokens) {
        for(String token:tokens){
            System.out.println(token);
        }
    }


    public static void main(String args[]){
        String test = "a=b " +
                "abc='123 456' " +
                "&=777 " +
                "#='uij yyy' " +
                "ABC='slk slk'              " +
                "123sdkljhSDFjflsakd@*#&=456sldSLKD)#(";
        List<String> tokens = RegexTest.tokenize(test, p);
        println(tokens);
    }
}

回复收藏 0 原文

疧_╮線 2024-08-13 11:40:38

或者，使用用于标记化的正则表达式，以及一个仅将键/值添加到映射的小型状态机：

String line = "a = b c='123 456' d=777 e =  'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
    String found = m.group();
    if (state.equals("key")) {
        if (found.equals("=") || found.startsWith("'"))
            { System.err.println ("ERROR"); }
        else { key = found; state = "equals"; }
    } else if (state.equals("equals")) {
        if (! found.equals("=")) { System.err.println ("ERROR"); }
        else { state = "value"; }
    } else if (state.equals("value")) {
        if (key == null) { System.err.println ("ERROR"); }
        else {
            if (found.startsWith("'"))
                found = found.substring(1,found.length()-1);
            keyval.put (key, found);
            key = null;
            state = "key";
        }
    }
}
if (! state.equals("key"))  { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);

打印输出

map: {d=777, e=uij yyy, c=123 456, a=b}

它会进行一些基本的错误检查，并从值中去掉引号。

Or, with a regex for tokenizing, and a little state machine that just adds the key/val to a map:

String line = "a = b c='123 456' d=777 e =  'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
    String found = m.group();
    if (state.equals("key")) {
        if (found.equals("=") || found.startsWith("'"))
            { System.err.println ("ERROR"); }
        else { key = found; state = "equals"; }
    } else if (state.equals("equals")) {
        if (! found.equals("=")) { System.err.println ("ERROR"); }
        else { state = "value"; }
    } else if (state.equals("value")) {
        if (key == null) { System.err.println ("ERROR"); }
        else {
            if (found.startsWith("'"))
                found = found.substring(1,found.length()-1);
            keyval.put (key, found);
            key = null;
            state = "key";
        }
    }
}
if (! state.equals("key"))  { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);

prints out

map: {d=777, e=uij yyy, c=123 456, a=b}

It does some basic error checking, and takes the quotes off the values.

回复收藏 0 原文

薄荷梦 2024-08-13 11:40:38

这个解决方案既通用又紧凑（它实际上是 cletus 答案的正则表达式版本）：

String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
  System.out.println(m.group()); // or whatever you want to do
}

换句话说，找到所有由带引号的字符串或非空格字符组合而成的字符；不支持嵌套引号（没有转义字符）。

This solution is both general and compact (it is effectively the regex version of cletus' answer):

String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
  System.out.println(m.group()); // or whatever you want to do
}

In other words, find all runs of characters that are combinations of quoted strings or non-space characters; nested quotes are not supported (there is no escape character).

回复收藏 0 原文

半暖夏伤 2024-08-13 11:40:38

public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123  456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
        token = tokenizer.nextToken();
    value = token.contains("'") ? value + " " + token : token ;
    if(!value.contains("'") || value.endsWith("'")) {
           //Split the strings and get variables into hashmap 
           attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
           value ="";
    }
}
    System.out.println(attributes);
}

输出：
{d=777, a=b, e='uij yyy', c='123 456'}

在这种情况下，连续空格将被截断为值中的单个空格。
这里属性哈希图包含值

public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123  456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
        token = tokenizer.nextToken();
    value = token.contains("'") ? value + " " + token : token ;
    if(!value.contains("'") || value.endsWith("'")) {
           //Split the strings and get variables into hashmap 
           attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
           value ="";
    }
}
    System.out.println(attributes);
}

output:
{d=777, a=b, e='uij yyy', c='123 456'}

In this case continuous space will be truncated to single space in the value.
here attributed hashmap contains the values

回复收藏 0 原文

幽梦紫曦～ 2024-08-13 11:40:38

 import java.io.*;
 import java.util.Scanner;

 public class ScanXan {
  public static void main(String[] args) throws IOException {

    Scanner s = null;

    try {
        s = new Scanner(new BufferedReader(new FileReader("<file name>")));

        while (s.hasNext()) {
            System.out.println(s.next());
           <write for output file>
        }
    } finally {
        if (s != null) {
            s.close();
        }
    }
 }
}

 import java.io.*;
 import java.util.Scanner;

 public class ScanXan {
  public static void main(String[] args) throws IOException {

    Scanner s = null;

    try {
        s = new Scanner(new BufferedReader(new FileReader("<file name>")));

        while (s.hasNext()) {
            System.out.println(s.next());
           <write for output file>
        }
    } finally {
        if (s != null) {
            s.close();
        }
    }
 }
}

回复收藏 0 原文

枉心 2024-08-13 11:40:38

java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
    String token = tokenizer.nextToken();
    int index = token.indexOf('=');
    String key = token.substring(0, index);
    String value = token.substring(index + 1);
}

java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
    String token = tokenizer.nextToken();
    int index = token.indexOf('=');
    String key = token.substring(0, index);
    String value = token.substring(index + 1);
}

回复收藏 0 原文