Java 的 shlex 替代品

发布于 2024-07-25 17:22:37 字数 205 浏览 2 评论 0原文

Java 是否有 shlex 替代方案? 我希望能够像 shell 处理它们一样分割引号分隔的字符串。 例如,如果我发送:

one two "three four"
and perform a split, I'd like to receive the tokens
one
two
three four

Is there a shlex alternative for Java? I'd like to be able to split quote delimited strings like the shell would process them. For example, if I'd send :

one two "three four"

and perform a split, I'd like to receive the tokens

one
two
three four

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一梦等七年七年为一梦 2024-08-01 17:22:37

我今天遇到了类似的问题,看起来任何标准选项(例如 StringTokenizer、StrTokenizer、Scanner)都不适合。 然而,实施基础知识并不难。

此示例处理当前在其他答案中评论的所有边缘情况。 请注意,我还没有检查它是否完全符合 POSIX 要求。 Gist 包括 GitHub 上提供的单元测试 - 通过未经许可在公共领域发布。

public List<String> shellSplit(CharSequence string) {
    List<String> tokens = new ArrayList<String>();
    boolean escaping = false;
    char quoteChar = ' ';
    boolean quoting = false;
    int lastCloseQuoteIndex = Integer.MIN_VALUE;
    StringBuilder current = new StringBuilder();
    for (int i = 0; i<string.length(); i++) {
        char c = string.charAt(i);
        if (escaping) {
            current.append(c);
            escaping = false;
        } else if (c == '\\' && !(quoting && quoteChar == '\'')) {
            escaping = true;
        } else if (quoting && c == quoteChar) {
            quoting = false;
            lastCloseQuoteIndex = i;
        } else if (!quoting && (c == '\'' || c == '"')) {
            quoting = true;
            quoteChar = c;
        } else if (!quoting && Character.isWhitespace(c)) {
            if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
                tokens.add(current.toString());
                current = new StringBuilder();
            }
        } else {
            current.append(c);
        }
    }
    if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
        tokens.add(current.toString());
    }

    return tokens;
}

I had a similar problem today, and it didn't look like any standard options such as StringTokenizer, StrTokenizer, Scanner were a good fit. However, it's not hard to implement the basics.

This example handles all the edge cases currently commented on other answers. Be warned, I haven't checked it for full POSIX compliance yet. Gist including unit tests available on GitHub - released in public domain via the unlicense.

public List<String> shellSplit(CharSequence string) {
    List<String> tokens = new ArrayList<String>();
    boolean escaping = false;
    char quoteChar = ' ';
    boolean quoting = false;
    int lastCloseQuoteIndex = Integer.MIN_VALUE;
    StringBuilder current = new StringBuilder();
    for (int i = 0; i<string.length(); i++) {
        char c = string.charAt(i);
        if (escaping) {
            current.append(c);
            escaping = false;
        } else if (c == '\\' && !(quoting && quoteChar == '\'')) {
            escaping = true;
        } else if (quoting && c == quoteChar) {
            quoting = false;
            lastCloseQuoteIndex = i;
        } else if (!quoting && (c == '\'' || c == '"')) {
            quoting = true;
            quoteChar = c;
        } else if (!quoting && Character.isWhitespace(c)) {
            if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
                tokens.add(current.toString());
                current = new StringBuilder();
            }
        } else {
            current.append(c);
        }
    }
    if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
        tokens.add(current.toString());
    }

    return tokens;
}
彼岸花似海 2024-08-01 17:22:37

查看 Apache Commons Lang

org.apache.commons.lang.text.StrTokenizer 应该能够做你想做的事:

new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();

Look at Apache Commons Lang:

org.apache.commons.lang.text.StrTokenizer should be able to do what you want:

new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();
情感失落者 2024-08-01 17:22:37

我使用 fastparse 成功使用了以下 Scala 代码。 我不能保证它是完整的:

val kvParser = {
  import fastparse._
  import NoWhitespace._
  def nonQuoteChar[_:P] = P(CharPred(_ != '"'))
  def quotedQuote[_:P] = P("\\\"")
  def quotedElement[_:P] = P(nonQuoteChar | quotedQuote)
  def quotedContent[_:P] = P(quotedElement.rep)
  def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"")
  def alpha[_:P] = P(CharIn("a-zA-Z"))
  def digit[_:P] = P(CharIn("0-9"))
  def hyphen[_:P] = P("-")
  def underscore[_:P] = P("_")
  def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore)
  def bareString[_:P] = P(bareStringChar.rep.!)
  def string[_:P] = P(quotedString | bareString)
  def kvPair[_:P] = P(string ~ "=" ~ string)
  def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep)
  def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace))
  def fullLang[_:P] = P(kvPairList ~ End)

  def res(str: String) = {
    parse(str, fullLang(_))
  }

  res _
}

I had success using the following Scala code using fastparse. I can't vouch for it being complete:

val kvParser = {
  import fastparse._
  import NoWhitespace._
  def nonQuoteChar[_:P] = P(CharPred(_ != '"'))
  def quotedQuote[_:P] = P("\\\"")
  def quotedElement[_:P] = P(nonQuoteChar | quotedQuote)
  def quotedContent[_:P] = P(quotedElement.rep)
  def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"")
  def alpha[_:P] = P(CharIn("a-zA-Z"))
  def digit[_:P] = P(CharIn("0-9"))
  def hyphen[_:P] = P("-")
  def underscore[_:P] = P("_")
  def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore)
  def bareString[_:P] = P(bareStringChar.rep.!)
  def string[_:P] = P(quotedString | bareString)
  def kvPair[_:P] = P(string ~ "=" ~ string)
  def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep)
  def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace))
  def fullLang[_:P] = P(kvPairList ~ End)

  def res(str: String) = {
    parse(str, fullLang(_))
  }

  res _
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文