在java中解析引用的文本

发布于 2024-12-01 19:28:58 字数 231 浏览 2 评论 0原文

有没有一种简单的方法将引用的文本解析为java字符串?我有这样的台词要解析:

author="Tolkien, J.R.R." title="The Lord of the Rings"
publisher="George Allen & Unwin" year=1954 

我想要的只是托尔金、JRR、指环王、乔治·艾伦和乔治·艾伦。 Unwin,1954 年作为弦乐。

Is there an easy way to parse quoted text as a string to java? I have this lines like this to parse:

author="Tolkien, J.R.R." title="The Lord of the Rings"
publisher="George Allen & Unwin" year=1954 

and all I want is Tolkien, J.R.R.,The Lord of the Rings,George Allen & Unwin, 1954 as strings.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

童话里做英雄 2024-12-08 19:28:58

您可以使用正则表达式,例如

"(.+)"

它将匹配引号之间的任何字符。在 Java 中将是:

Pattern p = Pattern.compile("\\"(.+)\\"";
Matcher m = p.matcher("author=\"Tolkien, J.R.R.\"");
while(matcher.find()){
  System.out.println(m.group(1));      
}

请注意,使用了 group(1),这是第二个匹配项,第一个匹配项 group(0) 是带引号的完整字符串

当然,您也可以使用子字符串来选择除第一个和最后一个之外的所有内容字符:

String quoted = "author=\"Tolkien, J.R.R.\"";
String unquoted;    
if(quoted.indexOf("\"") == 0 && quoted.lastIndexOf("\"")==quoted.length()-1){
    unquoted = quoted.substring(1, quoted.lenght()-1);
}else{
  unquoted = quoted;
}

You could either use a regex like

"(.+)"

It will match any character between quotes. In Java would be:

Pattern p = Pattern.compile("\\"(.+)\\"";
Matcher m = p.matcher("author=\"Tolkien, J.R.R.\"");
while(matcher.find()){
  System.out.println(m.group(1));      
}

Note that group(1) is used, this is the second match, the first one, group(0), is the full string with quotes

Offcourse you could also use a substring to select everything except the first and last char:

String quoted = "author=\"Tolkien, J.R.R.\"";
String unquoted;    
if(quoted.indexOf("\"") == 0 && quoted.lastIndexOf("\"")==quoted.length()-1){
    unquoted = quoted.substring(1, quoted.lenght()-1);
}else{
  unquoted = quoted;
}
十二 2024-12-08 19:28:58

有一些奇特的模式正则表达式的废话,奇特的人和奇特的程序员喜欢使用。

我喜欢使用 String.split()。这是一个简单的函数,可以完成您需要它做的事情。

因此,如果我有一个字符串 word: "hello" 并且我想取出“hello”,我可以简单地执行以下操作:

myStr = string.split("\"")[1 ];

这将根据引号将字符串切成位。

如果我想更具体,我可以执行

myStr = string.split("word: \"")[1]。分割(“\”“)[0];

这样我就用 word: "" 来剪切它。

当然,如果 word: " 重复两次,你就会遇到问题,这就是模式是为了。我认为您不必针对您的具体问题来处理该问题。

另外,要小心像 之类的字符。和 。 Split 使用正则表达式,因此这些字符会触发有趣的行为。我认为 "\\" = \ 将逃脱那些有趣的规则。如果我错了,有人纠正我。

祝你好运!

There are some fancy pattern regex nonsense things that fancy people and fancy programmers like to use.

I like to use String.split(). It's a simple function and does what you need it to do.

So if I have a String word: "hello" and I want to take out "hello", I can simply do this:

myStr = string.split("\"")[1];

This will cut the string into bits based on the quote marks.

If I want to be more specific, I can do

myStr = string.split("word: \"")[1].split("\"")[0];

That way I cut it with word: " and "

Of course, you run into problems if word: " is repeated twice, which is what patterns are for. I don't think you'll have to deal with that problem for your specific question.

Also, be cautious around characters like . and . Split uses regex, so those characters will trigger funny behavior. I think that "\\" = \ will escape those funny rules. Someone correct me if I'm wrong.

Best of luck!

不…忘初心 2024-12-08 19:28:58

您能否认为您的文档格式正确并且不包含语法错误?如果是这样,那么您只是在使用 String.split() 之后对所有其他标记感兴趣。

如果您需要更强大的功能,则可能需要使用 Scanner 类(或 StringBuffer 和 for 循环;-))来挑选有效的标记,同时考虑“我在某处看到引号”之外的其他标准。

例如,出于某些原因,您可能需要一个比盲目地在引号上分割字符串更强大的解决方案:如果开始的引号紧接在等号之后,那么它可能只是一个有效的标记。或者也许您确实需要处理未引用的值以及引用的值? \" 是否需要作为转义引号处理,或者是否算作字符串的结尾。它可以有单引号或双引号(例如:html),还是始终正确用双引号格式化?

一种可靠的方法是像编译器一样思考并使用基于 Java 的 Lexer(例如 JFlex),但是 复制

如果您更喜欢低级方法,您可以使用 while 循环逐个字符地迭代输入流,当您看到 =" 时,开始 字符变成StringBuffer 直到找到另一个非转义的 ",要么连接到各种想要的解析值,要么将它们添加到某种列表中(取决于您打算如何处理数据)。然后继续阅读直到再次遇到开始标记(例如:="),然后重复。

Can you presume your document is well-formed and does not contain syntax errors? If so, you are simply interested in every other token after using String.split().

If you need something more robust, you may need to use the Scanner class (or a StringBuffer and a for loop ;-)) to pick out the valid tokens, taking into account additional criterion beyond "I saw a quotation mark somewhere".

For example, some reasons you might need a more robust solution than splitting the string blindly on quotation marks: perhaps its only a valid token if the quotation mark starting it comes immediately after an equals sign. Or perhaps you do need to handle values that are not quoted as well as quoted ones? Will \" need to be handled as an escaped quotation mark, or does that count as the end of the string. Can it have either single or double quotes (eg: html) or will it always be correctly formatted with double quotes?

One robust way would be to think like a compiler and use a Java based Lexer (such as JFlex), but that might be overkill for what you need.

If you prefer a low-level approach, you could iterate through your input stream character by character using a while loop, and when you see an =" start copying the characters into a StringBuffer until you find another non-escaped ", either concatenating to the various wanted parsed values or adding them to a List of some sort (depending on what you plan to do with your data). Then continue reading until you encounter your start token (eg: =") again, and repeat.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文