如何搭配“逃亡”正则表达式中的不可打印字符?
我找到了一个指南,http://answers.oreilly.com/topic/214-how-to-match-nonprintable-characters-with-a-regular-expression/,但不是代码,\e, \x1b,\x1B,在 Java 中为我工作。
编辑
我正在尝试替换 Linux 终端命令输出的 ANSI 转义序列(特别是颜色序列)。 在 Python 中,替换模式看起来像“\x1b[34;01m”,这意味着蓝色粗体文本。同样的模式在 Java 中不起作用。我尝试单独替换“[34;01m”,并且成功了,所以问题是\x1b。 我正在使用 Pattern.quote() 进行“[”转义。
编辑
Map<String,String> escapeMap = new HashMap<String,String>();
escapeMap.put("\\x1b[01;34m", "</span><span style=\"color:blue;font-weight:bold\">");
FileInputStream stream = new FileInputStream(new File("/home/ch00k/gun.output"));
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
String message = Charset.defaultCharset().decode(bb).toString();
stream.close();
String patternString = Pattern.quote(StringUtils.join(escapeMap.keySet(), "|"));
System.out.println(patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, escapeMap.get(matcher.group()));
}
matcher.appendTail(sb);
String formattedMessage = sb.toString();
System.out.println(formattedMessage);
编辑 这是我最终得到的代码:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
import org.apache.commons.lang3.*;
class CreateMessage {
public static void message() throws IOException {
FileInputStream stream = new FileInputStream(new File("./gun.output"));
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
String message = Charset.defaultCharset().decode(bb).toString();
stream.close();
Map<String,String> tokens = new HashMap<String,String>();
tokens.put("root", "nobody");
tokens.put(Pattern.quote("[01;34m"), "qwe");
String patternString = "(" + StringUtils.join(tokens.keySet(), "|") + ")";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
System.out.println(tokens.get(matcher.group()));
matcher.appendReplacement(sb, tokens.get(matcher.group()));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
}
}
文件gun.output包含ls -la --color=always /
的输出 现在的问题是,如果我尝试匹配 Pattern.quote("[01;34m")
,我会收到 NullPointerException。除了包含 [
的字符串之外,所有内容都匹配良好,即使我引用了它们。例外情况如下:
Exception in thread "main" java.lang.NullPointerException
at java.util.regex.Matcher.appendReplacement(Matcher.java:699)
at org.minuteware.jgun.CreateMessage.message(CreateMessage.java:32)
at org.minuteware.jgun.Main.main(Main.java:23)
编辑
因此,根据http://java.sun.com/developer/technicalArticles/releases/1.4regex/,转义字符应与“\u001B”
,这确实适用于我的情况。问题是,如果我使用 tokens.put("\u001B" + Pattern.quote("[01;34m"), "qwe");,我仍然得到上面提到的 NPE。
I've found a howto, http://answers.oreilly.com/topic/214-how-to-match-nonprintable-characters-with-a-regular-expression/ , but non of the codes, \e, \x1b, \x1B, work for me in Java.
EDIT
I am trying to replace the ANSI escape sequences (specifically, color sequences) of a Linux terminal command's output.
In Python the replace pattern would look like "\x1b[34;01m", which means blue bold text. This same pattern does not work in Java. I tried to replace "[34;01m" separately, and it worked, so the problem is \x1b.
And I am doing the "[" escaping using Pattern.quote().
EDIT
Map<String,String> escapeMap = new HashMap<String,String>();
escapeMap.put("\\x1b[01;34m", "</span><span style=\"color:blue;font-weight:bold\">");
FileInputStream stream = new FileInputStream(new File("/home/ch00k/gun.output"));
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
String message = Charset.defaultCharset().decode(bb).toString();
stream.close();
String patternString = Pattern.quote(StringUtils.join(escapeMap.keySet(), "|"));
System.out.println(patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, escapeMap.get(matcher.group()));
}
matcher.appendTail(sb);
String formattedMessage = sb.toString();
System.out.println(formattedMessage);
EDIT
Here is the code I've ended up with:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
import org.apache.commons.lang3.*;
class CreateMessage {
public static void message() throws IOException {
FileInputStream stream = new FileInputStream(new File("./gun.output"));
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
String message = Charset.defaultCharset().decode(bb).toString();
stream.close();
Map<String,String> tokens = new HashMap<String,String>();
tokens.put("root", "nobody");
tokens.put(Pattern.quote("[01;34m"), "qwe");
String patternString = "(" + StringUtils.join(tokens.keySet(), "|") + ")";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
System.out.println(tokens.get(matcher.group()));
matcher.appendReplacement(sb, tokens.get(matcher.group()));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
}
}
The file gun.output contains the output of ls -la --color=always /
Now, the problem is that I'm getting a NullPointerException if I'm trying to match Pattern.quote("[01;34m")
. Everything matches fine except of the strings, that contain [
, even though I quote them. The exception is the following:
Exception in thread "main" java.lang.NullPointerException
at java.util.regex.Matcher.appendReplacement(Matcher.java:699)
at org.minuteware.jgun.CreateMessage.message(CreateMessage.java:32)
at org.minuteware.jgun.Main.main(Main.java:23)
EDIT
So, according to http://java.sun.com/developer/technicalArticles/releases/1.4regex/, the escape character should be matched with "\u001B"
, which indeed works in my case. The problem is, if I use tokens.put("\u001B" + Pattern.quote("[01;34m"), "qwe");
, I still get the above mentioned NPE.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
quote()
是创建一个与输入字符串逐字匹配的模式。你的字符串中有模式语言。查看 quote() 的输出 - 您会发现它正在尝试逐字查找四个字符 \x1b。quote()
is to make a pattern that will match the input string verbatim. Your string has pattern language in it. Look at the output from quote() - you'll see that it's trying to literally find the four characters \x1b.ansi 转义序列采用以下形式 [\033[34;01m]
,其中 \033 是 ANSI 字符 033(八进制)或十六进制的 1b 或十进制的 27。您需要使用以下正则表达式:
当您在 java 字符串中使用不可打印字符时,可以使用八进制 (\033) 或十六进制 (\x1b) 表示形式。
The ansi escape sequences are of the following form [\033[34;01m]
where \033 is ANSI character 033 (oct) or 1b in Hex or 27 in decimal. You need to use the following regexp:
You can use an octal (\033) or hex (\x1b) representation when you're using a non-printable character in a java string.
正则表达式中“转义”字符的正确值为
\u001B
The proper value for "escape" character in a regexp is
\u001B
FWIW,我一直致力于从彩色 log4j 文件中剥离 ANSI 颜色代码,这个小模式似乎可以解决我遇到的所有情况:
FWIW, I've been working on stripping ANSI color codes from colorized log4j files and this little pattern seems to do the trick for all of the cases I've come across: