“用于编码 UTF-8 的不可映射字符”错误
我在以下方法中遇到编译错误。
public static boolean isValidPasswd(String passwd) {
String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for enoding UTF-8. 74th character is' " '
我该如何解决这个问题?谢谢。
I'm getting a compile error at the following method.
public static boolean isValidPasswd(String passwd) {
String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for enoding UTF-8. 74th character is' " '
How can I fix this? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
您的源代码文件存在编码问题。它可能是 ISO-8859-1 编码的,但编译器设置为使用 UTF-8。这会导致使用字符时出现错误,因为字符在 UTF-8 和 ISO-8859-1 中的字节表示形式不同。这将发生在所有不属于 ASCII 的字符上,例如
-
不签名。您可以使用以下程序来模拟这一点。它只是使用您的源代码行并生成 ISO-8859-1 字节数组,并使用 UTF-8 编码解码此“错误”。您可以看到线路在哪个位置被损坏。我在源代码中添加了 2 个空格以适合位置 74,以使其适合
Ø
NOT SIGN,这是唯一的字符,在ISO-8859-1编码和UTF-8编码中会生成不同的字节。我想这将与真实源文件的缩进相匹配。这会产生以下输出(由于标记而混乱):
请参阅 https://ideone.com/ShZnB
要解决此问题,请使用 UTF-8 编码保存源文件。
You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example
¬
NOT SIGN.You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to
¬
NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.which results in the following output (messed up because of markup):
See "live" at https://ideone.com/ShZnB
To fix this, save the source files with UTF-8 encoding.
我正在 Linux 机器上为 2000 年启动的遗留系统设置 CI 构建服务器。其中有一个部分生成包含非 UTF8 字符的 PDF。我们正处于发布的最后阶段,所以我无法替换那些让我悲伤的角色,但出于 Dilbertesque 的原因,我迫不及待地在发布后一周解决这个问题。幸运的是,Ant 中的“javac”命令有一个“encoding”参数。
I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.
Java 编译器假定您的输入是 UTF-8 编码的,因为您指定了它,或者因为它是您的平台默认编码。
但是,
.java
文件中的数据实际上并不是以 UTF-8 编码的。问题可能是-
字符。确保您选择的编辑器(或 IDE)确实以 UTF-8 编码保护其文件。The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.
However, the data in your
.java
files is not actually encoded in UTF-8. The problem is probably the¬
character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.对于 IntelliJ 用户来说,一旦你知道原始编码是什么,这就非常容易了。您可以从窗口右下角选择编码,系统将提示您一个对话框:
因此,如果您碰巧以某种奇怪的编码保存了一些字符,您应该首先选择“重新加载”以全部以错误字符的编码加载文件。对我来说这变成了?字符转换为适当的值。
IntelliJ 可以判断您是否很可能没有选择正确的编码,并会向您发出警告。恢复并重试。
一旦您看到坏字符消失,请将右下角的编码选择框更改回您最初想要的格式(如果您在 Google 上搜索此错误消息,则可能是 UTF-8)。这次选择对话框上的“转换”按钮。
对我来说,我需要重新加载为“windows-1252”,然后转换回“UTF-8”。违规字符是单引号(' 和 '),很可能是从 Word 文档(或电子邮件)中以错误的编码粘贴的,上述操作会将它们转换为 UTF-8。
For IntelliJ users, this is pretty easy once you find out what the original encoding was. You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying:
So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. For me this turned the ? characters into their proper value.
IntelliJ can tell if you most likely did not pick the right encoding and will warn you. Revert back and try again.
Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). This time select the 'Convert' button on the dialog.
For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. The offending characters were single quotes (‘ and ’) likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8.
在 Eclipse 中尝试转到文件属性 (Alt+Enter) 并更改
资源
→ '文本文件编码
' →其他
到UTF-8
。重新打开文件并检查字符串/文件中的某处是否存在垃圾字符。将其删除。保存文件。将编码资源→“
文本文件编码
”更改回默认值。编译并部署代码。
In eclipse try to go to file properties (Alt+Enter) and change the
Resource
→ 'Text File encoding
' →Other
toUTF-8
. Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.Change the encoding Resource → '
Text File encoding
' back to Default.Compile and deploy the code.
编译器使用 UTF-8 字符编码来读取源文件。但该文件必须是由编辑器使用不同的编码编写的。在设置为 UTF-8 编码的编辑器中打开文件,修复引号,然后再次保存。
或者,您可以找到字符的 Unicode 点并在源代码中使用 Unicode 转义。例如,字符
A
可以替换为 Unicode 转义符\u0041
。顺便说一句,在使用
matches()
方法时,不需要使用开始和结束行锚点^
和$
。使用matches()
方法时,整个序列必须通过正则表达式进行匹配。锚点仅对find()
方法有用。The compiler is using the UTF-8 character encoding to read your source file. But the file must have been written by an editor using a different encoding. Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again.
Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. For example, the character
A
can be replaced with the Unicode escape\u0041
.By the way, you don't need to use the begin- and end-line anchors
^
and$
when using thematches()
method. The entire sequence must be matched by the regular expression when using thematches()
method. The anchors are only useful with thefind()
method.感谢 Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) 的回答。
我在 Eclipse/STS 中执行了此操作:
Bingo,错误消失了!
Thanks Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) for your answer.
I did this in Eclipse / STS:
Bingo, error gone!
只需搜索
“
字符并将其更改为”
。Just search for
“
character and change it to"
.以下内容为我编译:
请参阅:
The following compiles for me:
See:
“错误:无法映射字符用于编码 UTF-8”意味着,java 发现了一个不以 UTF-8 表示的字符。因此,在编辑器中打开文件并将字符编码设置为 UTF-8。你应该能够找到一个UTF-8中没有表示的字符。去掉这个字符并重新编译。
"error: unmappable character for encoding UTF-8" means, java has found a character which is not representing in UTF-8. Hence open the file in an editor and set the character encoding to UTF-8. You should be able to find a character which is not represented in UTF-8.Take off this character and recompile.
我在使用 Eclipse 时发现了这个问题。我需要在 pom.xml 文件中添加编码并解决。 http://ctrlaltsolve.blogspot.in/2015/11/encoding -properties-in-maven.html
I observed this issue while using Eclipse. I needed to add encoding in my pom.xml file and it resolved. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html
我遇到了类似的问题,我用 IntelliJ 的下角修复了。
我将其从
LF
更改为CRLF
。以下是 IntelliJ 下角的样子:
I had the similar issue and I fix with the down corner of my IntelliJ.
I changed it from
LF
toCRLF
.Here is how it looks the down corner of the IntelliJ: