“用于编码 UTF-8 的不可映射字符”错误

发布于 2024-10-17 03:27:44 字数 373 浏览 2 评论 0原文

我在以下方法中遇到编译错误。

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

我该如何解决这个问题?谢谢。

I'm getting a compile error at the following method.

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

How can I fix this? Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

甜是你 2024-10-24 03:27:44

您的源代码文件存在编码问题。它可能是 ISO-8859-1 编码的,但编译器设置为使用 UTF-8。这会导致使用字符时出现错误,因为字符在 UTF-8 和 ISO-8859-1 中的字节表示形式不同。这将发生在所有不属于 ASCII 的字符上,例如 - 不签名

您可以使用以下程序来模拟这一点。它只是使用您的源代码行并生成 ISO-8859-1 字节数组,并使用 UTF-8 编码解码此“错误”。您可以看到线路在哪个位置被损坏。我在源代码中添加了 2 个空格以适合位置 74,以使其适合 Ø NOT SIGN,这是唯一的字符,在ISO-8859-1编码和UTF-8编码中会生成不同的字节。我想这将与真实源文件的缩进相匹配。

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));     

这会产生以下输出(由于标记而混乱):

字符串 reg = "^(?=.[0-9])(?=.[az])(?=.[AZ])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

字符串 reg = "^(?=.[0-9])(?=.[az])(?=.[AZ])(?=.[~#;:?/@&!"'%*=Ø.,-])(?=[^\s]+$).{8,24}$";: Ø


请参阅 https://ideone.com/ShZnB

要解决此问题,请使用 UTF-8 编码保存源文件。

You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));     

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.

二智少女 2024-10-24 03:27:44

我正在 Linux 机器上为 2000 年启动的遗留系统设置 CI 构建服务器。其中有一个部分生成包含非 UTF8 字符的 PDF。我们正处于发布的最后阶段,所以我无法替换那些让我悲伤的角色,但出于 Dilbertesque 的原因,我迫不及待地在发布后一周解决这个问题。幸运的是,Ant 中的“javac”命令有一个“encoding”参数。

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>
放血 2024-10-24 03:27:44

Java 编译器假定您的输入是 UTF-8 编码的,因为您指定了它,或者因为它是您的平台默认编码。

但是,.java 文件中的数据实际上并不是以 UTF-8 编码的。问题可能是 - 字符。确保您选择的编辑器(或 IDE)确实以 UTF-8 编码保护其文件。

The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.

However, the data in your .java files is not actually encoded in UTF-8. The problem is probably the ¬ character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.

挽梦忆笙歌 2024-10-24 03:27:44

对于 IntelliJ 用户来说,一旦你知道原始编码是什么,这就非常容易了。您可以从窗口右下角选择编码,系统将提示您一个对话框:

您选择的编码(“[编码类型]”)可能会更改内容
“[您的文件]”。您想从磁盘重新加载文件还是转换
文本并以新编码保存?

因此,如果您碰巧以某种奇怪的编码保存了一些字符,您应该首先选择“重新加载”以全部以错误字符的编码加载文件。对我来说这变成了?字符转换为适当的值。

IntelliJ 可以判断您是否很可能没有选择正确的编码,并会向您发出警告。恢复并重试。

一旦您看到坏字符消失,请将右下角的编码选择框更改回您最初想要的格式(如果您在 Google 上搜索此错误消息,则可能是 UTF-8)。这次选择对话框上的“转换”按钮。

对我来说,我需要重新加载为“windows-1252”,然后转换回“UTF-8”。违规字符是单引号(' 和 '),很可能是从 Word 文档(或电子邮件)中以错误的编码粘贴的,上述操作会将它们转换为 UTF-8。

For IntelliJ users, this is pretty easy once you find out what the original encoding was. You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying:

The encoding you've chosen ('[encoding type]') may change the contents
of '[Your file]'. Do you want to reload the file from disk or convert
the text and save in the new encoding?

So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. For me this turned the ? characters into their proper value.

IntelliJ can tell if you most likely did not pick the right encoding and will warn you. Revert back and try again.

Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). This time select the 'Convert' button on the dialog.

For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. The offending characters were single quotes (‘ and ’) likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8.

罪#恶を代价 2024-10-24 03:27:44

在 Eclipse 中尝试转到文件属性 (Alt+Enter) 并更改 资源 → '文本文件编码 ' → 其他UTF-8。重新打开文件并检查字符串/文件中的某处是否存在垃圾字符。将其删除。保存文件。

将编码资源→“文本文件编码”更改回默认值。

编译并部署代码。

In eclipse try to go to file properties (Alt+Enter) and change the Resource → 'Text File encoding' → Other to UTF-8. Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.

Change the encoding Resource → 'Text File encoding' back to Default.

Compile and deploy the code.

孤独陪着我 2024-10-24 03:27:44

编译器使用 UTF-8 字符编码来读取源文件。但该文件必须是由编辑器使用不同的编码编写的。在设置为 UTF-8 编码的编辑器中打开文件,修复引号,然后再次保存。

或者,您可以找到字符的 Unicode 点并在源代码中使用 Unicode 转义。例如,字符 A 可以替换为 Unicode 转义符 \u0041

顺便说一句,在使用 matches() 方法时,不需要使用开始和结束行锚点 ^$ 。使用 matches() 方法时,整个序列必须通过正则表达式进行匹配。锚点仅对 find() 方法有用。

The compiler is using the UTF-8 character encoding to read your source file. But the file must have been written by an editor using a different encoding. Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again.

Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. For example, the character A can be replaced with the Unicode escape \u0041.

By the way, you don't need to use the begin- and end-line anchors ^ and $ when using the matches() method. The entire sequence must be matched by the regular expression when using the matches() method. The anchors are only useful with the find() method.

£冰雨忧蓝° 2024-10-24 03:27:44

感谢 Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) 的回答。

我在 Eclipse/STS 中执行了此操作:

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo,错误消失了!

Thanks Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) for your answer.

I did this in Eclipse / STS:

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo, error gone!

睫毛上残留的泪 2024-10-24 03:27:44

只需搜索字符并将其更改为

Just search for character and change it to ".

得不到的就毁灭 2024-10-24 03:27:44

以下内容为我编译:

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

请参阅:

在此处输入图像描述

The following compiles for me:

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

See:

enter image description here

“错误:无法映射字符用于编码 UTF-8”意味着,java 发现了一个不以 UTF-8 表示的字符。因此,在编辑器中打开文件并将字符编码设置为 UTF-8。你应该能够找到一个UTF-8中没有表示的字符。去掉这个字符并重新编译。

"error: unmappable character for encoding UTF-8" means, java has found a character which is not representing in UTF-8. Hence open the file in an editor and set the character encoding to UTF-8. You should be able to find a character which is not represented in UTF-8.Take off this character and recompile.

尤怨 2024-10-24 03:27:44

我在使用 Eclipse 时发现了这个问题。我需要在 pom.xml 文件中添加编码并解决。 http://ctrlaltsolve.blogspot.in/2015/11/encoding -properties-in-maven.html

I observed this issue while using Eclipse. I needed to add encoding in my pom.xml file and it resolved. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html

何处潇湘 2024-10-24 03:27:44

我遇到了类似的问题,我用 IntelliJ 的下角修复了。

我将其从 LF 更改为 CRLF

以下是 IntelliJ 下角的样子:

IntelliJ_image

I had the similar issue and I fix with the down corner of my IntelliJ.

I changed it from LF to CRLF.

Here is how it looks the down corner of the IntelliJ:

IntelliJ_image

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文