“用于编码 UTF-8 的不可映射字符”错误

发布于 2024-10-17 03:27:44 字数 373 浏览 2 评论 0原文

我在以下方法中遇到编译错误。

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}

at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

我该如何解决这个问题？谢谢。

原文

I'm getting a compile error at the following method.

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}

at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

How can I fix this? Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜是你 2024-10-24 03:27:44

您的源代码文件存在编码问题。它可能是 ISO-8859-1 编码的，但编译器设置为使用 UTF-8。这会导致使用字符时出现错误，因为字符在 UTF-8 和 ISO-8859-1 中的字节表示形式不同。这将发生在所有不属于 ASCII 的字符上，例如 - 不签名。

您可以使用以下程序来模拟这一点。它只是使用您的源代码行并生成 ISO-8859-1 字节数组，并使用 UTF-8 编码解码此“错误”。您可以看到线路在哪个位置被损坏。我在源代码中添加了 2 个空格以适合位置 74，以使其适合 Ø NOT SIGN，这是唯一的字符，在ISO-8859-1编码和UTF-8编码中会生成不同的字节。我想这将与真实源文件的缩进相匹配。

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));

这会产生以下输出（由于标记而混乱）：

字符串 reg = "^(?=.[0-9])(?=.[az])(?=.[AZ])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �
字符串 reg = "^(?=.[0-9])(?=.[az])(?=.[AZ])(?=.[~#;:?/@&!"'%*=Ø.,-])(?=[^\s]+$).{8,24}$";: Ø

请参阅 https://ideone.com/ShZnB

要解决此问题，请使用 UTF-8 编码保存源文件。

You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �
String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.

回复收藏 0 原文

二智少女 2024-10-24 03:27:44

我正在 Linux 机器上为 2000 年启动的遗留系统设置 CI 构建服务器。其中有一个部分生成包含非 UTF8 字符的 PDF。我们正处于发布的最后阶段，所以我无法替换那些让我悲伤的角色，但出于 Dilbertesque 的原因，我迫不及待地在发布后一周解决这个问题。幸运的是，Ant 中的“javac”命令有一个“encoding”参数。

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

回复收藏 0 原文

放血 2024-10-24 03:27:44

Java 编译器假定您的输入是 UTF-8 编码的，因为您指定了它，或者因为它是您的平台默认编码。

但是，.java 文件中的数据实际上并不是以 UTF-8 编码的。问题可能是 - 字符。确保您选择的编辑器（或 IDE）确实以 UTF-8 编码保护其文件。

回复收藏 0 原文

挽梦忆笙歌 2024-10-24 03:27:44

对于 IntelliJ 用户来说，一旦你知道原始编码是什么，这就非常容易了。您可以从窗口右下角选择编码，系统将提示您一个对话框：

您选择的编码（“[编码类型]”）可能会更改内容
“[您的文件]”。您想从磁盘重新加载文件还是转换
文本并以新编码保存？

因此，如果您碰巧以某种奇怪的编码保存了一些字符，您应该首先选择“重新加载”以全部以错误字符的编码加载文件。对我来说这变成了？字符转换为适当的值。

IntelliJ 可以判断您是否很可能没有选择正确的编码，并会向您发出警告。恢复并重试。

一旦您看到坏字符消失，请将右下角的编码选择框更改回您最初想要的格式（如果您在 Google 上搜索此错误消息，则可能是 UTF-8）。这次选择对话框上的“转换”按钮。

对我来说，我需要重新加载为“windows-1252”，然后转换回“UTF-8”。违规字符是单引号（' 和 '），很可能是从 Word 文档（或电子邮件）中以错误的编码粘贴的，上述操作会将它们转换为 UTF-8。

回复收藏 0 原文

罪#恶を代价 2024-10-24 03:27:44

在 Eclipse 中尝试转到文件属性 (Alt+Enter) 并更改 资源 → '文本文件编码 ' → 其他 到 UTF-8。重新打开文件并检查字符串/文件中的某处是否存在垃圾字符。将其删除。保存文件。

将编码资源→“文本文件编码”更改回默认值。

编译并部署代码。

回复收藏 0 原文

孤独陪着我 2024-10-24 03:27:44

编译器使用 UTF-8 字符编码来读取源文件。但该文件必须是由编辑器使用不同的编码编写的。在设置为 UTF-8 编码的编辑器中打开文件，修复引号，然后再次保存。

或者，您可以找到字符的 Unicode 点并在源代码中使用 Unicode 转义。例如，字符 A 可以替换为 Unicode 转义符 \u0041。

顺便说一句，在使用 matches() 方法时，不需要使用开始和结束行锚点 ^ 和 $ 。使用 matches() 方法时，整个序列必须通过正则表达式进行匹配。锚点仅对 find() 方法有用。

回复收藏 0 原文

￡冰雨忧蓝° 2024-10-24 03:27:44

感谢 Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) 的回答。

我在 Eclipse/STS 中执行了此操作：

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo，错误消失了！

Thanks Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) for your answer.

I did this in Eclipse / STS:

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo, error gone!

回复收藏 0 原文

睫毛上残留的泪 2024-10-24 03:27:44

只需搜索“字符并将其更改为”。

回复收藏 0 原文

得不到的就毁灭 2024-10-24 03:27:44

以下内容为我编译：

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

请参阅：

在此处输入图像描述

The following compiles for me:

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

See:

enter image description here

回复收藏 0 原文

习惯那些不曾习惯的习惯 2024-10-24 03:27:44

“错误：无法映射字符用于编码 UTF-8”意味着，java 发现了一个不以 UTF-8 表示的字符。因此，在编辑器中打开文件并将字符编码设置为 UTF-8。你应该能够找到一个UTF-8中没有表示的字符。去掉这个字符并重新编译。

回复收藏 0 原文

尤怨 2024-10-24 03:27:44

我在使用 Eclipse 时发现了这个问题。我需要在 pom.xml 文件中添加编码并解决。 http://ctrlaltsolve.blogspot.in/2015/11/encoding -properties-in-maven.html

回复收藏 0 原文

何处潇湘 2024-10-24 03:27:44

我遇到了类似的问题，我用 IntelliJ 的下角修复了。

我将其从 LF 更改为 CRLF。

以下是 IntelliJ 下角的样子：

IntelliJ_image

回复收藏 0 原文

~没有更多了~

关于作者

徒留西风

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

“用于编码 UTF-8 的不可映射字符”错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（12）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

“用于编码 UTF-8 的不可映射字符”错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（12）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。