Java 如何添加重音符号“e”到一个字符串?
在现有帖子 tucuxi 的帮助下 Java 从没有正则的字符串中删除 HTML表达式 我构建了一个方法,可以从字符串中解析出任何基本的 HTML 标签。然而,有时,原始字符串包含 html 十六进制字符,例如 é(这是带重音的 e)。我已经开始添加功能,将这些转义字符转换为真实字符。
您可能会问:为什么不使用正则表达式?或者第三方库?不幸的是我不能,因为我正在不支持正则表达式的黑莓平台上进行开发,并且我从未能够成功地将第三方库添加到我的项目中。
所以,我已经到了任何 é 都被替换为“e”的地步。我现在的问题是,如何将实际的“重音 e”添加到字符串中?
这是我的代码:
public static String removeHTML(String synopsis) {
char[] cs = synopsis.toCharArray();
String sb = new String();
boolean tag = false;
for (int i = 0; i < cs.length; i++) {
switch (cs[i]) {
case '<':
if (!tag) {
tag = true;
break;
}
case '>':
if (tag) {
tag = false;
break;
}
case '&':
char[] copyTo = new char[7];
System.arraycopy(cs, i, copyTo, 0, 7);
String result = new String(copyTo);
if (result.equals("é")) {
sb += "e";
}
i += 7;
break;
default:
if (!tag)
sb += cs[i];
}
}
return sb.toString();
}
谢谢!
With the help of tucuxi from the existing post Java remove HTML from String without regular expressions I have built a method that will parse out any basic HTML tags from a string. Sometimes, however, the original string contains html hexadecimal characters like é (which is an accented e). I have started to add functionality which will translate these escaped characters into real characters.
You're probably asking: Why not use regular expressions? Or a third party library? Unfortunately I cannot, as I am developing on a BlackBerry platform which does not support regular expressions and I have never been able to successfully add a third party library to my project.
So, I have gotten to the point where any é is replaced with "e". My question now is, how do I add an actual 'accented e' to a string?
Here is my code:
public static String removeHTML(String synopsis) {
char[] cs = synopsis.toCharArray();
String sb = new String();
boolean tag = false;
for (int i = 0; i < cs.length; i++) {
switch (cs[i]) {
case '<':
if (!tag) {
tag = true;
break;
}
case '>':
if (tag) {
tag = false;
break;
}
case '&':
char[] copyTo = new char[7];
System.arraycopy(cs, i, copyTo, 0, 7);
String result = new String(copyTo);
if (result.equals("é")) {
sb += "e";
}
i += 7;
break;
default:
if (!tag)
sb += cs[i];
}
}
return sb.toString();
}
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Java 字符串是 unicode。
Java Strings are unicode.
您可以在 Java 中打印出您喜欢的任何字符,因为它使用 Unicode 字符集。
要找到您想要的字符,请查看此处的图表:
http://www.unicode.org/charts /
在拉丁语补充文档中,您将看到重音字符的所有 unicode 数字。例如,您应该看到 é 列出的十六进制数字 00E9。所有拉丁重音字符的数字都在本文档中,因此您应该会发现这非常有用。
要打印字符串中的 use 字符,只需使用 Unicode 转义序列 \u 后跟字符代码,如下所示:
会生成:“Let's go to the cafe”
根据您使用的 Java 版本,您可能会找到 StringBuilders (或者 StringBuffers(如果是多线程)也比使用 + 运算符连接字符串更有效。
You can print out just about any character you like in Java as it uses the Unicode character set.
To find the character you want take a look at the charts here:
http://www.unicode.org/charts/
In the Latin Supplement document you'll see all the unicode numbers for the accented characters. You should see the hex number 00E9 listed for é for example. The numbers for all Latin accented characters are in this document so you should find this pretty useful.
To print use character in a String, just use the Unicode escape sequence of \u followed by the character code like so:
Would produce: "Let's go to the café"
Depending in which version of Java you're using you might find StringBuilders (or StringBuffers if you're multi-threaded) more efficient than using the + operator to concatenate Strings too.
试试这个:
而不是
问题是,您没有在“e”字符的顶部添加重音符号,而是将其作为一个单独的字符一起使用。这个站点列出了字符的ascii代码。
try this:
instead of
The thing is that you're not adding an accent to the top of the 'e' character, but rather that is a separate character all together. This site lists out the ascii codes for characters.
有关 Java 中的重音字符表请查看此参考。
要解码 html 部分,请使用 Apache StringEscapeUtils 来自 Apache commons lang:
导入 org.apache.commons.lang.StringEscapeUtils;
...
String withCharacters = StringEscapeUtils.unescapeHtml(yourString);
另请参阅此堆栈溢出线程:
将 HTML 代码替换为 Java 中的等效字符
For a table of accented in characters in Java take a look at this reference.
To decode the html part, use Apache StringEscapeUtils from Apache commons lang:
import org.apache.commons.lang.StringEscapeUtils;
...
String withCharacters = StringEscapeUtils.unescapeHtml(yourString);
See also this Stack Overflow thread:
Replace HTML codes with equivalent characters in Java