如何替换 �在一个字符串中

发布于 2024-08-06 05:27:08 字数 121 浏览 3 评论 0 原文

我有一个包含字符 � 的字符串,但我无法正确替换它。

String.replace("�", "");

不起作用,有谁知道如何删除/替换字符串中的 ï¿

I have a string that contains a character � I haven't been able to replace it correctly.

String.replace("�", "");

doesn't work, does anyone know how to remove/replace the � in the string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

风启觞 2024-08-13 05:27:08

这就是 Unicode 替换字符 \uFFFD。 (info

类似这样的东西应该有效:

String strImport = "For some reason my �double quotes� were lost.";
strImport = strImport.replaceAll("\uFFFD", "\"");

That's the Unicode Replacement Character, \uFFFD. (info)

Something like this should work:

String strImport = "For some reason my �double quotes� were lost.";
strImport = strImport.replaceAll("\uFFFD", "\"");
夜血缘 2024-08-13 05:27:08

像这样的字符问题很难诊断,因为通过应用程序错误、错误配置、剪切粘贴等对字符的误解很容易丢失信息。

我(显然还有其他人)所看到的,您粘贴了三个字符:

codepoint   glyph   escaped    windows-1252    info
=======================================================================
U+00ef      ï       \u00ef     ef,             LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf      ¿       \u00bf     bf,             LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd      ½       \u00bd     bd,             LATIN_1_SUPPLEMENT, OTHER_NUMBER

正如 角色,从此页面下载并运行程序。将您的字符粘贴到文本字段中并选择字形模式;将报告粘贴到您的问题中。它将帮助人们识别有问题的角色。

Character issues like this are difficult to diagnose because information is easily lost through misinterpretation of characters via application bugs, misconfiguration, cut'n'paste, etc.

As I (and apparently others) see it, you've pasted three characters:

codepoint   glyph   escaped    windows-1252    info
=======================================================================
U+00ef      ï       \u00ef     ef,             LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf      ¿       \u00bf     bf,             LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd      ½       \u00bd     bd,             LATIN_1_SUPPLEMENT, OTHER_NUMBER

To identify the character, download and run the program from this page. Paste your character into the text field and select the glyph mode; paste the report into your question. It'll help people identify the problematic character.

寄离 2024-08-13 05:27:08

您要求替换字符“�”,但对我来说,它是三个字符“ï”、“¿”和“½”。这可能是您的问题...如果您使用 Java 1.5 之前的 Java,那么您只能获得 UCS-2 字符,即前 65K UTF-8 字符。根据其他评论,您要查找的字符很可能是“�”,即 Unicode 替换字符。该字符“用于替换其值未知或无法用 Unicode 表示的传入字符”。

实际上,看看 Kathy 的评论,您可能遇到的另一个问题是 javac 不会将您的 .java 文件解释为 UTF-8,假设您正在以 UTF-8 编写它。尝试使用:

javac -encoding UTF-8 xx.java

或者,修改源代码以执行以下操作:

String.replaceAll("\uFFFD", "");

You are asking to replace the character "�" but for me that is coming through as three characters 'ï', '¿' and '½'. This might be your problem... If you are using Java prior to Java 1.5 then you only get the UCS-2 characters, that is only the first 65K UTF-8 characters. Based on other comments, it is most likely that the character that you are looking for is '�', that is the Unicode replacement character. This is the character that is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode".

Actually, looking at the comment from Kathy, the other issue that you might be having is that javac is not interpreting your .java file as UTF-8, assuming that you are writing it in UTF-8. Try using:

javac -encoding UTF-8 xx.java

Or, modify your source code to do:

String.replaceAll("\uFFFD", "");
柳若烟 2024-08-13 05:27:08

正如其他人所说,您发布了 3 个字符而不是 1 个。我建议您运行这个小代码片段来查看字符串中的实际

public static void dumpString(String text)
{
    for (int i=0; i < text.length(); i++)
    {
        System.out.println("U+" + Integer.toString(text.charAt(i), 16) 
                           + " " + text.charAt(i));
    }
}

如果您发布结果,将更容易弄清楚发生了什么。 (我没有费心填充字符串 - 我们可以通过检查来做到这一点......)

As others have said, you posted 3 characters instead of one. I suggest you run this little snippet of code to see what's actually in your string:

public static void dumpString(String text)
{
    for (int i=0; i < text.length(); i++)
    {
        System.out.println("U+" + Integer.toString(text.charAt(i), 16) 
                           + " " + text.charAt(i));
    }
}

If you post the results of that, it'll be easier to work out what's going on. (I haven't bothered padding the string - we can do that by inspection...)

是伱的 2024-08-13 05:27:08

解析时将编码更改为 UTF-8。这将删除特殊字符

Change the Encoding to UTF-8 while parsing .This will remove the special characters

唔猫 2024-08-13 05:27:08

使用 unicode 转义序列。首先,您必须找到要替换的字符的代码点(假设它是十六进制的 ABCD):

str = str.replaceAll("\uABCD", "");

Use the unicode escape sequence. First you'll have to find the codepoint for the character you seek to replace (let's just say it is ABCD in hex):

str = str.replaceAll("\uABCD", "");
别理我 2024-08-13 05:27:08

了解详情

import java.io.UnsupportedEncodingException;

/**
 * File: BOM.java
 * 
 * check if the bom character is present in the given string print the string
 * after skipping the utf-8 bom characters print the string as utf-8 string on a
 * utf-8 console
 */

public class BOM
{
    private final static String BOM_STRING = "Hello World";
    private final static String ISO_ENCODING = "ISO-8859-1";
    private final static String UTF8_ENCODING = "UTF-8";
    private final static int UTF8_BOM_LENGTH = 3;

    public static void main(String[] args) throws UnsupportedEncodingException {
        final byte[] bytes = BOM_STRING.getBytes(ISO_ENCODING);
        if (isUTF8(bytes)) {
            printSkippedBomString(bytes);
            printUTF8String(bytes);
        }
    }

    private static void printSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
        int length = bytes.length - UTF8_BOM_LENGTH;
        byte[] barray = new byte[length];
        System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
        System.out.println(new String(barray, ISO_ENCODING));
    }

    private static void printUTF8String(final byte[] bytes) throws UnsupportedEncodingException {
        System.out.println(new String(bytes, UTF8_ENCODING));
    }

    private static boolean isUTF8(byte[] bytes) {
        if ((bytes[0] & 0xFF) == 0xEF && 
            (bytes[1] & 0xFF) == 0xBB && 
            (bytes[2] & 0xFF) == 0xBF) {
            return true;
        }
        return false;
    }
}

for detail

import java.io.UnsupportedEncodingException;

/**
 * File: BOM.java
 * 
 * check if the bom character is present in the given string print the string
 * after skipping the utf-8 bom characters print the string as utf-8 string on a
 * utf-8 console
 */

public class BOM
{
    private final static String BOM_STRING = "Hello World";
    private final static String ISO_ENCODING = "ISO-8859-1";
    private final static String UTF8_ENCODING = "UTF-8";
    private final static int UTF8_BOM_LENGTH = 3;

    public static void main(String[] args) throws UnsupportedEncodingException {
        final byte[] bytes = BOM_STRING.getBytes(ISO_ENCODING);
        if (isUTF8(bytes)) {
            printSkippedBomString(bytes);
            printUTF8String(bytes);
        }
    }

    private static void printSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
        int length = bytes.length - UTF8_BOM_LENGTH;
        byte[] barray = new byte[length];
        System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
        System.out.println(new String(barray, ISO_ENCODING));
    }

    private static void printUTF8String(final byte[] bytes) throws UnsupportedEncodingException {
        System.out.println(new String(bytes, UTF8_ENCODING));
    }

    private static boolean isUTF8(byte[] bytes) {
        if ((bytes[0] & 0xFF) == 0xEF && 
            (bytes[1] & 0xFF) == 0xBB && 
            (bytes[2] & 0xFF) == 0xBF) {
            return true;
        }
        return false;
    }
}
墨落画卷 2024-08-13 05:27:08

剖析 URL 代码和 unicode 错误。这个符号也在谷歌翻译中出现在亚美尼亚语文本中,有时还有破碎的缅甸语文本中。

dissect the URL code and unicode error. this symbol came to me as well on google translate in the armenian text and sometimes the broken burmese.

爱你不解释 2024-08-13 05:27:08

profilage bas� sur l'analysis de l'esprit(法语)

应翻译为:

profilage basé sur l'analysis de l'esprit

因此,在本例中<代码>� = <代码>é

profilage bas� sur l'analyse de l'esprit (french)

should be translated as:

profilage basé sur l'analyse de l'esprit

so, in this case � = é

葬花如无物 2024-08-13 05:27:08

以上答案都不能解决我的问题。当我下载 xml 时,它会将  附加到我的 xml 中。我只是

xml = parser.getXmlFromUrl(url);

xml = xml.substring(3);// it remove first three character from string,

现在它运行准确。

No above answer resolve my issue. When i download xml it apppends <xml to my xml. I simply

xml = parser.getXmlFromUrl(url);

xml = xml.substring(3);// it remove first three character from string,

now it is running accurately.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文