什么可能导致 String.charAt(0) 不打印任何内容，并且字符类型为“16”？

发布于 2025-01-05 00:33:37 字数 3182 浏览 7 评论 0原文

有人知道这里会发生什么吗？

第一个块显示了我通常期望看到的内容 - 字符串的第一个字符位于索引“0”中，“问题”字符串被注释掉，被完全相同的东西替换，但之前从未运行过。

public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    //String wordOne = "‭abc"; // old, pre-used string, used to hold a comma.
    String wordOne = "abc";// new, never run before with a comma
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

给出输出：

/*
    Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H

All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay

Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/

第二个块将“new”字符串注释掉，“wordOne”的第一个字符什么也没有。它不是空字符或换行符。我一直在使用该变量在“theDoc”中查找逗号...但是当我运行它时，索引“0”没有包含任何内容，而索引 1 中有逗号。如果我复制并粘贴该字符串，问题仍然存在。然而，注释掉/删除它就可以解决这个问题。

    public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    String wordOne = "‭abc"; // now running old string, used to hold comma
    //String wordOne = "abc"; 
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

它给出了输出：

/*  
    Type of character at index '0' in theDoc: 1
    Character at index '0' in theDoc: H
    
    All of wordOne: '‭abc'
    Type of character at index '0' in wordOne: 16 // What does this mean?
    Character at index '0' in wordOne: ‭   // where is the a? (well, its in wordOne index '1'... but why??)
    
    Type of Character at index '0' in wordTwo: 2
    Character at index '0' in wordTwo: a
*/

java 中是否有关于逗号或符号的东西会导致这样的问题？我尝试使用字符数组，清理工作区来重新构建所有内容，但没有任何改变……当某些克是“，和”之类的东西时，这对于在句子中查找“ngram”索引是一个巨大的问题。昨晚某个时候，它可以工作，然后突然开始不工作。我很困惑。

有什么想法吗？

原文

Anyone have an idea about what could be going on here?

The first block shows what I would generally expect to see - the first character of a string is in index '0', with the 'problem' string commented out, replaced by the exact same thing, however never run before.

public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    //String wordOne = "‭abc"; // old, pre-used string, used to hold a comma.
    String wordOne = "abc";// new, never run before with a comma
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

Which gives output:

/*
    Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H

All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay

Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/

The second block has the 'new' string commented out, and the first character of 'wordOne' is nothing. It isn't a null character, or newline. I had been using that variable to find commas in 'theDoc'… but when I ran it, index '0' held nothing, and index 1 had the comma in it. If i copy and paste the string, the problem remains. However, commenting it out / deleting it, gets rid of the issue.

    public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    String wordOne = "‭abc"; // now running old string, used to hold comma
    //String wordOne = "abc"; 
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

Which gives output:

/*  
    Type of character at index '0' in theDoc: 1
    Character at index '0' in theDoc: H
    
    All of wordOne: '‭abc'
    Type of character at index '0' in wordOne: 16 // What does this mean?
    Character at index '0' in wordOne: ‭   // where is the a? (well, its in wordOne index '1'... but why??)
    
    Type of Character at index '0' in wordTwo: 2
    Character at index '0' in wordTwo: a
*/

Is there something about commas or symbols in java that would cause an issue like this? I tried using character arrays, cleaning the workspace to re-build everything, and nothing has changed this… Which is a huge problem for finding indices of 'ngrams' within sentences, when some grams are things like ", and". At one point last night, it was working, and then all of a sudden started not working. I'm quite confused.

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄离 2025-01-12 00:33:37

我尝试将您的示例粘贴到 Eclipse 中，它告诉我：

某些字符无法使用“Cp1252”字符编码进行映射。

并向我指出字符串中的第一个字符：

String wordOne = "abc";

在 " 和 a 之间似乎有一个隐藏的（不可打印的）字符。

I tried pasting your example into Eclipse and it told me this:

Some characters cannot be mapped using "Cp1252" character encoding.

and pointed me to the first character in the string:

String wordOne = "abc";

It appears there is a hidden (non-printable) character between the " and the a.

回复收藏 0 原文

ˉ厌 2025-01-12 00:33:37

字符类型 16 对应于 Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B)。这是一个无法打印的字符；您可以打印它的十六进制值来确认。

回复收藏 0 原文

无可置疑 2025-01-12 00:33:37

您的字符串包含您无法看到的字符（在“a”之前）。 Unicode 集中有数十个字符没有有意义的视觉表示 - 这可能就是其中之一。

'16' 是字符类型，例如：

<块引用>
COMBINING_SPACING_MARK、CONNECTOR_PUNCTUATION、CONTROL、CURRENCY_SYMBOL、DASH_PUNCTUATION、DECIMAL_DIGIT_NUMBER、ENCLOSING_MARK、END_PUNCTUATION、FINAL_QUOTE_PUNCTUATION、FORMAT、INITIAL_QUOTE_PUNCTUATION、LETTER_NUMBER、 LINE_SEPARATOR、LOWERCASE_LETTER、MATH_SYMBOL、MODIFIER_LETTER、MODIFIER_SYMBOL、NON_SPACING_MARK、OTHER_LETTER、OTHER_NUMBER、OTHER_PUNCTUATION、OTHER_SYMBOL、PARAGRAPH_SEPARATOR、PRIVATE_USE、SPACE_SEPARATOR、START_PUNCTUATION、 SURROGATE、TITLECASE_LETTER、未分配、UPPERCASE_LETTER

所有这些都在 Character 类中定义。我无法告诉你它是哪一个，因为理论上这取决于实现；您应该检查这些值。或者，更好的是，使用 Character.getName 查找该角色的人类可读描述。