什么可能导致 String.charAt(0) 不打印任何内容,并且字符类型为“16”?

发布于 2025-01-05 00:33:37 字数 3182 浏览 1 评论 0原文

有人知道这里会发生什么吗?

第一个块显示了我通常期望看到的内容 - 字符串的第一个字符位于索引“0”中,“问题”字符串被注释掉,被完全相同的东西替换,但之前从未运行过。

public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    //String wordOne = "‭abc"; // old, pre-used string, used to hold a comma.
    String wordOne = "abc";// new, never run before with a comma
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

给出输出:

/*
    Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H

All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay

Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/

第二个块将“new”字符串注释掉,“wordOne”的第一个字符什么也没有。它不是空字符或换行符。我一直在使用该变量在“theDoc”中查找逗号...但是当我运行它时,索引“0”没有包含任何内容,而索引 1 中有逗号。如果我复制并粘贴该字符串,问题仍然存在。然而,注释掉/删除它就可以解决这个问题。

    public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    String wordOne = "‭abc"; // now running old string, used to hold comma
    //String wordOne = "abc"; 
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

它给出了输出:

/*  
    Type of character at index '0' in theDoc: 1
    Character at index '0' in theDoc: H
    
    All of wordOne: '‭abc'
    Type of character at index '0' in wordOne: 16 // What does this mean?
    Character at index '0' in wordOne: ‭   // where is the a? (well, its in wordOne index '1'... but why??)
    
    Type of Character at index '0' in wordTwo: 2
    Character at index '0' in wordTwo: a
*/

java 中是否有关于逗号或符号的东西会导致这样的问题?我尝试使用字符数组,清理工作区来重新构建所有内容,但没有任何改变……当某些克是“,和”之类的东西时,这对于在句子中查找“ngram”索引是一个巨大的问题。昨晚某个时候,它可以工作,然后突然开始不工作。我很困惑。

有什么想法吗?

Anyone have an idea about what could be going on here?

The first block shows what I would generally expect to see - the first character of a string is in index '0', with the 'problem' string commented out, replaced by the exact same thing, however never run before.

public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    //String wordOne = "‭abc"; // old, pre-used string, used to hold a comma.
    String wordOne = "abc";// new, never run before with a comma
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

Which gives output:

/*
    Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H

All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay

Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/

The second block has the 'new' string commented out, and the first character of 'wordOne' is nothing. It isn't a null character, or newline. I had been using that variable to find commas in 'theDoc'… but when I ran it, index '0' held nothing, and index 1 had the comma in it. If i copy and paste the string, the problem remains. However, commenting it out / deleting it, gets rid of the issue.

    public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    String wordOne = "‭abc"; // now running old string, used to hold comma
    //String wordOne = "abc"; 
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

Which gives output:

/*  
    Type of character at index '0' in theDoc: 1
    Character at index '0' in theDoc: H
    
    All of wordOne: '‭abc'
    Type of character at index '0' in wordOne: 16 // What does this mean?
    Character at index '0' in wordOne: ‭   // where is the a? (well, its in wordOne index '1'... but why??)
    
    Type of Character at index '0' in wordTwo: 2
    Character at index '0' in wordTwo: a
*/

Is there something about commas or symbols in java that would cause an issue like this? I tried using character arrays, cleaning the workspace to re-build everything, and nothing has changed this… Which is a huge problem for finding indices of 'ngrams' within sentences, when some grams are things like ", and". At one point last night, it was working, and then all of a sudden started not working. I'm quite confused.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寄离 2025-01-12 00:33:37

我尝试将您的示例粘贴到 Eclipse 中,它告诉我:

某些字符无法使用“Cp1252”字符编码进行映射。

并向我指出字符串中的第一个字符:

String wordOne = "abc";

"a 之间似乎有一个隐藏的(不可打印的)字符。

I tried pasting your example into Eclipse and it told me this:

Some characters cannot be mapped using "Cp1252" character encoding.

and pointed me to the first character in the string:

String wordOne = "abc";

It appears there is a hidden (non-printable) character between the " and the a.

ˉ厌 2025-01-12 00:33:37

字符类型 16 对应于 Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B)。这是一个无法打印的字符;您可以打印它的十六进制值来确认。

Character type 16 corresponds to Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B). It's an unprintable character; you can print it's hex value to confirm.

无可置疑 2025-01-12 00:33:37

您的字符串包含您无法看到的字符(在“a”之前)。 Unicode 集中有数十个字符没有有意义的视觉表示 - 这可能就是其中之一。

'16' 是字符类型,例如:

<块引用>

COMBINING_SPACING_MARK、CONNECTOR_PUNCTUATION、CONTROL、CURRENCY_SYMBOL、DASH_PUNCTUATION、DECIMAL_DIGIT_NUMBER、ENCLOSING_MARK、END_PUNCTUATION、FINAL_QUOTE_PUNCTUATION、FORMAT、INITIAL_QUOTE_PUNCTUATION、LETTER_NUMBER、 LINE_SEPARATOR、LOWERCASE_LETTER、MATH_SYMBOL、MODIFIER_LETTER、MODIFIER_SYMBOL、NON_SPACING_MARK、OTHER_LETTER、OTHER_NUMBER、OTHER_PUNCTUATION、OTHER_SYMBOL、PARAGRAPH_SEPARATOR、PRIVATE_USE、SPACE_SEPARATOR、START_PUNCTUATION、 SURROGATE、TITLECASE_LETTER、未分配、UPPERCASE_LETTER

所有这些都在 Character 类中定义。我无法告诉你它是哪一个,因为理论上这取决于实现;您应该检查这些值。或者,更好的是,使用 Character.getName 查找该角色的人类可读描述。

Your string contains a character you're having trouble seeing (before the 'a'). There are dozens of characters in the Unicode set which have no meaningful visual representation - this is probably one of them.

The '16' is the character type, for example:

COMBINING_SPACING_MARK, CONNECTOR_PUNCTUATION, CONTROL, CURRENCY_SYMBOL, DASH_PUNCTUATION, DECIMAL_DIGIT_NUMBER, ENCLOSING_MARK, END_PUNCTUATION, FINAL_QUOTE_PUNCTUATION, FORMAT, INITIAL_QUOTE_PUNCTUATION, LETTER_NUMBER, LINE_SEPARATOR, LOWERCASE_LETTER, MATH_SYMBOL, MODIFIER_LETTER, MODIFIER_SYMBOL, NON_SPACING_MARK, OTHER_LETTER, OTHER_NUMBER, OTHER_PUNCTUATION, OTHER_SYMBOL, PARAGRAPH_SEPARATOR, PRIVATE_USE, SPACE_SEPARATOR, START_PUNCTUATION, SURROGATE, TITLECASE_LETTER, UNASSIGNED, UPPERCASE_LETTER

All of which are defined in the Character class. I can't tell you which one it is, because that's implementation-dependent in theory; you should check against those values. Or, better yet, use Character.getName to find the human-readable description of the character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文