什么可能导致 String.charAt(0) 不打印任何内容,并且字符类型为“16”?
有人知道这里会发生什么吗?
第一个块显示了我通常期望看到的内容 - 字符串的第一个字符位于索引“0”中,“问题”字符串被注释掉,被完全相同的东西替换,但之前从未运行过。
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
//String wordOne = "abc"; // old, pre-used string, used to hold a comma.
String wordOne = "abc";// new, never run before with a comma
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
给出输出:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
第二个块将“new”字符串注释掉,“wordOne”的第一个字符什么也没有。它不是空字符或换行符。我一直在使用该变量在“theDoc”中查找逗号...但是当我运行它时,索引“0”没有包含任何内容,而索引 1 中有逗号。如果我复制并粘贴该字符串,问题仍然存在。然而,注释掉/删除它就可以解决这个问题。
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
String wordOne = "abc"; // now running old string, used to hold comma
//String wordOne = "abc";
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
它给出了输出:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 16 // What does this mean?
Character at index '0' in wordOne: // where is the a? (well, its in wordOne index '1'... but why??)
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
java 中是否有关于逗号或符号的东西会导致这样的问题?我尝试使用字符数组,清理工作区来重新构建所有内容,但没有任何改变……当某些克是“,和”之类的东西时,这对于在句子中查找“ngram”索引是一个巨大的问题。昨晚某个时候,它可以工作,然后突然开始不工作。我很困惑。
有什么想法吗?
Anyone have an idea about what could be going on here?
The first block shows what I would generally expect to see - the first character of a string is in index '0', with the 'problem' string commented out, replaced by the exact same thing, however never run before.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
//String wordOne = "abc"; // old, pre-used string, used to hold a comma.
String wordOne = "abc";// new, never run before with a comma
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
The second block has the 'new' string commented out, and the first character of 'wordOne' is nothing. It isn't a null character, or newline. I had been using that variable to find commas in 'theDoc'… but when I ran it, index '0' held nothing, and index 1 had the comma in it. If i copy and paste the string, the problem remains. However, commenting it out / deleting it, gets rid of the issue.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
String wordOne = "abc"; // now running old string, used to hold comma
//String wordOne = "abc";
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 16 // What does this mean?
Character at index '0' in wordOne: // where is the a? (well, its in wordOne index '1'... but why??)
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
Is there something about commas or symbols in java that would cause an issue like this? I tried using character arrays, cleaning the workspace to re-build everything, and nothing has changed this… Which is a huge problem for finding indices of 'ngrams' within sentences, when some grams are things like ", and". At one point last night, it was working, and then all of a sudden started not working. I'm quite confused.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我尝试将您的示例粘贴到 Eclipse 中,它告诉我:
并向我指出字符串中的第一个字符:
在
"
和a
之间似乎有一个隐藏的(不可打印的)字符。I tried pasting your example into Eclipse and it told me this:
and pointed me to the first character in the string:
It appears there is a hidden (non-printable) character between the
"
and thea
.字符类型 16 对应于 Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B)。这是一个无法打印的字符;您可以打印它的十六进制值来确认。
Character type 16 corresponds to Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B). It's an unprintable character; you can print it's hex value to confirm.
您的字符串包含您无法看到的字符(在“a”之前)。 Unicode 集中有数十个字符没有有意义的视觉表示 - 这可能就是其中之一。
'16' 是字符类型,例如:
所有这些都在
Character
类中定义。我无法告诉你它是哪一个,因为理论上这取决于实现;您应该检查这些值。或者,更好的是,使用Character.getName
查找该角色的人类可读描述。Your string contains a character you're having trouble seeing (before the 'a'). There are dozens of characters in the Unicode set which have no meaningful visual representation - this is probably one of them.
The '16' is the character type, for example:
All of which are defined in the
Character
class. I can't tell you which one it is, because that's implementation-dependent in theory; you should check against those values. Or, better yet, useCharacter.getName
to find the human-readable description of the character.