StreamTokenizer 将 001_to_003 拆分为两个令牌;我怎样才能阻止它这样做?
Java的StreamTokenizer在识别数字方面似乎太贪婪了。它的配置选项相对较少,而且我还没有找到让它执行我想要的操作的方法。以下测试通过,IMO 显示了实现中的一个错误;我真正想要的是将第二个标记识别为单词“20001_to_30000”。有什么想法吗?
public void testBrokenTokenizer()
throws Exception
{
final String query = "foo_bah 20001_to_30000";
StreamTokenizer tok = new StreamTokenizer(new StringReader(query));
tok.wordChars('_', '_');
assertEquals(tok.nextToken(), StreamTokenizer.TT_WORD);
assertEquals(tok.sval, "foo_bah");
assertEquals(tok.nextToken(), StreamTokenizer.TT_NUMBER);
assertEquals(tok.nval, 20001.0);
assertEquals(tok.nextToken(), StreamTokenizer.TT_WORD);
assertEquals(tok.sval, "_to_30000");
}
FWIW 我可以使用 StringTokenizer 代替,但它需要大量重构。
Java's StreamTokenizer seems to be too greedy in identifying numbers. It is relatively light on configuration options, and I haven't found a way to make it do what I want. The following test passes, IMO showing a bug in the implementation; what I'd really like is for the second token to be identified as a word "20001_to_30000". Any ideas?
public void testBrokenTokenizer()
throws Exception
{
final String query = "foo_bah 20001_to_30000";
StreamTokenizer tok = new StreamTokenizer(new StringReader(query));
tok.wordChars('_', '_');
assertEquals(tok.nextToken(), StreamTokenizer.TT_WORD);
assertEquals(tok.sval, "foo_bah");
assertEquals(tok.nextToken(), StreamTokenizer.TT_NUMBER);
assertEquals(tok.nval, 20001.0);
assertEquals(tok.nextToken(), StreamTokenizer.TT_WORD);
assertEquals(tok.sval, "_to_30000");
}
FWIW I could use a StringTokenizer instead, but it would require a lot of refactoring.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IMO,最好的解决方案是使用扫描仪,但如果您想强制古老的 StreamTokenizer 为您工作,请尝试以下操作:
本质上,您正在从 StreamTokenizer 中卸载数值的标记化。正则表达式匹配是为了避免依赖 NumericFormatException 来告诉您 Double.parseDouble() 不适用于给定的标记。
IMO, the best solution is using a Scanner, but if you want to force the venerable StreamTokenizer to work for you, try the following:
Essentially, you're offloading the tokenizing of numeric values from StreamTokenizer. The regex matching is to avoid relying on NumericFormatException to tell you that Double.parseDouble() doesn't work on the given token.