Java Stanley NLP:加载第二个词典后的 ArrayIndexOutOfBounds
我正在使用斯坦福自然语言处理工具包。我一直在尝试使用 Lexicon 的 isKnown 方法查找拼写错误,但它产生了相当多的误报。所以我想我应该加载第二个词典,并检查一下。然而,这会导致一个问题。
private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
static {
lp.setOptionFlags(Constants.lexOptionFlags);
wsjLexParse.setOptionFlags(Constants.lexOptionFlags);
}
public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
initialInput = input;
DocumentPreprocessor process = new DocumentPreprocessor();
sentences = process.getSentencesFromText(new StringReader(input));
for (List<? extends HasWord> sent : sentences) {
if(lp.parse(sent)) { // line 65
forest.add(lp.getBestParse()); //non determinism?
}
}
partsOfSpeech = pos();
runAnalysis();
}
生成以下失败跟踪:
java.lang.ArrayIndexOutOfBoundsException: 45547
at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
at nth.compling.ParseTree.<init>(ParseTree.java:65)
at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
如果我注释掉这一行:(以及对 wsjLexParse 的其他引用),
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
则一切正常。我在这里做错了什么?
I am using the Stanford Natural Language processing toolkit. I've been trying to find spelling errors with Lexicon
's isKnown
method, but it produces quite a few false positives. So I thought I'd load a second lexicon, and check that too. However, that causes a problem.
private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
static {
lp.setOptionFlags(Constants.lexOptionFlags);
wsjLexParse.setOptionFlags(Constants.lexOptionFlags);
}
public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
initialInput = input;
DocumentPreprocessor process = new DocumentPreprocessor();
sentences = process.getSentencesFromText(new StringReader(input));
for (List<? extends HasWord> sent : sentences) {
if(lp.parse(sent)) { // line 65
forest.add(lp.getBestParse()); //non determinism?
}
}
partsOfSpeech = pos();
runAnalysis();
}
The following fail trace is produced:
java.lang.ArrayIndexOutOfBoundsException: 45547
at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
at nth.compling.ParseTree.<init>(ParseTree.java:65)
at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
If I comment out this line: (and other references to wsjLexParse)
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
then everything works fine. What am I doing wrong here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来像是斯坦福大学图书馆的一个错误。你应该向他们报告。
当您仅加载第二个词典(而不加载另一个词典)时,第二个词典是否有效?
当您以不同的顺序加载两个词库时,是否会出现相同的错误?
Looks like a bug in the Stanford library. You should report it to them.
Does the second lexicon work when you load only it (and not the other one)?
Does the same error occur when you load the two lexica in different order?