Java Stanley NLP:加载第二个词典后的 ArrayIndexOutOfBounds

发布于 2024-08-14 02:05:33 字数 3141 浏览 8 评论 0原文

我正在使用斯坦福自然语言处理工具包。我一直在尝试使用 Lexicon 的 isKnown 方法查找拼写错误,但它产生了相当多的误报。所以我想我应该加载第二个词典,并检查一下。然而,这会导致一个问题。

private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

    static {
        lp.setOptionFlags(Constants.lexOptionFlags);        
        wsjLexParse.setOptionFlags(Constants.lexOptionFlags);       
    }

public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
    initialInput = input;
    DocumentPreprocessor process = new DocumentPreprocessor();
    sentences = process.getSentencesFromText(new StringReader(input));

    for (List<? extends HasWord> sent : sentences) {
        if(lp.parse(sent)) { // line 65
            forest.add(lp.getBestParse()); //non determinism?
        }
    }

    partsOfSpeech = pos();
    runAnalysis();
}

生成以下失败跟踪:

java.lang.ArrayIndexOutOfBoundsException: 45547
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
    at nth.compling.ParseTree.<init>(ParseTree.java:65)
    at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
    at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
    at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
    at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

如果我注释掉这一行:(以及对 wsjLexParse 的其他引用),

private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

则一切正常。我在这里做错了什么?

I am using the Stanford Natural Language processing toolkit. I've been trying to find spelling errors with Lexicon's isKnown method, but it produces quite a few false positives. So I thought I'd load a second lexicon, and check that too. However, that causes a problem.

private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

    static {
        lp.setOptionFlags(Constants.lexOptionFlags);        
        wsjLexParse.setOptionFlags(Constants.lexOptionFlags);       
    }

public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
    initialInput = input;
    DocumentPreprocessor process = new DocumentPreprocessor();
    sentences = process.getSentencesFromText(new StringReader(input));

    for (List<? extends HasWord> sent : sentences) {
        if(lp.parse(sent)) { // line 65
            forest.add(lp.getBestParse()); //non determinism?
        }
    }

    partsOfSpeech = pos();
    runAnalysis();
}

The following fail trace is produced:

java.lang.ArrayIndexOutOfBoundsException: 45547
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
    at nth.compling.ParseTree.<init>(ParseTree.java:65)
    at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
    at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
    at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
    at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

If I comment out this line: (and other references to wsjLexParse)

private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

then everything works fine. What am I doing wrong here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青柠芒果 2024-08-21 02:05:33

看起来像是斯坦福大学图书馆的一个错误。你应该向他们报告。

当您仅加载第二个词典(而不加载另一个词典)时,第二个词典是否有效?
当您以不同的顺序加载两个词库时,是否会出现相同的错误?

Looks like a bug in the Stanford library. You should report it to them.

Does the second lexicon work when you load only it (and not the other one)?
Does the same error occur when you load the two lexica in different order?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文