使用 Lucene 索引文本文档时出现异常,使用 SnowballAnalyzer 进行清理

发布于 2024-08-30 08:51:51 字数 2463 浏览 10 评论 0原文

我正在使用 Lucene 为文档建立索引,并尝试应用 SnowballAnalyzer 从文本中删除标点符号和停用词。我不断收到以下错误:(

IllegalAccessError: attempts to access method org.apache.lucene.analysis.Tokenizer.(Ljava/ io/Reader;)V 来自 org.apache.lucene.analysis.snowball.SnowballAnalyzer 类

这是代码,我非常感谢帮助!!!! 我是新来的..

public class Indexer {

private Indexer(){};

private String[] stopWords = {....};

private String indexName;
private IndexWriter iWriter;
private static String FILES_TO_INDEX = "/Users/ssi/forindexing";

public static void main(String[] args) throws   Exception {
  Indexer m = new Indexer();
  m.index("./newindex");
}


public void index(String indexName) throws Exception {
  this.indexName = indexName;

  final File docDir = new File(FILES_TO_INDEX); 

  if(!docDir.exists() || !docDir.canRead()){
        System.err.println("Something wrong... " + docDir.getPath());
        System.exit(1);
    }

    Date start = new Date();


        PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new SimpleAnalyzer());          
        analyzers.addAnalyzer("text", new SnowballAnalyzer("English", stopWords));
        Directory directory = FSDirectory.open(new File(this.indexName));
        IndexWriter.MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;

        iWriter = new IndexWriter(directory, analyzers, true, maxLength);

        System.out.println("Indexing to dir..........." + indexName);

        if(docDir.isDirectory()){
            File[] files = docDir.listFiles();
            if(files != null){
                for (int i = 0; i < files.length; i++) {
                    try {
                              indexDocument(files[i]);
                          }catch (FileNotFoundException fnfe){
                            fnfe.printStackTrace();
                        }
            }

        }
        }


System.out.println("Optimizing...... ");
iWriter.optimize();
iWriter.close();
Date end = new Date();
System.out.println("Time to index was" + (end.getTime()-start.getTime()) + "miliseconds");  

}

private void indexDocument(文件 someDoc) 抛出 IOException {

Document doc = new Document();
Field name = new Field("name", someDoc.getName(), Field.Store.YES, Field.Index.ANALYZED);
Field text = new Field("text",  new FileReader(someDoc), Field.TermVector.WITH_POSITIONS_OFFSETS);
doc.add(name);
doc.add(text);


iWriter.addDocument(doc);

} }

I am indexing the documents with Lucene and am trying to apply the SnowballAnalyzer for punctuation and stopword removal from text .. I keep getting the following error :(

IllegalAccessError: tried to access method org.apache.lucene.analysis.Tokenizer.(Ljava/io/Reader;)V from class org.apache.lucene.analysis.snowball.SnowballAnalyzer

Here is the code, I would very much appreciate help!!!! I am new with this..

public class Indexer {

private Indexer(){};

private String[] stopWords = {....};

private String indexName;
private IndexWriter iWriter;
private static String FILES_TO_INDEX = "/Users/ssi/forindexing";

public static void main(String[] args) throws   Exception {
  Indexer m = new Indexer();
  m.index("./newindex");
}


public void index(String indexName) throws Exception {
  this.indexName = indexName;

  final File docDir = new File(FILES_TO_INDEX); 

  if(!docDir.exists() || !docDir.canRead()){
        System.err.println("Something wrong... " + docDir.getPath());
        System.exit(1);
    }

    Date start = new Date();


        PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new SimpleAnalyzer());          
        analyzers.addAnalyzer("text", new SnowballAnalyzer("English", stopWords));
        Directory directory = FSDirectory.open(new File(this.indexName));
        IndexWriter.MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;

        iWriter = new IndexWriter(directory, analyzers, true, maxLength);

        System.out.println("Indexing to dir..........." + indexName);

        if(docDir.isDirectory()){
            File[] files = docDir.listFiles();
            if(files != null){
                for (int i = 0; i < files.length; i++) {
                    try {
                              indexDocument(files[i]);
                          }catch (FileNotFoundException fnfe){
                            fnfe.printStackTrace();
                        }
            }

        }
        }


System.out.println("Optimizing...... ");
iWriter.optimize();
iWriter.close();
Date end = new Date();
System.out.println("Time to index was" + (end.getTime()-start.getTime()) + "miliseconds");  

}

private void indexDocument(File someDoc) throws IOException {

Document doc = new Document();
Field name = new Field("name", someDoc.getName(), Field.Store.YES, Field.Index.ANALYZED);
Field text = new Field("text",  new FileReader(someDoc), Field.TermVector.WITH_POSITIONS_OFFSETS);
doc.add(name);
doc.add(text);


iWriter.addDocument(doc);

}
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心碎无痕… 2024-09-06 08:51:51

这表示一个 Lucene 类与另一个 Lucene 类不一致——一个类正在访问另一个类的成员,而另一个类却无法访问。这强烈表明您的类路径中有两个不同且不兼容的 Lucene 版本。

This says that one Lucene class is inconsistent with another Lucene class -- one is accessing a member of the other that it can't. This strongly suggests you have two different and incompatible versions of Lucene in your classpath somehow.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文