我正在使用Lucene Net v4.8 beta,并且有一种方法,该方法每5秒钟都在搜索人员上调用Mayberefresh。 99.9%的时间,一切正常。但是,有0.1%的时间,我遇到了致命的访问证明误差。我不确定是什么导致这种致命错误。这是完整的StackTrace:
at System.IO.UnmanagedMemoryAccessor.ReadByte(Int64)
at Lucene.Net.Store.BufferedChecksumIndexInput.ReadByte()
at Lucene.Net.Store.DataInput.ReadInt32()
at Lucene.Net.Index.SegmentInfos+FindSegmentsFile.Run(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.SegmentInfos.Read(Lucene.Net.Store.Directory)
at Lucene.Net.Index.StandardDirectoryReader.IsCurrent()
at Lucene.Net.Index.StandardDirectoryReader.DoOpenNoWriter(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.DirectoryReader.OpenIfChanged(Lucene.Net.Index.DirectoryReader)
at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(Lucene.Net.Search.IndexSearcher)
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].DoMaybeRefresh()
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MaybeRefresh()
...my method that calls MaybeRefresh...```
请注意:
我有2个单独的服务。一项服务是通过IndexWriter(Service A)定期写入索引的,另一个服务是在索引上搜索并每5秒钟调用Mayberefresh(服务B)。这是服务B,看到了这种致命的错误。服务A工作正常,没有任何错误。因此,我相信这与服务B有关,但是如果我错过了一些东西,请提及这一点,以使其完全透明。
如果有人可以深入了解Lucene方法引起的这种致命错误,那将不胜感激!
如果有帮助,我还会让我知道我应该添加的任何其他详细信息,以描述此错误。
I am using Lucene NET v4.8 beta, and I have a method that is calling MaybeRefresh on a SearcherManager every 5 seconds. 99.9% of the time, everything works fine. However, 0.1% of the time, I am getting an fatal AccessViolationException error. I am not sure what is causing this fatal error. This is the full stacktrace:
at System.IO.UnmanagedMemoryAccessor.ReadByte(Int64)
at Lucene.Net.Store.BufferedChecksumIndexInput.ReadByte()
at Lucene.Net.Store.DataInput.ReadInt32()
at Lucene.Net.Index.SegmentInfos+FindSegmentsFile.Run(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.SegmentInfos.Read(Lucene.Net.Store.Directory)
at Lucene.Net.Index.StandardDirectoryReader.IsCurrent()
at Lucene.Net.Index.StandardDirectoryReader.DoOpenNoWriter(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.DirectoryReader.OpenIfChanged(Lucene.Net.Index.DirectoryReader)
at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(Lucene.Net.Search.IndexSearcher)
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].DoMaybeRefresh()
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MaybeRefresh()
...my method that calls MaybeRefresh...```
Please note:
I have 2 separate services. One service is periodically writing to the index via IndexWriter (service A), and the other is searching on the index and calling MaybeRefresh every 5 seconds (service B). It is service B that sees this fatal error. Service A works fine and does not have any errors. So I believe that this has something to do with Service B, but mentioning this for full transparency in case I missed something.
If anyone can give any insight into this fatal error caused by Lucene methods, that would be appreciated!
Please also let me know of any additional details I should add to describe this error, if it helps.
发布评论
评论(1)
首先,错误消息很可能表明您要在同一索引文件上多次打开
mmapDirectory
,并且您会得到例外,因为两个实例都写入同一内存空间。我不确定是否可以将其视为错误,但是应该注意的是,对于写作,您不需要打开RAM密集型mmapdirectory
,您只能使用simplefsdirectory
。话虽这么说,以下建议将使上述要点毫无意义。
选项1
通常,您应该将打开单个索引的进程数限制为1。 lucene-nrt-hello-world.html“ rel =“ nofollow noreferrer”> lucene的实时搜索功能。
执行此操作的步骤是:
indexWriter
并保持打开状态(将其注册为单例)。indexWriter
作为param创建searcherManager
(或者使用writer.getReader()
)。searcherManager
搜索。indexwriter
用于索引操作。commit()
索引后。searchermanager.mayberefresh()
。如链接教程中指出的那样,您可以使用
ControlDrealTimerePenthRead
在后台定期刷新indexreader
。最后,要解决打开多个
目录
实例的问题(这是最终导致此问题的原因),请使用单个过程进行写作和阅读。由于写入通常比读取的频率少,因此我建议您在搜索服务的内部进行所有这些操作,然后使用网络插座(TCP,HTTP等)来向搜索服务发送Write服务的消息以写入/更新/从索引中删除。选项2
如果要在多个进程中打开相同的索引,则可以使用 lucene.net.replicator 通常建议在网络农场中的多个节点上复制相同的索引,但也可用于在一项服务中编写索引并在另一个服务中读取它。从本质上讲,对于您的用例,您将为您的每种服务都有一个单独的索引目录。
但是,它还需要您构建网络服务以将更新发布到。主要区别是您不需要设计专门的Web API来编写/更新/删除索引,而是可以使用现有的API在编写索引后发布。
参考:
First of all, the error message most likely indicates you are opening the
MMapDirectory
multiple times on the same set of index files, and you are getting the exception because both instances are writing to the same memory space. I am not sure whether that can be considered a bug or not, but it should be noted that for writing, you don't need to open a RAM-intensiveMMapDirectory
, you can just use aSimpleFSDirectory
.That being said, the following advice will make the above point moot.
Option 1
Typically, you should limit the number of processes opening a single index to 1. If you need to write at the same time that reads happen, you can use the near real-time search feature of Lucene.
The steps involved in doing this are:
IndexWriter
and keep it open (register it as a singleton).SearcherManager
with theIndexWriter
as param (or alternatively, usewriter.GetReader()
).SearcherManager
to search.IndexWriter
for indexing operations.Commit()
after indexing.searcherManager.MaybeRefresh()
after adding a document.As pointed out in the linked tutorial, you can use
ControlledRealTimeReopenThread
to periodically refresh theIndexReader
in the background.Finally, to solve the problem of opening multiple
Directory
instances (which is what is ultimately causing this issue), use a single process for both writing and reading. Since writes typically happen less often than reads, I recommend doing all of this inside of your searching service and then using a network sockets (TCP, HTTP, etc) to message the search service from the write service in order to write to/update/delete from the index.Option 2
If you want to open the same index in multiple processes, you can use the Lucene.Net.Replicator module to write your index with one service and then publish it for replication to other services.
Lucene.Net.Replicator
is typically recommended for replicating the same index across multiple nodes in a web farm, but can also be used for writing the index in one service and reading it in another service. Essentially, for your use case you would have a separate index directory for each one of your services.However, it will also require you to build a network service to publish the updates to. The primary difference is you wouldn't need to design a specialized web API to write/update/delete the index, instead you could use an existing API to publish your index after it is written.
References: