添加到 Whoosh 索引时出现奇怪的错误
任何人都可以帮助我解决在向 Whoosh 索引添加新文档时遇到的这个奇怪错误吗?
这是代码:
def add_to_index(self, doc):
ix = index.open_dir(self.index_dir)
writer = AsyncWriter(ix) # use async writer to prevent write lock errors
writer.add_document(**self.get_doc_args(doc))
writer.commit()
def get_doc_args(self, doc):
return {
'id': u""+str(doc['id']),
'org': doc['org__id'],
'created': doc['created_date'],
'date': doc['received_date'],
'from_addr': doc['from_addr'],
'subject': doc['subject'],
'body': doc['messagebody__cleaned_message']
}
我收到以下错误:
TypeError('ord() expected a character, but string of length 0 found',)
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/usr/local/lib/python2.6/dist-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 131, in index_message
MessageSearcher().add_to_index(message)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 29, in add_to_index
writer.commit()
File "/usr/local/lib/python2.6/dist-packages/whoosh/writing.py", line 423, in commit
self.writer.commit(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 501, in commit
new_segments = mergetype(self, self.segments)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 78, in MERGE_SMALL
reader = SegmentReader(writer.storage, writer.schema, seg)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filereading.py", line 63, in __init__
self.termsindex = TermIndexReader(tf)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 590, in __init__
super(TermIndexReader, self).__init__(dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 502, in __init__
OrderedHashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 379, in __init__
HashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 187, in __init__
self.hashtype = dbfile.read_byte()
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/structfile.py", line 219, in read_byte
return ord(self.file.read(1))
奇怪的是,使用标准编写器(即不是 AsyncWriter)的完全相同的代码工作得很好。我在这里缺少什么?请注意,在生产中我必须使用 AsyncWriter 以避免 LockErrors。
Can anyone help me with this strange error I'm getting when adding a new document to a Whoosh index?
Here's the code:
def add_to_index(self, doc):
ix = index.open_dir(self.index_dir)
writer = AsyncWriter(ix) # use async writer to prevent write lock errors
writer.add_document(**self.get_doc_args(doc))
writer.commit()
def get_doc_args(self, doc):
return {
'id': u""+str(doc['id']),
'org': doc['org__id'],
'created': doc['created_date'],
'date': doc['received_date'],
'from_addr': doc['from_addr'],
'subject': doc['subject'],
'body': doc['messagebody__cleaned_message']
}
I get the following error:
TypeError('ord() expected a character, but string of length 0 found',)
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/usr/local/lib/python2.6/dist-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 131, in index_message
MessageSearcher().add_to_index(message)
File "/mnt/deploy/prod/chorus/src/chorus/../chorus/search/__init__.py", line 29, in add_to_index
writer.commit()
File "/usr/local/lib/python2.6/dist-packages/whoosh/writing.py", line 423, in commit
self.writer.commit(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 501, in commit
new_segments = mergetype(self, self.segments)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filewriting.py", line 78, in MERGE_SMALL
reader = SegmentReader(writer.storage, writer.schema, seg)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filereading.py", line 63, in __init__
self.termsindex = TermIndexReader(tf)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 590, in __init__
super(TermIndexReader, self).__init__(dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 502, in __init__
OrderedHashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 379, in __init__
HashReader.__init__(self, dbfile)
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/filetables.py", line 187, in __init__
self.hashtype = dbfile.read_byte()
File "/usr/local/lib/python2.6/dist-packages/whoosh/filedb/structfile.py", line 219, in read_byte
return ord(self.file.read(1))
Strangely, the exact same code using a standard writer (i.e. not AsyncWriter) works just fine. What am I missing here? Note that in production I have to use AsyncWriter in order to avoid LockErrors.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此错误是由某种索引损坏引起的。就我而言,在重建索引时,机器因另一个原因崩溃了。
您可以通过完全删除 whoosh_index 文件夹内容并重建索引来轻松解决此问题。
This error is caused by some kind of index corruption. In my case the machine crashed by another reason while index was being rebuild.
You can easily solve it by deleting whoosh_index folder contents completely and rebuilidng index.
最终找到了解决方案;它称为 Solr :-)
Ended up finding a solution; it's called Solr :-)