适用于 Ruby 的可序列化内存中全文索引工具
我正在尝试找到一种方法来构建存储在内存中的全文索引,其格式可以安全地通过 Marshal.dump
/Marshal.load
所以我可以在将索引存储到磁盘之前对其进行加密。
我需要此功能的理由:我正在设计一个系统,其中用户的内容需要使用自己的密钥进行加密,并建立索引以进行全文搜索。我意识到,如果对于系统的每个用户,我必须将其内容的整个索引解组并将其加载到内存中,则会产生巨大的开销和内存使用量。对于这个项目来说,安全远比效率重要。
全文索引将维护太多有关用户内容的详细信息,无法保持未加密状态,并且仅将索引存储在加密卷上是不够的,因为每个用户的索引都需要使用该用户的唯一密钥进行加密,以保持索引的级别所需的安全性。
用户内容将被加密并可能存储在传统的 RDBMS 中。我的想法是,对于拥有大量内容的用户来说,加载/卸载序列化索引的开销比解密属于它们的所有数据库行并对每个搜索进行完整扫描的开销要少。
我对 ferret 的尝试使我成功创建了内存索引。但是,由于使用了 Mutex
,索引在 Marshal.dump
中失败。我还在评估 xapian 和 solr,但似乎也遇到了障碍。
在我进一步讨论之前,我想知道这种方法是否合理,如果不是,我可能会考虑哪些替代方案。我还想知道是否有人以这种方式成功地序列化全文索引,您使用了什么工具,以及您可以提供的任何指示。
I am trying to find a way to build a full-text index stored in-memory in a format that can be safely passed through Marshal.dump
/Marshal.load
so I can take the index and encrypt it before storing it to disk.
My rationale for needing this functionality: I am designing a system where a user's content needs to be both encrypted using their own key, and indexed for full text searching. I realize there would be significant overhead and memory usage if for each user of the system I had to un-marshal and load the entire index of their content into memory. For this project security is far more important than efficiency.
A full text index would maintain far too many details about a user's content to leave unencrypted, and simply storing the index on an encrypted volume is insufficient as each user's index would need to be encrypted using the unique key for that user to maintain the level of security desired.
User content will be encrypted and likely stored in a traditional RDBMS. My thought is that loading/unloading the serialized index would be less overhead for a user with large amounts of content than decrypting all the DB rows belonging to them and doing a full scan for every search.
My trials with ferret
got me to the point of successfully creating an in-memory index. However, the index failed a Marshal.dump
due to the use of Mutex
. I am also evaluating xapian
and solr
but seem to be hitting roadblocks there as well.
Before I go any further I would like to know if this approach is even a sane one and what alternatives I might want to consider if its not. I also want to know if anyone has had any success with serializing a full-text index in this manner, what tool you used, and any pointers you can provide.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么不使用标准全文搜索引擎并将每个客户端的索引保存在单独的加密磁盘映像上,例如 TrueCrypt?每个客户端的磁盘映像都可以有一个唯一的密钥,它会使用更少的 RAM,并且可能会花费更少的时间来实现。
Why not use a standard full-text search engine and keep each client's index on a separate encrypted disk image, like TrueCrypt? Each client's disk image could have a unique key, it would use less RAM, and would probably take less time to implement.