查询数据库中相似的全文
我正在尝试将 OCR 字母与将存储在数据库中的样板模板进行匹配。我知道如何比较字符串以获得相似性,但出于明显的原因,我不想迭代整个表来比较每个对象。有没有办法创建字符串的整数表示,以便可以使用范围查询数据库?
理想情况下,我能够获取 This 1s a string
的 OCR 结果,并在一定阈值内,在 This is a string
的数据库中找到存储的值。这些文件将是整页信件。
我的第二个想法是创建一个样板表和一个子表,其中每行文本都有一个哈希值。我可以迭代 OCR 结果并查询哈希值,直到获得唯一的结果。
有什么建议吗?
I am trying to match an OCR'd letter against a boilerplate template that will be stored in a database. I know how to compare strings to get similarity but for obvious reasons, I don't want to iterate over the entire table to compare every object. Is there a way to create an integer representation of a string so that the db could be queried with a range?
Ideally, I'd be able to take an OCR result of This 1s a string
and within a certain threshold, find a stored value in the database of This is a string
. The documents will be full page letters.
My second thought is to create a boilerplate table and a child table that has a hash for each line of text. I could iterate through the OCR results and query for the hash value until I have a unique result.
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论