数据集成问题——如何集成相似的实体
我有一个数据库,在同一个表中有非常相似的行。这些行很相似,因为它们具有几乎相同的列值。我需要将这些相应的行集成到一行中。
例如,这两个用户(u1 和 u2)应该集成:
u1 = User(name = "William Henry Gates III",
age = 55,
nationality = "american",
alma_mater = "Harvard Univesity")
u2 = User(name: "William Henry 'Bill' Gates III",
age: 55,
nationality: "America",
alma_mater: "Harvard U.")
我正在考虑使用一些编辑距离 和词干提取技术。其他算法和技术建议?有什么有用的库可以使用(最好是 Python 或 Java)?
I have a database which has very similar rows within the same table. Those rows are similar because they have nearly equal column values. I need to integrate those corresponding rows into one single row.
For example, those two users (u1 and u2) should be integrated:
u1 = User(name = "William Henry Gates III",
age = 55,
nationality = "american",
alma_mater = "Harvard Univesity")
u2 = User(name: "William Henry 'Bill' Gates III",
age: 55,
nationality: "America",
alma_mater: "Harvard U.")
I am thinking of using some edit distance and stemming techniques. Other algorithms and techniques suggestions? Any helpful libraries to use (preferably in Python or Java)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑过类似 Refine 的东西吗?
Considered something like Refine?