Python:使用模糊逻辑使用 3 个匹配列合并 2 个数据帧
我有 2 个 Excel 工作表 A 和 B, 表 A 的 A 列包含产品名称、剂量类型,B 列包含规格,C 列包含国家/地区 表 B 的 A 列包含产品、剂量类型、尺寸、国家/地区缩写
表 1 列:
name size Country
Brand Actified 100 mg/30 mg syrup 21 France
[Df1 columns][1]
表 2 列:
Clubbed field
BRANDACTI 100mg/30mg 21 FR
df2包含产品、尺寸和国家/地区缩写的公共字段 这只是一个直接的示例,但是两个表中的数据与映射不一致,要么缺少某些值,要么值的格式不同。
我尝试过的解决方案: 模糊匹配每一列,并将所有 3 列合并为一列。 将产品名称裁剪为单独的列,现有列包含尺寸,裁剪列包含国家/地区代码
但问题是,合并所有 3 列会给出更多具有高阈值的值组合,因为字符串长度很大。例如:两个数据帧都有 4 个匹配行,但大小差异很小(只是大小单位差异,比如 2 或 4 个单位),组合给出 16 行,因为所有组合的阈值都很高,因为大数据中唯一的小差异文本是数字。
有没有一种方法可以根据匹配所有数据的数据来合并 2 个数据帧3 列有模糊分数?
就我而言:根据与所有 3 列匹配的值组合数据帧:名称、大小和国家/地区与模糊阈值分数
对此最好的解决方案是什么?
I have 2 excel sheets A and B,
Sheet A has Column A has Product name, dose type, column B with Size and Column C with Country
Sheet B has Column A with Product, dose type, Size, country abbreviation
Sheet 1 Columns:
name size Country
Brand Actified 100 mg/30 mg syrup 21 France
[Df1 columns][1]
Sheet 2 Column:
Clubbed field
BRANDACTI 100mg/30mg 21 FR
df2 common field with Product, size and country abbreviation
This is just an direct example, but the data is not consistent to map in both the tables, either some values missing or values are in a different format.
Solution i tried:
Fuzzy matched each column separetely and combined all 3 columns into one.
Cropped Product name as separate column, existing column with Size, Cropped column with country Code
But issue is, merging all 3 columns is giving more combination of values with high threshold, as the string length is big. Ex: Both dataframes has 4 matching rows, but has small difference in size(just size unit difference say 2 or 4 units), combining is giving 16 rows, as the threshold is high for all combinations, as the only small difference in the big text is number.
Column merge using Fuzzy logic
Is there a way i can merge the 2 dataframes based on datas matching all 3 columns with fuzzy score?
In my case: Combine dataframes based on values matching all 3 columns: Name, Size and Country with fuzzy threshold score
What is the best possible solution for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论