将多个Python行转换为并发数据框架,并与源数据合并

发布于 2025-02-07 07:26:04 字数 910 浏览 2 评论 0原文

如果这是一个基本问题,我深表歉意。我觉得这应该很容易,但我无法弄清楚。我有下面列出的代码,这些代码本质上查看了CSV文件中的两个列,并匹配了具有0.7相似性的作业标题。为此,我使用difflib.get_close_matches。但是,输出是多个单行,每当我尝试将其转换为数据框架时,每条线都是其自己的数据框架,我无法弄清楚如何合并/concat。所有代码以及当前和所需的输出都在下面。任何帮助都将不胜感激。

当前代码为:

import pandas as pd
import difflib

df = pd.read_csv('name.csv')

aLists = list(df['JTs'])

bLists = list(df['JT'])

n=3
cutoff = 0.7

for aList in aLists:
    best = difflib.get_close_matches(aList, bLists, n, cutoff)
    print(best)

当前输出为:

['SW Engineer']
['Manu Engineer']
[]
['IT Help']

所需的输出为:

          Output
0    SW Engineer
1  Manu Engineer
2        (blank)
3        IT Help

我试图执行此操作的表是:

< img src =“ https://i.sstatic.net/8va9u.png” alt =“必需的表格格式快照”>

任何帮助都将不胜感激!

I apologize if this is a rudimentary question. I feel like it should be easy but I cannot figure it out. I have the code that is listed below that essentially looks at two columns in a CSV file and matches up job titles that have a similarity of 0.7. To do this, I use difflib.get_close_matches. However, the output is multiple single lines and whenever I try to convert to a DataFrame, every single line is its own DataFrame and I cannot figure out how to merge/concat them. All code, as well as current and desired outputs are below. Any help would be much appreciated.

Current Code is:

import pandas as pd
import difflib

df = pd.read_csv('name.csv')

aLists = list(df['JTs'])

bLists = list(df['JT'])

n=3
cutoff = 0.7

for aList in aLists:
    best = difflib.get_close_matches(aList, bLists, n, cutoff)
    print(best)

Current Output is:

['SW Engineer']
['Manu Engineer']
[]
['IT Help']

Desired Output is:

          Output
0    SW Engineer
1  Manu Engineer
2        (blank)
3        IT Help

The table I am attempting to do this one is:

Required table format snapshot

Any help would be greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

孤星 2025-02-14 07:26:04

这是一种实现此目的的简单方法。我首先转换为字符串。然后将第一个和最后一个括号从该字符串中删除,然后将其附加到全局列表中。

import pandas as pd
import difflib
import numpy as np

df = pd.read_csv('name.csv')

aLists = list(df['JTs'])

bLists = list(df['JT'])

n = 3
cutoff = 0.7

best = []

for aList in aLists:
    temp = difflib.get_close_matches(aList, bLists, n, cutoff)
    temp = str(temp)
    strippedString = temp.lstrip("[").rstrip("]")
    # print(temp)    
    best.append(strippedString)

print(best)

输出

[ 
  "'SW Engineer'", 
  "'Manu Engineer'", 
  '', 
  "'IT Help'"
]

是实现这一目标的另一种更好的方法。
您只需使用numpy将多个数组加入单个阵列。然后,可以在需要的情况下将其转换为普通数组。

import pandas as pd
import difflib
import numpy as np

df = pd.read_csv('name.csv')
aLists = list(df['JTs'])
bLists = list(df['JT'])
n = 3
cutoff = 0.7
best = []
for aList in aLists:
    temp = difflib.get_close_matches(aList, bLists, n, cutoff)
    best.append(temp)

# print(best)
# Use concatenate() to join two arrays
combinedNumpyArray = np.concatenate(best)
#Converting numpy array to normal array
normalArray = combinedNumpyArray.tolist()
print(normalArray)

输出

['SW Engineer', 'Manu Engineer', 'IT Help']

谢谢

Here is a simple way to achieve this.I have converted first to a string.Then the first and last brackets are removed from that string and then is appended to a global list.

import pandas as pd
import difflib
import numpy as np

df = pd.read_csv('name.csv')

aLists = list(df['JTs'])

bLists = list(df['JT'])

n = 3
cutoff = 0.7

best = []

for aList in aLists:
    temp = difflib.get_close_matches(aList, bLists, n, cutoff)
    temp = str(temp)
    strippedString = temp.lstrip("[").rstrip("]")
    # print(temp)    
    best.append(strippedString)

print(best)

Output

[ 
  "'SW Engineer'", 
  "'Manu Engineer'", 
  '', 
  "'IT Help'"
]

Here is another better way to achieve this.
You can simply use numpy to concatenate multiple arrays into single one.And then you can convert it to normal array if you want.

import pandas as pd
import difflib
import numpy as np

df = pd.read_csv('name.csv')
aLists = list(df['JTs'])
bLists = list(df['JT'])
n = 3
cutoff = 0.7
best = []
for aList in aLists:
    temp = difflib.get_close_matches(aList, bLists, n, cutoff)
    best.append(temp)

# print(best)
# Use concatenate() to join two arrays
combinedNumpyArray = np.concatenate(best)
#Converting numpy array to normal array
normalArray = combinedNumpyArray.tolist()
print(normalArray)

Output

['SW Engineer', 'Manu Engineer', 'IT Help']

Thanks

め可乐爱微笑 2025-02-14 07:26:04

您可以使用熊猫的.apply()在每个条目上运行您的功能。然后,可以将其添加为新列或创建的新数据框架。

例如:

import pandas as pd
import difflib


def get_best_match(word):
    matches = difflib.get_close_matches(word, JT, n, cutoff)
    return matches[0] if matches else None

df = pd.read_csv('name.csv')
JT = df['JT']

n = 3
cutoff = 0.7

df['Output'] = df['JTs'].apply(get_best_match)

或用于新的数据帧:

df_output = pd.DataFrame({'Output' : df['JTs'].apply(get_best_match)})

给您:

                      JTs             JT         Output
0       Software Engineer  Manu Engineer    SW Engineer
1  Manufacturing Engineer    SW Engineer  Manu Engineer
2  Human Resource Manager        IT Help           None
3            IT Help Desk              f        IT Help

或::

          Output
0    SW Engineer
1  Manu Engineer
2           None
3        IT Help

You could use Panda's .apply() to run your function on each entry. This could then either be added as a new column or a new dataframe created.

For example:

import pandas as pd
import difflib


def get_best_match(word):
    matches = difflib.get_close_matches(word, JT, n, cutoff)
    return matches[0] if matches else None

df = pd.read_csv('name.csv')
JT = df['JT']

n = 3
cutoff = 0.7

df['Output'] = df['JTs'].apply(get_best_match)

Or for a new dataframe:

df_output = pd.DataFrame({'Output' : df['JTs'].apply(get_best_match)})

Giving you:

                      JTs             JT         Output
0       Software Engineer  Manu Engineer    SW Engineer
1  Manufacturing Engineer    SW Engineer  Manu Engineer
2  Human Resource Manager        IT Help           None
3            IT Help Desk              f        IT Help

Or:

          Output
0    SW Engineer
1  Manu Engineer
2           None
3        IT Help
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文