比较两个 csv 文件之间的数据并计算有多少行具有相同的数据

发布于 2025-01-14 06:51:24 字数 1604 浏览 2 评论 0原文

假设我有所有 OU 的列表 (AllOU.csv)：

NEWS
STORE
SPRINKLES
ICECREAM

我想查看名为“column3”的第三列上的 csv 文件 (samplefile.csv)，并搜索每一行是否与示例文件中的内容匹配。 csv。然后我想对它们进行排序并计算每一个有多少行。

这就是列的外观：

column3
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=store,OU=tv,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com

我想像这样（或列表）对它们进行排序：

CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com

CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com

CN=Mary Poppins,OU=ice cream, dc=company,dc=com

CN=Anne Potts,OU=sprinkles,dc=company,dc=com

这就是最终输出应该是的：

2, news
2, store,
1, icecream
1, sprinkles

也许列表是对它们进行排序的好方法？像这样？

holdingList =['CN=Clark Kent,OU=news,dc=company,dc=com','CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com'],
['CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com','CN=Monty Python,OU=store,dc=company,dc=com'],
['CN=Mary Poppins,OU=ice cream, dc=company,dc=com'],
['CN=Anne Potts,OU=sprinkles,dc=company,dc=com']

到目前为止我有这样的事情：

file = open('samplefile.csv')
df = pd.read_csv(file, usecols=['column3'])

#file of all OUs
file2 = open('ALLOU.csv')
OUList = pd.read_csv(file2, header=None)

for OU in OUList[0]:
        df_dept = df[df['column3'].str.contains(f'OU={OU }')].count()
        print({OU}, df_dept)

原文

Let's say I have list of all OUs (AllOU.csv):

NEWS
STORE
SPRINKLES
ICECREAM

I want to look through a csv file (samplefile.csv) on the third column called 'column3', and search through each row if it matches what is in the samplefile.csv.
Then I want to sort them and count how many rows each one has.

This is how the column looks:

column3
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=store,OU=tv,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com

I want to sort them out like this (or a list):

CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com

CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com

CN=Mary Poppins,OU=ice cream, dc=company,dc=com

CN=Anne Potts,OU=sprinkles,dc=company,dc=com

This is what the final output should be:

2, news
2, store,
1, icecream
1, sprinkles

Maybe a list would be a good way of sorting them? Like this?

holdingList =['CN=Clark Kent,OU=news,dc=company,dc=com','CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com'],
['CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com','CN=Monty Python,OU=store,dc=company,dc=com'],
['CN=Mary Poppins,OU=ice cream, dc=company,dc=com'],
['CN=Anne Potts,OU=sprinkles,dc=company,dc=com']

I had something like this so far:

file = open('samplefile.csv')
df = pd.read_csv(file, usecols=['column3'])

#file of all OUs
file2 = open('ALLOU.csv')
OUList = pd.read_csv(file2, header=None)

for OU in OUList[0]:
        df_dept = df[df['column3'].str.contains(f'OU={OU }')].count()
        print({OU}, df_dept)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

日裸衫吸 2025-01-21 06:51:24

首先读取文件并创建对象列表。
[{CN:'Clark Kent',OU:'news',dc:'company',dc:'com'},…{…}]

创建列表后，您可以将其转换为数据框，然后应用所有熊猫的分组、排序和其他能力。

现在要实现此目的，首先将文件读入变量，让我们调用 var filedata=yourFileContents。接下来分割文件数据。 varlines = filedata.split('\n')
现在循环每一行

dataList = [] 
for line in lines:
    item = dict()
    elements = line.split(‘,’)
    for element in elements:
        key_value = element.split(‘=‘)
        item[key_value[0]] = key_value[1]
        dataList.append(item)
print(dataList)

现在您可以将其加载到 panda 数据框中并应用排序和分组。一旦构建了数据框，您就可以简单地从此数据框中的其他文件中搜索密钥并获取您的数字

Read your file first and create a list of objects.
[{CN:’Clark Kent’,OU:’news’,dc:’company’,dc:’com’},…{…}]

Once you have created the list you can convert it to data frame and then apply all the grouping, sorting and other abilities of pandas.

Now to achieve this, first read your file into a variable lets call var filedata=yourFileContents. Next split filedata. var lines = filedata.split(‘\n’)
Now loop over each lines

dataList = [] 
for line in lines:
    item = dict()
    elements = line.split(‘,’)
    for element in elements:
        key_value = element.split(‘=‘)
        item[key_value[0]] = key_value[1]
        dataList.append(item)
print(dataList)

Now you may load this onto a panda dataframe and apply sorting and grouping. Once you have structured the data frame, you can simply search the key from the other file in this dataframe and get your numbers

回复收藏 0 原文

~没有更多了~