比较两个 csv 文件之间的数据并计算有多少行具有相同的数据
假设我有所有 OU 的列表 (AllOU.csv):
NEWS
STORE
SPRINKLES
ICECREAM
我想查看名为“column3”的第三列上的 csv 文件 (samplefile.csv),并搜索每一行是否与示例文件中的内容匹配。 csv。 然后我想对它们进行排序并计算每一个有多少行。
这就是列的外观:
column3
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=store,OU=tv,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com
我想像这样(或列表)对它们进行排序:
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com
这就是最终输出应该是的:
2, news
2, store,
1, icecream
1, sprinkles
也许列表是对它们进行排序的好方法?像这样?
holdingList =['CN=Clark Kent,OU=news,dc=company,dc=com','CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com'],
['CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com','CN=Monty Python,OU=store,dc=company,dc=com'],
['CN=Mary Poppins,OU=ice cream, dc=company,dc=com'],
['CN=Anne Potts,OU=sprinkles,dc=company,dc=com']
到目前为止我有这样的事情:
file = open('samplefile.csv')
df = pd.read_csv(file, usecols=['column3'])
#file of all OUs
file2 = open('ALLOU.csv')
OUList = pd.read_csv(file2, header=None)
for OU in OUList[0]:
df_dept = df[df['column3'].str.contains(f'OU={OU }')].count()
print({OU}, df_dept)
Let's say I have list of all OUs (AllOU.csv):
NEWS
STORE
SPRINKLES
ICECREAM
I want to look through a csv file (samplefile.csv) on the third column called 'column3', and search through each row if it matches what is in the samplefile.csv.
Then I want to sort them and count how many rows each one has.
This is how the column looks:
column3
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=store,OU=tv,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com
I want to sort them out like this (or a list):
CN=Clark Kent,OU=news,dc=company,dc=com
CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com
CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com
CN=Monty Python,OU=store,dc=company,dc=com
CN=Mary Poppins,OU=ice cream, dc=company,dc=com
CN=Anne Potts,OU=sprinkles,dc=company,dc=com
This is what the final output should be:
2, news
2, store,
1, icecream
1, sprinkles
Maybe a list would be a good way of sorting them? Like this?
holdingList =['CN=Clark Kent,OU=news,dc=company,dc=com','CN=Mary Jane,OU=news,OU=tv,dc=company,dc=com'],
['CN=Pepper Jack,OU=tv,OU=store,dc=company,dc=com','CN=Monty Python,OU=store,dc=company,dc=com'],
['CN=Mary Poppins,OU=ice cream, dc=company,dc=com'],
['CN=Anne Potts,OU=sprinkles,dc=company,dc=com']
I had something like this so far:
file = open('samplefile.csv')
df = pd.read_csv(file, usecols=['column3'])
#file of all OUs
file2 = open('ALLOU.csv')
OUList = pd.read_csv(file2, header=None)
for OU in OUList[0]:
df_dept = df[df['column3'].str.contains(f'OU={OU }')].count()
print({OU}, df_dept)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先读取文件并创建对象列表。
[{CN:'Clark Kent',OU:'news',dc:'company',dc:'com'},…{…}]
创建列表后,您可以将其转换为数据框,然后应用所有熊猫的分组、排序和其他能力。
现在要实现此目的,首先将文件读入变量,让我们调用 var filedata=yourFileContents。接下来分割文件数据。 varlines = filedata.split('\n')
现在循环每一行
现在您可以将其加载到 panda 数据框中并应用排序和分组。一旦构建了数据框,您就可以简单地从此数据框中的其他文件中搜索密钥并获取您的数字
Read your file first and create a list of objects.
[{CN:’Clark Kent’,OU:’news’,dc:’company’,dc:’com’},…{…}]
Once you have created the list you can convert it to data frame and then apply all the grouping, sorting and other abilities of pandas.
Now to achieve this, first read your file into a variable lets call var filedata=yourFileContents. Next split filedata. var lines = filedata.split(‘\n’)
Now loop over each lines
Now you may load this onto a panda dataframe and apply sorting and grouping. Once you have structured the data frame, you can simply search the key from the other file in this dataframe and get your numbers