程序要读取数据,由定界符分开,删除白色空间然后计数
我有一个程序正在努力,需要读取一个.txt文件,该文件具有多个看起来像这样的数据:
[ABC/DEF // 25GHI //// JKLM // 675 //]
我的程序下面的程序可以在新线路上打印每个序列以进行分析,但是该功能是我遇到问题的地方。我可以得到它以删除单个数值“ 675”并留下字母数字。 (从样本中删除675个)
a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
lines = [line.rstrip('\n') for line in file]
print(*lines, sep = "\n")
cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+$'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list = list(filter(lambda e:e.strip(), tokens))
cleaned_data.append(clean_list)
print(clean_list) # Quick check if function works
split_lines(lines, "/")
然后将其打印出类似的分离行 白色空间(其中“/”和数值)
[“ abc”,“ def”,“ 25GHI”,“ JKLM”]
我要做的是使用包含这些新界定行的“ cleaned_data”列表,并量化它们以输出以下内容:
4X [“ ABC”,“ DEF”,“ 25GHI”,“ JKLM”]
接下来,使用“ cleaned_data”读取每一行并打印一个重复的字符串计数?
I have a program I'm working on where I need to read a .txt file which has multiple rows of data that look like this:
[ABC/DEF//25GHI////JKLM//675//]
My program below can print each sequence on a new line for analysis, however the function is where I'm having issues. I can get it to remove the individual numerical values "675" and leave the alphanumeric ones. (Removes 675 from sample)
a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
lines = [line.rstrip('\n') for line in file]
print(*lines, sep = "\n")
cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+
This then prints out separated rows like this, removing the
white spaces (where "/" was, and numerical values)
["ABC", "DEF", "25GHI", "JKLM"]
What I'm trying to do is then use the "cleaned_data" list that contains these newly delimited rows, and quantify them to output this:
4x ["ABC", "DEF", "25GHI", "JKLM"]
What can I do next using "cleaned_data" to read each row and print a count of duplicate strings?
):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list = list(filter(lambda e:e.strip(), tokens))
cleaned_data.append(clean_list)
print(clean_list) # Quick check if function works
split_lines(lines, "/")
This then prints out separated rows like this, removing the
white spaces (where "/" was, and numerical values)
["ABC", "DEF", "25GHI", "JKLM"]
What I'm trying to do is then use the "cleaned_data" list that contains these newly delimited rows, and quantify them to output this:
4x ["ABC", "DEF", "25GHI", "JKLM"]
What can I do next using "cleaned_data" to read each row and print a count of duplicate strings?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
输出:
Output:
如果您只需要摆脱重复项:
如果您需要知道有多少重复刚刚从len(row_of_of_cleaned_data)中减去len(deduped_row_of_cleaned_data)。
如果您需要所有重复项的计数,则可以从付出的行中创建一个分配的空字典列表:
然后循环浏览列表以添加每个值:
通过字典的循环以获取计数:
之后,您在此之后,您在此中获得了defupuped数据。
和每个项目的计数
If you just need to get rid of duplicates:
If you need to know how many duplicates just subtract len(deduped_row_of_cleaned_data) from len(row_of_cleaned_data).
If you need a count of all duplicates you can create a list assigned empty dictionary from your deduped row:
Then loop through the list to add each value:
The loop through the dictionary to get the counts:
After that, you have the deduped data in
and counts of each item in