程序要读取数据，由定界符分开，删除白色空间然后计数

发布于 2025-01-26 09:01:21 字数 1022 浏览 1 评论 0原文

我有一个程序正在努力，需要读取一个.txt文件，该文件具有多个看起来像这样的数据：

[ABC/DEF // 25GHI //// JKLM // 675 //]

我的程序下面的程序可以在新线路上打印每个序列以进行分析，但是该功能是我遇到问题的地方。我可以得到它以删除单个数值“ 675”并留下字母数字。（从样本中删除675个）

a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
  lines = [line.rstrip('\n') for line in file]
  print(*lines, sep = "\n")

cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+$'):
  for line in lines:
    tokens = line.split(delimiter)
    tokens = [re.sub(remove, "", token) for token in tokens]
    clean_list = list(filter(lambda e:e.strip(), tokens))
    cleaned_data.append(clean_list)
    print(clean_list) # Quick check if function works
split_lines(lines, "/")

然后将其打印出类似的分离行白色空间（其中“/”和数值）

[“ abc”，“ def”，“ 25GHI”，“ JKLM”]

我要做的是使用包含这些新界定行的“ cleaned_data”列表，并量化它们以输出以下内容：

4X [“ ABC”，“ DEF”，“ 25GHI”，“ JKLM”]

接下来，使用“ cleaned_data”读取每一行并打印一个重复的字符串计数？

原文

I have a program I'm working on where I need to read a .txt file which has multiple rows of data that look like this:

[ABC/DEF//25GHI////JKLM//675//]

My program below can print each sequence on a new line for analysis, however the function is where I'm having issues. I can get it to remove the individual numerical values "675" and leave the alphanumeric ones. (Removes 675 from sample)

a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
  lines = [line.rstrip('\n') for line in file]
  print(*lines, sep = "\n")

cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+
This then prints out separated rows like this, removing the

white spaces (where "/" was, and numerical values)

["ABC", "DEF", "25GHI", "JKLM"]

What I'm trying to do is then use the "cleaned_data" list that contains these newly delimited rows, and quantify them to output this:

4x ["ABC", "DEF", "25GHI", "JKLM"]

What can I do next using "cleaned_data" to read each row and print a count of duplicate strings?
):
  for line in lines:
    tokens = line.split(delimiter)
    tokens = [re.sub(remove, "", token) for token in tokens]
    clean_list = list(filter(lambda e:e.strip(), tokens))
    cleaned_data.append(clean_list)
    print(clean_list) # Quick check if function works
split_lines(lines, "/")

This then prints out separated rows like this, removing the
white spaces (where "/" was, and numerical values)

["ABC", "DEF", "25GHI", "JKLM"]

What I'm trying to do is then use the "cleaned_data" list that contains these newly delimited rows, and quantify them to output this:

4x ["ABC", "DEF", "25GHI", "JKLM"]

What can I do next using "cleaned_data" to read each row and print a count of duplicate strings?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

擦肩而过的背影 2025-02-02 09:01:21

from pprint import pprint

unique_data = {}
cleaned_data = [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 3, 4, 5, 'a', 'b', [1, 2,
                                                                       ],
                [1, 2, ]]
for item in cleaned_data:
    key = str(item) # convert mutable objects like list to immutable string.
    if not unique_data.get(key):  # key does not exist
        unique_data[key] = 1, item  # Add count of 1 and the data
    else:  # A duplicate has been encountered
        # Increment the count
        unique_data[key] = (unique_data[key][0] + 1), item

for k, v in unique_data.items():
    print(f"{v[0]}:{v[1]}")

输出：

1:1
1:2
2:3
2:4
2:5
2:a
2:b
1:c
1:d
2:[1, 2]

from pprint import pprint

unique_data = {}
cleaned_data = [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 3, 4, 5, 'a', 'b', [1, 2,
                                                                       ],
                [1, 2, ]]
for item in cleaned_data:
    key = str(item) # convert mutable objects like list to immutable string.
    if not unique_data.get(key):  # key does not exist
        unique_data[key] = 1, item  # Add count of 1 and the data
    else:  # A duplicate has been encountered
        # Increment the count
        unique_data[key] = (unique_data[key][0] + 1), item

for k, v in unique_data.items():
    print(f"{v[0]}:{v[1]}")

Output:

1:1
1:2
2:3
2:4
2:5
2:a
2:b
1:c
1:d
2:[1, 2]

回复收藏 0 原文

遥远的绿洲 2025-02-02 09:01:21

如果您只需要摆脱重复项：

    deduped_row_of_cleaned_data = list(set(row_of_cleaned_data))

如果您需要知道有多少重复刚刚从len（row_of_of_cleaned_data）中减去len（deduped_row_of_cleaned_data）。

如果您需要所有重复项的计数，则可以从付出的行中创建一个分配的空字典列表：

    empty_dict=dict.from_keys(list(set(row_of_cleaned_data)),[])

然后循环浏览列表以添加每个值：

    for item in row_of_cleaned_data:
        empty_dict[item].append(item)

通过字典的循环以获取计数：

    for key, value in empty_dict.items():
        empty_dict[key] = len(value)

之后，您在此之后，您在此中获得了defupuped数据。

    list(empty_dict.keys())

和每个项目的计数

    list(empty_dict.values()).

If you just need to get rid of duplicates:

    deduped_row_of_cleaned_data = list(set(row_of_cleaned_data))

If you need to know how many duplicates just subtract len(deduped_row_of_cleaned_data) from len(row_of_cleaned_data).

If you need a count of all duplicates you can create a list assigned empty dictionary from your deduped row:

    empty_dict=dict.from_keys(list(set(row_of_cleaned_data)),[])

Then loop through the list to add each value:

    for item in row_of_cleaned_data:
        empty_dict[item].append(item)

The loop through the dictionary to get the counts:

    for key, value in empty_dict.items():
        empty_dict[key] = len(value)

After that, you have the deduped data in

    list(empty_dict.keys())

and counts of each item in

    list(empty_dict.values()).

回复收藏 0 原文

~没有更多了~

关于作者

等数载，海棠开

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

程序要读取数据，由定界符分开，删除白色空间然后计数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

程序要读取数据，由定界符分开，删除白色空间然后计数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。