通过JSON文件循环并删除特定字符串的速度更快？

发布于 2025-01-31 11:18:22 字数 1176 浏览 1 评论 0原文

我有以下表单的JSON文件：

 {'query': {'tool': 'domainquery', 'query': 'example.org'},
 'response': {'result_count': '1',
  'total_pages': '1',
  'current_page': '1',
  'matches': [{'domain': 'example2.org',
    'created_date': '2015-07-25',
    'registrar': 'registrar_10'}]}}

我有以下表格的列表：

removal_list=["example2.org","example3.org"...]

我试图循环浏览emoval_list，并从JSON文件中删除每个项目的所有实例。问题是计算需要多长时间，而emoval_list包含110,000个项目。我试图通过使用set（）和isdisjoint来更快地做到这一点，但这似乎并不能更快。

我目前必须这样做的代码是：

    removal_list= set(removal_list)
    for domain in removal_list:
        for i in range(len(JSON_file)):
            if int(JSON_file[i]['response']['result_count'])>0:  
                for j in range(len(JSON_file[i]['response']['matches'])):
                    for item in JSON_file[i]['response']['matches'][j]['domain']:
                        if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
                            del(JSON_file[i]['response']['matches'][j]['domain'])
                        else: 
                            pass

是否有人对如何加速此过程有任何建议？提前致谢。

原文

I have a JSON file of the following form:

 {'query': {'tool': 'domainquery', 'query': 'example.org'},
 'response': {'result_count': '1',
  'total_pages': '1',
  'current_page': '1',
  'matches': [{'domain': 'example2.org',
    'created_date': '2015-07-25',
    'registrar': 'registrar_10'}]}}

I have a list of the following form:

removal_list=["example2.org","example3.org"...]

I am trying to loop through the removal_list and remove all instances of each item from the JSON file. The issue is how long it takes to compute, with removal_list containing 110,000 items. I have tried to make this faster by using set() and isdisjoint, but this does not make it any faster it seems.

The code I currently have to do this is:

    removal_list= set(removal_list)
    for domain in removal_list:
        for i in range(len(JSON_file)):
            if int(JSON_file[i]['response']['result_count'])>0:  
                for j in range(len(JSON_file[i]['response']['matches'])):
                    for item in JSON_file[i]['response']['matches'][j]['domain']:
                        if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
                            del(JSON_file[i]['response']['matches'][j]['domain'])
                        else: 
                            pass

Does anyone have any suggestions on how to speed this process up? Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾铮苏瑾 2025-02-07 11:18:22

问题中的循环是“倒置”。也就是说，应列举并检查JSON_FILE（显然是字典列表），以查看“匹配”列表中是否有任何字典在ememoval_list中具有域。

让我们在JSON_FILE列表中只有两个词典，然后显示代码来处理它们。

removal_list = {"example2.org", "example3.org"}

d1 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example2.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}
d2 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example3.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}

JSON_File = [d1, d2]

for j in JSON_File:
    if matches := j['response'].get('matches'):
        for match in matches:
            if match.get('domain') in removal_list:
                del match['domain']

print(JSON_File)

假设：

如果result_count为非零，则将有一个非空的“匹配”列表，这意味着无需明确检查'result_count value'

注意：注意：

需要Python 3.8+

The looping in the question is 'inverted'. That is to say that JSON_File (which is clearly a list of dictionaries) should be enumerated and examined to see if there are any dictionaries within the 'matches' list that have a domain in the removal_list.

Let's have just two dictionaries in the JSON_File list and then show the code to process them.

removal_list = {"example2.org", "example3.org"}

d1 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example2.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}
d2 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example3.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}

JSON_File = [d1, d2]

for j in JSON_File:
    if matches := j['response'].get('matches'):
        for match in matches:
            if match.get('domain') in removal_list:
                del match['domain']

print(JSON_File)

Assumption:

if result_count is non-zero then there will be a non-empty 'matches' list which means that there's no need to explicitly examine the 'result_count value'

Note:

Requires Python 3.8+

回复收藏 0 原文

~没有更多了~