通过JSON文件循环并删除特定字符串的速度更快?

发布于 2025-01-31 11:18:22 字数 1176 浏览 1 评论 0原文

我有以下表单的JSON文件:

 {'query': {'tool': 'domainquery', 'query': 'example.org'},
 'response': {'result_count': '1',
  'total_pages': '1',
  'current_page': '1',
  'matches': [{'domain': 'example2.org',
    'created_date': '2015-07-25',
    'registrar': 'registrar_10'}]}}

我有以下表格的列表:

removal_list=["example2.org","example3.org"...]

我试图循环浏览emoval_list,并从JSON文件中删除每个项目的所有实例。问题是计算需要多长时间,而emoval_list包含110,000个项目。我试图通过使用set()和isdisjoint来更快地做到这一点,但这似乎并不能更快。

我目前必须这样做的代码是:

    removal_list= set(removal_list)
    for domain in removal_list:
        for i in range(len(JSON_file)):
            if int(JSON_file[i]['response']['result_count'])>0:  
                for j in range(len(JSON_file[i]['response']['matches'])):
                    for item in JSON_file[i]['response']['matches'][j]['domain']:
                        if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
                            del(JSON_file[i]['response']['matches'][j]['domain'])
                        else: 
                            pass

是否有人对如何加速此过程有任何建议?提前致谢。

I have a JSON file of the following form:

 {'query': {'tool': 'domainquery', 'query': 'example.org'},
 'response': {'result_count': '1',
  'total_pages': '1',
  'current_page': '1',
  'matches': [{'domain': 'example2.org',
    'created_date': '2015-07-25',
    'registrar': 'registrar_10'}]}}

I have a list of the following form:

removal_list=["example2.org","example3.org"...]

I am trying to loop through the removal_list and remove all instances of each item from the JSON file. The issue is how long it takes to compute, with removal_list containing 110,000 items. I have tried to make this faster by using set() and isdisjoint, but this does not make it any faster it seems.

The code I currently have to do this is:

    removal_list= set(removal_list)
    for domain in removal_list:
        for i in range(len(JSON_file)):
            if int(JSON_file[i]['response']['result_count'])>0:  
                for j in range(len(JSON_file[i]['response']['matches'])):
                    for item in JSON_file[i]['response']['matches'][j]['domain']:
                        if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
                            del(JSON_file[i]['response']['matches'][j]['domain'])
                        else: 
                            pass

Does anyone have any suggestions on how to speed this process up? Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

顾铮苏瑾 2025-02-07 11:18:22

问题中的循环是“倒置”。也就是说,应列举并检查JSON_FILE(显然是字典列表),以查看“匹配”列表中是否有任何字典在ememoval_list中具有域。

让我们在JSON_FILE列表中只有两个词典,然后显示代码来处理它们。

removal_list = {"example2.org", "example3.org"}

d1 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example2.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}
d2 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example3.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}

JSON_File = [d1, d2]

for j in JSON_File:
    if matches := j['response'].get('matches'):
        for match in matches:
            if match.get('domain') in removal_list:
                del match['domain']

print(JSON_File)

假设:

如果result_count为非零,则将有一个非空的“匹配”列表,这意味着无需明确检查'result_count value'

注意:注意:

需要Python 3.8+

The looping in the question is 'inverted'. That is to say that JSON_File (which is clearly a list of dictionaries) should be enumerated and examined to see if there are any dictionaries within the 'matches' list that have a domain in the removal_list.

Let's have just two dictionaries in the JSON_File list and then show the code to process them.

removal_list = {"example2.org", "example3.org"}

d1 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example2.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}
d2 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example3.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}

JSON_File = [d1, d2]

for j in JSON_File:
    if matches := j['response'].get('matches'):
        for match in matches:
            if match.get('domain') in removal_list:
                del match['domain']

print(JSON_File)

Assumption:

if result_count is non-zero then there will be a non-empty 'matches' list which means that there's no need to explicitly examine the 'result_count value'

Note:

Requires Python 3.8+

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文