通过JSON文件循环并删除特定字符串的速度更快?
我有以下表单的JSON文件:
{'query': {'tool': 'domainquery', 'query': 'example.org'},
'response': {'result_count': '1',
'total_pages': '1',
'current_page': '1',
'matches': [{'domain': 'example2.org',
'created_date': '2015-07-25',
'registrar': 'registrar_10'}]}}
我有以下表格的列表:
removal_list=["example2.org","example3.org"...]
我试图循环浏览emoval_list,并从JSON文件中删除每个项目的所有实例。问题是计算需要多长时间,而emoval_list包含110,000个项目。我试图通过使用set()和isdisjoint来更快地做到这一点,但这似乎并不能更快。
我目前必须这样做的代码是:
removal_list= set(removal_list)
for domain in removal_list:
for i in range(len(JSON_file)):
if int(JSON_file[i]['response']['result_count'])>0:
for j in range(len(JSON_file[i]['response']['matches'])):
for item in JSON_file[i]['response']['matches'][j]['domain']:
if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
del(JSON_file[i]['response']['matches'][j]['domain'])
else:
pass
是否有人对如何加速此过程有任何建议?提前致谢。
I have a JSON file of the following form:
{'query': {'tool': 'domainquery', 'query': 'example.org'},
'response': {'result_count': '1',
'total_pages': '1',
'current_page': '1',
'matches': [{'domain': 'example2.org',
'created_date': '2015-07-25',
'registrar': 'registrar_10'}]}}
I have a list of the following form:
removal_list=["example2.org","example3.org"...]
I am trying to loop through the removal_list and remove all instances of each item from the JSON file. The issue is how long it takes to compute, with removal_list containing 110,000 items. I have tried to make this faster by using set() and isdisjoint, but this does not make it any faster it seems.
The code I currently have to do this is:
removal_list= set(removal_list)
for domain in removal_list:
for i in range(len(JSON_file)):
if int(JSON_file[i]['response']['result_count'])>0:
for j in range(len(JSON_file[i]['response']['matches'])):
for item in JSON_file[i]['response']['matches'][j]['domain']:
if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
del(JSON_file[i]['response']['matches'][j]['domain'])
else:
pass
Does anyone have any suggestions on how to speed this process up? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题中的循环是“倒置”。也就是说,应列举并检查JSON_FILE(显然是字典列表),以查看“匹配”列表中是否有任何字典在ememoval_list中具有域。
让我们在JSON_FILE列表中只有两个词典,然后显示代码来处理它们。
假设:
如果result_count为非零,则将有一个非空的“匹配”列表,这意味着无需明确检查'result_count value'
注意:注意:
需要Python 3.8+
The looping in the question is 'inverted'. That is to say that JSON_File (which is clearly a list of dictionaries) should be enumerated and examined to see if there are any dictionaries within the 'matches' list that have a domain in the removal_list.
Let's have just two dictionaries in the JSON_File list and then show the code to process them.
Assumption:
if result_count is non-zero then there will be a non-empty 'matches' list which means that there's no need to explicitly examine the 'result_count value'
Note:
Requires Python 3.8+