从政府网站正确删除CSV文件
我是Python的新手 - 而且我正在尝试刮擦该网站 https:// https:// itc.aeso.ca/itc/public/atc/historic/ 。我找到了目标链接(我看到字符串URL是如何操纵到这样的“ https://itc.aeso.ca/itc/itc/public/api/api/v2/interchange?startdate = 20220517&amp” 1& pageize = 3“
我试图使用请求和熊猫来解析两种不同的方法,并且两者都给了我很难使用 /诊断的数据框。
import requests
import pandas as pd
import json
payload = {'startDate': '20220517', 'endDate': '20220519'}
r = requests.get("https://itc.aeso.ca/itc/public/atc/historic/", params=payload)
csv_url = "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"
req = requests.get(csv_url)
url_content = req.content
ss = url_content
dd=json.loads(req.text)
这给了我看起来像这样的数据< / strong < / strong < / strong >
{'message': 'Report successfully retrieved.',
'responseCode': '200',
'localTimestamp': '2022-05-19 10:17:59 MDT',
'return': {'BcIntertie': {'Allocations': [{'flowgate': False,
'date': '2022-05-17',
'he': '1',
'import': {'transferType': 'BC_IMPORT',
'reason': '2L113 outage',
'atc': 670,
'trmTotal': 130,
'ttc': 800,
'trmSystem': 130,
'trmAllocation': 0,
'grossOffer': 50,
'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
'updatedLocalTime': '2022-05-16 19:35:40 MDT'},
'export': {'transferType': 'BC_EXPORT',
'reason': '',
'atc': 950,
'trmTotal': 50,
'ttc': 1000,
'trmSystem': 50,
'trmAllocation': 0,
'grossOffer': 0,
'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
'updatedLocalTime': '2021-04-17 09:13:00 MDT'}},
{'flowgate': False,
'date': '2022-05-17',
'he': '2',
'import': {'transferType': 'BC_IMPORT',
'reason': '2L113 outage',
'atc': 670,
'trmTotal': 130,
'ttc': 800,
'trmSystem': 130,
'trmAllocation': 0,
'grossOffer': 50,
'effectiveLocalTime': '2022-05-17 01:00:00 MDT',
'updatedLocalTime': '2022-05-16 22:35:40 MDT'},
'export': {'transferType': 'BC_EXPORT',
'reason': '',
'atc': 950,
我尝试的第二种方法是使用Pandas,
import pandas as pd
df_names = pd.read_csv("https://itc.aeso.ca/itc/public/api/v2/interchange?
startDate=20220517&endDate=20220519&pageNo=1&pageSize=3")`
这给了我所有的列,而
{"message":"Report successfully retrieved." responseCode:"200" localTimestamp:"2022-05-19 11:52:53 MDT" return:{"BcIntertie":{"Allocations":[{"flowgate":false date:"2022-05-17" he:"1" import:{"transferType":"BC_IMPORT" reason:"2L113 outage" atc:670 trmTotal:130 ... updatedLocalTime:"2022-05-19 07:59:40 MDT"}.3 export:{"transferType":"SYSTEM_EXPORT".71 reason:"".551 atc:1088.71 trmTotal:65.224 ttc:1153.71 trmSystem:65.225 grossOffer:0.387 effectiveLocalTime:"2022-05-19 23:00:00 MDT".9 updatedLocalTime:"2021-04-19 09:12:59 MDT"}}]}}}
我在网上发现的一切都如此干净,我尝试了许多使用Gzip等的东西。不确定我要去哪里错误的。
I am very new to python - and I am trying to scrape this website https://itc.aeso.ca/itc/public/atc/historic/. I have found the target link (I see how the string url is manipulated to something like this "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"
I have tried to parse it two different ways, using requests and pandas, and both have given me dataframes that are really hard to use / diagnose.
import requests
import pandas as pd
import json
payload = {'startDate': '20220517', 'endDate': '20220519'}
r = requests.get("https://itc.aeso.ca/itc/public/atc/historic/", params=payload)
csv_url = "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"
req = requests.get(csv_url)
url_content = req.content
ss = url_content
dd=json.loads(req.text)
This gave me data that looks like this
{'message': 'Report successfully retrieved.',
'responseCode': '200',
'localTimestamp': '2022-05-19 10:17:59 MDT',
'return': {'BcIntertie': {'Allocations': [{'flowgate': False,
'date': '2022-05-17',
'he': '1',
'import': {'transferType': 'BC_IMPORT',
'reason': '2L113 outage',
'atc': 670,
'trmTotal': 130,
'ttc': 800,
'trmSystem': 130,
'trmAllocation': 0,
'grossOffer': 50,
'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
'updatedLocalTime': '2022-05-16 19:35:40 MDT'},
'export': {'transferType': 'BC_EXPORT',
'reason': '',
'atc': 950,
'trmTotal': 50,
'ttc': 1000,
'trmSystem': 50,
'trmAllocation': 0,
'grossOffer': 0,
'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
'updatedLocalTime': '2021-04-17 09:13:00 MDT'}},
{'flowgate': False,
'date': '2022-05-17',
'he': '2',
'import': {'transferType': 'BC_IMPORT',
'reason': '2L113 outage',
'atc': 670,
'trmTotal': 130,
'ttc': 800,
'trmSystem': 130,
'trmAllocation': 0,
'grossOffer': 50,
'effectiveLocalTime': '2022-05-17 01:00:00 MDT',
'updatedLocalTime': '2022-05-16 22:35:40 MDT'},
'export': {'transferType': 'BC_EXPORT',
'reason': '',
'atc': 950,
The second way I tried was using pandas
import pandas as pd
df_names = pd.read_csv("https://itc.aeso.ca/itc/public/api/v2/interchange?
startDate=20220517&endDate=20220519&pageNo=1&pageSize=3")`
This gave me data that was all columns and only one row.
{"message":"Report successfully retrieved." responseCode:"200" localTimestamp:"2022-05-19 11:52:53 MDT" return:{"BcIntertie":{"Allocations":[{"flowgate":false date:"2022-05-17" he:"1" import:{"transferType":"BC_IMPORT" reason:"2L113 outage" atc:670 trmTotal:130 ... updatedLocalTime:"2022-05-19 07:59:40 MDT"}.3 export:{"transferType":"SYSTEM_EXPORT".71 reason:"".551 atc:1088.71 trmTotal:65.224 ttc:1153.71 trmSystem:65.225 grossOffer:0.387 effectiveLocalTime:"2022-05-19 23:00:00 MDT".9 updatedLocalTime:"2021-04-19 09:12:59 MDT"}}]}}}
Everything I find online has it working so cleanly, I have tried numerous solutions using stuff like GZip and etc. Not sure where I am going wrong.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它正在返回JSON字符串。您可以使用JSON库进行转换。
或可能更好的是,请求中内置了JSON:
如何访问返回数据集中数据的一个真实示例将是:
It is returning a json string. You can convert it using the json library.
Or probably better yet, requests has json built into it:
A real example of how to access data in your returned dataset would then be: