从政府网站正确删除CSV文件

发布于 2025-01-30 14:46:22 字数 3142 浏览 2 评论 0原文

我是Python的新手 - 而且我正在尝试刮擦该网站 https:// https:// itc.aeso.ca/itc/public/atc/historic/ 。我找到了目标链接(我看到字符串URL是如何操纵到这样的“ https://itc.aeso.ca/itc/itc/public/api/api/v2/interchange?startdate = 20220517&amp” 1& pageize = 3“

我试图使用请求和熊猫来解析两种不同的方法,并且两者都给了我很难使用 /诊断的数据框。

import requests
import pandas as pd
import json

payload = {'startDate': '20220517', 'endDate': '20220519'}
r = requests.get("https://itc.aeso.ca/itc/public/atc/historic/", params=payload)
csv_url = "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"
req = requests.get(csv_url)
url_content = req.content
ss = url_content
dd=json.loads(req.text)

这给了我看起来像这样的数据< / strong < / strong < / strong >

{'message': 'Report successfully retrieved.',
 'responseCode': '200',
 'localTimestamp': '2022-05-19 10:17:59 MDT',
 'return': {'BcIntertie': {'Allocations': [{'flowgate': False,
     'date': '2022-05-17',
     'he': '1',
     'import': {'transferType': 'BC_IMPORT',
      'reason': '2L113 outage',
      'atc': 670,
      'trmTotal': 130,
      'ttc': 800,
      'trmSystem': 130,
      'trmAllocation': 0,
      'grossOffer': 50,
      'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
      'updatedLocalTime': '2022-05-16 19:35:40 MDT'},
     'export': {'transferType': 'BC_EXPORT',
      'reason': '',
      'atc': 950,
      'trmTotal': 50,
      'ttc': 1000,
      'trmSystem': 50,
      'trmAllocation': 0,
      'grossOffer': 0,
      'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
      'updatedLocalTime': '2021-04-17 09:13:00 MDT'}},
    {'flowgate': False,
     'date': '2022-05-17',
     'he': '2',
     'import': {'transferType': 'BC_IMPORT',
      'reason': '2L113 outage',
      'atc': 670,
      'trmTotal': 130,
      'ttc': 800,
      'trmSystem': 130,
      'trmAllocation': 0,
      'grossOffer': 50,
      'effectiveLocalTime': '2022-05-17 01:00:00 MDT',
      'updatedLocalTime': '2022-05-16 22:35:40 MDT'},
     'export': {'transferType': 'BC_EXPORT',
      'reason': '',
      'atc': 950,

我尝试的第二种方法是使用Pandas,

import pandas as pd    

df_names = pd.read_csv("https://itc.aeso.ca/itc/public/api/v2/interchange? 
    startDate=20220517&endDate=20220519&pageNo=1&pageSize=3")`

这给了我所有的列,而

{"message":"Report successfully retrieved." responseCode:"200"  localTimestamp:"2022-05-19 11:52:53 MDT"    return:{"BcIntertie":{"Allocations":[{"flowgate":false  date:"2022-05-17"   he:"1"  import:{"transferType":"BC_IMPORT"  reason:"2L113 outage"   atc:670 trmTotal:130    ... updatedLocalTime:"2022-05-19 07:59:40 MDT"}.3   export:{"transferType":"SYSTEM_EXPORT".71   reason:"".551   atc:1088.71 trmTotal:65.224 ttc:1153.71 trmSystem:65.225    grossOffer:0.387    effectiveLocalTime:"2022-05-19 23:00:00 MDT".9  updatedLocalTime:"2021-04-19 09:12:59 MDT"}}]}}}

我在网上发现的一切都如此干净,我尝试了许多使用Gzip等的东西。不确定我要去哪里错误的。

I am very new to python - and I am trying to scrape this website https://itc.aeso.ca/itc/public/atc/historic/. I have found the target link (I see how the string url is manipulated to something like this "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"

I have tried to parse it two different ways, using requests and pandas, and both have given me dataframes that are really hard to use / diagnose.

import requests
import pandas as pd
import json

payload = {'startDate': '20220517', 'endDate': '20220519'}
r = requests.get("https://itc.aeso.ca/itc/public/atc/historic/", params=payload)
csv_url = "https://itc.aeso.ca/itc/public/api/v2/interchange?startDate=20220517&endDate=20220519&pageNo=1&pageSize=3"
req = requests.get(csv_url)
url_content = req.content
ss = url_content
dd=json.loads(req.text)

This gave me data that looks like this

{'message': 'Report successfully retrieved.',
 'responseCode': '200',
 'localTimestamp': '2022-05-19 10:17:59 MDT',
 'return': {'BcIntertie': {'Allocations': [{'flowgate': False,
     'date': '2022-05-17',
     'he': '1',
     'import': {'transferType': 'BC_IMPORT',
      'reason': '2L113 outage',
      'atc': 670,
      'trmTotal': 130,
      'ttc': 800,
      'trmSystem': 130,
      'trmAllocation': 0,
      'grossOffer': 50,
      'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
      'updatedLocalTime': '2022-05-16 19:35:40 MDT'},
     'export': {'transferType': 'BC_EXPORT',
      'reason': '',
      'atc': 950,
      'trmTotal': 50,
      'ttc': 1000,
      'trmSystem': 50,
      'trmAllocation': 0,
      'grossOffer': 0,
      'effectiveLocalTime': '2022-05-17 00:00:00 MDT',
      'updatedLocalTime': '2021-04-17 09:13:00 MDT'}},
    {'flowgate': False,
     'date': '2022-05-17',
     'he': '2',
     'import': {'transferType': 'BC_IMPORT',
      'reason': '2L113 outage',
      'atc': 670,
      'trmTotal': 130,
      'ttc': 800,
      'trmSystem': 130,
      'trmAllocation': 0,
      'grossOffer': 50,
      'effectiveLocalTime': '2022-05-17 01:00:00 MDT',
      'updatedLocalTime': '2022-05-16 22:35:40 MDT'},
     'export': {'transferType': 'BC_EXPORT',
      'reason': '',
      'atc': 950,

The second way I tried was using pandas

import pandas as pd    

df_names = pd.read_csv("https://itc.aeso.ca/itc/public/api/v2/interchange? 
    startDate=20220517&endDate=20220519&pageNo=1&pageSize=3")`

This gave me data that was all columns and only one row.

{"message":"Report successfully retrieved." responseCode:"200"  localTimestamp:"2022-05-19 11:52:53 MDT"    return:{"BcIntertie":{"Allocations":[{"flowgate":false  date:"2022-05-17"   he:"1"  import:{"transferType":"BC_IMPORT"  reason:"2L113 outage"   atc:670 trmTotal:130    ... updatedLocalTime:"2022-05-19 07:59:40 MDT"}.3   export:{"transferType":"SYSTEM_EXPORT".71   reason:"".551   atc:1088.71 trmTotal:65.224 ttc:1153.71 trmSystem:65.225    grossOffer:0.387    effectiveLocalTime:"2022-05-19 23:00:00 MDT".9  updatedLocalTime:"2021-04-19 09:12:59 MDT"}}]}}}

Everything I find online has it working so cleanly, I have tried numerous solutions using stuff like GZip and etc. Not sure where I am going wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

七月上 2025-02-06 14:46:22

它正在返回JSON字符串。您可以使用JSON库进行转换。

import json

json_resp = json.loads(req.text)

或可能更好的是,请求中内置了JSON:

json_resp = req.json()

如何访问返回数据集中数据的一个真实示例将是:

    print(json_resp['return']['BcIntertie']['Allocations'][0]['date'])
returns:
    2022-05-17

It is returning a json string. You can convert it using the json library.

import json

json_resp = json.loads(req.text)

Or probably better yet, requests has json built into it:

json_resp = req.json()

A real example of how to access data in your returned dataset would then be:

    print(json_resp['return']['BcIntertie']['Allocations'][0]['date'])
returns:
    2022-05-17
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文