如何提交查询以使用Python在.aspx页面中提取表。 2022

发布于 2025-02-13 23:39:47 字数 2288 浏览 4 评论 0原文

我想从 https:///www.nasdaqtrader.com/trader.aspx中? id = Tradehalts 。我尝试了不同的方法,例如 href =“ https://stackoverflow.com/questions/18840100/how-to-to-start-a-query-from-a-a-static-website?noredirect = 1& amp;lq = 1”> this ,和<一个href =“ https://stackoverflow.com/questions/1480356/how-to-to-submit-query-to-aspx-page-in-python”> this 。

我可以删除静态页面,但仍然不太了解ASPX格式。我在这里复制我从首先

import urllib
from bs4 import BeautifulSoup

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': 'http://www.indiapost.gov.in',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': 'http://www.nitt.edu/prm/nitreg/ShowRes.aspx',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}

class MyOpener(urllib.request.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'https://www.nasdaqtrader.com/Trader.aspx?id=TradeHalts'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

formData = (
     ('__EVENTVALIDATION', eventvalidation),
     ('__VIEWSTATE', viewstate),
     ('__VIEWSTATEENCRYPTED', ''),
)

encodedFields = urllib.parse.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

# We use BeautifulSoup
soup = BeautifulSoup(f)

print(soup.content)

无法在内容中找到表信息。我想念什么?

I want to scrape data from https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts. I tried different approaches, like this, this, and this.

I could scrap static pages, but still don't understand the aspx format very well. I am copying here what I took from the first reference link:

import urllib
from bs4 import BeautifulSoup

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': 'http://www.indiapost.gov.in',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': 'http://www.nitt.edu/prm/nitreg/ShowRes.aspx',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}

class MyOpener(urllib.request.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'https://www.nasdaqtrader.com/Trader.aspx?id=TradeHalts'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

formData = (
     ('__EVENTVALIDATION', eventvalidation),
     ('__VIEWSTATE', viewstate),
     ('__VIEWSTATEENCRYPTED', ''),
)

encodedFields = urllib.parse.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

# We use BeautifulSoup
soup = BeautifulSoup(f)

print(soup.content)

I cannot find the table information in the content. What am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沙与沫 2025-02-20 23:39:48

要将数据作为熊猫数据框架获取下一个示例:

import requests
import pandas as pd
from io import StringIO


url = "https://www.nasdaqtrader.com/RPCHandler.axd"

headers = {
    "Referer": "https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts",
}

payload = {
    "id": 2,
    "method": "BL_TradeHalt.GetTradeHalts",
    "params": "[]",
    "version": "1.1",
}

data = requests.post(url, json=payload, headers=headers).json()
data = StringIO(data["result"])

df = pd.read_html(data)[0]
print(df.head(10).to_markdown(index=False))

打印:

停止日期停止时间问题符号符号问题名称市场原因暂停阈值恢复日期恢复日期恢复时间恢复时间恢复时间
07/06/06/202215:57:38COMSP9.25% SRS A CMLTV REDM PRF STKNASDAQLUDPNAN07/06/202215:57:38NAN
07/06/202212:51:35BRPMU BRPMURILEY校长150 MERGNASDAQLUDPNAN07/06/06/202212:51:3512:56:35
B.UT :06VACCVACCITECH PLC ADSNASDAQLUDPNAN07/06/202212:06:0612:16:06
07/06/202211:15:10USEAUnited Maritime Corp CM STNASDAQLUDPNAN07/06/06/202211:15: 1507/07/06/06/2022
:15:29:29: 2510 :28:53UseaUnited Maritime Corp CM STNASDAQLUDPNAN07/06/202210:28:5310:43:30
07/06/202210:18:19USEAUnited Maritime Corp CM STNASDAQLUDPNAN07/06/06/202210:18:19 10:28:1910:19
07/06/06/202209 :41:43gambgambling.com组OSNASDAQLUDPNAN07/06/202209:41:4309:46:43
07/06/202209:37:16USEAUnited Maritime Corp CM STNASDAQLUDPNAN07/06/06/202209:37:37:16 10:17:17:41 07/06/2022
09/202209 :31:15JJNIPATHA系列B彭博镍亚索引总回报ETNNYSE ARCAMNAN07/06/202209:36:1509:36:15
07/06/202209:31:17AMTI AMTIAPLIED分子运输CMNASDAQLUDPNAN07/06/06/202209:31:17 09:36:17 09:36:09:36:09 :36: 17

To get the data as pandas DataFrame you can use next example:

import requests
import pandas as pd
from io import StringIO


url = "https://www.nasdaqtrader.com/RPCHandler.axd"

headers = {
    "Referer": "https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts",
}

payload = {
    "id": 2,
    "method": "BL_TradeHalt.GetTradeHalts",
    "params": "[]",
    "version": "1.1",
}

data = requests.post(url, json=payload, headers=headers).json()
data = StringIO(data["result"])

df = pd.read_html(data)[0]
print(df.head(10).to_markdown(index=False))

Prints:

Halt DateHalt TimeIssue SymbolIssue NameMarketReason CodesPause Threshold PriceResumption DateResumption Quote TimeResumption Trade Time
07/06/202215:57:38COMSP9.25% Srs A Cmltv Redm Prf StkNASDAQLUDPnan07/06/202215:57:38nan
07/06/202212:51:35BRPMUB. Riley Principal 150 Merg UtNASDAQLUDPnan07/06/202212:51:3512:56:35
07/06/202212:06:06VACCVaccitech plc ADSNASDAQLUDPnan07/06/202212:06:0612:16:06
07/06/202211:15:10USEAUnited Maritime Corp Cm StNASDAQLUDPnan07/06/202211:15:1011:29:25
07/06/202210:28:53USEAUnited Maritime Corp Cm StNASDAQLUDPnan07/06/202210:28:5310:43:30
07/06/202210:18:19USEAUnited Maritime Corp Cm StNASDAQLUDPnan07/06/202210:18:1910:28:19
07/06/202209:41:43GAMBGambling.com Group OsNASDAQLUDPnan07/06/202209:41:4309:46:43
07/06/202209:37:16USEAUnited Maritime Corp Cm StNASDAQLUDPnan07/06/202209:37:1610:17:41
07/06/202209:31:15JJNiPathA Series B Bloomberg Nickel Subindex Total Return ETNNYSE ArcaMnan07/06/202209:36:1509:36:15
07/06/202209:31:17AMTIApplied Molecular Transport CmNASDAQLUDPnan07/06/202209:31:1709:36:17
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文