如何以Pythonic的方式保持标题,API参数&请求[Web Scraper的对象]分开但易于访问和可变的对象?
我想编写一个具有3种不同属性“组”(或数据)的刮刀,这些属性(或数据)可能会单独保存。
我希望使用数据类别并针对Pythonic实践,但是由于以后更详细说明的原因,数据级不适合。
3组[或“接口”]如下:
#1:HTTP Header Fields
- 具有默认值,但是需要在对象实例化(#)的对象实例化时 (#) 3)请求类对象
- 在#3内部使用请求方法
#2:api参数
for url查询请求中
- 使用默认值时,请求类对象的行为就像是dict。 /strong>在实例化之后,
- 理想情况下使用#3内部使用请求方法
#3:响应对象
(数据)在请求从API服务器返回用户后。
- 稍后,我将实现该对象的方法以具有输出格式,例如
csv
,json
,sql db
,s3
等等。这至少是第四个接口。
我一直在尝试完成的任务,
我想要一个界面,用户可以在其中使用他们需要的API参数实例化类,例如player
,并且还可以更新HTTP Header(根据需要)。
http header
和api params
都可以轻松存储为Python dict
(或JSON)。我在下面加入了它们。
=> 问题是,我如何使它们在实例化(创建)的请求对象(类)中可变,并且能够在实例化(创建)之后进行更新?
通过dataclasses继承?我试图将这些字典放在数据级别中,但它们不喜欢它们链接,因为它是尝试从Dataclass模块中浏览
default_factory
field 的default_factory
。有可能,但是使用数据级避免所有额外的语法。使用dataclasses也使其成为myDataClass .__ dict __
比pythonclass .__ dict __
拥有更多的东西。 =>因此,使用常规的python类或dict ...:似乎有两个选项可以使HTTP标头的可变性在创建时。 1)
继承
,但这将HTTP标头属性的水与API参数混淆。 2)coption
,将属性字段设置为httpclassheader并进行一些工作以能够转换回dict
以在request> request_data()方法。
将dicts放入播放器(请求类)不允许通过不错的关键字接口进行突变性(或者我不知道如何实现它)。
这是我的文本形式的我的代码:
class Players:
__endpoint__ = "CommonallPlayers"
def __init__(self, IsOnlyCurrentSeason=0, LeagueID="00", Season="2021-22", header= HTTPHeader) -> None:
# these first 3 attributes constitute the (#2) API Params
self.IsOnlyCurrentSeason = IsOnlyCurrentSeason
self.LeagueID = LeagueID
self.Season = Season
self.header = HTTPHeader # (1) inherit as a Class or Dict?
def encode_api_params(self):
return self.__dict__ # if only 3 attributes, this works, but not if I add more attributes HTTP or self.request_data
def get_http_header(self):
# ideally can return the http_header as a dict
pass
# ideally this is NOT instantiated (as doesn't have data, shouldn't be accessible to user until AFTER request)
def request_data(self):
url_api = f"{BASE_URL}/{self.__endpoint__}"
return requests.get(url_api,
params=self.encode_api_params(),
headers=self.get_http_header())
# works, has current defaults (current season)
c = Players()
# a common use case, using a different Season than the default (current season)
c = Players(Season="1999-00")
# A possible needed change, with 2 possible desired interface
c = Players(Season="1999-00", header={"Referer": "https://www.another-website.com/"})
c = Players(Season="1999-00").header(Referer="https://www.another-website.com/")
# Final outputs
c.request_data().to_csv("downloads/my_data.csv")
c.request_data().to_sql("table-name")
这是http header
,api params
和request> request opect object
如下(运行这些一起返回一些数据):
HTTP_HEADER = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Host": "stats.nba.com",
"Origin": "https://www.nba.com",
"Referer": "https://www.nba.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"Sec-GPC": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
"x-nba-stats-origin": "stats",
"x-nba-stats-token": "true",
}
params = {'IsOnlyCurrentSeason': 0, 'LeagueID': '00', 'Season': '2021-22'}
r = requests.get("https://stats.nba.com/stats/commonallplayers", # base url
params=params, # expects (#2) params, the api parameters to be a dict
headers=headers) # expects (#1) headers to be a dict
r.json()
I would like to write a scraper that has 3 different "groups" of attributes (or data) that would [and likely should] be kept separately.
I was hoping to use DataClasses and aim at Pythonic practices, but DataClasses don't feel appropriate for reasons stated in more detail later.
The 3 groups [or "interfaces"] are as follows:
#1: HTTP Header fields
- has defaults, but needs to be mutable at/after object instantiation of the (#3) request class object
- ideally acts like a dict when using a request method inside #3 request object
#2: API parameters
for the URL query request
- has defaults, but also needs to be mutable at/after instantiation
- ideally acts like a dict when using a request method inside #3 request object
#3:Response Object
(the data) after the request is returned to the user from the API server.
- I would later implement methods for the object to have output formats such as
CSV
,JSON
,SQL DB
,S3
, etc. That would be [at least] a 4th interface.
The Task I've been trying to Accomplish
I want an interface where a user can instantiate a class, e.g. Player
with the API params they need and are also update HTTP header (as needed).
Here's my current code (pic form):
The HTTP Header
and the API Params
are both easily stored as Python dicts
(or JSON). I have included them below.
=> The question is how do I make them mutable in the Request object (Class) at instantiation (creation) and able to be updated after instantiation (creation)?
Inheritance via DataClassses? I have tried to put these dictionaries in DataClasses but they don't like them link as it's a hack to try to get around the
default_factory
usingfield
from the dataclass module. It's possible, but defeats using Dataclasses to avoid all the extra syntax. Using Dataclasses also makes it so theMyDataClass.__dict__
has way more stuff to it thanPythonClass.__dict__
.
=> Thus use a regular Python Class or Dict...Using a Regular Python Class: There seems to be two options to allow mutability of the HTTP Header at creation. 1)
Inheritance
, but that muddies the waters of the attributes of the HTTP Header with the API Params. 2)Composition
, setting an attribute field to the HTTPClassHeader and doing some work to be able to convert back to adict
to use in therequest_data()
method.Putting the Dicts into the Players (Request Class) doesn't allow mutability via a nice keyword interface (or I'm not aware how to implement it).
Here's my code in text form:
class Players:
__endpoint__ = "CommonallPlayers"
def __init__(self, IsOnlyCurrentSeason=0, LeagueID="00", Season="2021-22", header= HTTPHeader) -> None:
# these first 3 attributes constitute the (#2) API Params
self.IsOnlyCurrentSeason = IsOnlyCurrentSeason
self.LeagueID = LeagueID
self.Season = Season
self.header = HTTPHeader # (1) inherit as a Class or Dict?
def encode_api_params(self):
return self.__dict__ # if only 3 attributes, this works, but not if I add more attributes HTTP or self.request_data
def get_http_header(self):
# ideally can return the http_header as a dict
pass
# ideally this is NOT instantiated (as doesn't have data, shouldn't be accessible to user until AFTER request)
def request_data(self):
url_api = f"{BASE_URL}/{self.__endpoint__}"
return requests.get(url_api,
params=self.encode_api_params(),
headers=self.get_http_header())
# works, has current defaults (current season)
c = Players()
# a common use case, using a different Season than the default (current season)
c = Players(Season="1999-00")
# A possible needed change, with 2 possible desired interface
c = Players(Season="1999-00", header={"Referer": "https://www.another-website.com/"})
c = Players(Season="1999-00").header(Referer="https://www.another-website.com/")
# Final outputs
c.request_data().to_csv("downloads/my_data.csv")
c.request_data().to_sql("table-name")
Here's the HTTP HEADER
, the API Params
, and Request Object
in the simplest form are as follows (running these together would return some data):
HTTP_HEADER = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Host": "stats.nba.com",
"Origin": "https://www.nba.com",
"Referer": "https://www.nba.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"Sec-GPC": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
"x-nba-stats-origin": "stats",
"x-nba-stats-token": "true",
}
params = {'IsOnlyCurrentSeason': 0, 'LeagueID': '00', 'Season': '2021-22'}
r = requests.get("https://stats.nba.com/stats/commonallplayers", # base url
params=params, # expects (#2) params, the api parameters to be a dict
headers=headers) # expects (#1) headers to be a dict
r.json()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论