在 Heroku 中使用 Selenium 以及 Python、FastAPI 和 Celery 时,错误 R14(超出内存配额)导致 TimeoutException
我构建了一个抓取器,可以从页面收集数据,对其进行格式化并将其添加到数据库中。然后,它使用抓取的数据来构建模型,但抓取的一个值除外。一切都包含在 Celery 中,以便任务在后台运行。
@router.post("/run/{id}")
async def create(id: str):
wallet_reputation.delay(id)
return {"Status": "Task successfully add to execute"}
上面的端点工作正常,一切正常。在上述端点中添加的 ID 值是唯一的,大约有 100 个这样的值。为了自动为每个 ID 构建模型,我创建了这样一个端点来不时调用它(抓取数据更改,因此我需要更新我的模型)。
@router.post("/run")
async def create_all():
for address in all_addresses_generator():
wallet_reputation.delay(address)
return {"Status": "Tasks successfully add to execute"}
我收到该错误,
2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723+00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724+00:00 app[worker.1]: R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725+00:00 app[worker.1]: return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725+00:00 app[worker.1]: File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725+00:00 app[worker.1]: WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727+00:00 app[worker.1]: nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727+00:00 app[worker.1]: WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728+00:00 app[worker.1]: raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728+00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>
我不明白为什么如果在 Celery 中执行相同任务的前一个端点正常工作,我会突然收到错误。下面,我粘贴了生成器和类方法的代码,其中弹出了错误。
def all_addresses_generator():
for row in session.query(DbNcTransaction).all():
yield row.to
def nc_balance(self):
base_url = "https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d?a="
self.driver.get(base_url + self.address)
nc_balance = (
WebDriverWait(self.driver, 20)
.until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "#ContentPlaceHolder1_divFilteredHolderBalance")
)
)
.text
)
nc_balance = nc_balance.split()[1]
nc_balance = round(float(nc_balance.replace(",", "")), 2)
return nc_balance
我该如何处理这个问题?
I built a scraper that collects data from a page, formats it and adds it to a database. It then uses the scraped data to build models, except for one value that it scrapes. Everything is wrapped in Celery so that tasks run in the background.
@router.post("/run/{id}")
async def create(id: str):
wallet_reputation.delay(id)
return {"Status": "Task successfully add to execute"}
Endpoint above works fine, everything is ok. The ID value that is added in the above endpoint is unique and there are about 100 such values. In order to automate building a model for each ID I made such an endpoint to call it from time to time (scrape data changes, hence I need to update my models).
@router.post("/run")
async def create_all():
for address in all_addresses_generator():
wallet_reputation.delay(address)
return {"Status": "Tasks successfully add to execute"}
I recive that error
2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723+00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724+00:00 app[worker.1]: R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725+00:00 app[worker.1]: return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725+00:00 app[worker.1]: File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725+00:00 app[worker.1]: WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727+00:00 app[worker.1]: nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727+00:00 app[worker.1]: WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728+00:00 app[worker.1]: raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728+00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>
I don't understand why I suddenly get an error if the previous endpoint that performs the same task in Celery works normally. Below, I paste the code of the generator and class method, on which the error pops up.
def all_addresses_generator():
for row in session.query(DbNcTransaction).all():
yield row.to
def nc_balance(self):
base_url = "https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d?a="
self.driver.get(base_url + self.address)
nc_balance = (
WebDriverWait(self.driver, 20)
.until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "#ContentPlaceHolder1_divFilteredHolderBalance")
)
)
.text
)
nc_balance = nc_balance.split()[1]
nc_balance = round(float(nc_balance.replace(",", "")), 2)
return nc_balance
How can I deal with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尽管这个答案可能很简单……我花了很长时间才想出来。 Heroku 无法处理超过 30 秒的请求。这就是您收到 TimeoutException 的原因。
了解更多https://devcenter.heroku.com/articles/request-timeout
:
使用其他平台进行部署
As simple as this answer might be... It took me long to figure. Heroku cannot process a request for more than 30 seconds. This is why you're getting TimeoutException.
Read more https://devcenter.heroku.com/articles/request-timeout
SOLUTION:
Use another platform to deploy
此错误消息...
...表示 timeOutException 是由于您的程序超过了内存而引起了错误初始化
forkpoolworker-8
forkpoolworker-8 配额。深潜水
这是 错误超过内存超过 最大水平。
现在,在使用 543m 的过程中,内存使用率为 104.1% ,并且大概是按照
Dynos
heroku平台使用容器模型运行和扩展所有Heroku应用程序和容器称为 dynos 。 Dynos是隔离的,虚拟化的Linux容器,这些容器旨在根据用户指定的命令执行代码。应用程序可以根据其资源需求扩展到任何指定数量的Dynos。
error r14(超过内存配额)
有时是dyno可能需要超过其分配的配额的内存。在那些特殊情况下,Dyno将在交换空间以继续运行的情况下页面,这有时可能导致流程性能退化。这种现象可以开始生成R14错误,该错误是由总内存交换,RSS和缓存计算的,如下所示:
在这些情况下解决R14内存错误
,您可能希望您的应用程序更少使用内存,并且您可能需要调整以下提到的因素之一:
通常会添加额外的能力,因为更多的服务器/Dynos出现了,在操作中散布请求和事件,即单个机器上所有线程正在同时处理最大请求。但是,从长远来看,减少整体内存需求的最佳路径是减少对象分配。
本用户酶中的本用户酶
似乎根据第一个代码块IE
def create(id:str)
,用于大约100个ID值,以自动为每个ID构建一个模型,您的应用程序能够扩展,但随后是当您def create_all()
时,您开始看到错误。解决方案
您可以采用不同的方法,而不是为GO中的每个ID创建所有模型。如果可能的话,将ID值与包含最佳型号数量的每个批量运行的ID值分组运行,以使内存使用情况不会跨越阈值。
This error message...
...implies that TimeoutException was raised as there was an error initializing
ForkPoolWorker-8
as your program exceeded the Memory quota.Deep Dive
This is a classic example of Out of Memory error where the memory usage have exceeded the maximum level.
Now during the usage of 543M the memory usage is 104.1% and presumably as per the Dyno memory specs you must be using:
Dynos
The Heroku Platform uses the container model to run and scale all the Heroku apps and the containers are called dynos. Dynos are isolated, virtualized linux containers that are designed to execute code based on a user-specified command. Apps can scale to any specified number of dynos based on its resource demands.
Error R14 (Memory quota exceeded)
At times a dyno may require memory in excess of its assigned quota. In those exceptional cases the dyno will page to swap space to continue running which may at times cause degraded process performance. This phenomenon can start generating the R14 error which is calculated by total memory swap, rss and cache as follows:
Resolving R14 memory error
In these scenarios you may like your application to use less memory and you may need to tweak one of the below mentioned factors:
Generally adding capacity works perfecto as more servers/dynos comes into operation spreading out the requests and the event that all threads on an individual machine are processing the largest request at the same time is reduced. However in the long run the optimum path to reducing your overall memory requirement is reducing object allocation.
This usecase
In this usecase it seems as per the first code block i.e.
def create(id: str)
for about 100 ID values to automate building a model for each ID your application is able to scale up but subsequently when youdef create_all()
you start seeing the error.Solution
You can adopt a different approach other than creating all the models for each ID in go. If possible divide the ID values to run in batch with each batch containing optimum number of model so the memory usage doesn't crossover the threshhold.
这个问题不是(最初)使用硒升高
timeoutexception
,而是heroku risingr14-内存配额超过
错误,如您提供的错误日志的第二行所示。您的应用程序的RAM使用已超过可用配额。由于您正在使用a free dyno ,最大ram(配额)为512 MB(请参阅在这里)。但是,您的应用程序 - 如错误日志的第一行(即,运行mem = 543m(104.1%)
)所示 - 需要更多的金额。因此,您可以尝试减少工人数量(如果您使用的是多个),或者减少应用程序的RAM使用情况,或者升级到其他 heroku dyno (请参阅如何从Heroku的免费层升级)。
此外,更新
是优选的,可以实例化
WebDriverWait
一次(在启动),而不是多次(您也可能需要增加timeout
WebDriverWait
中的值:然后使用AS:
The issue is not (initially) with Selenium raising
TimeoutException
, but with Heroku raisingR14 - Memory quota exceeded
error, as shown at the second line of the error log you provided. The RAM usage of your application has exceeded the available quota. Since you are using a free dyno, the maximum RAM (quota) is 512 MB (see here). However, your application - as shown at the first line of the error log (i.e.,Process running mem=543M(104.1%)
) - requires more than that amount.Thus, you may try either reducing the number of workers (in case you are using more than one), or reducing the RAM usage of your app, or upgrading to a different Heroku Dyno (see How do I upgrade from Heroku's free tier).
Update
Additionally, it would be preferable to instantiate the
WebDriverWait
once (at startup), not multiple times (you may also need to increase thetimeout
value inWebDriverWait
):and then use as: