为需要 30 分钟执行的脚本制作 Web 界面

发布于 2024-08-18 11:23:56 字数 1161 浏览 3 评论 0原文

我编写了一个 python 脚本来处理 CSV 文件中的一些数据。该脚本需要 3 到 30 分钟才能完成,具体取决于 CSV 的大小。

现在我想为此添加一个 Web 界面,这样我就可以从任何地方上传 CSV 数据文件。我编写了一个基本的 HTTP POST 上传页面并使用了 Python 的 CGI 模块 - 但脚本在一段时间后就超时了。

该脚本在开始时输出 HTTP 标头,并在迭代 CSV 的每一行后输出数据位。例如,此 print 语句将每 30 秒左右触发一次。

# at the very top, with the 'import's
print "Content-type: text/html\n\n Processing ... <br />"

# the really long loop.
for currentRecord in csvRecords:
    count = count + 1
    print "On line " + str(count) + " <br />"

我假设浏览器会接收标头,并等待,因为它不断接收少量数据。但实际发生的情况似乎是它根本没有接收到任何数据,并且当给定包含大量行的 CSV 时,错误 504 会超时。

也许某个地方发生了一些缓存?从日志来看,

[Wed Jan 20 16:59:09 2010] [error] [client ::1] Script timed out before returning headers: datacruncher.py, referer: http://localhost/index.htm
[Wed Jan 20 17:04:09 2010] [warn] [client ::1] Timeout waiting for output from CGI script /Library/WebServer/CGI-Executables/datacruncher.py, referer: http://localhost/index.htm

解决此问题的最佳方法是什么,或者在浏览器中运行此类脚本是否不合适?

编辑: 这是一个供我自己使用的脚本 - 我通常打算在我的计算机上使用它,但我认为基于网络的界面可以在旅行时派上用场,或者例如通过电话。另外,实际上没有什么可下载的 - 脚本很可能会在最后通过电子邮件发送一份报告。

I wrote a python script to process some data from CSV files. The script takes between 3 to 30 minutes to complete, depending on the size of the CSV.

Now I want to put in a web interface to this, so I can upload the CSV data files from anywhere. I wrote a basic HTTP POST upload page and used Python's CGI module - but the script just times out after some time.

The script outputs HTTP headers at the start, and outputs bits of data after iterating over every line of the CSV. As an example, this print statement would trigger every 30 seconds or so.

# at the very top, with the 'import's
print "Content-type: text/html\n\n Processing ... <br />"

# the really long loop.
for currentRecord in csvRecords:
    count = count + 1
    print "On line " + str(count) + " <br />"

I assumed the browser would receive the headers, and wait since it keeps on receiving little bits of data. But what actually seems to happen is it doesn't receive any data at all, and Error 504 times out when given a CSV with lots of lines.

Perhaps there's some caching happening somewhere? From the logs,

[Wed Jan 20 16:59:09 2010] [error] [client ::1] Script timed out before returning headers: datacruncher.py, referer: http://localhost/index.htm
[Wed Jan 20 17:04:09 2010] [warn] [client ::1] Timeout waiting for output from CGI script /Library/WebServer/CGI-Executables/datacruncher.py, referer: http://localhost/index.htm

What's the best way to resolve this, or, is it not appropriate to run such scripts in a browser?

Edit:
This is a script for my own use - I normally intend to use it on my computer, but I thought a web-based interface could come in handy while travelling, or for example from a phone. Also, there's really nothing to download - the script will most probably e-mail a report off at the end.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

穿透光 2024-08-25 11:23:56

我会像这样分开工作:

  1. 接受 POSTed CSV 文件的 Web 应用程序 URL。 Web 应用程序将 CSV 内容放入离线队列中,例如数据库表。 Web 应用程序的响应应该是排队项目的唯一 ID(例如,使用自动递增 ID 列)。客户端必须存储第 3 部分的此 ID。

  2. 一个独立的服务应用程序,用于轮询队列中的工作并进行处理。处理完成后,将结果存储在另一个数据库表中,使用唯一 ID 作为键。

  3. 可以获取处理结果的 Web 应用程序 URL,http://server/getresults/uniqueid/。如果处理完成(即在结果数据库表中找到唯一ID),则返回结果。如果未完成,响应应该是指示这一点的代码。例如,自定义 HTTP 标头、HTTP 状态响应、响应正文“PENDING”或类似内容。

I would separate the work like this:

  1. A web app URL that accept the POSTed CSV file. The web app puts the CSV content into an off line queue, for instance a database table. The web app's response should be an unique ID for the queued item (use an auto-incremented ID column, for instance). The client must store this ID for part 3.

  2. A stand-alone service app that polls the queue for work, and does the processing. Upon completion of the processing, store the results in another database table, using the unique ID as a key.

  3. A web app URL that can get processed results, http://server/getresults/uniqueid/. If the processing is finished (i.e. the unique ID is found in the results database table), then return the results. If not finished, the response should be a code that indicates this. For instance a custom HTTP header, a HTTP status response, response body 'PENDING' or similar.

爱已欠费 2024-08-25 11:23:56

我以前也遇到过这种情况,并且我使用了cronjobs。 HTTP 脚本只会在队列中写入要执行的作业(数据库或目录中的文件),而 cronjob 将读取它并执行该作业。

I've had this situation before and I used cronjobs. The HTTP script would just write in a queue a job to be performed (a DB or a file in a directory) and the cronjob would read it and execute that job.

胡大本事 2024-08-25 11:23:56

您可能需要执行 stdout.flush() ,因为在您写入页面缓冲区的数据之前,脚本并未真正向网络服务器写入任何内容 - 这并没有发生在超时之前。

但解决这个问题的正确方法是,正如其他人所建议的那样,在单独的线程/进程中进行处理,并向用户显示一个自动刷新的页面,其中显示状态,并带有进度条或其他一些精美的视觉效果来保持它们因为无聊。

You'll probably need to do a stdout.flush(), as the script isn't really writing anything yet to the webserver until you've written a page buffer's worth of data - which doesn't happen before the timeout.

But the proper way to solve this is, as others suggested, to do the processing in a separate thread/process, and show the user an auto-refreshed page which shows the status, with a progress bar or some other fancy visual to keep them from being bored.

在巴黎塔顶看东京樱花 2024-08-25 11:23:56

请参阅 Randal Schwartz 的通过 CGI 观察长进程。本文使用 Perl,但该技术不依赖于语言。

See Randal Schwartz's Watching long processes through CGI. The article uses Perl, but the technique does not depend on the language.

只是偏爱你 2024-08-25 11:23:56

Very similar question here. I suggest spawning off the lengthy process and returning an ajax based progress bar to the user. This way they user has the luxury of the web-interface and you have the luxury of no time-outs.

从来不烧饼 2024-08-25 11:23:56

恕我直言,最好的方法是运行一个独立的脚本,将更新发布到某个地方(平面文件、数据库等)。我不知道如何从 python 分叉一个独立的进程,所以我不能给出任何代码示例。

要在网站上显示进度,请对页面执行 ajax 请求,该请求会读取这些状态更新,例如显示一个漂亮的进度条。

添加 setTimeout("refreshProgressBar[...]) 或元刷新之类的内容以进行自动刷新。

imho the best way would be to run an independent script which posts updates somewhere (flat file, database, etc...). I don't know how to fork an independent process from python so i can't give any code examples.

To show progress on a WebSite implement an ajax request to a page that reads those status updates and for example shows a nice progress bar.

Add something like setTimeout("refreshProgressBar[...]) or meta-refresh for auto-refresh.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文