ASP.NET - 在服务器上解压缩文件的网页请求

发布于 2024-07-21 05:59:34 字数 352 浏览 5 评论 0原文

我们使用 SharpZipLib。我们需要能够在服务器上解压缩文件并将它们放在单独的文件夹中。解压缩文件的请求将来自网页上的用户。我想如果文件足够大，解压需要很长时间。 我们不希望用户在等待解压缩完成以继续浏览网站时卡在页面上。

处理这种情况的好方法是什么：分出一个不同的线程来执行关心解压缩文件，创建一个单独的 Windows 服务来解压缩文件，或者......什么？

通过单独的线程或窗口服务执行此操作有哪些优点和缺点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

江湖正好 2024-07-28 05:59:34

单独流程的优点
在单独的进程中完成的工作可以在时间上、物理上以及从安全的角度与页面流解耦。及时解耦：如果您选择，您可以缓冲解压缩内容的请求，直到“稍后”负载较低并且您有空闲的 cpu 周期来执行此操作。

物理上也解耦；对于大型系统，您可以拥有多个工作进程，甚至可以部署在多台独立的计算机上，异步执行此工作，并且该处理层可以独立于网页处理进行扩展。任何系统都存在瓶颈，分布式部署的优点是您可以独立扩展单独的工作负载，以更有效地消除瓶颈。

但我想说，后一个好处仅在非常非常大的系统中有用。在大多数情况下，您不会拥有从独立的物理扩展层中受益的交易量。这不仅适用于您的工作负载，而且适用于 98% 的所有工作负载。 YAGNI 原则也适用于可扩展性。

物理解耦还允许独立开发不同的工作负载（页面流和 zip 解包）。换句话说，假设工作项不是简单的“解压缩文件”，而是更复杂的东西，一路上有多个步骤和决策点。在单独的进程中设计工作处理器允许独立于工作项处理来构建和测试页面流。如果它们必须独立发展，这可能是一个很好的优势。

如果工作项通过不同的渠道到达，这种物理解耦也很好。假设网页不是工作项到达的唯一方式。假设您有一个 ftp drop、一个 Web 服务或一个也可以接收工作项的机器监控的电子邮件箱。在这种情况下，将工作项处理与网页处理物理分离是有意义的。

最后，这些东西在运行时的安全性上是解耦的。在某些 Web 应用服务器部署中，安全规则禁止 Web 服务器写入磁盘 - Web 服务器没有可写磁盘存储。单独的异步工作进程可以部署在网络的单独部分，具有充足的存储空间，并且可能受到一组单独的安全要求的限制。这可能适用也可能不适用于您。

线程处理的优点
在单独的线程中完成工作的优点是它更简单。解耦会带来复杂性和成本。在单独的线程中管理工作，您就没有管理单独进程（可能是单独机器）的任何操作开销。没有额外的配置，没有新的构建/部署步骤。无需额外备份。无需维护额外的安全身份。无需担心通信交换（超出线程调度）。

您可以选择对工作项处理进行更复杂的处理，并且可以选择在 zip 文件看起来足够小时同步执行工作。假设您建立了 4 秒响应时间的阈值 - 高于该阈值，您需要异步工作负载，低于 4 秒，则“内联”执行。当然，您永远无法确定 zip 文件需要多长时间，但您可以根据文件的大小建立一个良好的启发式方法。无论您使用外部进程进行异步工作还是使用单独的线程，都可以使用此优化，但说实话，使用单独的线程时利用优化更简单。更少的额外工作要做。所以这是线程方法的一个优点。

非差异化因素
如果您选择使用 AJAX 轮询机制来通知工作项状态，那么这将适用于单独的进程或单独的线程。我不知道你将如何进行工作项跟踪，但我假设当一个特定的工作项（zip 文件？）完成时，你将更新某处的记录 - 文件系统中的文件，数据库中的表。无论更新是由同一进程中的线程还是由单独的进程（Windows 服务）完成，都会发生该更新。因此，无论您的架构决策如何，轮询的 AJAX 客户端在任何情况下都只会检查数据库表或文件系统，并以相同的方式获取工作项状态的通知。

如何决定
这个理论很有趣，但最终毫无用处，没有实际操作的限制。

工作负载是现实世界中的关键项目之一。您没有说这些 zip 文件有多大，但我猜它们是“常规大小”。大约 4GB 或更少。通常，在我的笔记本电脑上解压这样的 zip 文件需要 20-60 秒，但当然在具有真正存储系统和更快 CPU 的服务器上，时间会更少。您也没有描述事务的并发性 - 任何时候会发生多少这样的事情。我假设并发性不是特别高。

如果是这种情况，我会坚持使用更简单的异步线程方法。我猜你是在 ASP.NET 中在服务器操作系统上执行此操作。 CLR具有良好的线程管理能力，ASP.NET具有良好的进程横向扩展能力。因此，即使在高工作负载下，您也可以获得良好的 CPU 利用率和规模，而无需进行大量配置工作。

如果工作项运行时间较长 - 比如说几个小时甚至几天，并且时间是不可预测的（例如股票订单的关闭） - 那么在这种情况下我会倾向于异步流程。如果并发性达到每秒数千，或者再次非常不可预测，那么也会建议使用单独的进程。如果故障模式足够复杂，我可能希望将工作项放在单独的进程中以对其进行管理。如果工作项处理可能会定期更改（根据不断变化的业务条件添加额外的步骤），我可能希望将其放在单独的流程中。

但在你的情况下，这些事情似乎都不是真的——解压 zip 文件。

Advantages of a separate process
Work done in a separate process can be decoupled in time, as well as physically, and from a security standpoint, from the page flow. Decoupled in time: If you choose, you can buffer the requests to unzip things until "later" when load is lower and when you have spare cpu cycles to do it.

Also decoupled physically; for a large scale system, you could have multiple worker processes, even deployed on multiple independent machines, doing this work asynchronously, and that layer of processing can scale independently of the web page processing. In any system there are bottlenecks, and the advantage of distributed deployments is you can scale the separate workloads independently, to more efficiently eliminate bottlenecks.

I would say though, that this latter benefit is only useful in very very large scale systems. In most cases you won't have the kind of transaction volume that would benefit from an independent physical scaling layer. This is true not just of your workload, but of 98% of all workloads. The YAGNI principle applies to scalability, too.

Physical decoupling also allows the disparate workloads (page flow and zip unpack) to be developed independently. In other words, supposing the workitem was not a simple "unzip a file" but was something more complex, with multiple steps and decision points along the way. Designing the work processor in a separate process allows the page flow to be built and tested independently from the workitem processing. This can be a nice advantage if they have to evolve independently.

This physical decoupling is also nice if workitems will arrive via different channels. Suppose the web page is not the only way for a workitem to arrive. Suppose you have an ftp drop, a web service, or a machine-monitored email box that can also receive workitems. In that cases it would makes sense to have the workitem processing physicall decoupled from the web page processing.

Finally, these things are decoupled in security at runtime. In some web app server deployments, security rules prohibit the web server from writing to the disk - web servers have no writable disk storage. A separate asynch worker process can be deployed in a separate part of the network, with plenty of storage and it perhaps is constrained by a separate set of security requirements. This may or may not be applicable to you.

Advantages of Threaded processing
The advantage of doing the work in a separate thread is that it is much simpler. Decoupling introduces complexity and cost. Managing the work in a separate thread, you don't have any of the operational overhead of managing a separate process, potentially a separate machine. There's no additional configuration, no new build/deployment step. No additional backup. No additional security identity to maintain. No communication interchange to worry about (beyond the thread dispatch).

You could choose to get a little more sophisticated about workitem processing, and optionally do the work synchronously when the zipfile looks small enough. Suppose you establish a threshold of 4 seconds response time - above that, you need asynchronous workload, below 4 seconds, you do it "inline". Of course you never know for sure how long a zipfile will take, but you couldd establish a good heuristic based on the size of the file. This optimization is available to you whether you use an external process for async work, or a separate thread, but to be honest, it is simpler to take advantage of the optimization when using a separate thread. Less additional work to do. So this is an advantage for the threaded approach.

Non Differentiators
If you choose to have an AJAX polling mechanism for notification of workitem status, that would work with either the separate process or the separate thread. I don't know how you would do work item tracking, but I would suppose that when a particular work item (zip file?) is completed, then you will update a record somewhere - a file in a filesystem, a table in a database. That update happens whether it is being done by a thread in the same process, or by a separate process (Windows Service). So the AJAX client that polls will just check the db table or filesystem in any case, and will get the notification of workitem status in the same way, regardless of your architecture decision.

How to decide
The theory is interesting but ultimately useless, without actual operating constraints.

Workload is one of the key real-world items. You didn't say how large these zip files are, but I am guessing they are "regular sized". Something about 4gb or less. Normally a zipfile like that takes 20-60 seconds to unpack on my laptop, but of course on a server with a real storage system and faster CPU, it will be less. You also did not characterize the concurrency of transactions - how many of these things will be happening at any one time. I'm assuming concurrency is not particularly high.

If that is the case, I would stick to the simpler async thread approach. You are doing this in ASP.NET, I presume on a server OS. The CLR has good thread management, and ASP.NET has good process scale-out capability. So even in high workloads, you will get good CPU utilization and scale, without a ton of configuration effort.

If the workitems were longer running - let's say on the order of hours or even days, and the time was unpredictable (like the closing of a stock order) - well in that case I would lean toward an async process. If the concurrency was in the thousands per second, or again very unpredictable, that also would recommend a separate process. If the failure modes were complex enough, I might want the workitems to be in a separate process just to manage it. If the workitem processing were likely to change regularly (adding an additional step, according to evolving business conditions), I might want it in a separate process.

But none of those things seem to be true in your case - unpacking zip files.

回复收藏 0 原文