批量上传 2 GB 数据库所需的 CPU 时间?

发布于 2024-11-24 01:32:04 字数 315 浏览 5 评论 0原文

我聘请了一名程序员将我的网站(最初使用 Django 和 MySQL 实现)移植到 Google App Engine。原始 Web 应用程序的数据库大小约为 2 GB,最大的表有 500 万行。据我了解,为了移植这些内容,程序员将数据库序列化为 JSON,然后将其上传到 Google 应用引擎。
到目前为止,根据 GAE 的统计,他的上传已使用了 100 小时的 CPU 时间,但看起来只有大约 50 或 100 MB 已加载到数据库中。对于如此少量的数据来说,CPU 时间是否合理? MySQL 可以在几分钟内加载这么多数据,所以我不明白为什么 GAE 会慢 1000 倍。他是不是在做一些低效的事情?

I hired a programmer to port my web site -- originally implemented using Django and MySQL -- over to Google App Engine. The database for the original web app is about 2 GB in size, and the largest table has 5 million rows. To port these contents over, as I understand it, the programmer is serializing the database to JSON and then uploading it to Google app engine.
So far his uploading has used 100 hours of CPU time, as billed by GAE, yet it looks like only about 50 or 100 MB has been loaded into the database. Is that a reasonable amount of CPU time for such a small amount of data? MySQL could load this much data in a few minutes, so I don't understand why GAE would be 1000x slower. Is he doing something inefficiently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

于我来说 2024-12-01 01:32:04

这看起来很高,而且很可能他们让服务器做了很多可以在客户端完成的工作(解码 JSON、编码和存储实体)。 SDK 中已经提供了一个bulkloader,如果由于某种原因不适合,bulkloader 所基于的remote_api 提供了比您自己构建更有效的选项。

That seems high, and it's likely they're making the server do a lot of work (decoding the JSON, encoding and storing the entities) that could be done on the client. There's already a bulkloader provided with the SDK, and if that isn't suitable for some reason, remote_api, on which the bulkloader is based, provides a more efficient option than rolling your own.

如梦 2024-12-01 01:32:04

我已经批量加载了 1 GB 的数据,但是我编写了自己的批量加载模块(基于他们定义的接口),并且花费了 25 小时的 CPU 时间。

有关详细信息,您可以查看 App Engine 批量加载器性能

I have bulk loaded a GB of data, however i wrote my own bulk load module (based on the interfaces they defined), and it took 25 hours of CPU time.

For more info, you could take a look at App Engine Bulk Loader Performance

一个人的旅程 2024-12-01 01:32:04

这在很大程度上取决于他如何序列化数据。我强烈怀疑他正在做一些效率低下的事情,是的,这对于这么多数据来说是荒谬的。您的低效率可能在于每个查询的传输时间和开始/停止时间。如果他序列化每一行并将其一次一个地发布到处理程序,那么我完全可以理解它既要花很长时间,又要消耗大量的 cpu 时间。

That depends a great deal on how he's serializing the data. I STRONGLY suspect that he's doing something inefficient as yes, that's ludicrous for that amount of data. Your inefficiency probably lies in the transfer time and the start/stop time for each query. If he's serializing each row and posting it to a handler one at a time then I could totally understand it both taking forever and consuming a lot of cpu time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文