通过 XML API 处理大量数据
所以,我在这里搜索了一些,但找不到任何好的东西,如果我的搜索功能不够,我深表歉意......
所以,我今天是我的用户使用上传一个 CSV 文本文件表单到我的 PHP 脚本中,然后在验证文件中的每一行后将该文件导入到数据库中。该文本文件可容纳约 70,000 行长,每行包含 24 个值字段。自从处理此类数据以来,这显然不是问题。每一行都需要验证,并且我检查数据库中是否有重复项(根据数据生成的动态键)以确定是否应该插入或更新数据。
是的,但我的客户现在为此请求一个自动 API,这样他们就不必手动创建和上传文本文件。当然可以,但是我该怎么做呢?
如果我使用 REST 服务器,如果一个请求包含要插入 70k 个帖子的 XML,那么内存很快就会耗尽,所以这几乎是不可能的。
那么,我应该怎么做呢?我考虑过三个选项,请帮助医学决定或在列表中添加更多选项
每个请求一个帖子。并非所有客户端都有 70k 个帖子,但对数据库的更新可能会导致 API 在短时间内处理 70k 个请求,而且可能每天都会处理。
每个请求的帖子数量。对 API 每个请求处理的帖子数量设置限制,例如一次 100 个。这意味着 700 个请求。
API 要求客户端脚本上传 CSV 文件,以便使用当前例程导入。这看起来“脆弱”并且不太现代。
还有其他想法吗?
So, I searched some here, but couldn't find anything good, apologies if my search-fu is insufficient...
So, what I have today is that my users upload a CSV text file using a form to my PHP script, and then I import that file into a database, after validating every line in it. The text file can be put to about 70,000 lines long, and each lines contains 24 fields of values. This is obviously not a problem since dealing with that kind of data. Every line needs to be validated plus I check the DB for duplicates (according to a dynamic key generated from the data) to determine if the data should be inserted or updated.
Right, but my clients are now requesting an automatic API for this, so they don't have to manually create and upload a text file. Sure, but how would I do it?
If I were to use a REST server, memory would run out pretty quickly if one request contained XML for 70k posts to be inserted, so that's pretty much out of the question.
So, how should I do it? I have thought about three options, please help med decide or add more options to the list
One post per request. Not all clients have 70k posts, but an update to the DB could result in the API handling 70k requests in a short period, and it would probably be daily either way.
X amount of posts per request. Set a limit to the number of posts that the API deals with per request is set to, say, 100 at a time. This means 700 requests.
The API requires for the client script to upload a CSV file ready to import using the current routine. This seems "fragile" and not very modern.
Any other ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您阅读了 SAX 处理 http://en.wikipedia.org/wiki/Simple_API_for_XML 并HTTP 块编码 http://en.wikipedia.org/wiki/Chunked_transfer_encoding 您将看到在发送 XML 文档时解析该文档应该是可行的。
If you read up on SAX processing http://en.wikipedia.org/wiki/Simple_API_for_XML and HTTP Chunk Encoding http://en.wikipedia.org/wiki/Chunked_transfer_encoding you will see that it should be feasible to parse the XML document whilst it is being sent.
我现在已经通过将每个请求限制为 100 个帖子来解决这个问题,并且我通过 PHP 使用 REST 来处理数据。上传 36,000 个帖子需要大约两分钟的时间进行所有验证。
I have now solved this by imposing a limit of 100 posts per request, and I am using REST through PHP to handle the data. Uploading 36,000 posts takes about two minutes with all the validation.
首先不要为此使用 XMl!使用JSON,它比xml最快。
我在我的项目中使用从 xls 导入。文件非常大,但脚本工作正常,只是客户端必须创建具有相同结构的文件才能导入
First of all don't use XMl for this! Use JSON, it is fastest than xml.
I Use on my project import from xls. file is very large, but script work fine, just client must create files with same structure for import