批量数据处理技术

发布于 2024-12-07 17:01:11 字数 230 浏览 0 评论 0 原文

我正在寻找一种技术来执行以下操作,我需要您的建议。 我有一个巨大的(真正的)表格,其中包含注册 ID,我需要向这些 ID 所有者发送消息。我无法一次将消息发送给多个收件人,这需要逐一进行。所以我想要一个脚本(php),它可以通过从数据库获取一些量并处理它来在许多并行实例(进程)中运行。换句话说,每个流程都需要处理特定范围的数据。我还想停止每个进程,并能够继续从停止的用户向另一组尚未收到消息的用户发送消息。 如果可以的话?欢迎任何提示和建议。

I'm looking for a technique to do the following and I need your advices.
I have a huge (really )table with registration ids and I need to send messages to these ID owners. I cant send the message to many recipients at once, this needs to be proceeded one by one. So I would like to have a script(php) which can run in many parallel instances (processes) by getting some amount from db and processing it. In other words every process needs to work with a particular range of data. I would like also to stop each process and to be able to continue message sending from the stopped user to another set of users who didnt get the message yet.
If it's possible? Any tips and advices are welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小嗲 2024-12-14 17:01:11

您可能希望设置一个 cron 作业,这通常是使用 PHP 脚本运行大批量操作的最佳方法之一:

http://www.developertutorials.com/tutorials/php/running-php-cron-jobs-regular-scheduled-tasks-in-php-172/

你的 cron 作业需要指向执行以下操作的 PHP 脚本:

  1. 根据
    标志设置为 #3(如下),标识下一批要处理的
  2. 向选定的收件人发送电子邮件
  3. 保存当前职位成功/失败的注释(即您可以设置
    数据库中成功邮寄的每个收件人旁边的标志,重新运行作业时不会选择这些)

You may wish to set a cron job, typically one of the best approaches to run large batch operations with PHP scripts:

http://www.developertutorials.com/tutorials/php/running-php-cron-jobs-regular-scheduled-tasks-in-php-172/

Your cron job will need to point to a PHP script which does the following:

  1. Selects a subset of recipients from your large DB table, based on a
    flag set at #3 (below), identifying the next batch to process
  2. Send email to those selected recipients
  3. Saves a note of current job position success/fail (i.e. you could set a
    flag next to each recipient in the DB who is succesfully mailed, these are then not selected when the job is rerun)
攀登最高峰 2024-12-14 17:01:11

并行处理只能在服务器配置的范围内进行。许多服务器可以以并行方式提供页面,但话又说回来,它仅限于少数。相反,经验法则是尽可能快地跳转到下一个请求。

关于您对数据库中大量数据的处理。首先,您需要一个用于您正在进行的邮件发送的 id 列表:

INSERT INTO `mymailinglisttable` (mailing_id, recipient_id, senton) SELECT 123 AS mailing_id, mycontacttable.recipient_id, NULL FROM mycontacttable WHERE [insert your criterias for your contacts]

接下来,您将需要使用 innodb 或一些巧妙的逻辑来进行并行处理:

使用 InnoDB,您可以执行一些行级锁定,但不要询问我怎么了,自己搜索一下,我根本不使用InnoDB,但我知道这是可能的。因此,您阅读相关文档,选择并锁定一些行,发送电子邮件,标记为已发送,然后通过回调您自己的脚本来重复该操作。 (使用AJAX或使用php套接字)

如果没有InnoDB,您可以简单地向数据库添加2个字段,一个是processid,另一个是lockedon字段。当您想要锁定某些地址进行处理时,请执行以下操作:

$mypid = getmypid().rand(1111,9999);
$now = date('Y-m-d G:i:s');
mysql_query('UPDATE mymailinglisttable SET mypid = '.$mypid.', lockedon = "'.$now.'" LIMIT 3');

这将为您的 pid 锁定 3 行,并在当前时间选择锁定的行:

mysql_query('SELECT * FROM mymailinglisttable WHERE mypid = '.$mypid.' AND lockedon = "'.$now.'")

您将检索正确锁定的 3 行以进行处理。我更倾向于使用这个版本而不是 innodb 版本,因为我是用这种方法提出的,但不是因为它性能更高,实际上,我确信 InnoDB 的版本要好得多,只是从未尝试过。

Parallel processing is possible only to the extent of the configuration of your server. Many servers can serve pages in a parallel fashion, but then again, it is limited to a few. Instead, the rule of thumb is to be as fast as possible and jump to the next request.

Regarding your processing of a really large list of data in your database. You will first of all need a list of id for the mailing your are doing:

INSERT INTO `mymailinglisttable` (mailing_id, recipient_id, senton) SELECT 123 AS mailing_id, mycontacttable.recipient_id, NULL FROM mycontacttable WHERE [insert your criterias for your contacts]

Next you will need to use either innodb or some clever logic for your parallel processing:

With InnoDB, you can do some row level locking, but don't ask me how, search it yourself, i don't use InnoDB at all, but i know it is possible. So you read the docs on that, select and lock some rows, send the emails, mark as sent and wash rinse repeat the operation by calling back your own script. (Either with AJAX or with a php socket)

Without InnoDB, you can simply add 2 fields to your database, one is a processid, the other is a lockedon field. When you want to lock some addresses for your processing, do:

$mypid = getmypid().rand(1111,9999);
$now = date('Y-m-d G:i:s');
mysql_query('UPDATE mymailinglisttable SET mypid = '.$mypid.', lockedon = "'.$now.'" LIMIT 3');

This will lock 3 rows for your pid and on the current time, select the rows that were locked using:

mysql_query('SELECT * FROM mymailinglisttable WHERE mypid = '.$mypid.' AND lockedon = "'.$now.'")

You will retrieve the 3 rows that you locked correctly for processing. I tend to use this version more than the innodb version cause i was raised with this method but not because it is more performant, actually, i'm sure InnoDB's version is much better just never tried it.

紫竹語嫣☆ 2024-12-14 17:01:11

如果您习惯使用 PEAR 模块,我建议您查看一下 pear Mail_Queue 模块。

http://pear.php.net/package/Mail_Queue

有详细的文档和很好的教程。我之前使用过此版本的修改版本向客户发送了数千封电子邮件,但它还没有给我带来问题:

http://pear.php.net/manual/en/package.mail.mail-queue.mail-queue.tutorial.php< /a>

If you're comfortable with using PEAR modules, I'd recommend having a look at the pear Mail_Queue module.

http://pear.php.net/package/Mail_Queue

Well documented and with a nice tutorial. I've used a modified version of this before to send out thousands of emails to customers and it hasn't given me a problem yet:

http://pear.php.net/manual/en/package.mail.mail-queue.mail-queue.tutorial.php

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文