将数据从“原始”数据传输到优化的 MySQL 表中带有 php 的 cron 中的表
我正在与一家 MLS 房地产列表提供商 (RETS) 合作。每隔 48 小时,我们就会通过 cron 作业将数据从他们的服务器提取到 SQL 数据库。我负责编写一个 php 脚本,该脚本将在远程服务器的数据转储到我们的“原始”表中后运行。在这些原始表中,所有列都是 VARCHAR(255)
,我们希望将数据移动到优化表中。在我将我的脚本发送给负责设置 cron 作业的人之前,我想知道是否有更有效的方法来做到这一点,这样我就不会显得愚蠢。
这就是我正在做的事情:
总共有 8 个表,其中 4 个是原始表,4 个是优化表 - 全部都在同一个数据库中。原始表列名称是非描述性的,例如 c1、c2、c2、c4 等。这是故意的,因为每列中的数据可能会发生变化。原始表列名称使用 php 映射到正确的优化表列,如下所示:
$tables['optimized_table_name1']['raw_table'] = 'raw_table_name1';
$tables['optimized_table_name1']['data_map'] = array(
'c1' => array( // <--- "c1" is the raw table column name
'column_name' => 'id',
// I use other values for table creation,
// but they don't matter to the question.
// Just explaining why the array looks like this
//'type' => 'VARCHAR',
//'max_length' => 45,
//'primary_key' => FALSE,
// etc.
),
'c9' => array('column_name' => 'address'),
'c25' => array('column_name' => 'baths'),
'c2' => array('column_name' => 'bedrooms') //etc.
);
我对 4 个表中的每一个执行相同的操作:SELECT * FROM
原始表,读取配置数组并创建一个巨大的 SQL 插入语句,TRUNCATE
优化表,然后运行 INSERT
查询。
foreach ($tables as $table_name => $config):
$raw_table = $config['raw_table'];
$data_map = $config['data_map'];
$fields = array();
$values = array();
$count = 0;
// Get the raw data and create an array mapped to the optimized table columns.
$query = mysql_query("SELECT * FROM dbname.{$raw_table}");
while ($row = mysql_fetch_assoc($query))
{
// Reading column names from my config file on first pass
// Setting up the array, will only run once per table
if (empty($fields))
{
foreach ($row as $key => $val)
{// Produces an array with the column names
$fields[] = $data_map[$key]['column_name'];
}
}
foreach ($row as $key => $val)
{// Assigns data to an array to be imploded later
$values[$count][] = $val;
}
$count++;
}
// Create the INSERT statement string
$insert = array();
$sql = "\nINSERT INTO `{$table_name}` (`".implode('`,`', $fields)."`) VALUES\n";
foreach ($values as $key => $vals)
{
foreach ($vals as &$val)
{
// Escape the data
$val = mysql_real_escape_string($val);
}
// Using implode for simplicity, could avoid the nested foreach if I wanted to
$insert[] = "('".implode("','", $vals)."')";
}
$sql .= implode(",\n", $insert).";\n";
// TRUNCATE optimized table and run INSERT query here
endforeach;
它会产生这样的结果(只是更大 - 每个表最多大约 15,000 条记录,每个表一个插入语句):
INSERT INTO `optimized_table_name1` (`id`,`beds`,`baths`,`town`) VALUES
('50300584','2','1','Fairfield'),
('87560584','3','2','New Haven'),
('76545584','2','1','Bristol');
现在我承认,我已经在 ORM 的羽翼下很长一段时间了,并且不喜欢我的香草mysql/php.ini这是一个非常简单的任务,我想让代码保持简单。
我的问题:
- TRUNCATE/INSERT 方法是执行此操作的好方法吗?
- 您认为我的代码有什么问题吗?我知道您看到嵌套的 foreach 循环并感到不寒而栗,但我希望代码尽可能小而干净,并避免大量混乱的字符串连接(以生成插入查询)。正如我所说,我也已经很长时间没有使用原生 PHP 函数来执行 SQL 了。
- 我觉得如果代码每两天凌晨 3 点运行一次,即使代码没有优化也没关系。 重要吗?这段代码执行正常吗?
- 有没有更好的整体策略来完成这项任务?
- 我需要使用交易吗?
- 如何知道 cron 脚本中可能出现的错误?
如果我没有使用正确的 cron 术语,请道歉,这对我来说是新的。
I'm working with an MLS real estate listing provider (RETS). Every 48 hours we will be pulling data from their server in a cron job to an SQL database. I'm charged with the task of writing a php script that will be run after the data from the remote server is dumped into our "raw" tables. In these raw tables, all columns are VARCHAR(255)
, and we want to move the data into optimized tables. Before I send my script to the guy in charge of setting up the cron job, I wondered if there is a more efficient way to do it so I don't look foolish.
Here's what I'm doing:
There are 8 total tables, 4 raw and 4 optimized - all in the same database. The raw table column names are non descriptive, like c1,c2,c2,c4 etc. This is intentional because the data that goes in each column may change. The raw table column names are mapped to the correct optimized table columns with php, something like this:
$tables['optimized_table_name1']['raw_table'] = 'raw_table_name1';
$tables['optimized_table_name1']['data_map'] = array(
'c1' => array( // <--- "c1" is the raw table column name
'column_name' => 'id',
// I use other values for table creation,
// but they don't matter to the question.
// Just explaining why the array looks like this
//'type' => 'VARCHAR',
//'max_length' => 45,
//'primary_key' => FALSE,
// etc.
),
'c9' => array('column_name' => 'address'),
'c25' => array('column_name' => 'baths'),
'c2' => array('column_name' => 'bedrooms') //etc.
);
I'm doing the same thing for each of the 4 tables: SELECT * FROM
the raw table, read the config array and create a huge SQL insert statement, TRUNCATE
the optimized table, then run the INSERT
query.
foreach ($tables as $table_name => $config):
$raw_table = $config['raw_table'];
$data_map = $config['data_map'];
$fields = array();
$values = array();
$count = 0;
// Get the raw data and create an array mapped to the optimized table columns.
$query = mysql_query("SELECT * FROM dbname.{$raw_table}");
while ($row = mysql_fetch_assoc($query))
{
// Reading column names from my config file on first pass
// Setting up the array, will only run once per table
if (empty($fields))
{
foreach ($row as $key => $val)
{// Produces an array with the column names
$fields[] = $data_map[$key]['column_name'];
}
}
foreach ($row as $key => $val)
{// Assigns data to an array to be imploded later
$values[$count][] = $val;
}
$count++;
}
// Create the INSERT statement string
$insert = array();
$sql = "\nINSERT INTO `{$table_name}` (`".implode('`,`', $fields)."`) VALUES\n";
foreach ($values as $key => $vals)
{
foreach ($vals as &$val)
{
// Escape the data
$val = mysql_real_escape_string($val);
}
// Using implode for simplicity, could avoid the nested foreach if I wanted to
$insert[] = "('".implode("','", $vals)."')";
}
$sql .= implode(",\n", $insert).";\n";
// TRUNCATE optimized table and run INSERT query here
endforeach;
Which produces something like this (only larger - about 15,000 records max per table, and one insert statement per table):
INSERT INTO `optimized_table_name1` (`id`,`beds`,`baths`,`town`) VALUES
('50300584','2','1','Fairfield'),
('87560584','3','2','New Haven'),
('76545584','2','1','Bristol');
Now I'll admit, I have been under the wing of an ORM for a long time and am not up on my vanilla mysql/php. This is a pretty simple task and I want to keep the code simple.
My questions:
- Is the TRUNCATE/INSERT method a good way to do this?
- Is there anything about my code that you can see being a problem? I know you see nested foreach loops and just shudder, but I want to keep the code as small clean as possible and avoid lots of messy string concatenation (to produce the insert query). Like I said, I also haven't used native php functions for SQL in a long time.
- I feel like it really doesn't matter if the code is not optimized if it is run at 3AM every 2 days. Does it matter? Is this code going to preform OK?
- Is there a better overall strategy to accomplish this task?
- Do I need to be using transactions?
- How can I be aware of errors that may occur in cron scripts?
Apologize if I don't use correct cron jargon, it's new to me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
保持简单。 ORM 非常适合这项任务。
回答:
mysql -u username --password=password
将数据加载到服务器中导入_file.sql
。因此,我会将文件存储在磁盘上的某个位置,这样我就可以随时查看它。甚至可以编辑一次性校正负载。但你仍然可以通过将 sql 语句写入文件来完成。0 7 * * 0 /path/to/php -c /path/to/php.ini /path/to/script.php 2> /tmp/errors_file
因此,您可以创建在第一个脚本之后运行的第二个脚本,以通过电子邮件或......任何您喜欢的通知方式通知 script.php 中的错误。我更喜欢 register_shutdown_functions 来检查error_file
,如果不为空,则通知您并随后将其删除。只是我的意见,但希望我的回答对您有所帮助。
Keep it simple. ORM would be swell for this task.
Answers:
mysql -u username --password=password < import_file.sql
. So I'd have my file stored somewhere on a disk so I cal always take a look at it. And maybe even edit for one-time correction load. But you still can do it by writing your sql statement into file.0 7 * * 0 /path/to/php -c /path/to/php.ini /path/to/script.php 2> /tmp/errors_file
And thus you can create SECOND script runnin after first one to notify about errors in script.php by email or.... whatever way of notifying you prefer. I'd prefer to register_shutdown_functions that would check forerror_file
and if it is not empty, notify you and delete it afterwards.Just my opinion, but I hope my answer helps though.