在将行导入 mySQL 之前如何检查值是否唯一?
我有一个数据库表,需要将一堆记录导入其中。如果现有记录已经存在于数据库中,我不想覆盖它们,因此我设置了首先执行选择查询来检查是否有值,但显然行导入速度太快,索引无法跟上,因为我正在为我插入的每一行创建重复项。
我正在导入 CSV 文件。
这就是我正在做的事情(这是在 Joomla 系统内部,因此一些代码和对象是 joomla 特定的):
$fp = fopen(JPATH_ROOT.DS."tmp".DS.$filename, 'r');
//run insert query on each line of file
if(JRequest::getVar('importType')=="activated") {
while(!feof($fp)) {
while (($data = fgetcsv($fp, 1000, ",")) !== FALSE) {
if($this->checkUnique($data[0])) {
$this->runInsert2($data[0], $data[1], $data[2], $data[3]);
error_log("there is not already a code for ".$data[0]);
}
else {
error_log("there is already a code for ".$data[0]);
}
$row++;
}
}
}
fclose($fp);
这是 checkUnique:
function checkUnique($vouchNum) {
$db =& JFactory::getDBO();
$query = "select COUNT(*) from arrc_Voucher where VoucherNbr=".$db->quote($vouchNum);
if(!$db->query()) error_log("error running unique check on ".$vouchNum." - " . $db->stderr());
$db->setQuery($query);
$count = $db->loadResult();
if($count>0) {
return false;
}
else {
return true;
}
}
这是 runInsert2:
function runInsert2($vouchNum,$BalanceInit,$BalanceCurrent,$ActivatedDT) {
$rightNow = date('Y-m-d H:i:s');
$db =& JFactory::getDBO();
if($ActivatedDT <> "NULL") {
$activatedDTtmp = strtotime($ActivatedDT);
$activatedDT = date('Y-m-d H:i:s',$activatedDTtmp);
}
else {
$activatedDT = $rightNow;
}
$query = "insert into arrc_Voucher (VoucherNbr,BalanceInit, BalanceCurrent, ActivatedDT)
values (". $db->quote($vouchNum). ", ".$db->quote($BalanceInit).",".$db->quote($BalanceCurrent).",".$db->quote($activatedDT).")";
error_log("query: ".$query);
$db->setQuery($query);
if (!$db->query()) error_log("error inserting voucher number ". $vouchNum . "-" . $db->stderr());
}
我不知道我哪里出了问题,但如果有人可以帮助我(或为我指出一个更好的方向以避免重复),我将非常感激。仅供参考,我们认为“唯一”的字段 (VoucherNbr) 实际上不是主键,也不是以任何方式在表结构中标记为唯一的,也不可能是。这是我们现在需要在编码端解决的问题。
I have a database table that I need to import a bunch of records into. I don't want to overwrite existing records if they're already in the database, so I've set things up to do a select query first to check if there's a value, but apparently the rows are importing too quickly for the index to keep up, as I'm getting duplicates created for every single row that I'm inserting.
I'm importing a CSV file.
Here's what I'm doing (this is inside a Joomla system, so some of the code and objects are joomla-specific):
$fp = fopen(JPATH_ROOT.DS."tmp".DS.$filename, 'r');
//run insert query on each line of file
if(JRequest::getVar('importType')=="activated") {
while(!feof($fp)) {
while (($data = fgetcsv($fp, 1000, ",")) !== FALSE) {
if($this->checkUnique($data[0])) {
$this->runInsert2($data[0], $data[1], $data[2], $data[3]);
error_log("there is not already a code for ".$data[0]);
}
else {
error_log("there is already a code for ".$data[0]);
}
$row++;
}
}
}
fclose($fp);
Here's checkUnique:
function checkUnique($vouchNum) {
$db =& JFactory::getDBO();
$query = "select COUNT(*) from arrc_Voucher where VoucherNbr=".$db->quote($vouchNum);
if(!$db->query()) error_log("error running unique check on ".$vouchNum." - " . $db->stderr());
$db->setQuery($query);
$count = $db->loadResult();
if($count>0) {
return false;
}
else {
return true;
}
}
And here's runInsert2:
function runInsert2($vouchNum,$BalanceInit,$BalanceCurrent,$ActivatedDT) {
$rightNow = date('Y-m-d H:i:s');
$db =& JFactory::getDBO();
if($ActivatedDT <> "NULL") {
$activatedDTtmp = strtotime($ActivatedDT);
$activatedDT = date('Y-m-d H:i:s',$activatedDTtmp);
}
else {
$activatedDT = $rightNow;
}
$query = "insert into arrc_Voucher (VoucherNbr,BalanceInit, BalanceCurrent, ActivatedDT)
values (". $db->quote($vouchNum). ", ".$db->quote($BalanceInit).",".$db->quote($BalanceCurrent).",".$db->quote($activatedDT).")";
error_log("query: ".$query);
$db->setQuery($query);
if (!$db->query()) error_log("error inserting voucher number ". $vouchNum . "-" . $db->stderr());
}
I have no clue where I'm going wrong here, but if anyone can help me out (or point me in a better direction for avoiding duplicates) I'd be very grateful. FYI, the field that we're considering to be "unique" (VoucherNbr) is not actually a primary key or in any way marked as unique in the table structure, and cannot be. This is something we need to work around on the coding end right now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
设置唯一约束并使用
insertignore
,这样您就不会出现重复项。也就是说,是否可以忽略重复的行。
您无法在列上设置唯一键而不是需要保留唯一值的原因是什么?
另一种解决方案是将数据导入具有相同结构的单独表中。
创建像arrc_Voucher一样的表arrc_buffer
。您在每次导入之前截断该表。
然后您可以从此缓冲区插入到您的arrc_Voucher 表中。
1.
从缓冲区中删除arrc_Voucher 中已有的所有行。
然后将其余部分插入arrc_Voucher。
除了这些导入之外,还有其他在 arrc_Voucher 中插入数据的例程吗?
Put an unique constraint and use
insert ignore
, this way you'll never have duplicates.That is if it's ok for the duplicate rows to be ignored.
What is the reason you can't set an unique key on a column than needs to keep unique values?
Another solution would be to import data in a separate table with the same structure.
create table arrc_buffer like arrc_Voucher
.You truncate this table before each import.
Then you can insert into your arrc_Voucher table from this buffer.
1.
Remove from the buffer all rows that are already in arrc_Voucher.
Then insert the rest in arrc_Voucher.
Besides these imports, is there any other routine inserting data in arrc_Voucher ?
如果您确实无法更改表,则可能必须在 INSERT 后检查重复项并删除它们,或者在检查现有行之前锁定表。您无法保证 SELECT 和 INSERT 语句之间不会发生 INSERT。
If you really can't change the tables, you might have to either check for duplicates and remove them after your INSERT, or lock the table before your check for existing rows. You can't guarantee an INSERT doesn't happen between your SELECT and INSERT statements.