使用 Perl 访问 BerkeleyDB 的正确方法是什么?
我在使用 BerkeleyDB 时遇到了一些问题。我有相同代码的多个实例指向单个数据库文件存储库,并且一切正常运行 5-32 小时,然后突然出现死锁。命令提示符会在执行 db_get 或 db_put 或游标创建调用之前停止。所以我只是询问处理这些电话的正确方法。这是我的总体布局:
这就是创建环境和数据库的方式:
my $env = new BerkeleyDB::Env (
-Home => "$dbFolder\\" ,
-Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL)
or die "cannot open environment: $BerkeleyDB::Error\n";
my $unsortedHash = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/Unsorted.db",
-Flags => DB_CREATE,
-Env => $env
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
此代码的单个实例运行,转到站点并保存要由另一个实例解析的 URL(我设置了标志,以便每个数据库在一个实例时都被锁定)被锁定):
$lk = $unsortedHash->cds_lock();
while(@urlsToAdd){
my $currUrl = shift @urlsToAdd;
$unsortedHash->db_put($currUrl, '0');
}
$lk->cds_unlock();
它定期检查一定数量的项目是否处于未排序状态:
$refer = $unsortedHash->db_stat();
$elements = $refer->{'hash_ndata'};
在将任何元素添加到任何数据库之前,它首先检查所有数据库以查看该元素是否已经存在:
if ($unsortedHash->db_get($search, $value) == 0){
$value = "1:$value";
}elsif ($badHash->db_get($search, $value) == 0){
$value = "2:$value";
....
接下来的代码以及它的许多实例是并行运行的。首先,它获取未排序的下一个项目(没有繁忙值“1”),然后将该值设置为繁忙“1”,然后对其执行某些操作,然后将数据库条目完全移动到另一个数据库(它是从未排序中删除并存储在另一个数据库中):
my $pageUrl = '';
my $busy = '1';
my $curs;
my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock
########## GET AN ELEMENT FROM THE UNSORTED HASH #######
while(1){
$busy = '1';
$curs = $unsortedHash->db_cursor();
while ($busy){
$curs->c_get($pageUrl, $busy, DB_NEXT);
print "$pageUrl:$busy:\n";
if ($pageUrl eq ''){
$busy = 0;
}
}
$curs->c_close();
$curs = undef;
if ($pageUrl eq ''){
print "Database empty. Sleeping...\n";
$lk->cds_unlock();
sleep(30);
$lk = $unsortedHash->cds_lock();
}else{
last;
}
}
####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT
$unsortedHash->db_put($pageUrl, '1');
$lk->cds_unlock();
$lk = undef;
在所有其他地方,如果我在任何数据库上调用 db_put 或 db_del ,它都会像这样被锁包裹:
print "\n\nBad.\n\n";
$lk = $badHash->cds_lock();
$badHash->db_put($pageUrl, '0');
$unsortedHash->db_del($pageUrl);
$lk->cds_unlock();
$lk = undef;
但是,我的 db_get 命令是自由浮动的,没有锁,因为我认为阅读不需要锁。
我已经检查了这段代码一百万次,算法是无懈可击的。所以我只是想知道我是否执行了错误的任何部分,使用了错误的锁等等。或者是否有更好的方法来防止 BerkeleyDB 和 Strawberry Perl 发生死锁(甚至诊断死锁)?
更新:更具体地说,问题发生在 Windows 2003 服务器上(1.5 GB RAM,不确定这是否重要)。我可以在我的 Windows 7 机器(4GB RAM)上很好地运行整个设置。我还开始使用以下命令打印锁定统计信息:
将此标志添加到环境创建中:
-MsgFile => "$dbFolder/lockData.txt"
然后每 60 秒调用一次:
my $status = $env->lock_stat_print();
print "Status:$status:\n";
状态始终返回为 0,即成功。这是最后一份统计报告:
29 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
5 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
40 Number of lock object partitions
24 Number of current locks
42 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
29 Number of current lockers
29 Maximum number of lockers at any one time
6 Number of current lock objects
13 Maximum number of lock objects at any one time
1 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
3121958 Total number of locks requested
3121926 Total number of locks released
0 Total number of locks upgraded
24 Total number of locks downgraded
9310 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
8 Number of deadlocks
1000000 Lock timeout value
0 Number of locks that have timed out
1000000 Transaction timeout value
0 Number of transactions that have timed out
792KB The size of the lock region
59 The number of partition locks that required waiting (0%)
46 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
27 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
1 Maximum hash bucket length
我对此持谨慎态度:
8 Number of deadlocks
这些死锁是如何发生的,以及它们是如何解决的? (代码的所有部分仍在运行)。在这种情况下,僵局到底是什么?
I've been having some problems with using BerkeleyDB. I have multiple instances of the same code pointed to a single repository of DB files, and everything runs fine for 5-32 hours, then suddenly there is a deadlock. The command prompts stop right before executing a db_get or db_put or cursor creation call. So I'm simply asking for the proper way to handle these calls. Here's my general layout:
This is how the environment and DBs are created:
my $env = new BerkeleyDB::Env (
-Home => "$dbFolder\\" ,
-Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL)
or die "cannot open environment: $BerkeleyDB::Error\n";
my $unsortedHash = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/Unsorted.db",
-Flags => DB_CREATE,
-Env => $env
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
A single instance of this code runs, goes to a site and saves URLs to be parsed by another instance (I have the flag set so that every DB is locked when one is locked):
$lk = $unsortedHash->cds_lock();
while(@urlsToAdd){
my $currUrl = shift @urlsToAdd;
$unsortedHash->db_put($currUrl, '0');
}
$lk->cds_unlock();
It periodically checks if a certain number of items are in Unsorted:
$refer = $unsortedHash->db_stat();
$elements = $refer->{'hash_ndata'};
Before adding any element to any DB, it first checks all DBs to see if that element is already present:
if ($unsortedHash->db_get($search, $value) == 0){
$value = "1:$value";
}elsif ($badHash->db_get($search, $value) == 0){
$value = "2:$value";
....
This next code comes after, and many instances of it are run in parallel. First, it gets the next item in unsorted (that does not have the busy value '1'), then sets the value to busy '1', then does something with it, then moves the DB entry completely to another DB (it is removed from unsorted and stored in another DB):
my $pageUrl = '';
my $busy = '1';
my $curs;
my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock
########## GET AN ELEMENT FROM THE UNSORTED HASH #######
while(1){
$busy = '1';
$curs = $unsortedHash->db_cursor();
while ($busy){
$curs->c_get($pageUrl, $busy, DB_NEXT);
print "$pageUrl:$busy:\n";
if ($pageUrl eq ''){
$busy = 0;
}
}
$curs->c_close();
$curs = undef;
if ($pageUrl eq ''){
print "Database empty. Sleeping...\n";
$lk->cds_unlock();
sleep(30);
$lk = $unsortedHash->cds_lock();
}else{
last;
}
}
####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT
$unsortedHash->db_put($pageUrl, '1');
$lk->cds_unlock();
$lk = undef;
And in every other place, if I call db_put or db_del on ANY DB, it is wrapped with a lock like so:
print "\n\nBad.\n\n";
$lk = $badHash->cds_lock();
$badHash->db_put($pageUrl, '0');
$unsortedHash->db_del($pageUrl);
$lk->cds_unlock();
$lk = undef;
However, my db_get commands are free-floating with no lock, because I don't think reading needs a lock.
I have looked over this code a million times and the algorithm is airtight. So I am just wondering if I am implementing any part of this wrong, using the locks wrong, etc. Or if there is a better way to prevent deadlocking (or even diagnose deadlocking) with BerkeleyDB and Strawberry Perl?
UPDATE: To be more specific, the problem is occurring on a Windows 2003 server (1.5 GB RAM, not sure if that is important). I can run this whole setup fine on my Windows 7 machine (4GB RAM). I also started printing out the lock stats using the following:
Adding this flag to the environment creation:
-MsgFile => "$dbFolder/lockData.txt"
And then calling this every 60 seconds:
my $status = $env->lock_stat_print();
print "Status:$status:\n";
The status is always returned as 0, which is success. Here is the last stat report:
29 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
5 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
40 Number of lock object partitions
24 Number of current locks
42 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
29 Number of current lockers
29 Maximum number of lockers at any one time
6 Number of current lock objects
13 Maximum number of lock objects at any one time
1 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
3121958 Total number of locks requested
3121926 Total number of locks released
0 Total number of locks upgraded
24 Total number of locks downgraded
9310 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
8 Number of deadlocks
1000000 Lock timeout value
0 Number of locks that have timed out
1000000 Transaction timeout value
0 Number of transactions that have timed out
792KB The size of the lock region
59 The number of partition locks that required waiting (0%)
46 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
27 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
1 Maximum hash bucket length
Of which I am wary of this:
8 Number of deadlocks
How did these deadlocks occur, and how were they resolved? (all parts of the code are still running). What exactly is a deadlock, in this case?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
但是,我的 db_get 命令是自由浮动的,没有锁,因为我认为读取不需要锁。
这个假设是错误的。作为http://pybsddb.sourceforge.net/ref/lock/page.html 说,BerkeleyDB 必须在内部发出读锁,否则如果读者试图读取正在更改的数据,您可能会得到未定义的行为。因此读取很容易成为死锁情况的一部分。
在存在游标的情况下尤其如此。读取游标对已读取的所有内容保持锁定,直到游标关闭为止。请参阅http://pybsddb.sourceforge.net/ref/lock/am_conv.html 了解更多可能陷入僵局的详细信息(事实上,您甚至可能自己陷入僵局)。
However, my db_get commands are free-floating with no lock, because I don't think reading needs a lock.
This assumption is wrong. As http://pybsddb.sourceforge.net/ref/lock/page.html says, BerkeleyDB has to issue read locks internally because otherwise you could get undefined behavior if a reader tried to read data that was being changed out from under it. Therefore reads can easily be part of in a deadlock situation.
This is particularly true in the presence of cursors. Read cursors maintain locks on everything that has been read until the cursor is closed. See http://pybsddb.sourceforge.net/ref/lock/am_conv.html for more details an ways that you can get into deadlock (in fact you can even deadlock yourself).
简而言之,你需要进行死锁检测。我可以看到有两种可能性可以做到这一点。首先,您可以使用
db_deadlock
实用程序。其次,也许更方便的是,您可以在打开环境时指定-LockDetect
标志,该标志在BerkeleyDB.pm
的 Perl 文档。在 4.5.20 版本中,这两种方法对我来说似乎都工作得很好。 (顺便问一下,您的版本是什么?)
现在详细信息。
指定
-LockDetect
标志实际上就是这样。有几个值可供选择。我选择了DB_LOCK_DEFAULT
,它看起来工作得很好。有了更多关于正在发生的事情的线索,你肯定会变得更加奇特。运行
db_deadlock
实用程序可以这样完成:以下是
db_deadlock
手册中的引用:我得出的结论是,通过对两个写入器和一个读取器重复执行测试,这两种方法都可以正常工作,在快速连续(每秒 100 个)或通过游标将新条目放入数据库时,这会死锁几次数据库中的所有键。
标志方法似乎可以非常快速地处理死锁,它们在我的测试中并没有变得明显。
另一方面,与脚本并行运行带有详细输出的
db_deadlock
实用程序是有启发性的,因为您可以看到它们如何阻塞,然后在锁定器中止后继续,特别是与db_stat
实用程序:我缺乏解释所有细节的专业知识,但您可以看到,在被阻止的情况下,有某些条目,而在其他情况下则没有。另请参阅标题为 Berkeley DB 并发数据存储锁定约定(什么是
IWRITE
?)在Berkeley DB 程序员参考指南。您在问这些僵局是如何发生的。不能准确地说,但我确实看到它们在并发访问时发生。你还问他们是如何解决的。我不知道。在我的测试场景中,被阻止的脚本将简单地挂起。也许在您的场景中,有人在您不知情的情况下进行了死锁检测?
为了完整起见,您的应用程序可能只是挂起,因为线程在退出之前尚未关闭资源。如果您只是 Ctrl-C 一个进程并且没有适当的清理处理程序来关闭资源,则可能会发生这种情况。但这似乎不是你的问题。
如果它确实成为您的问题,您应该查看处理失败部分参考指南中的数据存储和并发数据存储应用程序。
CDS和DS没有恢复的概念。由于CDS和DS不支持事务并且不维护恢复日志,因此它们无法运行恢复。如果 DS 或 CDS 中的数据库损坏,您只能将其删除并重新创建。 (更多内容逐字摘自 Himanshu Yadava 编写的 Berkeley DB Book。)
最后,还有视频Oracle 站点上的教程,包括 Margo Seltzer 的 CDS 使用说明。
In short, you need to do deadlock detection. I can see two possibilities to do that. First, you can use the
db_deadlock
utility. Second, and perhaps more conveniently, you can specify the-LockDetect
flag when opening your environment, a flag that's not exactly explained in depth in the Perl docs forBerkeleyDB.pm
.Both ways appear to work fine for me in version 4.5.20. (What's your version, by the way?)
Now for the detail.
Specifying the
-LockDetect
flag is really just that. There are a couple of values to choose from. I choseDB_LOCK_DEFAULT
and it appeared to work just fine. With more clues as to what's going on you could certainly get more fancy.Running the
db_deadlock
utility could be done like this:Here's a quote from the
db_deadlock
manual:I arrived at the conclusion that both ways do work fine by repeatedly performing a test with two writers and one reader, which would deadlock a couple times while putting new entries in the database in rapid succession (100 per second), or going through a cursor of all keys in the database.
The flag method appears to deal with deadlocks very quickly, they didn't become noticeable in my tests.
On the other hand, running the
db_deadlock
utility with verbose output in paralles with the scripts is instructive in that you see how they block and then continue after lockers have been aborted, especially when combined with thedb_stat
utility:I lack the expertise to explain all the details, but you can see that in blocked situations there are certain entries there while in others there aren't. Also see the section entitled Berkeley DB Concurrent Data Store locking conventions(what is
IWRITE
?) in the Berkeley DB Programmer's Reference Guide.You're asking how these deadlocks did occur. Can't say exactly, but I do see that they are occurring with concurrent access. You're also asking how they were resolved. I have no idea. In my test scenarios, blocked scripts will simply hang. Maybe in your scenario someone ran deadlock detection without you knowing about it?
For completeness, your application might simply hang because a thread has not closed resources before exiting. Might happen if you just Ctrl-C a process and there is no clean-up handler in place to close resources. But that doesn't appear to be your problem.
If it does become your problem, you should review the section on Handling failure in Data Store and Concurrent Data Store applications in the Reference Guide.
CDS and DS have no concept of recovery. Since CDS and DS don't support transactions and don't maintain a recovery log, they cannot run recovery. If the database gets corrupted in DS or CDS, you can only remove it and recreate it. (Taken moreless verbatim from the Berkeley DB Book by Himanshu Yadava.)
Finally, there are video tutorials on the Oracle site, including one on using CDS by Margo Seltzer.
虽然不是 BerkeleyDB 解决方案,但您可以通过 Win32::Mutex 使用替代锁定,它使用底层 Windows 互斥锁。一个非常简单的例子如下:
While not a BerkeleyDB solution, you might be able to use alternative locking though Win32::Mutex, which uses underlying Windows mutexes. A very simple example is below: