如何访问 Perl DBM 哈希中的随机元素?

发布于 2024-08-19 09:07:21 字数 167 浏览 12 评论 0原文

我有一个 Perl DBM 哈希,其中包含一个 URL 列表,我想从中随机选择负载平衡蜘蛛站点。因此,我想随机选择一个键,或者选择第 n 个元素(这样我就可以随机化 n)。

我知道这违背了哈希的概念,但这可能吗?

注意:错过了一个有价值的观点,即散列大小太大而无法加载所有密钥来随机选择。

I have a Perl DBM hash containing a list of URLs that I want to pick randomly from to load balance spidering sites. As a result I want to pick a key at random, or select the nth element (so I can randomise n).

I'm aware this goes against the concept of a hash, but is this possible?

NOTE: missed a valuable point that hash size will be too large to load all the keys to randomly select.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我做我的改变 2024-08-26 09:07:21

我认为任何 DBM 包都没有用于检索随机密钥或按索引号检索密钥的 API。您可以查找特定的键,或者您可以按照数据库选择返回它们的任何顺序读取所有键(如果数据库被修改,这可能会改变,并且可能或可能不足够“随机”以满足您的需要做)。

您可以通读所有键并选择一个,但这需要每次读取整个数据库(或至少其中相当大的一部分),而且这可能太慢了。

我认为您需要重新排列数据结构。

  1. 您可以使用真正的 SQL 数据库
    (比如SQLite),所以你可以
    按顺序查找行
    行号和 URL。这会
    是最灵活的。

  2. 您可以使用连续整数
    作为 DBM 文件的密钥。那
    会随机选择一个
    容易,但你不能再看
    按 URL 向上条目。

  3. 您可以使用两个 DBM 文件:一个是您现在拥有的,另一个是由连续整数键入并以 URL 作为值的。 (实际上,由于 URL 看起来不像整数,因此您可以将两组记录存储在同一个 DBM 文件中,但这会使使用 each 的任何代码变得复杂。)这将使用两倍的磁盘空间,并且会使插入/删除条目变得更加复杂。除非由于某种原因无法安装 SQLite,否则采用方法 1 可能会更好。

I don't think any of the DBM packages have an API for retrieving a random key, or for retrieving keys by index number. You can look up a particular key, or you can read through all the keys in whatever order the database chooses to return them in (which may change if the database is modified, and may or may not be "random" enough for whatever you want to do).

You could read through all the keys and pick one, but that would require reading the entire database each time (or at least a sizable chunk of it), and that's probably too slow.

I think you'll need to rearrange your data structure.

  1. You could use a real SQL database
    (like SQLite), so you could
    look up rows both by a sequential
    row number and by URL. This would
    be the most flexible.

  2. You could use a sequential integer
    as the key for your DBM file. That
    would make picking a random one
    easy, but you could no longer look
    up entries by URL.

  3. You could use two DBM files: the one you have now and a second keyed by sequential integer with the URL as value. (Actually, since URLs don't look like integers, you could store both sets of records in the same DBM file, but that would complicate any code that uses each.) This would use twice the disk space, and would make inserting/removing entries a bit more complicated. You'd probably be better off with approach #1, unless you can't install SQLite for some reason.

萌无敌 2024-08-26 09:07:21

从数组中选择随机元素更简单,因此您可以使用 keys(%foo) 来获取键数组并从中随机选择。

我相信这将从数组中返回一个随机元素 $x

$x = $array[rand @array];

如果您想对数组进行洗牌,请考虑 List::Util::shuffle。请参阅 http://search.cpan.org/perldoc/List::Util#shuffle_LIST

Picking a random element from an array is simpler so you can use keys(%foo) to get the array of keys and pick randomly from that.

I believe this will return a random element $x from an array:

$x = $array[rand @array];

If you want to shuffle an array, consider List::Util::shuffle. See http://search.cpan.org/perldoc/List::Util#shuffle_LIST

当然,这是可能的。首先,获取键列表。然后,使用 List:: 中的 shuffle 随机化列表实用程序

然后,循环按键。

如果键太多(因此将它们全部保存在列表中并且无法进行洗牌),请记住您使用的是绑定哈希。只需使用 each 即可迭代键值对。

顺序将是确定的,但据我所知,它不会按字母顺序或插入顺序。这本身也许就能让你得到你想要的东西。

Of course, it is possible. First, get a list of the keys. Then, randomize the list, using shuffle from List::Util.

Then, loop over the keys.

If there are too many keys (so keeping them all in a list and shuffling is not possible), just remember that you are using tied hashes. Just use each to iterate over key value pairs.

The order will be deterministic but AFAIK, it will not be alphabetical or order of insertion. That, by itself, might be able to get you what you want.

迎风吟唱 2024-08-26 09:07:21

您可以使用 DBM::Deep 而不是传统的数据库文件来保存您的数据。

tie %hash, "DBM::Deep", {
    file => "foo.db",
    locking => 1,
    autoflush => 1
};

# $hash{keys} = [ ... ]
# $hash{urls} = { ... } <- same as your current DB file.

my $like_old = $hash{urls}; # a ref to a hash you can use like your old hashref.
my $count = @{$hash{keys}};

这样您就可以根据需要提取随机值。

You could use DBM::Deep instead of a traditional DB file to keep your data.

tie %hash, "DBM::Deep", {
    file => "foo.db",
    locking => 1,
    autoflush => 1
};

# $hash{keys} = [ ... ]
# $hash{urls} = { ... } <- same as your current DB file.

my $like_old = $hash{urls}; # a ref to a hash you can use like your old hashref.
my $count = @{$hash{keys}};

With that you can pull out random values as needed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文