CLASPL- 10%的阈值意味着什么?

发布于 2025-02-02 10:55:57 字数 633 浏览 2 评论 0 原文

我使用Claypl进行了两组地址之间的模糊匹配。 文档说默认值是:

如果未给出成本,则所有默认为10%,另一个默认为10% 转换号码范围默认为全部。组件名称可以 缩写。

但是,与阅读此Q& a 这个示例,这似乎不匹配。这是一个示例:

agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE

从描述中,我想计算10%的10个字母单词中会有1个更改,但这是4中的1个。这是如何计算的?

I used Agrepl for fuzzy matching between two sets of addresses. The documentation says that the default is:

If cost is not given, all defaults to 10%, and the other
transformation number bounds default to all. The component names can
be abbreviated.

However, reading this q&a with this example, that doesn't seem to match up. Here is that example:

agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE

From the description, I'd imagine that calculating the 10% would be having 1 change in a 10 letter word, but this is 1 in 4. How exactly is this calculated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

江南月 2025-02-09 10:55:57

诚然,这是非常令人困惑的(至少对我来说!),但这是我的尝试。 链接答案说:

长度图4的默认最大转换量为1。

我们如何从0.1(成本)×4(图案长度)到1?好吧,?cyspl 指出 max.dist 表示为

作为模式的一小部分长度时间
最大转化成本(将被
最小的整数不小于相应的分数

(添加了强调);我将括号的子句表示意味着最大转换数为上限(0.1*4) = 1。我们需要一个带有长度≥11的模式,以便 for for for tote> tatter_length )要从1增加到2 ...

如果您想找出实际实现的位置,则必须深入研究C源代码,即功能,我们看到的地方

if(bound < 1) bound *= (patlen * max_cost);
params->max_cost = IntegerFromReal(ceil(bound), &warn);

This is admittedly very confusing (at least to me!), but here's my attempt to explain it. The linked answer says:

The default maximum amount of transformations for a pattern of length 4 is 1.

How do we get from 0.1 (cost) × 4 (pattern length) to 1? Well, ?agrepl notes that the max.dist is expressed as

as a fraction of the pattern length times
the maximal transformation cost (will be replaced by the
smallest integer not less than the corresponding fraction
)

(emphasis added); I take the parenthetical clause to mean that the maximum number of transformations is ceiling(0.1*4) = 1. We would need a pattern with length ≥ 11 in order for ceiling(0.1*pattern_length) to increase from 1 to 2 ...

If you want to find out where this is actually implemented, you have to dig fairly deep into the C source code, i.e. lines 59-60 of agrep.c, in the amatch_regparams function, where we see

if(bound < 1) bound *= (patlen * max_cost);
params->max_cost = IntegerFromReal(ceil(bound), &warn);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文