计算函数合理性的算法/蒙特卡罗方法

发布于 2024-12-04 18:20:09 字数 3691 浏览 1 评论 0原文

我正在编写一个程序,尝试复制本文开头讨论的算法,

http: //www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

F 是一个从 char 到 char 的函数。假设 Pl(f) 是该函数的“合理性”度量。该算法是:

从对函数(例如 f)的初步猜测开始,然后是新函数 f* -

  • 计算 Pl(f)。
  • 通过对 f 分配给两个符号的值进行随机转置来更改为 f*。
  • 计算 Pl(f*);如果大于 Pl(f),则接受 f*。
  • 如果没有,则抛一枚 Pl(f)/Pl(f*) 硬币;如果出现正面,请接受 f*。
  • 如果抛硬币反面朝上,则停留在 f 处。

我正在使用以下代码来实现这一点。我正在使用 c#,但试图让它对每个人来说都更加简化。如果有更好的论坛,请告诉我。

 var current_f = Initial();    // current accepted function f
 var current_Pl_f = InitialPl();  // current plausibility of accepted function f

 for (int i = 0; i < 10000; i++)
        {
            var candidate_f = Transpose(current_f); // create a candidate function

            var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

            if (candidate_Pl_f > current_Pl_f)            // candidate Pl has improved
            {
                current_f = candidate_f;            // accept the candidate
                current_Pl_f = candidate_Pl_f; 
            }
            else                                    // otherwise flip a coin
            {
                int flip = Flip(); 

                if (flip == 1)                      // heads
                {
                    current_f = candidate_f;        // accept it anyway
                    current_Pl_f = candidate_Pl_f; 
                }
                else if (flip == 0)                 // tails
                {
                    // what to do here ?
                }
            }
        }

我的问题基本上是这看起来是否是实现该算法的最佳方法。尽管实现了这种方法,但似乎我可能会陷入一些局部最大值/局部最小值。

编辑 - 这是 Transpose() 方法背后的本质。我使用 << 类型的字典/哈希表字符,字符>>候选函数用于查找任何给定的字符 ->字符转换。因此转置方法只是交换字典中决定函数行为的两个值。

    private Dictionary<char, char> Transpose(Dictionary<char, char> map, params int[] indices)
    {
        foreach (var index in indices)
        {
            char target_val = map.ElementAt(index).Value;   // get the value at the index

            char target_key = map.ElementAt(index).Key;     // get the key at the index

            int _rand = _random.Next(map.Count);   // get a random key (char) to swap with

            char rand_key = map.ElementAt(_rand).Key;

            char source_val = map[rand_key]; // the value that currently is used by the source of the swap

            map[target_key] = source_val; // make the swap

            map[rand_key] = target_val;
        }

        return map; 
    }

请记住,使用底层字典的候选函数基本上只是:

public char GetChar(char in, Dictionary<char, char> theMap)
{
     return theMap[char]; 
}

这是计算 Pl(f) 的函数:

    public decimal ComputePl(Func<char, char> candidate, string encrypted, decimal[][] _matrix)
    {
        decimal product = default(decimal);

        for (int i = 0; i < encrypted.Length; i++)
        {
            int j = i + 1;

            if (j >= encrypted.Length)
            {
                break;
            }

            char a = candidate(encrypted[i]);
            char b = candidate(encrypted[j]);

            int _a = GetIndex(_alphabet, a); // _alphabet is just a string/char[] of all avl chars 
            int _b = GetIndex(_alphabet, b);

            decimal _freq = _matrix[_a][_b]; 

            if (product == default(decimal))
            {
                product = _freq;
            }
            else
            {
                product = product * _freq;
            }
        }

        return product;
    }

I am writing a program that attempts to duplicate the algorithm discussed at the beginning of this article,

http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

F is a function from char to char. Assume that Pl(f) is a 'plausibility' measure of that function. The algorithm is:

Starting with a preliminary guess at the function, say f, and a then new function f* --

  • Compute Pl(f).
  • Change to f* by making a random transposition of the values f assigns to two symbols.
  • Compute Pl(f*); if this is larger than Pl(f), accept f*.
  • If not, flip a Pl(f)/Pl(f*) coin; if it comes up heads, accept f*.
  • If the coin toss comes up tails, stay at f.

I am implementing this using the following code. I'm using c# but tried to make it somewhat more simplified for everyone. If there is a better forum for this please let me know.

 var current_f = Initial();    // current accepted function f
 var current_Pl_f = InitialPl();  // current plausibility of accepted function f

 for (int i = 0; i < 10000; i++)
        {
            var candidate_f = Transpose(current_f); // create a candidate function

            var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

            if (candidate_Pl_f > current_Pl_f)            // candidate Pl has improved
            {
                current_f = candidate_f;            // accept the candidate
                current_Pl_f = candidate_Pl_f; 
            }
            else                                    // otherwise flip a coin
            {
                int flip = Flip(); 

                if (flip == 1)                      // heads
                {
                    current_f = candidate_f;        // accept it anyway
                    current_Pl_f = candidate_Pl_f; 
                }
                else if (flip == 0)                 // tails
                {
                    // what to do here ?
                }
            }
        }

My question is basically whether this looks like the optimal approach to implementing that algorithm. IT seems like I may be getting stuck in some local maxima / local minima despite implementing this method.

EDIT - Here is the essentially whats behind Transpose() method. I use a dictionary / hash table of type << char, char >> that the candidate function(s) use to look any given char -> char transformation. So the transpose method simply swaps two values in the dictionary that dictates the behavior of the function.

    private Dictionary<char, char> Transpose(Dictionary<char, char> map, params int[] indices)
    {
        foreach (var index in indices)
        {
            char target_val = map.ElementAt(index).Value;   // get the value at the index

            char target_key = map.ElementAt(index).Key;     // get the key at the index

            int _rand = _random.Next(map.Count);   // get a random key (char) to swap with

            char rand_key = map.ElementAt(_rand).Key;

            char source_val = map[rand_key]; // the value that currently is used by the source of the swap

            map[target_key] = source_val; // make the swap

            map[rand_key] = target_val;
        }

        return map; 
    }

Keep in mind that a candidate function that uses the underlying dictionary is basically just:

public char GetChar(char in, Dictionary<char, char> theMap)
{
     return theMap[char]; 
}

And this is the function that computes Pl(f):

    public decimal ComputePl(Func<char, char> candidate, string encrypted, decimal[][] _matrix)
    {
        decimal product = default(decimal);

        for (int i = 0; i < encrypted.Length; i++)
        {
            int j = i + 1;

            if (j >= encrypted.Length)
            {
                break;
            }

            char a = candidate(encrypted[i]);
            char b = candidate(encrypted[j]);

            int _a = GetIndex(_alphabet, a); // _alphabet is just a string/char[] of all avl chars 
            int _b = GetIndex(_alphabet, b);

            decimal _freq = _matrix[_a][_b]; 

            if (product == default(decimal))
            {
                product = _freq;
            }
            else
            {
                product = product * _freq;
            }
        }

        return product;
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

素衣风尘叹 2024-12-11 18:20:09

暂时,codereview.stackexchange.com 可能是一个“更好的论坛”。
尽管如此,我还是会快速浏览一下:

  • 乍一看,显示的代码片段是算法的正确实现。
  • 算法是否会“卡在”局部极小值是一个与算法有关的问题,而不是与实现有关的问题。 (参见下面的讨论)
  • 您对“最佳方法”的追求似乎是针对算法中的调整(与原始算法的偏差),而不是实现中的调整(使其更快和/或消除一些可能的缺陷)。有关算法的注意事项,请参阅下面的讨论;对于有关实施的讨论,请考虑以下内容:
    • 确保 Flip() 方法是公平的
    • 确保 ComputePl() 正确:由于值函数中的算术精度问题,经常会出现一些错误。
    • 使用 Transpose() 方法确保公平性(等概率)。
    • 性能改进可能来自 ComputePl() 方法(未显示)中的优化,而不是主循环中的优化。

关于算法本身及其对不同问题的适用性的讨论。
简而言之,该算法是一种引导随机搜索,其中使用两个随机设备对[巨大]解空间进行采样:Transpose() 方法(每次非常轻微地修改)当前候选函数和 Flip() 方法决定[局部]次优解决方案是否应该生存。搜索由目标函数 ComputePl() 引导,该函数本身基于某个参考语料库中的一阶转换矩阵。

在这种情况下,可以通过增加选择“次优”函数的概率来避免局部最小值“焦油坑”:而不是公平的 50-50 Flip(),可以尝试保留 66% 的“次优”解决方案的概率或甚至75%。这种方法通常会延长收敛到最佳解决方案所需的代数,但是,如上所述,可以避免陷入局部最小值。

确保算法适用性的另一种方法是确保更好地评估给定函数的合理性。该算法相对成功和通用性的可能解释是

  • 英语中一阶转换的分布不是很均匀(因此通过奖励匹配的函数和惩罚匹配的函数,可以很好地模拟给定函数的合理性)它的不匹配)。这种多维统计数据比给定语言/语料库中字符的“0 阶”分布更能适应偏离参考语料库的情况。
  • 一阶转换的分布虽然是特定于语言的,但在不同语言和/或行话词汇的上下文中通常是相似的[基于所述语言]。在速记行话的情况下,事情确实会崩溃,因为元音等字母通常会被省略。

因此,为了提高算法对给定问题的适用性,请确保所使用的分布矩阵与底层文本的语言和领域尽可能匹配。

Tentatively, codereview.stackexchange.com may be a "better forum for this".
Never the less, I'll take a quick stab at it:

  • At a glance, the snippet shown is a correct implementation of the algorithm.
  • Whether the algorithm will "get stuck" in local minima is a issue pertaining to the algorithm not to the implementation. (see discussion below)
  • Your quest for an "optimal approach" seems to be directed at tweaks in the algorithm (deviation from the original algorithm) rather than at tweaks in the implementation (to make it faster and/or to eradicate some possible flaws). For considerations regarding the algorithm, see discussion below; for discussion regarding the implementation consider the following:
    • ensure the Flip() method is fair
    • ensure the ComputePl() is correct: some errors often arise because of issues with the arithmetic precision in the value function.
    • ensure fairness (equiprobability) with the Transpose() method.
    • Performance improvements would likely come from optimizations in the ComputePl() method (not shown) rather than in the main loop.

Discussion regarding the algorithm per-se and its applicability to different problems.
In a nutshell the algorithm is a guided stochastic search, whereby the [huge] solution space is sampled with two random devices: the Transpose() method which modifies (very slightly each time) the current candidate function and the Flip() method which decides whether a [locally] suboptimal solution should survive. The search is guided by an objective function, ComputePl() which is itself based on a matrix of first order transition in some reference corpus.

In this context, local minima "tar pits" can be avoided by augmenting the probability of selecting "suboptimal" functions: rather than a fair 50-50 Flip(), maybe try with a probablity of retaining "suboptimal" solutions of 66% or even 75%. This approach typically extend the number of generations necessary to converge toward the optimal solution, but, as said may avoid getting stuck in local minima.

Another way of ensuring the applicability of the algorithm is to ensure a better evaluation of the plausibility of given functions. Likely explanations for the relative success and generality of the algorithm are

  • The distribution of first order transitions in the English language is not very uniform (and hence models rather well the plausibility of a given function, by rewarding it for its matches and by punishing it for its mismatches). This kind of multi-dimensional statistic, is more resilient to departure from the reference corpus than say the "order 0" distribution of characters in a a given language/corpus.
  • The distribution of first order transitions while being language-specific, are generally similar across different languages and/or in the context of jargon vocabularies [based on said languages]. Things do break down in the case of short-hand jargons whereby letters such as vowals are typically omitted.

Therefore to improve the applicability of the algorithm to a given problem, ensure that the distribution matrix used matches as closely as possible the language and domain of the underlying text.

可遇━不可求 2024-12-11 18:20:09

从文章中的描述来看,您的实现似乎是正确的(您标记为“在这里做什么”的部分确实应该是什么)。

如果您遇到局部最大值问题(本文声称抛硬币应该避免这种情况),请确保您的 Initial、Transpose、ComputePl 和 Flip 的实现正确。

您还可以尝试使硬币抛掷有偏差(增加 Flip() == 1 的概率将使这更接近随机游走并且不易被卡住)。

这是代码的稍微严格的版本:

var current_f = Initial();    // current accepted function f
var current_Pl_f = ComputePl(current_f);  // current plausibility of accepted function f

for (int i = 0; i < 10000; i++)
{
    var candidate_f = Transpose(current_f); // create a candidate function
    var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

    if (candidate_Pl_f > current_Pl_f  ||  Flip() == 1)
    {
        // either candidate Pl has improved,
        // or it hasn't and we flipped the coin in candidate's favor
        //  - accept the candidate
        current_f = candidate_f;            
        current_Pl_f = candidate_Pl_f; 
    }
}

From the description in the article, your implementation seems correct (the part you mark as "what to do here" should indeed be nothing).

If you are having problems with local maxima (something the article claims the coin toss should avoid), make sure your implemenations of Initial, Transpose, ComputePl and Flip are correct.

You can also try making the coin tossed biased (increasing the probability of Flip() == 1 will make this closer to a random walk and less susceptible to getting stuck).

Here is a slightly tighter version of your code:

var current_f = Initial();    // current accepted function f
var current_Pl_f = ComputePl(current_f);  // current plausibility of accepted function f

for (int i = 0; i < 10000; i++)
{
    var candidate_f = Transpose(current_f); // create a candidate function
    var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

    if (candidate_Pl_f > current_Pl_f  ||  Flip() == 1)
    {
        // either candidate Pl has improved,
        // or it hasn't and we flipped the coin in candidate's favor
        //  - accept the candidate
        current_f = candidate_f;            
        current_Pl_f = candidate_Pl_f; 
    }
}
挖个坑埋了你 2024-12-11 18:20:09

该算法似乎与http://en.wikipedia.org/wiki/Simulated_annealing有关。如果是这种情况,那么通过改变您接受当前解决方案的较差替代方案的概率可能会有助于这种行为,特别是如果您随着时间的推移降低这种概率。

或者,您可以从多个随机起点尝试简单的爬山 - 永远不要接受较差的替代方案,这意味着您将更频繁地陷入局部最大值,但从不同的起点重复运行算法。

当您对此进行测试时,您通常会知道测试问题的正确答案。最好将正确答案的合理性值与您的算法得出的合理性值进行比较,以防万一合理性公式存在弱点,在这种情况下,您的算法将得出错误的答案,这些答案看起来比正确答案更合理。正确的一个。

This algorithm seems to be related to http://en.wikipedia.org/wiki/Simulated_annealing. If that is the case, the behaviour might be helped by changing the probability with which you accept poorer alternatives to the current solution, especially if you reduce this probability over time.

Alternatively, you could try simple hill-climbing from multiple random starts - never accept poorer alternatives, which means that you will get stuck in local maxima more often, but repeatedly run the algorithm from different starts.

When you test this out, you usually know the right answer for your test problem. It is a good idea to compare the plausibility value at the right answer with those your algorithm is coming up with, just in case the weakness is in the plausibility formula, in which case your algorithm will come up with wrong answers which appear more plausible than the correct one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文