用于句子相似度检测的 BLEU 评分实现
我需要计算 BLEU 分数来识别两个句子是否相似。我读过一些文章,其中大部分是关于用于测量机器翻译准确性的 BLEU 分数。但是我需要 BLEU 分数来找出句子之间的相似性相同的语言[英语]。(即)(两个句子都是英语)。谢谢期待。
I need to calculate BLEU score for identifying whether two sentences are similar or not.I have read some articles which are mostly about BLEU score for Measuring Machine translation accuracy.But i'm in need of a BLEU score to find out similarity between sentences in a same language[English].(i.e)(Both the sentences are in English).Thanks in anticipation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对于句子级别的比较,请使用平滑的 BLEU
用于机器翻译评估的标准 BLEU 分数 (BLEU:4) 仅在语料库级别才真正有意义,因为任何不具有至少一场 4-gram 匹配的得分为 0。
发生这种情况是因为,从本质上讲,BLEU 实际上只是几何平均值 n-gram 精度,通过简洁性惩罚进行缩放,以防止具有某些匹配材料的非常短的句子被给予不适当的高分。由于几何平均值是通过将平均值中包含的所有项相乘来计算的,因此任何 n 元语法计数为零都会导致整个分数为零。
如果您想将 BLEU 应用于单个句子,最好使用平滑的 BLEU (Lin 和 Och 2004 - 请参阅第 4 节),在计算 n 元语法精度之前,将每个 n 元语法计数加 1。这将防止任何 n 元语法精度为零,因此即使没有任何 4 元语法匹配,也会产生非零值。
Java 实现
您将在斯坦福机器翻译包中找到 BLEU 和 smooth BLEU 的 Java 实现短语。
替代方案
正如 Andreas 已经提到的,您可能需要使用替代评分指标,例如 < strong>Levenstein 的字符串编辑距离。然而,使用传统的 Levenstein 字符串编辑距离来比较句子的一个问题是它没有明确地意识到单词边界。
其他替代方案包括:
For sentence level comparisons, use smoothed BLEU
The standard BLEU score used for machine translation evaluation (BLEU:4) is only really meaningful at the corpus level, since any sentence that does not have at least one 4-gram match will be given a score of 0.
This happens because, at its core, BLEU is really just the geometric mean of n-gram precisions that is scaled by a brevity penalty to prevent very short sentences with some matching material from being given inappropriately high scores. Since the geometric mean is calculated by multiplying together all the terms to be included in the mean, having a zero for any of the n-gram counts results in the entire score being zero.
If you want to apply BLEU to individual sentences, you're better off using smoothed BLEU (Lin and Och 2004 - see sec. 4), whereby you add 1 to each of the n-gram counts before you calculate the n-gram precisions. This will prevent any of the n-gram precisions from being zero, and thus will result in non-zero values even when there are not any 4-gram matches.
Java Implementation
You'll find a Java implementation of both BLEU and smooth BLEU in the Stanford machine translation package Phrasal.
Alternatives
As Andreas already mentioned, you might want to use an alternative scoring metric such as Levenstein's string edit distance. However, one problem with using the traditional Levenstein string edit distance to compare sentences is that it isn't explicitly aware of word boundaries.
Other alternatives include:
在这里:http://code.google.com/p/lingutil/
Here you go: http://code.google.com/p/lingutil/
好吧,如果您只想计算 BLEU 分数,那很简单。将一个句子视为参考翻译,将另一个句子视为候选翻译。
Well, if you just want to calculate the BLEU score, it's straightforward. Treat one sentence as the reference translation and the other as the candidate translation.
也许(Levenstein)编辑距离或者汉明距离也是一个选项。不管怎样,BLEU 分数也适合该工作;它衡量一个句子与参考文献的相似度,因此只有当它们使用与您的问题相同的语言时才有意义。
Maybe the (Levenstein) edit distance is also an option, or the Hamming distance. Either way, the BLEU score is also appropriate for the job; it measures the similarity of one sentence against a reference, so that only makes sense when they're in the same language like with your problem.
您可以使用 Moses multi-bleu 脚本,其中还可以使用多个引用: https://github.com/moses-smt/mosesdecoder/blob/RELEASE-2.1.1/scripts/generic/multi-bleu.perl
You can use Moses multi-bleu script, where you can also use multiple references: https://github.com/moses-smt/mosesdecoder/blob/RELEASE-2.1.1/scripts/generic/multi-bleu.perl
不鼓励您自己实现 BLEU,SACREBLEU 是一个标准实现。
You are not encouraged to implement the BLEU yourself, and the SACREBLEU is a standard implementation.