如何检测用户输入文本的语言?

发布于 2024-09-08 22:15:55 字数 1539 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

挽心 2024-09-15 22:15:55

Java 语言检测库对 53 种语言的准确率应达到 99% 以上。

另外,还有 Apache Tika,这是一个用于内容分析的库,它提供的不仅仅是语言检测。

This Language Detection Library for Java should give more than 99% accuracy for 53 languages.

Alternatively, there is Apache Tika, a library for content analysis that offers much more than just language detection.

骷髅 2024-09-15 22:15:55

Google 提供了一个 API 可以为您执行此操作。我昨天偶然发现了这个,没有保留链接,但如果你,嗯,谷歌搜索它,你应该设法找到它。

这与他们的翻译 API 的描述很接近,它将为您将文本翻译成您喜欢的任何语言。还有另一个调用只是为了猜测输入语言。

谷歌在机械翻译领域处于世界领先地位;他们的东西基于非常大的文本语料库(大多数互联网,有点)和统计方法,通常仅仅凭借拥有巨大的样本空间就可以“得到”正确的结果。

编辑:这是链接:http://code.google.com/apis/ajaxlanguage/

编辑2:如果您坚持“离线”:一个得到好评的答案是 Guess-Language 的建议。它是一个 C++ 库,可处理大约 60 种语言。

Google offers an API that can do this for you. I just stumbled across this yesterday and didn't keep a link, but if you, umm, Google for it you should manage to find it.

This was somewhere near the description of their translation API, which will translate text for you into any language you like. There's another call just for guessing the input language.

Google is among the world's leaders in mechanical translation; they base their stuff on extremely large corpuses of text (most of the Internet, kinda) and a statistical approach that usually "gets" it right simply by virtue of having a huge sample space.

EDIT: Here's the link: http://code.google.com/apis/ajaxlanguage/

EDIT 2: If you insist on "offline": A well upvoted answer was the suggestion of Guess-Language. It's a C++ library and handles about 60 languages.

天赋异禀 2024-09-15 22:15:55

另一种选择是 JLangDetect,但它不是很强大并且语言基础有限。好处是它是 Apache 许可证,如果它满足您的要求,您可以使用它。我在这里猜测,但是你在单跳和双跳事件之间释放空格键吗?

在 0.4 版本中它非常强大。我已经在自己的许多项目中使用了它,并且从未遇到过任何重大问题。此外,就速度而言,它可以与非常专业的语言检测器(例如,仅少数语言)相媲美。

An alternative is the JLangDetect but it's not very robust and has a limited language base. Good thing is it's an Apache license, if it satisfies your requirements, you can use it. I'm guessing here, but do you release the space key between the single and double jump event?

In version 0.4 it is very robust. I have been using this in many projects of my own and never had any major problems. Also, when it comes to speed it is comparable to very specialized language detectors (e.g., few languages only).

情独悲 2024-09-15 22:15:55

检测语言 API 还提供 Java 客户端

例子:

List<Result> results = DetectLanguage.detect("Hello world");

Result result = results.get(0);

System.out.println("Language: " + result.language);
System.out.println("Is reliable: " + result.reliable);
System.out.println("Confidence: " + result.confidence);

Detect Language API also provides Java client.

Example:

List<Result> results = DetectLanguage.detect("Hello world");

Result result = results.get(0);

System.out.println("Language: " + result.language);
System.out.println("Is reliable: " + result.reliable);
System.out.println("Confidence: " + result.confidence);
北音执念 2024-09-15 22:15:55

这是另一个选项:Java 语言检测库

这是 Java 中的一个库。

here is another option : Language Detection Library for Java

this is a library in Java.

九八野马 2024-09-15 22:15:55
Just a working code from already available solution from cybozu labs:

package com.et.generate;

import java.util.ArrayList;
import com.cybozu.labs.langdetect.Detector;
import com.cybozu.labs.langdetect.DetectorFactory;
import com.cybozu.labs.langdetect.LangDetectException;
import com.cybozu.labs.langdetect.Language;

public class LanguageCodeDetection {

    public void init(String profileDirectory) throws LangDetectException {
        DetectorFactory.loadProfile(profileDirectory);
    }
    public String detect(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.detect();
    }
    public ArrayList<Language> detectLangs(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.getProbabilities();
    }
    public static void main(String args[]) {
        try {
            LanguageCodeDetection ld = new  LanguageCodeDetection();

            String profileDirectory = "C:/profiles/";
            ld.init(profileDirectory);
            String text = "Кремль россий";
            System.out.println(ld.detectLangs(text));
            System.out.println(ld.detect(text));
        } catch (LangDetectException e) {
            e.printStackTrace();
        }
    }

}

Output:
[ru:0.9999983255911719]
ru

可以从以下位置下载配置文件:
https://language-detection.googlecode.com/files/langdetect -09-13-2011.zip

Just a working code from already available solution from cybozu labs:

package com.et.generate;

import java.util.ArrayList;
import com.cybozu.labs.langdetect.Detector;
import com.cybozu.labs.langdetect.DetectorFactory;
import com.cybozu.labs.langdetect.LangDetectException;
import com.cybozu.labs.langdetect.Language;

public class LanguageCodeDetection {

    public void init(String profileDirectory) throws LangDetectException {
        DetectorFactory.loadProfile(profileDirectory);
    }
    public String detect(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.detect();
    }
    public ArrayList<Language> detectLangs(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.getProbabilities();
    }
    public static void main(String args[]) {
        try {
            LanguageCodeDetection ld = new  LanguageCodeDetection();

            String profileDirectory = "C:/profiles/";
            ld.init(profileDirectory);
            String text = "Кремль россий";
            System.out.println(ld.detectLangs(text));
            System.out.println(ld.detect(text));
        } catch (LangDetectException e) {
            e.printStackTrace();
        }
    }

}

Output:
[ru:0.9999983255911719]
ru

Profiles can be downloaded from:
https://language-detection.googlecode.com/files/langdetect-09-13-2011.zip

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文