如何在JAVA中比较不同语言的String值?

发布于 2024-10-04 13:44:45 字数 1019 浏览 4 评论 0原文

在我的网络应用程序中,我使用两种不同的语言,即英语阿拉伯语

我的网络应用程序中有一个搜索框,如果我们按名称或部分名称搜索,它将通过比较“家乡”来从数据库中检索值用户

解释:

例如,如果用户属于家乡“加利福尼亚”,并且他搜索名称,例如“Victor” > 那么我的查询将首先看到具有相同家乡“加利福尼亚”的人,以及在“加利福尼亚”作为家乡的人员列表中” Victor” *姓名*将被搜索,并检索“加利福尼亚”作为其家乡”和“victor”。

问题是,如果家乡“加利福尼亚”是用英语保存的,它会比较并检索这些值。但“加利福尼亚”将被保存为阿拉伯语的“كרففׁר”。在这种情况下,家乡比较失败并且无法检索值。

我希望我的查询应该找到两个相同的家乡并检索值。是否可以?

对于这个逻辑我应该想到什么替代来进行比较。我很困惑。请问有什么建议吗?

编辑: *我有一个想法,如果获得了家乡,那么是否可以使用谷歌翻译或音译器将家乡更改为另一种语言。如果是英语,则转换为阿拉伯语,或者如果是英语,则转换为阿拉伯语,并给出将两者结合起来的搜索结果。有什么建议吗?*

In my web application I am using two different Languages namely English and Arabic.

I have a search box in my web application in which if we search by name or part of the name then it will retrieve the values from DB by comparing the "Hometown" of the user

Explanation:

Like if a user belongs to hometown "California" and he searches a name say "Victor" then my query will first see the people who are having the same hometown "California" and in the list of people who have "California" as hometown the "Victor" *name* will be searched and it retrieve the users having "California" as their hometown and "victor" in their name or part of the name.

The problem is if the hometown "California" is saved in English it will compare and retrieve the values. But "California" will be saved as "كاليفورنيا" in Arabic. In this case the hometown comparison fails and it cant retrieve the values.

I wish that my query should find both are same hometown and retrieve the values. Is it possible?

What alternate I should think of for this logic for comparison. I am confused. Any suggestion please?

EDIT:
*I have an Idea such that if the hometown is got then is it possible to use Google translator or transliterator and change the hometown to another language. if it is in english then to arabic or if it is in english then to arabic and give the search results joining both. Any suggestion?*

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

℡寂寞咖啡 2024-10-11 13:44:46

您可以在客户端使用一些本地化来显示值吗?或者为hometown创建一个包装类,它将重写equal(Object),就像加利福尼亚州的实例为“加利福尼亚州”返回true一样和“?????”(抱歉,如果我在这里犯了错误,只是从上面复制粘贴)。

How about you use some localization on client side to display values. Or create a wrapper class for hometown that will override equal(Object) in the manner the instance for California will return true for both "California" and "كاليفورنيا" (sorry if I made mistake here, just copy-pasted from above).

瑶笙 2024-10-11 13:44:46

这听起来像是一个经典的编码问题。每当您传输非 ASCII 字符时,您都需要确保对其进行正确的编码。对于阿拉伯语和英语,我怀疑您可以使用 UTF-8 (但我不知道阿拉伯语,所以可能是错误的)。

在您的设置中,您可能会遇到以下几点:

Browser <-> Servlet container <-> Database
                   |
                System.out

在将字符(16 位)转换为字节(8 位)的任何系统接口中,您需要确保编码正确。

浏览器到 Servlet 容器

当您从网页执行 GET 或 POST 请求时,浏览器将查看 1) 来自服务器的 HTTP 标头,尤其是 Content-Type: text/html; charset=UTF-8,如果存在,将覆盖 HTML 元标头

在 servlet 容器端,HttpServletRequest.getParameter() 将具有您最有可能需要在服务器设置中设置的编码。

示例 tomcat 的 server.xml

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
           maxThreads="2000"                
           connectionTimeout="20000" 
           redirectPort="8443" />

Servlet 容器到数据库

数据库需要有正确的编码,否则排序等将不正确。

MySQL 的 my.cnf 示例

[mysqld] 
 ....
init_connect=''SET collation_connection = utf8_general_ci'' 
init_connect='SET NAMES utf8' 
default-character-set=utf8 
character-set-server = utf8 
collation-server = utf8_general_ci 

[mysql] 
 ....
default-character-set=utf8 

然后 JDBC 驱动程序需要设置为 UTF-8。

示例 JDBC 连接字符串

jdbc:mysql://localhost:3306/rimario?useUnicode=true&characterEncoding=utf-8

System.out

不能依赖 System.out.printnln() 来验证事物。首先,它取决于 java vm 默认编码,使用 System.property -Dfile.encoding=UTF-8 设置,其次,您执行 System.out 的终端需要设置为支持UTF-8。不要相信 System.out!

一旦VM中的String是正确的字符,它就不会受到编码的影响。在内存中,字符串中的每个字符都是16位的,这(几乎)涵盖了utf-8可以编码的所有字符。您可以将字符串写入文件并调查该文件,以真正了解虚拟机中的字符是否正确。

This sounds like a classic encoding problem. Whenever you transfer non-ascii character you need to make sure you're encoding it right. For Arabic and English I suspect you can use UTF-8 (but I don't know arabic, so it may be wrong).

In your setup you will probably have the following points:

Browser <-> Servlet container <-> Database
                   |
                System.out

In any of the system interfaces where chars (16-bit) are converted to byte (8-bit) you will need to make sure the encoding is correct.

Browser to Servlet container

When you do GET or POST requests from a web-page, the browser will look at 1) The HTTP headers from the server, especially the Content-Type: text/html; charset=UTF-8, which if present, will override the HTML meta header <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">.

On the servlet container side, the HttpServletRequest.getParameter(), will have an encoding that you most likely need to set in the server settings.

Example tomcat's server.xml

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
           maxThreads="2000"                
           connectionTimeout="20000" 
           redirectPort="8443" />

Servlet container to Database

The database needs to have the correct encodings, or sorting etc will not be right.

Example my.cnf for MySQL

[mysqld] 
 ....
init_connect=''SET collation_connection = utf8_general_ci'' 
init_connect='SET NAMES utf8' 
default-character-set=utf8 
character-set-server = utf8 
collation-server = utf8_general_ci 

[mysql] 
 ....
default-character-set=utf8 

Then the JDBC-driver needs to be set for UTF-8.

Example JDBC connect string

jdbc:mysql://localhost:3306/rimario?useUnicode=true&characterEncoding=utf-8

System.out

System.out.printnln() can not be relied upon to verify things. First it depends on the java vm default encoding, set using System.property -Dfile.encoding=UTF-8, secondly the terminal in which you do the System.out, will need to be set to and support UTF-8. Don't trust System.out!

Once a String in the VM is a proper character, it will not be affected by encoding. In memory every char in a string is 16-bit, which (almost) covers all the chars that utf-8 can encode. You can write the string to a file and investigate the file to really know if you got correct chars in your VM.

千纸鹤 2024-10-11 13:44:45

您遇到的问题是您想要/需要两种或多种语言的信息,并且您希望应用程序的用户能够使用两种语言。一种可能的方法是为每个项目保留多个记录,并包含语言代码作为主键的一部分,例如,如果您的记录是

id   hometown   name
001  California Victor

您可以引入语言代码并存储

id   lang hometown   name
001  en   California Victor
001  ar   كاليفورنيا Victor

,那么您的搜索将匹配“加利福尼亚”或“美国”您的 id 001,然后您可以使用它来加载数据的所有翻译(或仅加载当前输出语言的数据)。该方案可以与任意数量的语言一起使用,并且具有您不需要的额外优势预填充表格。当新的翻译已知时,您可以添加新的记录翻译。

(警告:我只是重复了你的阿拉伯语字符串,我看不懂它,而且“ar”很可能不是阿拉伯语的正确语言代码,但你明白了。)

The problem you encounter is that you want / need information in 2 or more languages and you want the user of your application to be able to use both languages. One possible approach is to keep multiple records per item and including a language code as part of the primary key, for instance if your record is

id   hometown   name
001  California Victor

you could introduce a language code and store

id   lang hometown   name
001  en   California Victor
001  ar   كاليفورنيا Victor

then your search would match either "California" or "كاليفورنيا" giving you the id 001, which you can then use to load all translations of your data (or just the data in the current output language.) This sceme can be used with any number of languages and has the added advantage that you don't need to prefill the table. You can add new translations for records when they become known.

(Caveat: I just repeated your arabic string, I can't read it, also 'ar' most likely isn't the correct language code for aribic but you get the idea.)

芸娘子的小脾气 2024-10-11 13:44:45

阿拉伯语听起来像“加利福尼亚”吗?如果是这样,您将需要在“听起来相似”的基础上进行比较,这很可能会导致音素转换。

Does the Arabic sound like "California"? If so you will need to compare on a "sounds-like"-basis which will most likely result in a phoneme conversion.

青春有你 2024-10-11 13:44:45

将所有姓名音译为同一种语言(例如英语)进行搜索,并使用 Levenstein 编辑距离来计算姓名语音表示之间的相似度。如果您只是将查询与每个名称进行比较,那么速度会很慢,但是如果您将数据库中的所有地名预先索引到 Burkhard-Keller树,那么就可以通过与查询词的编辑距离来高效地搜索它们。

此技术允许您根据名称的实际匹配程度对名称进行排序。与使用变音位或双变音位相比,您可能更有可能通过这种方式找到匹配项,尽管这更难实现。

Transliterate all names into the same language (e.g. English) for searching, and use Levenstein edit distance to compute the similarity between the phonetic representations of the names. This will be slow if you simply compare your query with every name, but if you pre-index all of the place names in your database into a Burkhard-Keller tree, then they can be efficiently searched by edit distance from the query term.

This technique allows you to sort names by how close they actually match. You're probably more likely to find a match this way than using metaphone or double-metaphone, though this is more difficult to implement.

我不在是我 2024-10-11 13:44:45

你的谷歌建议听起来可能也是一个不错的建议,但你应该尝试一下,并确保你对其准确性感到满意。在测试希伯来语和英语之间的工作方式时,我注意到有时谷歌在翻译成希伯来语时只是将英文地名保留在英文字母中。

Your Google suggestion sounds like it might also be a good one, but you should play around with it, and be sure that you're happy with its accuracy. In testing how it worked going between Hebrew and English, I noticed that sometimes Google just leaves English place names in English letters when translating to Hebrew.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文