如何在JAVA中比较不同语言的String值？

发布于 2024-10-04 13:44:45 字数 1019 浏览 4 评论 0原文

在我的网络应用程序中，我使用两种不同的语言，即英语和阿拉伯语。

我的网络应用程序中有一个搜索框，如果我们按名称或部分名称搜索，它将通过比较“家乡”来从数据库中检索值用户

解释：

例如，如果用户属于家乡“加利福尼亚”，并且他搜索名称，例如“Victor” > 那么我的查询将首先看到具有相同家乡“加利福尼亚”的人，以及在“加利福尼亚”作为家乡的人员列表中” Victor” *姓名*将被搜索，并检索以“加利福尼亚”作为其家乡”和“victor”。

问题是，如果家乡“加利福尼亚”是用英语保存的，它会比较并检索这些值。但“加利福尼亚”将被保存为阿拉伯语的“كרففׁר”。在这种情况下，家乡比较失败并且无法检索值。

我希望我的查询应该找到两个相同的家乡并检索值。是否可以？

对于这个逻辑我应该想到什么替代来进行比较。我很困惑。请问有什么建议吗？

编辑： *我有一个想法，如果获得了家乡，那么是否可以使用谷歌翻译或音译器将家乡更改为另一种语言。如果是英语，则转换为阿拉伯语，或者如果是英语，则转换为阿拉伯语，并给出将两者结合起来的搜索结果。有什么建议吗？*

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

℡寂寞咖啡 2024-10-11 13:44:46

您可以在客户端使用一些本地化来显示值吗？或者为hometown创建一个包装类，它将重写equal(Object)，就像加利福尼亚州的实例为“加利福尼亚州”返回true一样和“?????”（抱歉，如果我在这里犯了错误，只是从上面复制粘贴）。

回复收藏 0 原文

瑶笙 2024-10-11 13:44:46

这听起来像是一个经典的编码问题。每当您传输非 ASCII 字符时，您都需要确保对其进行正确的编码。对于阿拉伯语和英语，我怀疑您可以使用 UTF-8 （但我不知道阿拉伯语，所以可能是错误的）。

在您的设置中，您可能会遇到以下几点：

Browser <-> Servlet container <-> Database
                   |
                System.out

在将字符（16 位）转换为字节（8 位）的任何系统接口中，您需要确保编码正确。

浏览器到 Servlet 容器

当您从网页执行 GET 或 POST 请求时，浏览器将查看 1) 来自服务器的 HTTP 标头，尤其是 Content-Type: text/html; charset=UTF-8，如果存在，将覆盖 HTML 元标头。

在 servlet 容器端，HttpServletRequest.getParameter() 将具有您最有可能需要在服务器设置中设置的编码。

示例 tomcat 的 server.xml

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
           maxThreads="2000"                
           connectionTimeout="20000" 
           redirectPort="8443" />

Servlet 容器到数据库

数据库需要有正确的编码，否则排序等将不正确。

MySQL 的 my.cnf 示例

[mysqld] 
 ....
init_connect=''SET collation_connection = utf8_general_ci'' 
init_connect='SET NAMES utf8' 
default-character-set=utf8 
character-set-server = utf8 
collation-server = utf8_general_ci 

[mysql] 
 ....
default-character-set=utf8

然后 JDBC 驱动程序需要设置为 UTF-8。

示例 JDBC 连接字符串

jdbc:mysql://localhost:3306/rimario?useUnicode=true&characterEncoding=utf-8

System.out

不能依赖 System.out.printnln() 来验证事物。首先，它取决于 java vm 默认编码，使用 System.property -Dfile.encoding=UTF-8 设置，其次，您执行 System.out 的终端需要设置为支持UTF-8。不要相信 System.out！

一旦VM中的String是正确的字符，它就不会受到编码的影响。在内存中，字符串中的每个字符都是16位的，这（几乎）涵盖了utf-8可以编码的所有字符。您可以将字符串写入文件并调查该文件，以真正了解虚拟机中的字符是否正确。

This sounds like a classic encoding problem. Whenever you transfer non-ascii character you need to make sure you're encoding it right. For Arabic and English I suspect you can use UTF-8 (but I don't know arabic, so it may be wrong).

In your setup you will probably have the following points:

Browser <-> Servlet container <-> Database
                   |
                System.out

In any of the system interfaces where chars (16-bit) are converted to byte (8-bit) you will need to make sure the encoding is correct.

Browser to Servlet container

When you do GET or POST requests from a web-page, the browser will look at 1) The HTTP headers from the server, especially the Content-Type: text/html; charset=UTF-8, which if present, will override the HTML meta header <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">.

On the servlet container side, the HttpServletRequest.getParameter(), will have an encoding that you most likely need to set in the server settings.

Example tomcat's server.xml

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
           maxThreads="2000"                
           connectionTimeout="20000" 
           redirectPort="8443" />

Servlet container to Database

The database needs to have the correct encodings, or sorting etc will not be right.

Example my.cnf for MySQL

[mysqld] 
 ....
init_connect=''SET collation_connection = utf8_general_ci'' 
init_connect='SET NAMES utf8' 
default-character-set=utf8 
character-set-server = utf8 
collation-server = utf8_general_ci 

[mysql] 
 ....
default-character-set=utf8

Then the JDBC-driver needs to be set for UTF-8.

Example JDBC connect string

jdbc:mysql://localhost:3306/rimario?useUnicode=true&characterEncoding=utf-8

System.out

System.out.printnln() can not be relied upon to verify things. First it depends on the java vm default encoding, set using System.property -Dfile.encoding=UTF-8, secondly the terminal in which you do the System.out, will need to be set to and support UTF-8. Don't trust System.out!

Once a String in the VM is a proper character, it will not be affected by encoding. In memory every char in a string is 16-bit, which (almost) covers all the chars that utf-8 can encode. You can write the string to a file and investigate the file to really know if you got correct chars in your VM.

回复收藏 0 原文

千纸鹤 2024-10-11 13:44:45

您遇到的问题是您想要/需要两种或多种语言的信息，并且您希望应用程序的用户能够使用两种语言。一种可能的方法是为每个项目保留多个记录，并包含语言代码作为主键的一部分，例如，如果您的记录是

id   hometown   name
001  California Victor

您可以引入语言代码并存储

id   lang hometown   name
001  en   California Victor
001  ar   كاليفورنيا Victor

，那么您的搜索将匹配“加利福尼亚”或“美国”您的 id 001，然后您可以使用它来加载数据的所有翻译（或仅加载当前输出语言的数据）。该方案可以与任意数量的语言一起使用，并且具有您不需要的额外优势预填充表格。当新的翻译已知时，您可以添加新的记录翻译。

（警告：我只是重复了你的阿拉伯语字符串，我看不懂它，而且“ar”很可能不是阿拉伯语的正确语言代码，但你明白了。）

The problem you encounter is that you want / need information in 2 or more languages and you want the user of your application to be able to use both languages. One possible approach is to keep multiple records per item and including a language code as part of the primary key, for instance if your record is

id   hometown   name
001  California Victor

you could introduce a language code and store

id   lang hometown   name
001  en   California Victor
001  ar   كاليفورنيا Victor

then your search would match either "California" or "كاليفورنيا" giving you the id 001, which you can then use to load all translations of your data (or just the data in the current output language.) This sceme can be used with any number of languages and has the added advantage that you don't need to prefill the table. You can add new translations for records when they become known.

(Caveat: I just repeated your arabic string, I can't read it, also 'ar' most likely isn't the correct language code for aribic but you get the idea.)

回复收藏 0 原文