将外来字符更改为对应的罗马字符
我正在使用 php,我想知道是否有一种预定义的方法可以将外来字符转换为非外来字符。
ê、ë、é 等字符均生成 'e'。
我正在寻找一个函数,它将接受一个字符串并返回它,不带特殊字符。
任何想法将不胜感激!
I am using php and I was wondering if there was a predefined way to convert foreign characters to their non-foreign alternatives.
Characters such as ê, ë, é all resulting to 'e'.
I'm looking for a function that would take a string and return it without the special characters.
Any ideas would be greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
在找不到合适的转换器之后,我创建了自己的集合来满足我的需求,包括我最喜欢的西里尔转换,默认情况下有许多变体。
After failing to find suitable convertors I created my own collection that suits my needs including my favorite Cyrillic conversion that by default has numerous variations.
我的第一个推荐是 iconv 函数。即因为它内置于 PHP 中,所以不需要任何外部或第三方库。此外,它的功能旨在精确执行您想要完成的任务(接受字符集作为输入,并输出备用字符集,特别是从 UTF-8 到 ASCII)。以下是如何调用此函数的示例:
有关此 PHP 函数的详细信息,请参见此处: http://php.net/manual/en/function.iconv.php
注意: iconv 函数接受字符串输入,因此您需要迭代数据并解析它,以便您可以正在传递一个字符串输入。
My first recommendation is the iconv function. Namely because it's built into PHP, so doesn't require any external or 3rd party libraries. In addition, it's a function that's designed to do precisely what you are trying to accomplish (accept on character set as input, and output an alternate character set, specifically going from UTF-8 to ASCII). Below is an example of how to call this function:
More information about the specifics of this PHP function can be found here: http://php.net/manual/en/function.iconv.php
Note: The iconv function accepts string inputs, so you'll want to iterate over data, and parse it such that you are passing in a string input.
我编写了这个函数,它使用 PHP 中内置的 HTML 实体转换表来罗马化字符:
它的工作原理是应用
htmlentities()
,然后删除常见的实体后缀,一个简单的方法例如:请注意,要使其正常工作,您的文件需要使用 UTF-8 进行编码(显然没有 BOM)。
另请参阅我的其他答案了解另一个示例。
I coded this function which uses the HTML entities translation table built-in into PHP to romanize chars:
It works by applying
htmlentities()
and then removing common entities suffixes, a simple example:Beware that for this to work properly your files need to be encoded in UTF-8 (no BOM obviously).
See also my other answer for another example.
尝试
iconv()
http://www.php。 net/manual/en/function.iconv.php 与//TRANSLIT
选项,或recode_string()
http://www.php.net/manual/en/function.recode-string .php 或mb_convert_encoding()
http://www.php.net/manual/en/function.mb-convert-encoding.phpTry
iconv()
http://www.php.net/manual/en/function.iconv.php with the//TRANSLIT
option, orrecode_string()
http://www.php.net/manual/en/function.recode-string.php, ormb_convert_encoding()
http://www.php.net/manual/en/function.mb-convert-encoding.php看到这个老问题,仍然不知道最好的答案是什么。
如果它可以帮助其他人,这是我从
自动组成的数组http://www.fileformat.info/info/charset/UTF-8/list.htm
Saw this old question and still don't know what the best answer is.
In case it can help others, here is a array I made up automatically from
http://www.fileformat.info/info/charset/UTF-8/list.htm
我希望这对任何人都有用:
https://github.com/infralabs/DiacriticsRemovePHP
该类从包含 Latin-1 Suplement 的字符串中删除变音符号、拉丁文扩展 A 和拉丁文扩展 B 特殊字符。
用法:
来源:
结果:
I hope this will be useful for anybody:
https://github.com/infralabs/DiacriticsRemovePHP
This class removes diacritics from strings containing Latin-1 Supplement, Latin Extended-A and Latin Extended-B special characters.
usage:
source:
result:
这对我有用。您可能需要编辑 php.ini 的第 934 行,其中显示
删除分号。
This worked for me. You might have to edit line 934 of your php.ini where it says
Remove the semicolon.
解决此问题的最通用方法是使用 Unicode 规范化,因为它会自动处理所有重音 - 您不必预先准备清单。我不知道它在 PHP 中是否容易使用,我在 C 和 Java 中使用过它。本质上,您首先转换字符串,以便所有重音字符都由常规字符加上所谓的组合变音符号表示(内置或外部库应该提供此功能),然后删除组合变音符号(使用专门的库,使用语言提供的字符属性或使用一些正则表达式扩展)。
The most generic way to solve this is to use Unicode Normalization as it works automatically on all accents - you don't have to prepare the list up front. I don't know if it's easily available in PHP, I have used it in C and Java. Essentially, you first transform the string so that all accented characters are represented by regular character plus so-called composing diacritical mark (a built-in or external library should provide this function), and then remove the composing diacritics (using a specialized library, using character properties the language provides or using some regular expression extensions).