如何从文本中删除变音符号?

发布于 2024-08-12 03:10:44 字数 353 浏览 4 评论 0原文

我正在制作一个瑞典语网站,瑞典语字母是 å、ä 和 ö。

我需要使用户输入的字符串成为 PHP 的 url 安全。

基本上,需要将所有字符转换为下划线,除了这些:

 A-Z, a-z, 1-9

所有瑞典语都应该像这样转换:

'å' 到 'a' 和 'ä' 到 'a' 和 'ö' 到 'o' (只需删除上面的点)。

正如我所说,其余的应该变成下划线。

我不擅长正则表达式,所以我会感谢大家的帮助!

谢谢

注意:不是 URLENCODE...我需要将其存储在数据库中...等等,urlencode 对我不起作用。

I am making a swedish website, and swedish letters are å, ä, and ö.

I need to make a string entered by a user to become url-safe with PHP.

Basically, need to convert all characters to underscore, all EXCEPT these:

 A-Z, a-z, 1-9

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

The rest should become underscores as I said.

Im not good at regular expressions so I would appreciate the help guys!

Thanks

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

我的鱼塘能养鲲 2024-08-19 03:10:44

这应该很有用,可以处理几乎所有情况。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

This should be useful which handles almost all the cases.

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}
小猫一只 2024-08-19 03:10:44

使用 iconv 将字符串从给定编码转换为 ASCII,然后替换使用 preg_replace 的非字母数字字符:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

结果:

raksmorgas_och_kottbullar

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

raksmorgas_och_kottbullar
找回味觉 2024-08-19 03:10:44
// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);
// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);
浴红衣 2024-08-19 03:10:44

所有瑞典语都应该像这样转换:

“å”到“a”,“ä”到“a”,“ö”到“o”(只需删除上面的点)。

使用 normalizer_normalize() 摆脱 变音符号

正如我所说,其余的应该变成下划线。

使用 preg_replace() 的模式为[\W] (iow:任何不匹配字母、数字或下划线的字符)用下划线替换它们。

最终结果应如下所示:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

Use normalizer_normalize() to get rid of diacritical marks.

The rest should become underscores as I said.

Use preg_replace() with a pattern of [\W] (i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

Final result should look like:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));
怂人 2024-08-19 03:10:44

如果启用了 intl php 扩展,您可以像这样使用 Transliterator :

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

删除其他特殊字符(不仅仅是像“æ”这样的变音符号)

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}

If intl php extension is enabled, you can use Transliterator like this :

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

To remove other special chars (not diacritics only like 'æ')

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}
深巷少女 2024-08-19 03:10:44

如果您只是想让 URL 安全,那么您需要 urlencode

返回一个字符串,其中所有
除 -_ 之外的非字母数字字符。
已替换为百分比 (%)
符号后跟两个十六进制数字和
空格编码为加号 (+)。它
的编码方式与
从 WWW 表单发布的数据是
编码,与中的方式相同
应用程序/x-www-form-urlencoded
媒体类型。这与 »
RFC 1738 编码(参见 rawurlencode())
由于历史原因,空间
编码为加号 (+)。

如果你真的想删除所有非 AZ、az、1-9(顺便问一下,0 有什么问题吗?),那么你想要:

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

If you're just interested in making things URL safe, then you want urlencode.

Returns a string in which all
non-alphanumeric characters except -_.
have been replaced with a percent (%)
sign followed by two hex digits and
spaces encoded as plus (+) signs. It
is encoded the same way that the
posted data from a WWW form is
encoded, that is the same way as in
application/x-www-form-urlencoded
media type. This differs from the »
RFC 1738 encoding (see rawurlencode())
in that for historical reasons, spaces
are encoded as plus (+) signs.

If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);
层林尽染 2024-08-19 03:10:44

就像

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

假设您对数据和代码使用相同的编码一样简单。

as simple as

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

assuming you use the same encoding for your data and your code.

小巷里的女流氓 2024-08-19 03:10:44

一种简单的解决方案是使用 str_replace 函数进行搜索和替换字母数组。

One simple solution is to use str_replace function with search and replace letter arrays.

爱,才寂寞 2024-08-19 03:10:44

您不需要花哨的正则表达式来过滤瑞典字符,只需使用 strtr 函数 来“翻译”它们,例如:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文