如何从文本中删除变音符号？

发布于 2024-08-12 03:10:44 字数 353 浏览 4 评论 0原文

我正在制作一个瑞典语网站，瑞典语字母是 å、ä 和 ö。

我需要使用户输入的字符串成为 PHP 的 url 安全。

基本上，需要将所有字符转换为下划线，除了这些：

 A-Z, a-z, 1-9

所有瑞典语都应该像这样转换：

'å' 到 'a' 和 'ä' 到 'a' 和 'ö' 到 'o' （只需删除上面的点）。

正如我所说，其余的应该变成下划线。

我不擅长正则表达式，所以我会感谢大家的帮助！

谢谢

注意：不是 URLENCODE...我需要将其存储在数据库中...等等，urlencode 对我不起作用。

原文

I am making a swedish website, and swedish letters are å, ä, and ö.

I need to make a string entered by a user to become url-safe with PHP.

Basically, need to convert all characters to underscore, all EXCEPT these:

 A-Z, a-z, 1-9

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

The rest should become underscores as I said.

Im not good at regular expressions so I would appreciate the help guys!

Thanks

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我的鱼塘能养鲲 2024-08-19 03:10:44

这应该很有用，可以处理几乎所有情况。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

This should be useful which handles almost all the cases.

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

回复收藏 0 原文

小猫一只 2024-08-19 03:10:44

使用 iconv 将字符串从给定编码转换为 ASCII，然后替换使用 preg_replace 的非字母数字字符：

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

结果：

raksmorgas_och_kottbullar

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

raksmorgas_och_kottbullar

回复收藏 0 原文

找回味觉 2024-08-19 03:10:44

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

回复收藏 0 原文

浴红衣 2024-08-19 03:10:44

所有瑞典语都应该像这样转换：
“å”到“a”，“ä”到“a”，“ö”到“o”（只需删除上面的点）。

使用 normalizer_normalize() 摆脱变音符号。

正如我所说，其余的应该变成下划线。

使用 preg_replace() 的模式为[\W] （iow：任何不匹配字母、数字或下划线的字符）用下划线替换它们。

最终结果应如下所示：

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

and all swedish should be converted like this:
'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

Use normalizer_normalize() to get rid of diacritical marks.

The rest should become underscores as I said.

Use preg_replace() with a pattern of [\W] (i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

Final result should look like:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

回复收藏 0 原文

怂人 2024-08-19 03:10:44

如果启用了 intl php 扩展，您可以像这样使用 Transliterator ：

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

删除其他特殊字符（不仅仅是像“æ”这样的变音符号）

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}

If intl php extension is enabled, you can use Transliterator like this :

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

To remove other special chars (not diacritics only like 'æ')

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}

回复收藏 0 原文

深巷少女 2024-08-19 03:10:44

如果您只是想让 URL 安全，那么您需要 urlencode。

返回一个字符串，其中所有
除 -_ 之外的非字母数字字符。
已替换为百分比 (%)
符号后跟两个十六进制数字和
空格编码为加号 (+)。它
的编码方式与
从 WWW 表单发布的数据是
编码，与中的方式相同
应用程序/x-www-form-urlencoded
媒体类型。这与 »
RFC 1738 编码（参见 rawurlencode()）
由于历史原因，空间
编码为加号 (+)。

如果你真的想删除所有非 AZ、az、1-9（顺便问一下，0 有什么问题吗？），那么你想要：

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

If you're just interested in making things URL safe, then you want urlencode.

Returns a string in which all
non-alphanumeric characters except -_.
have been replaced with a percent (%)
sign followed by two hex digits and
spaces encoded as plus (+) signs. It
is encoded the same way that the
posted data from a WWW form is
encoded, that is the same way as in
application/x-www-form-urlencoded
media type. This differs from the »
RFC 1738 encoding (see rawurlencode())
in that for historical reasons, spaces
are encoded as plus (+) signs.

If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

回复收藏 0 原文

层林尽染 2024-08-19 03:10:44

就像

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

假设您对数据和代码使用相同的编码一样简单。

as simple as

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

assuming you use the same encoding for your data and your code.

回复收藏 0 原文

小巷里的女流氓 2024-08-19 03:10:44

一种简单的解决方案是使用 str_replace 函数进行搜索和替换字母数组。

回复收藏 0 原文

爱，才寂寞 2024-08-19 03:10:44

您不需要花哨的正则表达式来过滤瑞典字符，只需使用 strtr 函数来“翻译”它们，例如：

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

回复收藏 0 原文

~没有更多了~

关于作者

絕版丫頭

暂无简介

0 文章

0 评论

21 人气

关注发私信

烙印

文章 0 评论 0

关注

singlesman

文章 0 评论 0

关注

给自己一个微笑

文章 0 评论 0

关注

独孤求败

文章 0 评论 0

关注

晨钟暮鼓

文章 0 评论 0

关注

我是自愿种绣球花的

文章 0 评论 0

友情链接

文江博客

如何从文本中删除变音符号？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如何从文本中删除变音符号？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。