从ASCII字符串中的2个字节字符中删除重音
我正在阅读服务器的数据,该数据被返回为JSON,并使用JSON :: PARSE解析。 JSON数据包含重音字符,例如É。这些在输出字符串中编码为2个字节字符,例如\ xc3 \ xa9。字符串的其余部分是标准的1个字节ASCII字符。
如何从这些角色中删除口音?我已经尝试了以下所有方法,但没有成功:
use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;
sub normalise_text {
my $text = shift;
my $decomposed = NFKD( $text );
$decomposed =~ s/\p{NonspacingMark}//g;
return $decomposed;
}
sub convert {
my $converter = Text::Iconv->new("utf16", "utf8");
return $converter->convert($_);
}
sub fromto {
return from_to($_, 'UTF-16LE', 'UTF-8');
}
这些库倾向于按字节基础转换每个字符,这是不好的。在短期内,我进行的转换如下:
sub mine {
my $text = $_;
$text =~ s/\xc3\xa9/e/g;
$text =~ s/\xc3\xa1/a/g;
return $text;
}
必须有更好的方法!有什么建议吗?
I am reading data from server which is returned as JSON and parsed using JSON::Parse. The JSON data includes accented characters such as é. These are encoded in the output strings as 2 byte characters such as \xc3\xa9. The rest of the string is standard 1 byte ASCII characters.
How can I remove the accents from these characters? I have tried all of the following methods without success:
use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;
sub normalise_text {
my $text = shift;
my $decomposed = NFKD( $text );
$decomposed =~ s/\p{NonspacingMark}//g;
return $decomposed;
}
sub convert {
my $converter = Text::Iconv->new("utf16", "utf8");
return $converter->convert($_);
}
sub fromto {
return from_to($_, 'UTF-16LE', 'UTF-8');
}
These libraries tend to convert each character on a byte by byte basis which is no good. For the short term, I am doing the conversion as follows:
sub mine {
my $text = $_;
$text =~ s/\xc3\xa9/e/g;
$text =~ s/\xc3\xa1/a/g;
return $text;
}
There must be a better way! Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论