从ASCII字符串中的2个字节字符中删除重音

发布于 2025-02-08 19:13:43 字数 792 浏览 1 评论 0原文

我正在阅读服务器的数据,该数据被返回为JSON,并使用JSON :: PARSE解析。 JSON数据包含重音字符,例如É。这些在输出字符串中编码为2个字节字符,例如\ xc3 \ xa9。字符串的其余部分是标准的1个字节ASCII字符。

如何从这些角色中删除口音?我已经尝试了以下所有方法,但没有成功:

use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;

sub normalise_text {
  my $text = shift;
  my $decomposed = NFKD( $text );
  $decomposed =~ s/\p{NonspacingMark}//g;
  return $decomposed;
}

sub convert {
  my $converter = Text::Iconv->new("utf16", "utf8");
  return $converter->convert($_);
}

sub fromto {
  return from_to($_, 'UTF-16LE', 'UTF-8');
}

这些库倾向于按字节基础转换每个字符,这是不好的。在短期内,我进行的转换如下:

sub mine {
  my $text = $_;
  $text =~ s/\xc3\xa9/e/g;
  $text =~ s/\xc3\xa1/a/g;
  return $text;
}

必须有更好的方法!有什么建议吗?

I am reading data from server which is returned as JSON and parsed using JSON::Parse. The JSON data includes accented characters such as é. These are encoded in the output strings as 2 byte characters such as \xc3\xa9. The rest of the string is standard 1 byte ASCII characters.

How can I remove the accents from these characters? I have tried all of the following methods without success:

use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;

sub normalise_text {
  my $text = shift;
  my $decomposed = NFKD( $text );
  $decomposed =~ s/\p{NonspacingMark}//g;
  return $decomposed;
}

sub convert {
  my $converter = Text::Iconv->new("utf16", "utf8");
  return $converter->convert($_);
}

sub fromto {
  return from_to($_, 'UTF-16LE', 'UTF-8');
}

These libraries tend to convert each character on a byte by byte basis which is no good. For the short term, I am doing the conversion as follows:

sub mine {
  my $text = $_;
  $text =~ s/\xc3\xa9/e/g;
  $text =~ s/\xc3\xa1/a/g;
  return $text;
}

There must be a better way! Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文