从ASCII字符串中的2个字节字符中删除重音

发布于 2025-02-08 19:13:43 字数 792 浏览 1 评论 0原文

我正在阅读服务器的数据，该数据被返回为JSON，并使用JSON :: PARSE解析。 JSON数据包含重音字符，例如É。这些在输出字符串中编码为2个字节字符，例如\ xc3 \ xa9。字符串的其余部分是标准的1个字节ASCII字符。

如何从这些角色中删除口音？我已经尝试了以下所有方法，但没有成功：

use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;

sub normalise_text {
  my $text = shift;
  my $decomposed = NFKD( $text );
  $decomposed =~ s/\p{NonspacingMark}//g;
  return $decomposed;
}

sub convert {
  my $converter = Text::Iconv->new("utf16", "utf8");
  return $converter->convert($_);
}

sub fromto {
  return from_to($_, 'UTF-16LE', 'UTF-8');
}

这些库倾向于按字节基础转换每个字符，这是不好的。在短期内，我进行的转换如下：

sub mine {
  my $text = $_;
  $text =~ s/\xc3\xa9/e/g;
  $text =~ s/\xc3\xa1/a/g;
  return $text;
}

必须有更好的方法！有什么建议吗？

原文

I am reading data from server which is returned as JSON and parsed using JSON::Parse. The JSON data includes accented characters such as é. These are encoded in the output strings as 2 byte characters such as \xc3\xa9. The rest of the string is standard 1 byte ASCII characters.

How can I remove the accents from these characters? I have tried all of the following methods without success:

use Unicode::Normalize;
use utf8;
use Text::Iconv;
use Encode qw(from_to);
use Text::Unidecode;

sub normalise_text {
  my $text = shift;
  my $decomposed = NFKD( $text );
  $decomposed =~ s/\p{NonspacingMark}//g;
  return $decomposed;
}

sub convert {
  my $converter = Text::Iconv->new("utf16", "utf8");
  return $converter->convert($_);
}

sub fromto {
  return from_to($_, 'UTF-16LE', 'UTF-8');
}

These libraries tend to convert each character on a byte by byte basis which is no good. For the short term, I am doing the conversion as follows:

sub mine {
  my $text = $_;
  $text =~ s/\xc3\xa9/e/g;
  $text =~ s/\xc3\xa1/a/g;
  return $text;
}

There must be a better way! Any suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

野却迷人

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

从ASCII字符串中的2个字节字符中删除重音

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

从ASCII字符串中的2个字节字符中删除重音

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。