如何将汉字一一拆分?
如果名字和姓氏之间没有特殊字符(例如空格、:等)。
那么下面如何拆分汉字呢。
use strict;
use warnings;
use Data::Dumper;
my $fh = \*DATA;
my $fname; # 小三;
my $lname; # 张 ;
while(my $name = <$fh>)
{
$name =~ ??? ;
print $fname"/n";
print $lname;
}
__DATA__
张小三
输出
小三
张
[更新]
WinXP。使用ActivePerl5.10.1。
If there is no special character(such as white space, : etc) between firstname and lastname.
Then how to split the Chinese characters below.
use strict;
use warnings;
use Data::Dumper;
my $fh = \*DATA;
my $fname; # 小三;
my $lname; # 张 ;
while(my $name = <$fh>)
{
$name =~ ??? ;
print $fname"/n";
print $lname;
}
__DATA__
张小三
Output
小三
张
[Update]
WinXP. ActivePerl5.10.1 used.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您遇到问题是因为您忽略了在输入期间将二进制数据解码为 Perl 字符串并在输出期间将 Perl 字符串编码为二进制数据。其原因是正则表达式及其朋友
split
在 Perl 字符串上正常工作。(?<=.)
表示“第一个字符之后”。因此,该程序无法在复姓/复合姓氏上正常工作;请记住,它们很罕见,但确实存在。为了始终正确地将名字拆分为姓氏和名字部分,您需要使用包含姓氏的字典。Linux 版本:
输出:
Windows 版本:
输出:
You have problems because you neglect to decode binary data to Perl strings during input and encode Perl strings to binary data during output. The reason for this is that regular expressions and its friend
split
work properly on Perl strings.(?<=.)
means "after the first character". As such, this program will not work correctly on 复姓/compound family names; keep in mind that they are rare, but do exist. In order to always correctly split a name into family name and given name parts, you need to use a dictionary with family names.Linux version:
Output:
Windows version:
Output:
您需要某种启发式方法来区分名字和姓氏。这是一些工作代码,假设姓氏(姓氏)是一个字符(第一个字符),所有其余字符(至少一个)都属于名字(给定名称):
编辑:更改程序以忽略无效行而不是快死了。
当我从命令行运行该程序时,我得到以下输出:
You'll need some kind of heuristic to separate the first and last names. Here's some working code that assumes that the last name (surname) is one character (the first) and all the remaining characters (at least one) belong to the first name (given name):
EDIT: Changed program to ignore invalid lines rather than dying.
When I run this program from the command line, I get this output:
这会分割字符并将它们分配给 $fname 和 $lname。
虽然我认为你的例子和你的问题并不真正匹配(姓氏有两个字符。
This splits the characters and assigns them to $fname and $lname.
Though I think your example and your question don't really match (the lastname has two characters.