正则表达式在Perl中获取标点符号后单词的第一个字母

发布于 2024-12-10 07:12:15 字数 892 浏览 0 评论 0原文

谁能告诉我 Perl 中的正则表达式,用于获取点、问号或感叹号后面的单词的 ucfirst 字母...

我的程序逐个字符读取字符串。

要求:

input string : "abcd[.?!]\s*abcd"
output: "Abcd[.?!]\s*Abcd"

我的程序是如下:

#!/usr/bin/perl

use strict;

my $str = <STDIN>;
my $len=length($str);
my $ch;

my $i;
for($i=0;$i<=length($str);$i++)
{
$ch = substr($str,$i,1);
print "$ch";
if($ch =~ 's/([.?!]\s*[a-z])/uc($1)/ge')
{
    $i=$i+1;
    $ch = substr($str, $i,1);
    my $ch = uc($ch);
    print "$ch";
}
#elsif($ch eq "?")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print "$ch";
#}
#elsif($ch eq "!")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print"$ch";
#}
#elsif($ch eq " ")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print"$ch";
#}
#else
#{
#print "";
#}
}
print "\n";

Can any body tell me a regular expression in Perl for getting the ucfirst letter of the word comming after a dot,question or exclamation sign...

My program reads string character by character.

Requirement :

input string : "abcd[.?!]\s*abcd"
output: "Abcd[.?!]\s*Abcd"

My program is as follows:

#!/usr/bin/perl

use strict;

my $str = <STDIN>;
my $len=length($str);
my $ch;

my $i;
for($i=0;$i<=length($str);$i++)
{
$ch = substr($str,$i,1);
print "$ch";
if($ch =~ 's/([.?!]\s*[a-z])/uc($1)/ge')
{
    $i=$i+1;
    $ch = substr($str, $i,1);
    my $ch = uc($ch);
    print "$ch";
}
#elsif($ch eq "?")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print "$ch";
#}
#elsif($ch eq "!")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print"$ch";
#}
#elsif($ch eq " ")
#{
#   $i=$i+1;
#   $ch = substr($str, $i,1);
#   my $ch = uc($ch);
#   print"$ch";
#}
#else
#{
#print "";
#}
}
print "\n";

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

白昼 2024-12-17 07:12:15

循环字符串,然后循环匹配,是完全多余的。您的整个程序可以替换为:

perl -pe 's/(^|[.?!]\s*)([a-z])/$1\U\2/g' inputfile >outputfile

我将行开头添加到第一个括号表达式中,尽管您的解释不包括这一点(但您的示例包含)。

Looping over the string, and then looping over the match, is completely redundant. Your entire program can be replaced with this:

perl -pe 's/(^|[.?!]\s*)([a-z])/$1\U\2/g' inputfile >outputfile

I added beginning of line to the first parenthesized expression, although your explanation doesn't include that (but your example does).

从﹋此江山别 2024-12-17 07:12:15

通常,

$s =~ s/(?<=[.?!]|^)\s*[a-z]/\U$1/g;

$s =~ s/(?<![^.?!])\s*[a-z]/\U$1/g;

$s =~ s/(?:^|[.?!])\s*\K[a-z]/\U$1/g;

但是如果你一次只读一个字符,

my $after_punc = 1;
while (my $ch = ...) {
    if ($ch =~ /^[.?!]\z/) {
       $after_punc = 1;
    }
    elsif ($ch =~ /^[a-z]\z/) {
       $ch = uc($ch) if $after_punc;
       $after_punc = 0;
    }
    elsif ($ch =~ /^\s\z/) {
       # Ignore whitespace.
    }
    else {
       $after_punc = 0;
    }

    ...
}

Normally,

$s =~ s/(?<=[.?!]|^)\s*[a-z]/\U$1/g;

$s =~ s/(?<![^.?!])\s*[a-z]/\U$1/g;

$s =~ s/(?:^|[.?!])\s*\K[a-z]/\U$1/g;

But if you only read one character at a time,

my $after_punc = 1;
while (my $ch = ...) {
    if ($ch =~ /^[.?!]\z/) {
       $after_punc = 1;
    }
    elsif ($ch =~ /^[a-z]\z/) {
       $ch = uc($ch) if $after_punc;
       $after_punc = 0;
    }
    elsif ($ch =~ /^\s\z/) {
       # Ignore whitespace.
    }
    else {
       $after_punc = 0;
    }

    ...
}
尘曦 2024-12-17 07:12:15

任何人都可以告诉我 Perl 中的正则表达式,用于获取点、问题或感叹号后出现的单词的 ucfirst 字母...

我的程序逐字符读取字符串。

要求:

输入字符串:“abcd[.?!]\s*abcd”

输出:“Abcd[.?!]\s*Abcd”

您的输出与您的解释不符。在输入中,首字母“a”后面没有句号、问号或感叹号,而是更改为大写。

您可以而且应该通过一次替换来完成这种处理。完全按照您所说的操作:

s/[.?!]\K[[:lower:]]/uc(
amp;)/ge

\K 丢弃与 [.?!] 匹配的字符,只留下匹配字符串中的小写字母。 $& 是匹配的字符串。 e 标志表示评估 uc($&)

如果您还想将首字母变为大写:

s/(?:^|[.?!])\K[[:lower:]]/uc(
amp;)/ge

can any body tell me the regex in perl for getting the ucfirst letter of the word comming after a dot,question or exclamation sign...

My program reads string character by character.

Requirement :

input string : "abcd[.?!]\s*abcd"

output: "Abcd[.?!]\s*Abcd"

Your output does not match your explanation. In the input, the initial "a" does not follow a period, question mark, or exclamation mark, but was changed to upper case.

You can and should do this sort of processing with a single substitution. To do exactly as you said:

s/[.?!]\K[[:lower:]]/uc(
amp;)/ge

The \K discards the character matched by [.?!], leaving only the lower-case letter in the matched string. $& is the matched string. The e flag says to evaluate uc($&).

If you also want to make an initial letter uppercase:

s/(?:^|[.?!])\K[[:lower:]]/uc(
amp;)/ge
夏日落 2024-12-17 07:12:15

如果你有 unicode 字符串,你可以使用:

$str =~ s/(\pP|^)(\s*\pL)/$1\U$2/g;

If you have unicode string, you could use:

$str =~ s/(\pP|^)(\s*\pL)/$1\U$2/g;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文