Perl 单个 HTML 文件中的多个正则表达式

发布于 2024-12-11 08:57:10 字数 4428 浏览 0 评论 0 原文

所以我需要在一个 HTML 文件上执行多个 perl 正则表达式,并将每个值存储在一个数组中。

html 文件看起来像

<a href="/jobs_qa">Job QA</a>

Title:
Commercial Bank 
<p></p>
City:
TX   
State:
TX  
Country:

<p></p>
Full Description:
<p></p>
<p> Citi North America Consumer Banking group serves customers through Retail Banking, Credit Cards, Personal Banking and Wealth Management, Small Business Banking and Commercial Banking.     </p>

<p>Commercial Bank Head - Houston-11030087</p>

<p>Description </p>

<p>POSITION SUMMARY</p>

<p>Lead the sales, relationship, and credit management for commercial banking customers in a given marketplace.  Build and motivate talented relationship teams to effectively penetrate the market and gain market share.  Current business segment includes those clients with revenues from $20 to $500+ million annually.    Clients in this segment typically require more complex product offerings and customized credit decisions made in the field.</p>

<p> </p>

<p> </p>



<p>Qualifications </p>

<p>EXPERIENCE
<br />-MBA or equivalent experience
<br />-Minimum 10 years business and/or commercial banking with increasing levels of responsibility

<p> </p>


<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/commercial-bank-head-houston-citi-houston-tx</a>
<hr>
Title:
Sr Business Relationship 
<p></p>
City:
CO   
State:
CO  
Country:

<p></p>
Full Description:
<p></p>
<p>Effectively acquires, manages and grows profitable account relationships with an extensive percentage of moderately complex and medium sized business customers that have annual gross sales of generally more than $2MM and less than $20MM. Ensures the overall success & growth of an assigned portfolio by deepening relationships of existing customers and through the acquisition of new customers. 
<p></p>
<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/sr-business-relationship-mgr-wells-fargo-avon-co</a>
<hr>
Title:
Implementation Associate
<p></p>
City:
WI   
State:
WI  
Country:

<p></p>
Full Description:
<p></p>
<p>Works with project managers and project teams to determine implementation strategy, methods and plans for initiatives that typically impact single systems, workflows or products with low risk and complexity or where work is completed under guidance. Coordinates development of business requirements. Develops standard communication and training plans and materials. Implements communications and training plans. Tracks implementation tasks and budgets, identifies and reports issues or escalates as needed and reports project status. Documents or updates best practices, workflows or procedures. May also be responsible to miscellaneous business administrative initiatives.2+ years experience in one or more of the following: administrative support; project management; implementation; or participation in project teams as part of on-going responsibilities in a postion supporting the line of business.Relevant project management and/or implementation experience- Proven organizational, motivational, time management, prioritization, detail orientation
<br /> and multi-tasking skills. 
<br />- Proven oral and written communication skills to support each line of business. 
<br />- Experience with PC applications - Word, Excel, Access, Power Point and Visio.</p>
<p></p>
<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/implementation-associate-wells-fargo-milwaukee-wi</a>
<hr>
Title:
......... ... ..... ........ 

............

等等 - 即我想将所有内容从标题分组到标题。 即 $array[0]= "标题:商业银行

城市:TX........"
$array[1]= "标题:高级业务关系

"
等等。

我会有大约 300 个这样的值。

我还需要其中的 HTML 标签。 因为我需要验证标签的正确使用。 我不知道标签之间的内容

我尝试过的是 尝试:

my $i=0;
my @array;
while ($html =~ m/.*(Title:.*?)Title:/ig)
{
    $array[$i]=$1;
    $i++;
}

foreach (@array)
{
    print "$_";
}

但是没有什么能绝对被拾取。 请指教....

So I need to do multiple perl regex on a single HTML file, and store each value in an array.

The html file looks like

<a href="/jobs_qa">Job QA</a>

Title:
Commercial Bank 
<p></p>
City:
TX   
State:
TX  
Country:

<p></p>
Full Description:
<p></p>
<p> Citi North America Consumer Banking group serves customers through Retail Banking, Credit Cards, Personal Banking and Wealth Management, Small Business Banking and Commercial Banking.     </p>

<p>Commercial Bank Head - Houston-11030087</p>

<p>Description </p>

<p>POSITION SUMMARY</p>

<p>Lead the sales, relationship, and credit management for commercial banking customers in a given marketplace.  Build and motivate talented relationship teams to effectively penetrate the market and gain market share.  Current business segment includes those clients with revenues from $20 to $500+ million annually.    Clients in this segment typically require more complex product offerings and customized credit decisions made in the field.</p>

<p> </p>

<p> </p>



<p>Qualifications </p>

<p>EXPERIENCE
<br />-MBA or equivalent experience
<br />-Minimum 10 years business and/or commercial banking with increasing levels of responsibility

<p> </p>


<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/commercial-bank-head-houston-citi-houston-tx</a>
<hr>
Title:
Sr Business Relationship 
<p></p>
City:
CO   
State:
CO  
Country:

<p></p>
Full Description:
<p></p>
<p>Effectively acquires, manages and grows profitable account relationships with an extensive percentage of moderately complex and medium sized business customers that have annual gross sales of generally more than $2MM and less than $20MM. Ensures the overall success & growth of an assigned portfolio by deepening relationships of existing customers and through the acquisition of new customers. 
<p></p>
<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/sr-business-relationship-mgr-wells-fargo-avon-co</a>
<hr>
Title:
Implementation Associate
<p></p>
City:
WI   
State:
WI  
Country:

<p></p>
Full Description:
<p></p>
<p>Works with project managers and project teams to determine implementation strategy, methods and plans for initiatives that typically impact single systems, workflows or products with low risk and complexity or where work is completed under guidance. Coordinates development of business requirements. Develops standard communication and training plans and materials. Implements communications and training plans. Tracks implementation tasks and budgets, identifies and reports issues or escalates as needed and reports project status. Documents or updates best practices, workflows or procedures. May also be responsible to miscellaneous business administrative initiatives.2+ years experience in one or more of the following: administrative support; project management; implementation; or participation in project teams as part of on-going responsibilities in a postion supporting the line of business.Relevant project management and/or implementation experience- Proven organizational, motivational, time management, prioritization, detail orientation
<br /> and multi-tasking skills. 
<br />- Proven oral and written communication skills to support each line of business. 
<br />- Experience with PC applications - Word, Excel, Access, Power Point and Visio.</p>
<p></p>
<a href="http://www.mysite.com/jobs/">http://www.e.com/jobs/implementation-associate-wells-fargo-milwaukee-wi</a>
<hr>
Title:
......... ... ..... ........ 

...............

And so on - ie I want to group out all content from title to title.
i.e. $array[0]= "Title: Commercial Bank <p></p>City:TX ........."
and $array[1]= "Title: Sr Business Relationship <p></p> " and so on and so forth.

I would have approximately 300 such values.

I would also need the HTML tags inside them.
As i need to validate the correct usage of the tags.
I would not know the contents between the tags

What I have tried is
Attempt :

my $i=0;
my @array;
while ($html =~ m/.*(Title:.*?)Title:/ig)
{
    $array[$i]=$1;
    $i++;
}

foreach (@array)
{
    print "$_";
}

But nothing gets absolutely gets picked up.
Please advice....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

晨曦÷微暖 2024-12-18 08:57:10

不要使用正则表达式来解析 HTML。使用 HTML 解析器。 CPAN上有很多。我最喜欢的之一是 HTML::TokeParser::Simple

HTML::TidyW3验证器可以帮助您检查HTML文档。

Don't use regular expressions to parse HTML. Use an HTML parser. There are many on CPAN. One of my favorites is HTML::TokeParser::Simple.

HTML::Tidy and the W3 validator can help you check HTML documents.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文