I have a folder and it's name is v3. There are jsp files in that folder also in v3 folder there are some folders and there are jsp files in that folder too.
What I want to do that I want to detect which files has not valid definitions.
EDIT: Valid - not valid is a notion for my company's system. We are moving our css and js folders to another and they will be under a folder and that folder's name is v3.
The program will work like that:
jsp files are under v3 folder. I will run that program from anywhere and it will check all the jsp files under that folder(I will define the whole path of that v3 folder in the written program).
It will find all lines that start with <link and <script.
If it is <link it will find href="
If it is <script it will find src="
After it finds one of them it will check that does it start with /static/ or not.
If starts it is OK but if doesn't it will write the file name to output/text file or anything else.
#!/usr/bin/perl
use strict;
use warnings;
use File::chdir;
use Cwd 'abs_path';
use HTML::TokeParser;
my @paths = @ARGV;
my @files;
foreach my $path (@paths) {
local $CWD = $path;
opendir( my $dh, $CWD);
push @files, map { abs_path($_) } grep {/\.jsp$/} readdir $dh;
}
foreach my $file (@files) {
my $parser = HTML::TokeParser->new($file);
while (my $tag = $parser->get_tag("link", "script")) {
if ($tag->[0] eq "link" and exists $tag->[1]{'href'}) {
print "$file\n" unless $tag->[1]{'href'} =~ m#^/static/#;
} elsif ($tag->[0] eq "script" and exists $tag->[1]{'src'}) {
print "$file\n" unless $tag->[1]{'src'} =~ m#^/static/#;
}
}
}
You might look into ack, it is a Perl-based replacement for grep and should be able to do what you want.
Truthfully you don't want to do pattern matching on [xht]ml, you want to use a parser. A Perl script using HTML::TokeParser can find all tags with certain attributes.
Here is a quick mock-up, though you may want some more useful output:
#!/usr/bin/perl
use strict;
use warnings;
use File::chdir;
use Cwd 'abs_path';
use HTML::TokeParser;
my @paths = @ARGV;
my @files;
foreach my $path (@paths) {
local $CWD = $path;
opendir( my $dh, $CWD);
push @files, map { abs_path($_) } grep {/\.jsp$/} readdir $dh;
}
foreach my $file (@files) {
my $parser = HTML::TokeParser->new($file);
while (my $tag = $parser->get_tag("link", "script")) {
if ($tag->[0] eq "link" and exists $tag->[1]{'href'}) {
print "$file\n" unless $tag->[1]{'href'} =~ m#^/static/#;
} elsif ($tag->[0] eq "script" and exists $tag->[1]{'src'}) {
print "$file\n" unless $tag->[1]{'src'} =~ m#^/static/#;
}
}
}
use strict;
use warnings;
use 5.012;
use HTML::TreeBuilder;
my @filelist = @ARGV; ### or some other method ofc.
for my $file (@filelist) {
my $tree = HTML::TreeBuilder->new_from_file($file)->elementify;
for my $e ( $tree->look_down( '_tag' => 'link',
sub { !($_[0]->attr('href') =~ |^/static/|) }
)
) {
say "$file: ", $e->as_HTML;
}
}
use File::Find;
my @rootdirs = @ARGV or die "Please pass all root directories to search as arguments!";
my @filelist = find( sub { /\.jsp$/ or return; }, @rootdirs );
use strict;
use warnings;
use 5.012;
use HTML::TreeBuilder;
my @filelist = @ARGV; ### or some other method ofc.
for my $file (@filelist) {
my $tree = HTML::TreeBuilder->new_from_file($file)->elementify;
for my $e ( $tree->look_down( '_tag' => 'link',
sub { !($_[0]->attr('href') =~ |^/static/|) }
)
) {
say "$file: ", $e->as_HTML;
}
}
It's also then pretty trivial to modify your HTML with the same script (ie, concatenate '/static/' to the front of the bad attributes -- $e->attr('href') = '/static/' . $e->attr('href')) and spit it back out with HTML::Tidy with HTML::Tidy->new->clean($tree->as_HTML).
use File::Find;
my @rootdirs = @ARGV or die "Please pass all root directories to search as arguments!";
my @filelist = find( sub { /\.jsp$/ or return; }, @rootdirs );
That would recurse through your starting directories (passed by argument -- you could also assign them directly to @rootdirs if you prefer) and give you all the .jsp files in them as items in @filelist.
发布评论
评论(3)
您可以查看
ack
,它是grep< 的基于 Perl 的替代品/code> 并且应该能够做你想做的事。
说实话,您不想在 [xht]ml 上进行模式匹配,您想使用解析器。使用
HTML::TokeParser
的 Perl 脚本可以找到具有某些属性的所有标签。这是一个快速模型,但您可能需要一些更有用的输出:
You might look into
ack
, it is a Perl-based replacement forgrep
and should be able to do what you want.Truthfully you don't want to do pattern matching on [xht]ml, you want to use a parser. A Perl script using
HTML::TokeParser
can find all tags with certain attributes.Here is a quick mock-up, though you may want some more useful output:
使用 HTML::TreeBuilder (和因此 HTML::Element) 和look_down 来查找你的东西。
使用相同的脚本修改 HTML 也非常简单(即,将 '/static/' 连接到坏属性的前面 -
$e->attr('href') = '/static/ ' . $e->attr('href')
) 并用 HTML::Tidy 与HTML::Tidy->new->clean($tree->as_HTML)
。编辑:关于您关于文件列表的问题,将 ARGV 分配给文件列表只是因为它与问题无关。请参阅例如,如何在目录中搜索所有 .XXX 文件并在 Perl 中获取它们的列表?。我会使用 File::Find:
这将通过您的起始目录进行递归(通过参数传递——如果您愿意,您也可以将它们直接分配给 @rootdirs),并将其中的所有 .jsp 文件作为 @filelist 中的项目提供给您。
Use HTML::TreeBuilder (and thus HTML::Element) and look_down to find your stuff.
It's also then pretty trivial to modify your HTML with the same script (ie, concatenate '/static/' to the front of the bad attributes --
$e->attr('href') = '/static/' . $e->attr('href')
) and spit it back out with HTML::Tidy withHTML::Tidy->new->clean($tree->as_HTML)
.Edit: Re your question about the file list, assigning ARGV to the file list was done simply because it was tangential to the question. See How do I search a directory for all .XXX files and get a list of them in Perl? for example. I'd use File::Find:
That would recurse through your starting directories (passed by argument -- you could also assign them directly to @rootdirs if you prefer) and give you all the .jsp files in them as items in @filelist.
好吧,这样的事情会让你继续:
编辑以包含评论中讨论的更改
Well, something like this will get you going:
Edited to contain changes discussed in comments