在文件夹下的文件中搜索有效模式? (也许使用 Perl 或使用 Java 的一些 API 或其他任何东西)

发布于 2024-10-26 20:47:58 字数 1962 浏览 1 评论 0 原文

我有一个文件夹,它的名称是 v3。该文件夹中有 jsp 文件,v3 文件夹中也有一些文件夹,该文件夹中也有 jsp 文件。

我的 jsp 文件夹有一些链接,如下所示:

<link rel="stylesheet" href="/static/css/main.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="/css<s:text name="scripts"/>/general_styles.css">
<link rel="stylesheet" type="text/css" href="/v3/css<s:text name="scripts"/>/something.css" >

和脚本:

<script language="javascript" type="text/javascript" src="/static/scripts/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="/scripts<s:text name="scripts"/>/prototype-1.6.0.2.js"></script>
<script language="javascript" type="text/javascript" src="/scripts<s:text name="scripts"/>/${a.name}/<s:text name="genericJs"/>"></script>

对于链接:

href 应以 /static/ 开头,例如这是有效的:

<link rel="stylesheet" href="/static/css/main.css" type="text/css" />

对于脚本:

src< /code> 也应该以 /static/ 开头,例如这是有效的:

<script language="javascript" type="text/javascript" src="/static/scripts/jquery-1.4.2.min.js"></script>

我想要做什么,我想检测哪些文件具有无效定义。

编辑:有效-无效是我公司系统的一个概念。我们正在将 css 和 js 文件夹移动到另一个文件夹,它们将位于一个文件夹下,该文件夹的名称为 v3。

该程序将像这样工作:

  • jsp 文件位于 v3 文件夹下。我将从任何地方运行该程序,它将检查该文件夹下的所有 jsp 文件(我将在编写的程序中定义该 v3 文件夹的整个路径)。

  • 它将查找以 开头的所有行。

  • 如果是,它将找到href="

  • 如果是则会找到src="

  • 找到其中之一后,它将检查它是否以 /static/ 开头。

  • 如果启动,则可以,但如果不启动,它会将文件名写入输出/文本文件或其他任何内容。

I have a folder and it's name is v3. There are jsp files in that folder also in v3 folder there are some folders and there are jsp files in that folder too.

My jsp folders have some links as like:

<link rel="stylesheet" href="/static/css/main.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="/css<s:text name="scripts"/>/general_styles.css">
<link rel="stylesheet" type="text/css" href="/v3/css<s:text name="scripts"/>/something.css" >

and scripts:

<script language="javascript" type="text/javascript" src="/static/scripts/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="/scripts<s:text name="scripts"/>/prototype-1.6.0.2.js"></script>
<script language="javascript" type="text/javascript" src="/scripts<s:text name="scripts"/>/${a.name}/<s:text name="genericJs"/>"></script>

For links:

href should start with /static/ for example this is valid:

<link rel="stylesheet" href="/static/css/main.css" type="text/css" />

For scripts:

src should start with /static/ too for example this is valid:

<script language="javascript" type="text/javascript" src="/static/scripts/jquery-1.4.2.min.js"></script>

What I want to do that I want to detect which files has not valid definitions.

EDIT: Valid - not valid is a notion for my company's system. We are moving our css and js folders to another and they will be under a folder and that folder's name is v3.

The program will work like that:

  • jsp files are under v3 folder. I will run that program from anywhere and it will check all the jsp files under that folder(I will define the whole path of that v3 folder in the written program).

  • It will find all lines that start with <link and <script.

  • If it is <link it will find href="

  • If it is <script it will find src="

  • After it finds one of them it will check that does it start with /static/ or not.

  • If starts it is OK but if doesn't it will write the file name to output/text file or anything else.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

握住我的手 2024-11-02 20:47:58

您可以查看 ack,它是 grep< 的基于 Perl 的替代品/code> 并且应该能够做你想做的事。

说实话,您不想在 [xht]ml 上进行模式匹配,您想使用解析器。使用 HTML::TokeParser 的 Perl 脚本可以找到具有某些属性的所有标签。

这是一个快速模型,但您可能需要一些更有用的输出:

#!/usr/bin/perl

use strict;

use warnings;

use File::chdir;
use Cwd 'abs_path';

use HTML::TokeParser;

my @paths = @ARGV;
my @files;

foreach my $path (@paths) {
  local $CWD = $path;
  opendir( my $dh, $CWD);
  push @files, map { abs_path($_) } grep {/\.jsp$/} readdir $dh;
}

foreach my $file (@files) {
  my $parser = HTML::TokeParser->new($file);

  while (my $tag = $parser->get_tag("link", "script")) {

    if ($tag->[0] eq "link" and exists $tag->[1]{'href'}) {
      print "$file\n" unless $tag->[1]{'href'} =~ m#^/static/#;

    } elsif ($tag->[0] eq "script" and exists $tag->[1]{'src'}) {
      print "$file\n" unless $tag->[1]{'src'} =~ m#^/static/#;
    }

  }  

}

You might look into ack, it is a Perl-based replacement for grep and should be able to do what you want.

Truthfully you don't want to do pattern matching on [xht]ml, you want to use a parser. A Perl script using HTML::TokeParser can find all tags with certain attributes.

Here is a quick mock-up, though you may want some more useful output:

#!/usr/bin/perl

use strict;

use warnings;

use File::chdir;
use Cwd 'abs_path';

use HTML::TokeParser;

my @paths = @ARGV;
my @files;

foreach my $path (@paths) {
  local $CWD = $path;
  opendir( my $dh, $CWD);
  push @files, map { abs_path($_) } grep {/\.jsp$/} readdir $dh;
}

foreach my $file (@files) {
  my $parser = HTML::TokeParser->new($file);

  while (my $tag = $parser->get_tag("link", "script")) {

    if ($tag->[0] eq "link" and exists $tag->[1]{'href'}) {
      print "$file\n" unless $tag->[1]{'href'} =~ m#^/static/#;

    } elsif ($tag->[0] eq "script" and exists $tag->[1]{'src'}) {
      print "$file\n" unless $tag->[1]{'src'} =~ m#^/static/#;
    }

  }  

}
︶ ̄淡然 2024-11-02 20:47:58

使用 HTML::TreeBuilder (和因此 HTML::Element) 和look_down 来查找你的东西。

use strict;
use warnings;
use 5.012;

use HTML::TreeBuilder;

my @filelist = @ARGV; ### or some other method ofc.

for my $file (@filelist) {
  my $tree = HTML::TreeBuilder->new_from_file($file)->elementify;
  for my $e ( $tree->look_down( '_tag' => 'link',
                                sub { !($_[0]->attr('href') =~ |^/static/|) }
                              ) 
            ) {
    say "$file: ", $e->as_HTML;
  }
}

使用相同的脚本修改 HTML 也非常简单(即,将 '/static/' 连接到坏属性的前面 - $e->attr('href') = '/static/ ' . $e->attr('href')) 并用 HTML::TidyHTML::Tidy->new->clean($tree->as_HTML)

编辑:关于您关于文件列表的问题,将 ARGV 分配给文件列表只是因为它与问题无关。请参阅例如,如何在目录中搜索所有 .XXX 文件并在 Perl 中获取它们的列表?。我会使用 File::Find

use File::Find;
my @rootdirs = @ARGV or die "Please pass all root directories to search as arguments!";
my @filelist = find( sub { /\.jsp$/ or return; }, @rootdirs );

这将通过您的起始目录进行递归(通过参数传递——如果您愿意,您也可以将它们直接分配给 @rootdirs),并将其中的所有 .jsp 文件作为 @filelist 中的项目提供给您。

Use HTML::TreeBuilder (and thus HTML::Element) and look_down to find your stuff.

use strict;
use warnings;
use 5.012;

use HTML::TreeBuilder;

my @filelist = @ARGV; ### or some other method ofc.

for my $file (@filelist) {
  my $tree = HTML::TreeBuilder->new_from_file($file)->elementify;
  for my $e ( $tree->look_down( '_tag' => 'link',
                                sub { !($_[0]->attr('href') =~ |^/static/|) }
                              ) 
            ) {
    say "$file: ", $e->as_HTML;
  }
}

It's also then pretty trivial to modify your HTML with the same script (ie, concatenate '/static/' to the front of the bad attributes -- $e->attr('href') = '/static/' . $e->attr('href')) and spit it back out with HTML::Tidy with HTML::Tidy->new->clean($tree->as_HTML).

Edit: Re your question about the file list, assigning ARGV to the file list was done simply because it was tangential to the question. See How do I search a directory for all .XXX files and get a list of them in Perl? for example. I'd use File::Find:

use File::Find;
my @rootdirs = @ARGV or die "Please pass all root directories to search as arguments!";
my @filelist = find( sub { /\.jsp$/ or return; }, @rootdirs );

That would recurse through your starting directories (passed by argument -- you could also assign them directly to @rootdirs if you prefer) and give you all the .jsp files in them as items in @filelist.

少跟Wǒ拽 2024-11-02 20:47:58

好吧,这样的事情会让你继续:

public static void main(String[] args) throws IOException {
    Iterator<File> files = FileUtils.iterateFiles(new File("/path/to/v3"), new String[]{"jsp"}, true);
    while (files.hasNext()) {
        File jsp =  files.next();
        List<String> list = FileUtils.readLines(jsp);
        for (String line : list) {
            if(line.startsWith("<link") || line.startsWith("<script")) {
                if(!line.contains("/static")) {
                    throw new RuntimeException("invalid file found: " + jsp.getAbsolutePath());
                }
            }
        }
    }
}

编辑以包含评论中讨论的更改

Well, something like this will get you going:

public static void main(String[] args) throws IOException {
    Iterator<File> files = FileUtils.iterateFiles(new File("/path/to/v3"), new String[]{"jsp"}, true);
    while (files.hasNext()) {
        File jsp =  files.next();
        List<String> list = FileUtils.readLines(jsp);
        for (String line : list) {
            if(line.startsWith("<link") || line.startsWith("<script")) {
                if(!line.contains("/static")) {
                    throw new RuntimeException("invalid file found: " + jsp.getAbsolutePath());
                }
            }
        }
    }
}

Edited to contain changes discussed in comments

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文