正则表达式删除文件中以相同或定义的字母结尾的行

发布于 2024-12-05 14:04:37 字数 2746 浏览 1 评论 0原文

我需要一个用于 ma​​c osx 的 bash 脚本,以这种方式工作:

./script.sh * folder/to/files/ 
#
# or #
#
./script.sh xx folder/to/files/

该脚本

  • 读取文件列表,
  • 打开每个文件并读取每一行,
  • 如果行以相同的字母结尾('*' 模式)或使用自定义字母('xx')然后
    删除行并重新保存文件
  • 备份原始文件

我执行此操作的第一种方法:

#!/bin/bash

# ck init params
if [ $# -le 0 ]
then
  echo "Usage: $0 <letters>"
  exit 0
fi

# list files in current dir
list=`ls BRUTE*` 
for i in $list 
do 

  # prepare regex    
  case $1 in
       "*") REGEXP="^.*(.)\1+$";;
       *) REGEXP="^.*[$1]$";;
  esac    
  FILE=$i

  # backup file
  cp $FILE $FILE.bak

  # removing line with same letters
  sed -Ee "s/$REGEXP//g" -i '' $FILE
  cat $FILE | grep -v "^$"

done

exit 0

但它无法按我想要的方式工作......

出了什么问题?
我该如何修复这个脚本?


示例:

$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$

如果我使用“*”,我希望所有以相同字母结尾的文件都是干净的。
如果我使用“ff”,我希望所有以“ff”结尾的文件都是干净的。


啊,它在 Mac OSx 上。请记住,sed 与经典的 Linux sed 略有不同。

人 sed

 sed [-Ealn] 命令 [文件 ...]
 sed [-Ealn] [-e 命令] [-f 命令文件] [-i 扩展名] [文件

...]

描述 sed 实用程序读取指定的文件或标准输入 如果未指定文件,则修改列表指定的输入 命令。这 然后输入被写入标准输出。

 可以将单个命令指定为 sed 的第一个参数。 

可以使用 -e 或 -f 选项指定多个命令。全部 命令被应用 按照指定的顺序对输入进行排序,而不管它们的值如何 起源。

 以下选项可用:

 -E 将正则表达式解释为扩展(现代)

正则表达式而不是基本正则表达式(BRE)。 re_format(7) 手册页 完整描述了两种格式。

 -a 作为“w”函数参数列出的文件
默认情况下,

在任何处理开始之前创建(或截断)。 -a 选项导致 sed 延迟打开每个文件,直到包含的命令 相关的“w”函数应用于一行输入。

<前><代码> -e 命令 追加命令指定的编辑命令

命令列表的参数。

<前><代码> -f 命令文件 追加在文件中找到的编辑命令

command_file 到命令列表。编辑命令应该 每个都列在单独的行中。

<前><代码> -i 扩展名 就地编辑文件,保存指定的备份

扩展名。如果给出零长度扩展,则不会进行备份 已保存。不推荐 修正以在就位时提供零长度扩展 编辑文件,因为在某些情况下您可能会面临损坏或部分内容的风险 磁盘空间在哪里 精疲力尽等

 -l 使输出行缓冲。

 -n 默认情况下,每行输入都会回显标准

应用所有命令后的输出。然后 选项抑制此 行为。

 sed命令的形式如下:

       [地址[,地址]]函数[参数]

 可以在第一个地址和地址之前插入空格

命令的功能部分。

 通常,sed 会循环复制一行输入,不包括

其终止换行符,进入模式空间,(除非有 还剩下什么吗 在“D”函数之后),应用所有命令 选择该模式空间的地址,将模式空间复制到 标准输出,附加 换行符,并删除模式空间。

 某些函数使用保留空间来保存全部或部分

用于后续检索的模式空间。

还有什么吗?
我的问题清楚了吗?

谢谢。

i need a bash script for mac osx working in this way:

./script.sh * folder/to/files/ 
#
# or #
#
./script.sh xx folder/to/files/

This script

  • read a list of files
  • open each file and read each lines
  • if lines ended with the same letters ('*' mode) or with custom letters ('xx') then
    remove line and RE-SAVE file
  • backup original file

My first approach to do this:

#!/bin/bash

# ck init params
if [ $# -le 0 ]
then
  echo "Usage: $0 <letters>"
  exit 0
fi

# list files in current dir
list=`ls BRUTE*` 
for i in $list 
do 

  # prepare regex    
  case $1 in
       "*") REGEXP="^.*(.)\1+$";;
       *) REGEXP="^.*[$1]$";;
  esac    
  FILE=$i

  # backup file
  cp $FILE $FILE.bak

  # removing line with same letters
  sed -Ee "s/$REGEXP//g" -i '' $FILE
  cat $FILE | grep -v "^$"

done

exit 0

But it doesn't work as i want....

What's wrong?
How can i fix this script?


Example:

$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$

If i use '*' i want all files that ended with same letters to be clean.
If i use 'ff' i want all files that ended with 'ff' to be clean.


Ah, it's on Mac OSx. Remember that sed is a little different from classical linux sed.

man sed

 sed [-Ealn] command [file ...]
 sed [-Ealn] [-e command] [-f command_file] [-i extension] [file

...]

DESCRIPTION
The sed utility reads the specified files, or the standard input
if no files are specified, modifying the input as specified by a list
of commands. The
input is then written to the standard output.

 A single command may be specified as the first argument to sed. 

Multiple commands may be specified by using the -e or -f options. All
commands are applied
to the input in the order they are specified regardless of their
origin.

 The following options are available:

 -E      Interpret regular expressions as extended (modern)

regular expressions rather than basic regular expressions (BRE's).
The re_format(7) manual page
fully describes both formats.

 -a      The files listed as parameters for the ``w'' functions

are created (or truncated) before any processing begins, by default.
The -a option causes
sed to delay opening each file until a command containing
the related ``w'' function is applied to a line of input.

 -e command
         Append the editing commands specified by the command

argument to the list of commands.

 -f command_file
         Append the editing commands found in the file

command_file to the list of commands. The editing commands should
each be listed on a separate line.

 -i extension
         Edit files in-place, saving backups with the specified

extension. If a zero-length extension is given, no backup will be
saved. It is not recom-
mended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in situations
where disk space is
exhausted, etc.

 -l      Make output line buffered.

 -n      By default, each line of input is echoed to the standard

output after all of the commands have been applied to it. The -n
option suppresses this
behavior.

 The form of a sed command is as follows:

       [address[,address]]function[arguments]

 Whitespace may be inserted before the first address and the

function portions of the command.

 Normally, sed cyclically copies a line of input, not including

its terminating newline character, into a pattern space, (unless there
is something left
after a ``D'' function), applies all of the commands with
addresses that select that pattern space, copies the pattern space to
the standard output, append-
ing a newline, and deletes the pattern space.

 Some of the functions use a hold space to save all or part of the

pattern space for subsequent retrieval.

anything else?
it's clear my problem?

thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不再见 2024-12-12 14:04:37

我不太了解 bash shell,所以无法评估失败是什么。
这只是对正则表达式的观察(这可能是错误的)。

* 模式正则表达式看起来没问题:
^.*(.)\1+$ 以相同字母结尾..

但文字模式可能不会按照您的想法操作。
当前:^.*[$1]$ 以“文字字符串”结尾
这不应该使用字符类。

将其更改为:^.*$1$

意识到 $1 中的字符串(在进入正则表达式之前)应该被转义
如果其中包含任何正则表达式元字符。

否则,你打算开设角色班吗?

I don't know bash shell too well so I can't evaluate what the failure is.
This is just an observation of the regex as understood (this may be wrong).

The * mode regex looks ok:
^.*(.)\1+$ that ended with same letters..

But the literal mode might not do what you think.
current: ^.*[$1]$ that ended with 'literal string'
This shouldn't use a character class.

Change it to: ^.*$1$

Realize though the string in $1 (before it goes into the regex) should be escaped
incase there are any regex metacharacters contained within it.

Otherwise, do you intend to have a character class?

篱下浅笙歌 2024-12-12 14:04:37
perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
'

示例:

echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
' '*'

产生

ab
ac
ad
ee
ef
perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
'

Example:

echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
' '*'

produces

ab
ac
ad
ee
ef
帅哥哥的热头脑 2024-12-12 14:04:37

一个可能的问题:

  • 当您在命令行上输入 * 时,shell 会将其替换为目录中所有文件的名称。您的 $1 永远不会等于 *

还有一些提示:

  • 您可以替换 替换:

This:

# list files in current dir
list=`ls BRUTE*` 
for i in $list 

With:

for i in BRUTE*
  • And:

This:

cat $FILE | grep -v "^$"

With:

grep -v "^$" $FILE

除了可能的问题之外,我看不出有什么问题。 干净是什么意思?您能举例说明文件之前和之后的样子以及命令的样子吗?

A possible issue:

  • When you put * on the command line, the shell replaces it with the name of all the files in your directory. Your $1 will never equal *.

And some tips:

  • You can replace replace:

This:

# list files in current dir
list=`ls BRUTE*` 
for i in $list 

With:

for i in BRUTE*
  • And:

This:

cat $FILE | grep -v "^$"

With:

grep -v "^$" $FILE

Besides the possible issue, I can't see anything jumping out at me. What do you mean clean? Can you give an example of what a file should look like before and after and what the command would look like?

强辩 2024-12-12 14:04:37

这就是问题所在!

grep '\(.\)\1[^\r\n]

在 MAC OSX 上,( ) { } 等...必须加引号!

解决了,谢谢。

*

在 MAC OSX 上,( ) { } 等...必须加引号!

解决了,谢谢。

This is the problem!

grep '\(.\)\1[^\r\n]

on MAC OSX, ( ) { }, etc... must be quoted!!!

Solved, thanks.

*

on MAC OSX, ( ) { }, etc... must be quoted!!!

Solved, thanks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文