正则表达式删除文件中以相同或定义的字母结尾的行

发布于 2024-12-05 14:04:37 字数 2746 浏览 1 评论 0原文

我需要一个用于 mac osx 的 bash 脚本，以这种方式工作：

./script.sh * folder/to/files/ 
#
# or #
#
./script.sh xx folder/to/files/

该脚本

读取文件列表，
打开每个文件并读取每一行，
如果行以相同的字母结尾（'*' 模式）或使用自定义字母（'xx'）然后
删除行并重新保存文件
备份原始文件

我执行此操作的第一种方法：

#!/bin/bash

# ck init params
if [ $# -le 0 ]
then
  echo "Usage: $0 <letters>"
  exit 0
fi

# list files in current dir
list=`ls BRUTE*` 
for i in $list 
do 

  # prepare regex    
  case $1 in
       "*") REGEXP="^.*(.)\1+$";;
       *) REGEXP="^.*[$1]$";;
  esac    
  FILE=$i

  # backup file
  cp $FILE $FILE.bak

  # removing line with same letters
  sed -Ee "s/$REGEXP//g" -i '' $FILE
  cat $FILE | grep -v "^$"

done

exit 0

但它无法按我想要的方式工作......

出了什么问题？
我该如何修复这个脚本？

示例：

$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$

如果我使用“*”，我希望所有以相同字母结尾的文件都是干净的。
如果我使用“ff”，我希望所有以“ff”结尾的文件都是干净的。

啊，它在 Mac OSx 上。请记住，sed 与经典的 Linux sed 略有不同。

人 sed

 sed [-Ealn] 命令 [文件 ...]
 sed [-Ealn] [-e 命令] [-f 命令文件] [-i 扩展名] [文件
...]
描述 sed 实用程序读取指定的文件或标准输入如果未指定文件，则修改列表指定的输入命令。这然后输入被写入标准输出。
 可以将单个命令指定为 sed 的第一个参数。 
可以使用 -e 或 -f 选项指定多个命令。全部命令被应用按照指定的顺序对输入进行排序，而不管它们的值如何起源。
 以下选项可用：

 -E 将正则表达式解释为扩展（现代）
正则表达式而不是基本正则表达式（BRE）。 re_format(7) 手册页完整描述了两种格式。
 -a 作为“w”函数参数列出的文件
默认情况下，
在任何处理开始之前创建（或截断）。 -a 选项导致 sed 延迟打开每个文件，直到包含的命令相关的“w”函数应用于一行输入。
<前><代码> -e 命令追加命令指定的编辑命令
命令列表的参数。
<前><代码> -f 命令文件追加在文件中找到的编辑命令
command_file 到命令列表。编辑命令应该每个都列在单独的行中。
<前><代码> -i 扩展名就地编辑文件，保存指定的备份
扩展名。如果给出零长度扩展，则不会进行备份已保存。不推荐修正以在就位时提供零长度扩展编辑文件，因为在某些情况下您可能会面临损坏或部分内容的风险磁盘空间在哪里精疲力尽等
 -l 使输出行缓冲。

 -n 默认情况下，每行输入都会回显标准
应用所有命令后的输出。然后选项抑制此行为。
 sed命令的形式如下：

       [地址[,地址]]函数[参数]

 可以在第一个地址和地址之前插入空格
命令的功能部分。
 通常，sed 会循环复制一行输入，不包括
其终止换行符，进入模式空间，（除非有还剩下什么吗在“D”函数之后），应用所有命令选择该模式空间的地址，将模式空间复制到标准输出，附加换行符，并删除模式空间。
 某些函数使用保留空间来保存全部或部分
用于后续检索的模式空间。

还有什么吗？
我的问题清楚了吗？

谢谢。

原文

i need a bash script for mac osx working in this way:

./script.sh * folder/to/files/ 
#
# or #
#
./script.sh xx folder/to/files/

This script

read a list of files
open each file and read each lines
if lines ended with the same letters ('*' mode) or with custom letters ('xx') then
remove line and RE-SAVE file
backup original file

My first approach to do this:

#!/bin/bash

# ck init params
if [ $# -le 0 ]
then
  echo "Usage: $0 <letters>"
  exit 0
fi

# list files in current dir
list=`ls BRUTE*` 
for i in $list 
do 

  # prepare regex    
  case $1 in
       "*") REGEXP="^.*(.)\1+$";;
       *) REGEXP="^.*[$1]$";;
  esac    
  FILE=$i

  # backup file
  cp $FILE $FILE.bak

  # removing line with same letters
  sed -Ee "s/$REGEXP//g" -i '' $FILE
  cat $FILE | grep -v "^$"

done

exit 0

But it doesn't work as i want....

What's wrong?
How can i fix this script?

Example:

$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$

If i use '*' i want all files that ended with same letters to be clean.
If i use 'ff' i want all files that ended with 'ff' to be clean.

Ah, it's on Mac OSx. Remember that sed is a little different from classical linux sed.

man sed

 sed [-Ealn] command [file ...]
 sed [-Ealn] [-e command] [-f command_file] [-i extension] [file
...]
DESCRIPTION
The sed utility reads the specified files, or the standard input
if no files are specified, modifying the input as specified by a list
of commands. The
input is then written to the standard output.
 A single command may be specified as the first argument to sed. 
Multiple commands may be specified by using the -e or -f options. All
commands are applied
to the input in the order they are specified regardless of their
origin.
 The following options are available:

 -E      Interpret regular expressions as extended (modern)
regular expressions rather than basic regular expressions (BRE's).
The re_format(7) manual page
fully describes both formats.
 -a      The files listed as parameters for the ``w'' functions
are created (or truncated) before any processing begins, by default.
The -a option causes
sed to delay opening each file until a command containing
the related ``w'' function is applied to a line of input.
 -e command
         Append the editing commands specified by the command
argument to the list of commands.
 -f command_file
         Append the editing commands found in the file
command_file to the list of commands. The editing commands should
each be listed on a separate line.
 -i extension
         Edit files in-place, saving backups with the specified
extension. If a zero-length extension is given, no backup will be
saved. It is not recom-
mended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in situations
where disk space is
exhausted, etc.
 -l      Make output line buffered.

 -n      By default, each line of input is echoed to the standard
output after all of the commands have been applied to it. The -n
option suppresses this
behavior.
 The form of a sed command is as follows:

       [address[,address]]function[arguments]

 Whitespace may be inserted before the first address and the
function portions of the command.
 Normally, sed cyclically copies a line of input, not including
its terminating newline character, into a pattern space, (unless there
is something left
after a ``D'' function), applies all of the commands with
addresses that select that pattern space, copies the pattern space to
the standard output, append-
ing a newline, and deletes the pattern space.
 Some of the functions use a hold space to save all or part of the
pattern space for subsequent retrieval.

anything else?
it's clear my problem?

thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不再见 2024-12-12 14:04:37

我不太了解 bash shell，所以无法评估失败是什么。
这只是对正则表达式的观察（这可能是错误的）。

* 模式正则表达式看起来没问题：
^.*(.)\1+$ 以相同字母结尾..

但文字模式可能不会按照您的想法操作。
当前：^.*[$1]$ 以“文字字符串”结尾
这不应该使用字符类。

将其更改为：^.*$1$

意识到 $1 中的字符串（在进入正则表达式之前）应该被转义
如果其中包含任何正则表达式元字符。

否则，你打算开设角色班吗？

回复收藏 0 原文

篱下浅笙歌 2024-12-12 14:04:37

perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
'

示例：

echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
' '*'

产生

ab
ac
ad
ee
ef

perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
'

Example:

echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
' '*'

produces

ab
ac
ad
ee
ef

回复收藏 0 原文

帅哥哥的热头脑 2024-12-12 14:04:37

一个可能的问题：

当您在命令行上输入 * 时，shell 会将其替换为目录中所有文件的名称。您的 $1 永远不会等于 *。

还有一些提示：

您可以替换替换：

This：

# list files in current dir
list=`ls BRUTE*` 
for i in $list

With：

for i in BRUTE*

And：

This：

cat $FILE | grep -v "^$"

With：

grep -v "^$" $FILE

除了可能的问题之外，我看不出有什么问题。干净是什么意思？您能举例说明文件之前和之后的样子以及命令的样子吗？

A possible issue:

When you put * on the command line, the shell replaces it with the name of all the files in your directory. Your $1 will never equal *.

And some tips:

You can replace replace:

This:

# list files in current dir
list=`ls BRUTE*` 
for i in $list

With:

for i in BRUTE*

And:

This:

cat $FILE | grep -v "^$"

With:

grep -v "^$" $FILE

Besides the possible issue, I can't see anything jumping out at me. What do you mean clean? Can you give an example of what a file should look like before and after and what the command would look like?

回复收藏 0 原文

强辩 2024-12-12 14:04:37

这就是问题所在！

grep '\(.\)\1[^\r\n]
在 MAC OSX 上，( ) { } 等...必须加引号！
解决了，谢谢。
 *

在 MAC OSX 上，( ) { } 等...必须加引号！

解决了，谢谢。

This is the problem!

grep '\(.\)\1[^\r\n]
on MAC OSX, ( ) { }, etc... must be quoted!!!
Solved, thanks.
 *

on MAC OSX, ( ) { }, etc... must be quoted!!!

Solved, thanks.

回复收藏 0 原文

~没有更多了~

关于作者

白色秋天

暂无简介

0 文章

0 评论

23 人气

关注发私信

已经忘了多久

文章 0 评论 0

关注

15867725375

文章 0 评论 0

关注

LonelySnow

文章 0 评论 0

关注

走过海棠暮

文章 0 评论 0

关注

轻许诺言

文章 0 评论 0

关注

信馬由缰

文章 0 评论 0

友情链接

文江博客

正则表达式删除文件中以相同或定义的字母结尾的行

示例：

人 sed

Example:

man sed

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

正则表达式删除文件中以相同或定义的字母结尾的行

示例：

人 sed

Example:

man sed

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。