bash 将文本分割成有限的字符桶(数组成员)

发布于 2024-08-31 11:17:47 字数 469 浏览 4 评论 0原文

我有诸如

http://pastebin.com/H8zTbG54

之类的文本,我们可以说该文本是一组分割的规则通过行尾的“OR”,

我需要将一组行(规则)放入桶(bash数组成员)中,但我对每个数组成员有字符限制,即1024,

因此每个数组成员应该包含一组规则,但字符数对于每个数组成员不能超过 1024

假设规则文本如下 a OR b OR c OR d OR e OR f OR g OR h

输出应为 数组成员 1 = a OR b

数组成员 2 = c OR d OR e

数组成员 3 = f OR g

数组成员 4 = h

任何人都可以帮助我

在 solaris 10 服务器上执行此操作

i have text such as

http://pastebin.com/H8zTbG54

we can say this text is set of rules splitted by "OR" at the end of lines

i need to put set of lines(rules) into buckets (bash array members) but i have character limit for each array member which is 1024

so each array member should contain set of rules but character count for each array member can not exceed 1024

suppose rule text like
a OR b OR c OR d OR e OR f OR g OR h

output should be
array member 1 = a OR b

array member 2 = c OR d OR e

array member 3 = f OR g

array member 4 = h

can anybody help me to do that

working on solaris 10 server

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

≈。彩虹 2024-09-07 11:17:48

这并不完全是微不足道的,需要更多的澄清,但基本上你最初通过 OR/AND 来分割它们(也许还有一些其他模式,取决于你的需要),然后再次递归地分割那些大于 1024 的块。

PS 这似乎就是其中一种情况,当使用成熟的脚本语言(例如 Perl、Python、PHP 或任何其他语言)时,能够更方便地实现结果。

例如。 PHP 中的一个基本内容(不确定是否完全正确,有一段时间没有使用 PHP),可以是这样的:

function splitByOr($input)
{
  $tokens = explode(" OR ",$input);
  foreach ($t in $tokens)
    if (strlen($t) > 1024)
         $t=splitByOr($t);
  return $tokens;
}

This is not entirely trivial and would require a bit more clarification, but basically you split them initially by OR/AND (and maybe some other patterns, depending on your needs) and then recursively split again those chunks that are larger than 1024.

P.S. This seems one of those cases, when using a fully-fledged scripting language such as Perl, Python, PHP or any other would be able to achieve result more convieniently.

Eg. a basic thing in PHP (not sure if completely correct, haven't done PHP in a while), could go like this:

function splitByOr($input)
{
  $tokens = explode(" OR ",$input);
  foreach ($t in $tokens)
    if (strlen($t) > 1024)
         $t=splitByOr($t);
  return $tokens;
}
墨离汐 2024-09-07 11:17:48

示例规则文件中的单个规则的长度都不超过 148 个字符 - 远小于 1024 个字符的限制。您没有说明如果规则确实超出了该限制,应该如何处理这些规则。

这是一个非常简单的 Bash 脚本,它将把文本“\n”上的示例拆分为名为“rules”的数组。它会跳过超过 1024 个字符的行并打印一条错误消息:

#!/bin/bash
while read -r line
do
    (( count++ ))
    if (( ${#line} > 1024 ))
    then
        echo "Line length limit of 1024 characters exceeded: Length: ${#line} Line no.: $count"
        echo "$line"
        continue
    fi
    rules+=($line)
done < <(echo -e "$(<samplerule)")

此变体将截断行长度,而不考虑后果:

#!/bin/bash
while read -r line
do
    rules+=(${line:0:1024})
done < <(echo -e "$(<samplerule)")

如果文字“\n”实际上不在文件中,并且您需要使用 Bash 数组而不是对其进行编码完全在 AWK 中,将上述任一版本中的行更改为:

done < <(echo -e "$(<samplerule)")

done < <(awk 'BEGIN {RS="OR"} {print $0,"OR"}' samplerule)
if [[ "${rules[${#rules[@]}-1]}" == "OR" ]]
then
    unset "rules[${#rules[@]}-1]"
fi

这将拆分“OR”上的行。

编辑:添加了一个命令来删除末尾多余的“OR”。

None of the individual rules in the samplerule file exceed 148 characters in length - far less than the 1024 character limit. You don't say what should be done with the rules if they do exceed that limit.

This is a very simple Bash script that will split your sample on literal "\n" into and array called "rules". It skips lines that exceed 1024 characters and prints an error message:

#!/bin/bash
while read -r line
do
    (( count++ ))
    if (( ${#line} > 1024 ))
    then
        echo "Line length limit of 1024 characters exceeded: Length: ${#line} Line no.: $count"
        echo "$line"
        continue
    fi
    rules+=($line)
done < <(echo -e "$(<samplerule)")

This variation will truncate the line length without regard to the consequences:

#!/bin/bash
while read -r line
do
    rules+=(${line:0:1024})
done < <(echo -e "$(<samplerule)")

If the literal "\n" is not actually in the file and you need to use Bash arrays rather than coding this entirely in AWK, change the line in either version above that says this:

done < <(echo -e "$(<samplerule)")

to say this:

done < <(awk 'BEGIN {RS="OR"} {print $0,"OR"}' samplerule)
if [[ "${rules[${#rules[@]}-1]}" == "OR" ]]
then
    unset "rules[${#rules[@]}-1]"
fi

which will split the lines on the "OR".

Edit: Added a command to remove an extra "OR" at the end.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文