是否可以将 bash 数组作为变量传递给 awk?
我有大量数据要从文本文件导入。这些文件已预先格式化,以便我可以将每一列导入为 bash 数组:
2GYS 链=(AB) hresname=(BMA FUC NAG NDG) hresnumber=( ) hatom=( )
现在我想从包含多行格式的文件中提取信息,如下所示:
原子 1 N THR A 4 30.127 13.123 1.297 1.00 39.96 N
例如,我想提取第一列是 ATOM 并且第五列与链数组匹配的所有行(在这种情况下,它既是 A 又是 B) 。
更新。这是我尝试过的:
for c in "${chain[@]}" ; do
awk -v pdbid="$pdbid" -v c="$c" '{ if($1 == "ATOM" && $5==c) { print $0 } }' ${pdbid}.pdb >> ../../properpdb/${pdbid}_${c}.pdb
done
for c in "${chain[@]}" ; do
for r in "${hresname[@]}" ; do
awk -v pdbid="$pdbid" -v c="$c" -v r="$r" '{ if($1 == "HETATM" && $5==c && $4==r) { print $0 } }' ${pdbid}.pdb >> ../../properpdb/${pdbid}_${c}.pdb
done
done
问题是,正如预期的那样,这会生成具有链 A 或 B 的文件,但不会生成同时具有两者的文件。此外,它不会生成数组“chain”和“hresname”的所有可能组合,它只是将“hresname”添加到只有一个“chain”可用的文件中。
I have a large number data that I am importing from text files. The files are preformatted so that I can import each column as a bash array:
2GYS chain=(A B) hresname=(BMA FUC NAG NDG) hresnumber=( ) hatom=( )
Now I would like to extract information from files containing several lines formatted like this:
ATOM 1 N THR A 4 30.127 13.123 1.297 1.00 39.96 N
For instance, I would like to extract all lines in which the first column is ATOM and the fifth column matches the chain array (in this case, it would be both A and B).
UPDATE. This is what I have tried:
for c in "${chain[@]}" ; do
awk -v pdbid="$pdbid" -v c="$c" '{ if($1 == "ATOM" && $5==c) { print $0 } }' ${pdbid}.pdb >> ../../properpdb/${pdbid}_${c}.pdb
done
for c in "${chain[@]}" ; do
for r in "${hresname[@]}" ; do
awk -v pdbid="$pdbid" -v c="$c" -v r="$r" '{ if($1 == "HETATM" && $5==c && $4==r) { print $0 } }' ${pdbid}.pdb >> ../../properpdb/${pdbid}_${c}.pdb
done
done
The problem is that, as expected this produces files with either chain A or B but not the file with both. In addition it does not produce all possible combinations of the arrays "chain" and "hresname", it just adds "hresname" to the files for which only one "chain" was available.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的解决方案是在 bash 中构建 awk 脚本的一部分,特别是匹配函数。
您似乎想要匹配
$1 == "ATOM" && 的字段($5==c[0] || $5==c[1]...) {print $0}
导出到文件。在 bash 中,将匹配函数构造为:
现在可以调整 awk 脚本以包含所需的位:(引号仍然很麻烦,因为您需要确保 $1 不受干扰地降到 awk,但会评估 $cmatch。 )
现在你的匹配脚本应该完成了。
我不太理解输出文件名
../../properpdb/${pdbid}_${c}.pdb
,因为这似乎表示每个元素的单独文件c,这是你不想要的?如果你想要将它们除以 c 的元素,那么稍微简单一点,像上面一样构造 rmatch 数组,然后执行类似的操作
如果你首先想要所有 ATOM 元素,或者...
如果你想要它们混合
My solution would be to build part of your awk script in bash, specifically the matching function.
You seem to want fields that match
$1 == "ATOM" && ($5==c[0] || $5==c[1]...) {print $0}
exported to the file.In bash, construct the matching function as:
Now your awk-scripts can be adjusted to include the needed bits: (Quotes continue to be a pain, since you need to make sure $1 gets down to awk unmolested, but $cmatch is evaluated.)
So now your matching script should be complete.
I don't really understand the output file name,
../../properpdb/${pdbid}_${c}.pdb
, since that would seem to indicate seperate files for each element of c, which is what you don't want?If you want these divided by elements of c, then it's slightly simpler, construct the rmatch array like above, and then do something like
If you want all ATOM elements first, or...
if you want them intermixed