如何读取文件并为每一行记录
寻求有关编写 Perl 程序的帮助,该程序接受输入文件并根据后续命令执行操作。我是一名 Perl 初学者,所以请不要给出太提前的建议。到目前为止我的结构是一个主程序和 4 个子程序。
我在两个部分遇到问题:
写入主段的一部分,为输入文件(固定宽度格式)中的每一行创建唯一的记录。我认为这应该用 substr 来完成,但我不太了解它应该如何构造。 Unpack 到目前为止超出了我的学习范围。
主程序中调用的函数之一是“距离”子函数,它将计算原子之间的距离。我认为这应该是 For 循环内的 For 循环。关于我应该采取什么方法有什么想法吗?
记录应存储一组原子记录(每个换行一个记录/原子):
• 原子的序列号,5 位数字。 (第 7 - 11 栏)
• 所属氨基酸的三字母名称(第 18 - 20 栏)
• 原子的三坐标实数(十进制和十进制)正交坐标 (x,y,z)(第 31 - 54 列)
对于 X,以埃为单位。 31-38
对于 Y,以埃为单位。 39-46
对于 Z,单位为埃列。 47-54
• 原子的一个或两个字母元素名称(例如 C、O、N、Na)(第 77-78 栏)
sub 距离 # 获取原子记录数组并返回最大距离
# 该数组中所有原子对之间。 (第 31-54 栏)
以下是输入文件中的示例文本。
# truncating for testing purposes. Actual data is aprox. 100 columns
# and starts with ATOM or HETATM
__DATA__
ATOM 4743 CG GLN A 704 19.896 32.017 54.717 1.00 66.44 C
ATOM 4744 CD GLN A 704 19.589 30.757 55.525 1.00 73.28 C
ATOM 4745 OE1 GLN A 704 18.801 29.892 55.098 1.00 75.91 O
这是到目前为止我所拥有的 make 记录的主记录和子记录。我讨厌蹩脚,但我还没有任何可显示的距离子项,所以不用担心提供代码,任何有关如何处理的建议将非常感激。
use warnings;
use strict;
my @fields;
my @recs;
while ( <DATA> ) {
chomp;
@fields = split(/\s+/);
push @recs, makeRecord(@fields);
}
for (my $i = 0; $i < @recs; $i++) {
printRec( $recs[$i] );
}
my %command_table = (
freq => \&freq,
length => \&length,
density => \&density,
help => \&help,
quit => \&quit
);
print "Enter a command: ";
while ( <STDIN> ) {
chomp;
my @line = split( /\s+/);
my $command = shift @line;
if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
print "Command must be: freq, length, density or quit\n";
}
else {
$command_table{$command}->();
}
print "Enter a command: ";
}
sub makeRecord
# Read the entire line and make records from the lines that contain the
# word ATOM or HETATM in the first column. Not sure how to do this:
{
my %record =
(
serialnumber => shift,
aminoacid => shift,
coordinates => shift,
element => [ @_ ]
);
return\%record;
}
Looking for help on writing a Perl program that takes an input file and performs manipulations based on follow-up commands. I'm a beginning Perl student so please don't get too advance in suggestions. The structure that I have so far is a main program and 4 subs.
I'm having trouble with two parts:
Writing the portion of the main segment that creates a unique record for each line from the input file (which is fixed width format). I think this should be done with substr but I don't know much more of how this should be structured. Unpack is beyond the scope of my learning so far.
One of the functions called in the main program is a "distance" sub which will calculate distance between atoms. I'm thinking this should be a For Loop inside a For loop. Any thoughts on what approach I should take?
The records should store an array of atom records (one record/atom per newline):
• The atom's serial number, 5 digits. (cols 7 - 11)
• The three-letter name of the amino acid to which it belongs (cols 18 - 20)
• The atom's three coordinates real number as decimal & Orthogonal Coordinates (x,y,z) (cols 31 - 54 )
For X in Angstroms cols. 31-38
For Y in Angstroms cols. 39-46
For Z in Angstroms cols. 47-54
• The atom's one- or two-letter element name (e.g. C, O, N, Na) (cols 77-78 )
sub Distance
# take an array of atom records and return the max distance
# between all pairs of atoms in that array. (cols 31-54)
Here is sample text from an input file.
# truncating for testing purposes. Actual data is aprox. 100 columns
# and starts with ATOM or HETATM
__DATA__
ATOM 4743 CG GLN A 704 19.896 32.017 54.717 1.00 66.44 C
ATOM 4744 CD GLN A 704 19.589 30.757 55.525 1.00 73.28 C
ATOM 4745 OE1 GLN A 704 18.801 29.892 55.098 1.00 75.91 O
Here is what I have so far for the main and sub for make records. I hate to be lame but I don't have anything to show for the Distance sub yet so don't worry about giving code, any suggestions on how to approach would be very appreciated.
use warnings;
use strict;
my @fields;
my @recs;
while ( <DATA> ) {
chomp;
@fields = split(/\s+/);
push @recs, makeRecord(@fields);
}
for (my $i = 0; $i < @recs; $i++) {
printRec( $recs[$i] );
}
my %command_table = (
freq => \&freq,
length => \&length,
density => \&density,
help => \&help,
quit => \&quit
);
print "Enter a command: ";
while ( <STDIN> ) {
chomp;
my @line = split( /\s+/);
my $command = shift @line;
if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
print "Command must be: freq, length, density or quit\n";
}
else {
$command_table{$command}->();
}
print "Enter a command: ";
}
sub makeRecord
# Read the entire line and make records from the lines that contain the
# word ATOM or HETATM in the first column. Not sure how to do this:
{
my %record =
(
serialnumber => shift,
aminoacid => shift,
coordinates => shift,
element => [ @_ ]
);
return\%record;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在线有 Perl 代码可用于处理 PDB 文件(显然您正在这样做)。我并不是建议只使用您下载的模块并完成它,因为您的老师肯定不会批准,而且您也不会学到那么多;)但是您可以看一下提供的一些代码,然后尝试看看其中的某些部分是否可以解决您的问题。
我快速进行了谷歌搜索,发现有 ParsePDB.pm(例如)。您可以在此处找到该网页。不过,我没有查看代码或功能,我只是希望其中有一些内容对您有所帮助。
编辑1
好吧,现在已经14小时过去了,我想做一些编码,所以由于你还没有接受答案,我想我可以忽略我自己的建议并起草一些东西(正如你会注意到我已经复制了 Zaid 的数据结构)...
编辑 2
关于距离子例程:for 循环内的 for 循环应该可以完成这项工作,但这是暴力方式,可能需要相当长的时间while(因为您必须进行 (number_of_atoms)^2 计算),具体取决于输入分子的大小。出于您的任务目的,暴力方法可能是可以接受的;在其他情况下,您必须决定是否支持编码的简易性或计算速度。如果您的老师也希望您记住后者,您可以查看此页面 (我知道你实际上想要最大的距离,而且你是在 3D 中,而不是 2D...)
好吧,现在我只是希望你能在这里找到一些有用的点滴:)
There's Perl code available online for working with PDB files (which obviously you are doing). I'm not suggesting just using a module you downloaded and be done with it, as surely your instructor wouldn't approve, and you wouldn't learn that much ;) But you could take a look at some of the code that's offered and try to see whether some bits there address your problem.
I did a quick bit of googling, I saw that there's ParsePDB.pm (for example). You can find the web page here. I didn't have a look at the code or the functionality though, I'm just hoping there will be something in there that you may find helpful.
EDIT 1
Okay, it's 14 hours later now, and I felt like doing some coding, so as you have not yet accepted an answer I thought I could just ignore my own advice and draw up something (as you will notice I have copied Zaid's data structure)...
EDIT 2
Concerning the distance subroutine: the for loop inside the for loop should do the job, but this is the brute force way which might take quite a while (as you'd have to do (number_of_atoms)^2 calculations), depending on the size of your input molecule. For the purpose of your assignment the brute force approach is probably acceptable; in other cases you'd have to decide whether to favour ease of coding, or computational speed. If your instructor also wants you to keep the latter in mind, you could take a look at this page (I know you actually want the largest distance, and you're in 3D, not 2D...)
Ok, now I just hope that you managed to find some helpful bits and pieces in here :)
奇怪的是,当我可以时
unpack
超出了范围请参阅调度表的使用。如果正在处理固定格式的文件,那么忽略使用unpack
是很愚蠢的。下面的代码中没有发生任何“高级”事情:It is strange that
unpack
is out of scope when I can see use of a dispatch table. It would be silly to overlook usingunpack
if fixed-format files are being processed. There is nothing 'advanced' going on in the code below:您的记录具有固定宽度的格式,因此请使用
unpack
将每条记录分解为感兴趣的字段。使用每个字段的规定列位置构建一个用于unpack
的模板。Your records have a fixed-width format, so use
unpack
to break each record into the fields of interest. Use the stated column positions of each field to construct a template for use withunpack
.