如何从绑定标量变量的文件输出中过滤特定行?
我尝试使用正则表达式从 VMware VMX 文件中过滤除非常特定的文本行之外的所有内容,我正在通过 foreach 循环运行该文件,因为每个虚拟机都有多个文件。每次循环运行时,它都会绑定 Net 的输出::OpenSSH 正在对 VM 服务器上的文件运行 cat 到标量变量。
我不确定这是否真的有意义。
无论如何,我遇到的问题是,当脚本运行时,它与我的正则表达式中的任何内容都不匹配,它只是一个接一个地显示所有 cated VMX 文件。我不知道我错过了什么。
这是我正在处理的代码示例。
sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
port => $ssh_port,
user => $esx_user,
password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();
foreach my $vm (@virtual_machines) {
# Replace "[" with "/"
$vm =~ s/\[/\//;
# Replace "]" with "/"
$vm =~ s/\]/\//;
# Match ID, NAME and VMX location
$vm =~ m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
# Build hash table of discovered virtual machines
$virtual_machines{"$2"}{"ID"} = "$1";
$virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
$virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);
有问题
的部分位于“undef @virtual_machines”行之后。示例中的第 38 行 我的第一个目标是将该行与“guestOSAltName”一词相匹配,我想一旦我完成了该部分,我将再次上路,只是遇到了障碍。
这里还有一个示例 VMX 文件可供查看。
.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"
displayName = "NS02"
extendedConfigFile = "NS02.vmxf"
scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"
sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"
ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"
I am trying to use regex to filter out everything but a very specific line of text from a VMware VMX file which I am running through a foreach loop because there are multiples of the file for each VM. Each time the loop runs though it binds the output of Net::OpenSSH which is running cat against the file sitting on the VM server to a scalar variable.
I am not sure if that actually made any sense.
Anyhow the problem I am running into is when the script runs it is not matching to anything in my regex expression it is just displaying all of the cated VMX files one after another. I can't figure out what I am missing.
Here is the sample of code of I am working on.
sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
port => $ssh_port,
user => $esx_user,
password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();
foreach my $vm (@virtual_machines) {
# Replace "[" with "/"
$vm =~ s/\[/\//;
# Replace "]" with "/"
$vm =~ s/\]/\//;
# Match ID, NAME and VMX location
$vm =~ m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
# Build hash table of discovered virtual machines
$virtual_machines{"$2"}{"ID"} = "$1";
$virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
$virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);
}
The part in question is after the "undef @virtual_machines" line. Line 38 in the sample
My first goal is to match the line with the word "guestOSAltName" I think once I get that part done I will be on my way again, just hit a road block.
Here is a sample VMX file to look at too.
.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"
displayName = "NS02"
extendedConfigFile = "NS02.vmxf"
scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"
sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"
ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据您提供的信息很难说,但我认为问题是正则表达式与
您提供的文件不匹配,因为
^
断言与 start-of-字符串,而不是行首。由于正则表达式不匹配,$1
保留程序早期的旧值,并打印出来。为了安全起见,您应该在使用捕获之前检查实际匹配的正则表达式:或者通过将匹配放入列表上下文来获取捕获:
要使
^
匹配行首,您需要/ m
修饰符,它将^$
更改为按行匹配而不是按字符串匹配:这就是 Perl 最佳实践中的 Damian Conway 建议您始终使用
/ m
——因为这样^$
总是做你直觉认为他们应该做的事情。 [事实上,他建议始终使用/xms
。你已经完成了三分之一:) ]PS:从这一点开始,一切都是一般的代码审查批评,与问题没有直接关系。我希望它有用,但请随意忽略它。
我发现在正则表达式和其他双引号上下文中过度使用转义字符
通常最好在单引号上下文中重写:
此外,这个正则表达式很难阅读:
您正在使用
/x
标签,为什么不利用它呢?我还会考虑使用命名捕获:
现在,您不再对
$2
和$9
进行神秘引用,而是对$+ 进行清晰、明显、自记录的引用{ID}
和$+{VERSION}
。我已将其余组设为非捕获组(?:regex)
,但如果我想稍后捕获一个组,我可以将其设为命名捕获,而无需更改索引与所有其他捕获不同,与位置捕获不同。命名捕获也不太可能遇到上面提到的旧值问题,即失败的捕获会使所有
$1
变量保持旧状态。It's hard to say with the information you've given, but I think the problem is that the regex
doesn't match the file you've given, because the
^
assertion matches start-of-string, not start-of-line. Since the regex doesn't match,$1
keeps its old value from earlier in the program, which gets printed out. For safety, you should check the regex actually matched before using captures:Or grab the captures by putting the match in list context:
To make
^
match start-of-line, you need the/m
modifier, which changes^$
to match linewise instead of stringwise:This is why Damian Conway in Perl Best Practices recommends that you always use
/m
-- because then^$
always do what you intuitively think they should do. [He in fact recommends always using/xms
. You're one-third of the way there :) ]PS: Everything from this point on is general code-review criticism, not directly related to the question. I hope it's useful, but feel free to ignore it.
I find that overuse of escaped chars in regexes and other double-quotish contexts
is often better rewritten in a single-quotish context:
Furthermore, this regex is pretty hard to read:
You're using the
/x
tag, why not take advantage of it?I'd also consider using named captures:
Now instead of cryptic references to
$2
and$9
afterward, you have clear, obvious, self-documenting references to$+{ID}
and$+{VERSION}
. I've made the rest of the groups into non-capturing groups(?:regex)
, but if I want to capture one at a later date I can make it into a named capture without changing the indices of all the other captures, unlike with positional capturing.Named captures are also less likely to suffer from the old value problem mentioned above, where a failed capture leaves all the
$1
variables in their old state.如果我猜对了你想要的,它可能是这样的
If I guess right at what you want, it's probably something like
正如@canavanin所说,问题是你有一个多行文本,所以你需要使用
m//m
才能有^
和$ 表示行的开头和结尾(而不是字符串的开头和结尾)。捕获与变量的匹配也更好(更安全)(在 perl > 5.10 中,您也如 @Potter 指出的那样命名了捕获)。最后, m//x 非常有用,但前提是您将正则表达式写在几行中,以便允许注释并忘记空格,但在单行中是无用的,并且容易出错,因为人们忘记了显式写空格
\s
或\s+
并放置实数(但由x
转义)空格。另外,正如您所说,您想要打印该行,而不仅仅是
'guestOSAltName'
,那么您需要捕获直到行尾:m/(^guestOSAltName .+$)/m
(如果将单行模式添加到多行//ms
中,那么您需要使.+
非贪婪.+?
允许$
在单行模式下被贪婪的.+
消耗之前匹配行尾)如果你有更多比这样的行之一,那么您希望有一个多重匹配模式 /g 并捕获在一个数组中:
As @canavanin has said, The problem is that you have a multiline text so you need to use
m//m
in order to have^
and$
meaning start and end of line (instead start and end of string). Also is better (safer) to capture the match to a variable (in perl >5.10 also you have named captures as @Potter pointed out). Finally, the m//x is very useful but only if you write your regex in several lines, in order to allow comments and forget about spaces, but in a single line is useless and is error prone because people forget about explicitly write spaces with\s
or\s+
and put real (but escaped by thex
) whitespaces.Also as you said you wanted to print the line, not only the
'guestOSAltName'
, then you need to capture until the end of line:m/(^guestOSAltName .+$)/m
(if you add the single-line-mode to the multi-line//ms
then you would need to make the.+
non greedy.+?
to allow$
to match the end of line before it being consumed by the greedy.+
in single-line-mode)If you have more than one of such lines, then you would like to have a multiple-matching-mode /g and capture in an array: