如何从绑定标量变量的文件输出中过滤特定行？

发布于 2024-10-10 12:56:18 字数 4813 浏览 3 评论 0原文

我尝试使用正则表达式从 VMware VMX 文件中过滤除非常特定的文本行之外的所有内容，我正在通过 foreach 循环运行该文件，因为每个虚拟机都有多个文件。每次循环运行时，它都会绑定 Net 的输出::OpenSSH 正在对 VM 服务器上的文件运行 cat 到标量变量。

我不确定这是否真的有意义。

无论如何，我遇到的问题是，当脚本运行时，它与我的正则表达式中的任何内容都不匹配，它只是一个接一个地显示所有 cated VMX 文件。我不知道我错过了什么。

这是我正在处理的代码示例。

sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
    port => $ssh_port,
    user => $esx_user, 
    password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();

foreach my $vm (@virtual_machines) {

    # Replace "[" with "/"

    $vm =~ s/\[/\//;

    # Replace "]" with "/"

    $vm =~ s/\]/\//;

    # Match ID, NAME and VMX location
    $vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
    # Build hash table of discovered virtual machines
    $virtual_machines{"$2"}{"ID"} = "$1";
    $virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
    $virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);

有问题

的部分位于“undef @virtual_machines”行之后。示例中的第 38 行我的第一个目标是将该行与“guestOSAltName”一词相匹配，我想一旦我完成了该部分，我将再次上路，只是遇到了障碍。

这里还有一个示例 VMX 文件可供查看。

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"

displayName = "NS02"
extendedConfigFile = "NS02.vmxf"

scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"

sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"

ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"

原文

I am trying to use regex to filter out everything but a very specific line of text from a VMware VMX file which I am running through a foreach loop because there are multiples of the file for each VM. Each time the loop runs though it binds the output of Net::OpenSSH which is running cat against the file sitting on the VM server to a scalar variable.

I am not sure if that actually made any sense.

Anyhow the problem I am running into is when the script runs it is not matching to anything in my regex expression it is just displaying all of the cated VMX files one after another. I can't figure out what I am missing.

Here is the sample of code of I am working on.

sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
    port => $ssh_port,
    user => $esx_user, 
    password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();

foreach my $vm (@virtual_machines) {

    # Replace "[" with "/"

    $vm =~ s/\[/\//;

    # Replace "]" with "/"

    $vm =~ s/\]/\//;

    # Match ID, NAME and VMX location
    $vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
    # Build hash table of discovered virtual machines
    $virtual_machines{"$2"}{"ID"} = "$1";
    $virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
    $virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);

}

The part in question is after the "undef @virtual_machines" line. Line 38 in the sample
My first goal is to match the line with the word "guestOSAltName" I think once I get that part done I will be on my way again, just hit a road block.

Here is a sample VMX file to look at too.

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"

displayName = "NS02"
extendedConfigFile = "NS02.vmxf"

scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"

sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"

ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

守护在此方 2024-10-17 12:56:18

根据您提供的信息很难说，但我认为问题是正则表达式与

$vm =~ m/^(\bguestOSAltName\b)/x;

您提供的文件不匹配，因为 ^ 断言与 start-of-字符串，而不是行首。由于正则表达式不匹配，$1 保留程序早期的旧值，并打印出来。为了安全起见，您应该在使用捕获之前检查实际匹配的正则表达式：

if ($vm =~ m/^(\bguestOSAltName\b)/x) {
    print "$1\n";
}
else {
    carp "Couldn't find guestOSAltName!";
}

或者通过将匹配放入列表上下文来获取捕获：

# $result gets $1 if the match succeeds, undef if it fails.
my ($result) = $vm =~ m/^(\bguestOSAltName\b)/x

要使 ^ 匹配行首，您需要 / m 修饰符，它将 ^$ 更改为按行匹配而不是按字符串匹配：

if ($vm =~ m/^(\bguestOSAltName\b)/xm) { ... }

这就是 Perl 最佳实践中的 Damian Conway 建议您始终使用 / m——因为这样^$总是做你直觉认为他们应该做的事情。 [事实上，他建议始终使用 /xms。你已经完成了三分之一:) ]

PS：从这一点开始，一切都是一般的代码审查批评，与问题没有直接关系。我希望它有用，但请随意忽略它。

我发现在正则表达式和其他双引号上下文中过度使用转义字符

$vm =~ s/\[/\//;

通常最好在单引号上下文中重写：

$vm =~ s'['/';

此外，这个正则表达式很难阅读：

$vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;

您正在使用 /x标签，为什么不利用它呢？

$vm =~  m/^(\d+) \s+ # $1: number of some sort
           (\S+) \s+ # $2: identifier we're interested in
           (\S+) \s+ # $3: VMX filename part a
           (\S+) \s+ # $4: VMX filename part b
           (\S+) \s+ # $5: another identifier
           (\D+)(\D) # $6, $7: at least two nondigits
           (\d)   # $8: digit
           (\d)   # $9: version digit
           /x;

我还会考虑使用命名捕获：

$vm =~  m/^(?:      \d+) \s+ # number of some sort
           (?<ID>   \S+) \s+ # $+{ID}: identifier we're interested in
           (?<VMXa> \S+) \s+ # $+{VMXa}: VMX filename part a
           (?<VMXb> \S+) \s+ # $+{VMXb}: VMX filename part b
           (?:      \S+) \s+
           (?:\D+)(?:\D)     # at least two nondigits
           (?:\d)            # one digit
           (?<VERSION> \d)   # $+{VERSION}: version digit
           /x;

现在，您不再对 $2 和 $9 进行神秘引用，而是对 $+ 进行清晰、明显、自记录的引用{ID} 和 $+{VERSION}。我已将其余组设为非捕获组 (?:regex)，但如果我想稍后捕获一个组，我可以将其设为命名捕获，而无需更改索引与所有其他捕获不同，与位置捕获不同。

命名捕获也不太可能遇到上面提到的旧值问题，即失败的捕获会使所有 $1 变量保持旧状态。

It's hard to say with the information you've given, but I think the problem is that the regex

$vm =~ m/^(\bguestOSAltName\b)/x;

doesn't match the file you've given, because the ^ assertion matches start-of-string, not start-of-line. Since the regex doesn't match, $1 keeps its old value from earlier in the program, which gets printed out. For safety, you should check the regex actually matched before using captures:

if ($vm =~ m/^(\bguestOSAltName\b)/x) {
    print "$1\n";
}
else {
    carp "Couldn't find guestOSAltName!";
}

Or grab the captures by putting the match in list context:

# $result gets $1 if the match succeeds, undef if it fails.
my ($result) = $vm =~ m/^(\bguestOSAltName\b)/x

To make ^ match start-of-line, you need the /m modifier, which changes ^$ to match linewise instead of stringwise:

if ($vm =~ m/^(\bguestOSAltName\b)/xm) { ... }

This is why Damian Conway in Perl Best Practices recommends that you always use /m -- because then ^$ always do what you intuitively think they should do. [He in fact recommends always using /xms. You're one-third of the way there :) ]

PS: Everything from this point on is general code-review criticism, not directly related to the question. I hope it's useful, but feel free to ignore it.

I find that overuse of escaped chars in regexes and other double-quotish contexts

$vm =~ s/\[/\//;

is often better rewritten in a single-quotish context:

$vm =~ s'['/';

Furthermore, this regex is pretty hard to read:

$vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;

You're using the /x tag, why not take advantage of it?

$vm =~  m/^(\d+) \s+ # $1: number of some sort
           (\S+) \s+ # $2: identifier we're interested in
           (\S+) \s+ # $3: VMX filename part a
           (\S+) \s+ # $4: VMX filename part b
           (\S+) \s+ # $5: another identifier
           (\D+)(\D) # $6, $7: at least two nondigits
           (\d)   # $8: digit
           (\d)   # $9: version digit
           /x;

I'd also consider using named captures:

$vm =~  m/^(?:      \d+) \s+ # number of some sort
           (?<ID>   \S+) \s+ # $+{ID}: identifier we're interested in
           (?<VMXa> \S+) \s+ # $+{VMXa}: VMX filename part a
           (?<VMXb> \S+) \s+ # $+{VMXb}: VMX filename part b
           (?:      \S+) \s+
           (?:\D+)(?:\D)     # at least two nondigits
           (?:\d)            # one digit
           (?<VERSION> \d)   # $+{VERSION}: version digit
           /x;

Now instead of cryptic references to $2 and $9 afterward, you have clear, obvious, self-documenting references to $+{ID} and $+{VERSION}. I've made the rest of the groups into non-capturing groups (?:regex), but if I want to capture one at a later date I can make it into a named capture without changing the indices of all the other captures, unlike with positional capturing.

Named captures are also less likely to suffer from the old value problem mentioned above, where a failed capture leaves all the $1 variables in their old state.

回复收藏 0 原文

只等公子 2024-10-17 12:56:18

如果我猜对了你想要的，它可能是这样的

if( $vm =~ /^guestOSAltName = (.+)\n/ )
{
  print "$1\n";
}

If I guess right at what you want, it's probably something like

if( $vm =~ /^guestOSAltName = (.+)\n/ )
{
  print "$1\n";
}

回复收藏 0 原文

与往事干杯 2024-10-17 12:56:18

正如@canavanin所说，问题是你有一个多行文本，所以你需要使用 m//m 才能有 ^ 和 $ 表示行的开头和结尾（而不是字符串的开头和结尾）。捕获与变量的匹配也更好（更安全）（在 perl > 5.10 中，您也如 @Potter 指出的那样命名了捕获）。最后， m//x 非常有用，但前提是您将正则表达式写在几行中，以便允许注释并忘记空格，但在单行中是无用的，并且容易出错，因为人们忘记了显式写空格\s 或 \s+ 并放置实数（但由 x 转义）空格。

另外，正如您所说，您想要打印该行，而不仅仅是 'guestOSAltName'，那么您需要捕获直到行尾：m/(^guestOSAltName .+$)/m （如果将单行模式添加到多行 //ms 中，那么您需要使 .+ 非贪婪 .+? 允许 $ 在单行模式下被贪婪的 .+ 消耗之前匹配行尾）

[not working code]
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";

[working code]
# make list context with parentheses
(my $guest_os_alias_line) = $vm=~m/^  # start of line (using /m)
                                   (   #start capturing
                                     guestOSAltName
                                     \b  # just in case guestOSAltName is a substring in an unwanted line
                                     .+  # everything else in the line (not matching \n because no /s)
                                   )  # end capturing
                                   $  # end of line (because /m)
                                 /xm; # multiline mode      
print "$guest_os_alias_line\n";

如果你有更多比这样的行之一，那么您希望有一个多重匹配模式 /g 并捕获在一个数组中：

(my @guest_os_alias_lines) = $vm=~m/^  # start of line (using /m)
                                   (   #start capturing
                                     guestOSAltName
                                     \b  # just in case guestOSAltName is a substring in an unwanted line
                                     .+  # everything else in the line (not matching \n because no /s)
                                   )  # end capturing
                                   $  # end of line (because /m)
                                 /xmg; # multiline mode (m) and multi-matching(g)      
print "@guest_os_alias_line\n"; # not needed `join ("\n",@guest_os_alias_line)` because the lines contain the `\n` already

As @canavanin has said, The problem is that you have a multiline text so you need to use m//m in order to have ^ and $ meaning start and end of line (instead start and end of string). Also is better (safer) to capture the match to a variable (in perl >5.10 also you have named captures as @Potter pointed out). Finally, the m//x is very useful but only if you write your regex in several lines, in order to allow comments and forget about spaces, but in a single line is useless and is error prone because people forget about explicitly write spaces with \s or \s+ and put real (but escaped by the x) whitespaces.

Also as you said you wanted to print the line, not only the 'guestOSAltName', then you need to capture until the end of line: m/(^guestOSAltName .+$)/m (if you add the single-line-mode to the multi-line //ms then you would need to make the .+ non greedy .+? to allow $ to match the end of line before it being consumed by the greedy .+ in single-line-mode)

[not working code]
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";

[working code]
# make list context with parentheses
(my $guest_os_alias_line) = $vm=~m/^  # start of line (using /m)
                                   (   #start capturing
                                     guestOSAltName
                                     \b  # just in case guestOSAltName is a substring in an unwanted line
                                     .+  # everything else in the line (not matching \n because no /s)
                                   )  # end capturing
                                   $  # end of line (because /m)
                                 /xm; # multiline mode      
print "$guest_os_alias_line\n";

If you have more than one of such lines, then you would like to have a multiple-matching-mode /g and capture in an array:

(my @guest_os_alias_lines) = $vm=~m/^  # start of line (using /m)
                                   (   #start capturing
                                     guestOSAltName
                                     \b  # just in case guestOSAltName is a substring in an unwanted line
                                     .+  # everything else in the line (not matching \n because no /s)
                                   )  # end capturing
                                   $  # end of line (because /m)
                                 /xmg; # multiline mode (m) and multi-matching(g)      
print "@guest_os_alias_line\n"; # not needed `join ("\n",@guest_os_alias_line)` because the lines contain the `\n` already

回复收藏 0 原文

~没有更多了~