如何将缺少的行（包含页码的URL）添加到数组（例如Linux中的SEQ）

发布于 2025-01-22 21:03:30 字数 1374 浏览 0 评论 0原文

我有一个由表单的URL组成的数组：

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/5/")

它们始终具有/页/1/，我需要添加（或重建）所有丢失的URL从最高页面下降到1，所以最终都像

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/2/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/2/"
,"https://somesite.com/folderBBB/page/3/"
,"https://somesite.com/folderBBB/page/4/"
,"https://somesite.com/folderBBB/page/5/")

：伪代码将是类似的：

对于每个文件夹，提取最高页码：

hxxps：//somesite.com/folderbbbbbb /page/5

将其从（5）扩展到（1）

  hxxps：//somesite.com/folderbbb/page/1/
  hxxps：//somesite.com/folderbbbbbbbbb/page/2/
  hxxps：//somesite.com/folderbbbbbbbbbbbbbbbbbbbb/page/3/
  hxxps：//somesite.com/folderbbbbbbbbbbbbbbbb/page/4/
  hxxps：//somesite.com/folderbbbbbbbbbbbbb/page/5/

将其输出到阵列

欢迎任何指针！

原文

I have an array consisting of URLS of the form:

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/5/")

They always have /page/1/, I need to add (or reconstruct) all missing URLS from the highest page down to 1 so it ends up like so:

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/2/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/2/"
,"https://somesite.com/folderBBB/page/3/"
,"https://somesite.com/folderBBB/page/4/"
,"https://somesite.com/folderBBB/page/5/")

I'd imagine the Pseudo-Code would be something like:

For each folder, extract the highest page number:

hxxps://somesite.com/folderBBB/page/5/

Expand this out from (5) to (1)

 hxxps://somesite.com/folderBBB/page/1/
  hxxps://somesite.com/folderBBB/page/2/
  hxxps://somesite.com/folderBBB/page/3/
  hxxps://somesite.com/folderBBB/page/4/
  hxxps://somesite.com/folderBBB/page/5/

Output this into an array

Any pointers would be welcome!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

日记撕了你也走了 2025-01-29 21:03:30

您可以通过 group-object cmdlet如下：

$URLs = @("https://somesite.com/folder1/page/1/"
  , "https://somesite.com/folder222/page/1/"
  , "https://somesite.com/folder222/page/2/"
  , "https://somesite.com/folder444/page/1/"
  , "https://somesite.com/folder444/page/3/"
  , "https://somesite.com/folderBBB/page/1/"
  , "https://somesite.com/folderBBB/page/5/")

$URLs |
  Group-Object { $_ -replace '[^/]+/
注意：


假设是，每个共享相同前缀的URL中的第一个和最后一个元素始终包含启动和启动所需枚举的终点。


如果该假设不起作用，请改用以下内容：
  $ minmax = $_。group-replace'^。+/（[^/]+）/




基于正则  -replace 操作员用于两件事：


  -replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。


  -replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。





 程序替代：
# Build a map (ordered hashtable) that maps URL prefixes
# to the number suffixes that occur among the URLs sharing
# the same prefix.
$map = [ordered] @{}
foreach ($url in $URLs) {
  if ($url -match '^(.+)/([^/]+)/') {
    $prefix, [int] $num = $Matches[1], $Matches[2]
    $map[$prefix] = [array] $map[$prefix] + $num
  }
}

# Process the map to generate the URLs.
# Again, use something like
#    $generatedUrls = foreach ...
# to capture them in an array.
foreach ($prefix in $map.Keys) {
  $nums = $map[$prefix]
  $from, $to = $nums[0], $nums[-1]
  foreach ($num in $from..$to) {
    '{0}/{1}/' -f $prefix, $num  # synthesize URL and output it.
  }
}

 } | # Group by shared prefix
    ForEach-Object {
      # Extract the start and end number for the group at hand.
      [int] $from, [int] $to = 
        ($_.Group[0], $_.Group[-1]) -replace '^.+/([^/]+)/
注意：


假设是，每个共享相同前缀的URL中的第一个和最后一个元素始终包含启动和启动所需枚举的终点。


如果该假设不起作用，请改用以下内容：





基于正则  -replace 操作员用于两件事：


  -replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。


  -replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。





 程序替代：

, '$1'
      # Generate the output URLs.
      # You can assign the entire pipeline to a variable 
      # ($generatedUrls = $URLs | ...) to capture them in an array.
      foreach ($i in $from..$to) { $_.Name + $i + '/' }
    }

注意：
假设是，每个共享相同前缀的URL中的第一个和最后一个元素始终包含启动和启动所需枚举的终点。
如果该假设不起作用，请改用以下内容：
基于正则 -replace操作员用于两件事：
-replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。
-replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。
程序替代：
，'$ 1'| 措施 - 对象 - 最小-maximum $从，$到= $ minmax.mimimime，$ minmax.maximim

基于正则 -replace操作员用于两件事：

-replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。
-replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。

程序替代：

} | # Group by shared prefix ForEach-Object { # Extract the start and end number for the group at hand. [int] $from, [int] $to = ($_.Group[0], $_.Group[-1]) -replace '^.+/([^/]+)/

注意：

假设是，每个共享相同前缀的URL中的第一个和最后一个元素始终包含启动和启动所需枚举的终点。
- 如果该假设不起作用，请改用以下内容：
基于正则 -replace操作员用于两件事：
- -replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。
- -replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。

程序替代：

, '$1' # Generate the output URLs. # You can assign the entire pipeline to a variable # ($generatedUrls = $URLs | ...) to capture them in an array. foreach ($i in $from..$to) { $_.Name + $i + '/' } }

注意：

假设是，每个共享相同前缀的URL中的第一个和最后一个元素始终包含启动和启动所需枚举的终点。
- 如果该假设不起作用，请改用以下内容：
基于正则 -replace操作员用于两件事：
- -replace'[^/]+/$'消除每个URL的最后一个组件，以便通过其共享前缀进行分组。
- -replace'^。+/（[^/]+）/$'，$ 1'有效地从每个给定的URL中提取最后一个组件，即表示代表的数字所需枚举的开始和终点。

程序替代：

You can use a pipeline-based solution via the Group-Object cmdlet as follows:

$URLs = @("https://somesite.com/folder1/page/1/"
  , "https://somesite.com/folder222/page/1/"
  , "https://somesite.com/folder222/page/2/"
  , "https://somesite.com/folder444/page/1/"
  , "https://somesite.com/folder444/page/3/"
  , "https://somesite.com/folderBBB/page/1/"
  , "https://somesite.com/folderBBB/page/5/")

$URLs |
  Group-Object { $_ -replace '[^/]+/
Note:


The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.


If that assumption doesn't hold, use the following instead:
$minMax = $_.Group -replace '^.+/([^/]+)/




The regex-based -replace operator is used for two things:


-replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.


-replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.





Procedural alternative:
# Build a map (ordered hashtable) that maps URL prefixes
# to the number suffixes that occur among the URLs sharing
# the same prefix.
$map = [ordered] @{}
foreach ($url in $URLs) {
  if ($url -match '^(.+)/([^/]+)/') {
    $prefix, [int] $num = $Matches[1], $Matches[2]
    $map[$prefix] = [array] $map[$prefix] + $num
  }
}

# Process the map to generate the URLs.
# Again, use something like
#    $generatedUrls = foreach ...
# to capture them in an array.
foreach ($prefix in $map.Keys) {
  $nums = $map[$prefix]
  $from, $to = $nums[0], $nums[-1]
  foreach ($num in $from..$to) {
    '{0}/{1}/' -f $prefix, $num  # synthesize URL and output it.
  }
}

 } | # Group by shared prefix
    ForEach-Object {
      # Extract the start and end number for the group at hand.
      [int] $from, [int] $to = 
        ($_.Group[0], $_.Group[-1]) -replace '^.+/([^/]+)/
Note:


The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.


If that assumption doesn't hold, use the following instead:





The regex-based -replace operator is used for two things:


-replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.


-replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.





Procedural alternative:

, '$1'
      # Generate the output URLs.
      # You can assign the entire pipeline to a variable 
      # ($generatedUrls = $URLs | ...) to capture them in an array.
      foreach ($i in $from..$to) { $_.Name + $i + '/' }
    }

Note:
The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.
If that assumption doesn't hold, use the following instead:
The regex-based -replace operator is used for two things:
-replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.
-replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.
Procedural alternative:
, '$1' | Measure-Object -Minimum -Maximum $from, $to = $minMax.Minimum, $minMax.Maximum

The regex-based -replace operator is used for two things:

-replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.
-replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.

Procedural alternative:

} | # Group by shared prefix ForEach-Object { # Extract the start and end number for the group at hand. [int] $from, [int] $to = ($_.Group[0], $_.Group[-1]) -replace '^.+/([^/]+)/

Note:

The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.
- If that assumption doesn't hold, use the following instead:
The regex-based -replace operator is used for two things:
- -replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.
- -replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.

Procedural alternative:

, '$1' # Generate the output URLs. # You can assign the entire pipeline to a variable # ($generatedUrls = $URLs | ...) to capture them in an array. foreach ($i in $from..$to) { $_.Name + $i + '/' } }

Note:

The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.
- If that assumption doesn't hold, use the following instead:
The regex-based -replace operator is used for two things:
- -replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.
- -replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.

Procedural alternative:

回复收藏 0 原文

~没有更多了~