如何从PowerShell中的字符串中划分特定数量的字节？

发布于 2025-02-07 01:12:39 字数 1717 浏览 1 评论 0原文

我有一个场景，需要在JSON REST响应中获取一个base64编码的json REST响应中的安装程序。由于JSON字符串的大小相当大（180 MB），因此在使用标准PowerShell工具解码剩余响应时会导致问题，因为它导致OUTOFMEMORYEXCEPTION在有限的内存场景中经常抛出（例如击中WINRM内存配额）。

在单个安装中，不希望在环境中提高内存配额，并且我们没有标准工具来准备一个包裹，该软件包在简单的http端点上不存在有效载荷（我没有直接的许可来发布包裹不发布包裹通过我们的构建系统执行）。在这种情况下，我的解决方案是在块中解码base64字符串。但是，尽管我有此工作，但我对此过程的最后一点点优化。

目前，我正在使用MemoryStream从字符串中读取，但是我需要提供byte []：

# $Base64String is a [ref] type
$memStream = [IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes($Base64String.Value))

这不足为奇地导致复制byte [] 表示整个基本64编码的字符串的表示，并且比内置工具的当前形式更少。您在这里看不到的代码从$ memstream 1024 byts均一次读取，将base64字符串解码并使用binarywriter将字节写入磁盘。这一切都很好，如果我经常强迫垃圾收集，那么这一切都很好。但是，我想将此字节计数扩展到初始MemoryStream和读n bytes一次。我的理解是base64字符串必须在可排除的字节块中解码4。

问题是[string] .substring（[int]，[int]，[int]）基于字符串长度工作。 ，不是每个字符的字节数。可以认为JSON响应是UTF-8编码，但即使使用此假设UTF-8字符在1-4个字节之间的长度也有所不同。我如何（直接或间接）在PowerShell中（PowerShell中的特定数字字节）串联，以便我可以从此子字符串而不是完整的$ base64String中创建MemoryStream？

我会注意到，我已经探索了 [text.encoding] .getBytes（getBytes（[string），[string]，[int]，[int]，[int]）< /code> OVERLOAD但是，我遇到了同样的问题，因为该方法期望 count ，而不是 byte count ，对于字符串的长度到从开始索引中获取byte []。

原文

I have a scenario where I need to obtain an installer embedded within a JSON REST response that is base64-encoded. Since the size of the JSON string is rather large (180 MB), it causes problems when decoding the REST response using standard PowerShell tooling as it causes OutOfMemoryException to be thrown quite often in limited memory scenarios (such as hitting WinRM memory quotas).

It's not desirable to raise the memory quota in our environment over a single installation, and we don't have standard tooling to prepare a package whose payload does not exist at a simple HTTP endpoint (I don't have direct permissions to publish packages not performed through our build system). My solution in this case is to decode the base64 string in chunks. However, while I have this working, I am stuck on one last bit of optimization for this process.

Currently I am using a MemoryStream to read from the string, but I need to provide a byte[]:

# $Base64String is a [ref] type
$memStream = [IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes($Base64String.Value))

This unsurprisingly results in copying the byte[] representation of the entire base64-encoded string, and is even less memory-efficient than built-in tooling in its current form. The code you don't see here reads from $memStream in chunks of 1024 bytes at a time, decoding the base64 string and writing the bytes to disk using BinaryWriter. This all works well, if slow since I'm forcing garbage collection fairly often. However, I want to extend this byte-counting to the initial MemoryStream and only read n bytes from the string at a time. My understanding is that base64 strings must be decoded in chunks of bytes divisible by 4.

The problem is that [string].Substring([int], [int]) works based on string length, not number of bytes per character. The JSON response can be assumed to be UTF-8 encoded, but even with this assumption UTF-8 characters vary between 1-4 bytes in length. How can I (directly or indirectly) substring a specific number of bytes in PowerShell so I can create the MemoryStream from this substring instead of the full $Base64String?

I will note that I have explored the use of the [Text.Encoding].GetBytes([string], [int], [int]) overload, however, I face the same issue in that the method expects a character count, not byte count, for the length of the string to get the byte[] for from the starting index.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流绪微梦 2025-02-14 01:12:39

为了回答基本问题：“如何从powershell中的字符串中划分特定数量的字节”，我能够编写以下函数：

function Get-SubstringByByteCount {
  [CmdletBinding()]
  Param(
    [Parameter(Mandatory)]
    [ValidateScript({ $null -ne $_ -and $_.Value -is [string] })]
    [ref]$InputString,
    [int]$FromIndex = 0,
    [Parameter(Mandatory)]
    [int]$ByteCount,
    [ValidateScript({ [Text.Encoding]::$_ })]
    [string]$Encoding = 'UTF8'
  )
  
  [long]$byteCounter = 0
  [System.Text.StringBuilder]$sb = New-Object System.Text.StringBuilder $ByteCount

  try {
    while ( $byteCounter -lt $ByteCount -and $i -lt $InputString.Value.Length ) {
      [char]$char = $InputString.Value[$i++]
      [void]$sb.Append($char)
      $byteCounter += [Text.Encoding]::$Encoding.GetByteCount($char)
    }

    $sb.ToString()
  } finally {
    if( $sb ) {
      $sb = $null
      [System.GC]::Collect()
    }
  }
}

调用类似于So：

Get-SubstringByByteCount -InputString ( [ref]$someString ) -ByteCount 8

有关此实现的一些注释：

将字符串作为将字符串作为> [参考]类型，因为原始目标是避免在有限的内存方案中复制完整的字符串。可以使用[String] type而改用此功能。
此功能本质上将每个字符添加到StringBuilder之前，直到已编写指定的字节数。
每个字符的字节数是通过使用[text.encoding] :: getbytecount Overloads确定的。可以通过参数指定编码，但是编码值应匹配[text.encoding]的静态编码属性之一。默认为utf8，如书面。
$ sb = $ null和[system.gc] :: collect（）旨在在内存约束中强行清理StringBuilder环境，但如果这不是问题，则可能会被省略。
-fromindex在-inputString中以启动位置开始，从而开始了子字符串操作。默认为0从-InputString开始评估。

To answer the base question "How can I substring a specific number of bytes from a string in PowerShell", I was able to write the following function:

function Get-SubstringByByteCount {
  [CmdletBinding()]
  Param(
    [Parameter(Mandatory)]
    [ValidateScript({ $null -ne $_ -and $_.Value -is [string] })]
    [ref]$InputString,
    [int]$FromIndex = 0,
    [Parameter(Mandatory)]
    [int]$ByteCount,
    [ValidateScript({ [Text.Encoding]::$_ })]
    [string]$Encoding = 'UTF8'
  )
  
  [long]$byteCounter = 0
  [System.Text.StringBuilder]$sb = New-Object System.Text.StringBuilder $ByteCount

  try {
    while ( $byteCounter -lt $ByteCount -and $i -lt $InputString.Value.Length ) {
      [char]$char = $InputString.Value[$i++]
      [void]$sb.Append($char)
      $byteCounter += [Text.Encoding]::$Encoding.GetByteCount($char)
    }

    $sb.ToString()
  } finally {
    if( $sb ) {
      $sb = $null
      [System.GC]::Collect()
    }
  }
}

Invocation works like so:

Get-SubstringByByteCount -InputString ( [ref]$someString ) -ByteCount 8

Some notes on this implementation:

Takes the string as a [ref] type since the original goal was to avoid copying the full string in a limited-memory scenario. This function could be re-implemented using the [string] type instead.
This function essentially adds each character to a StringBuilder until the specified number of bytes has been written.
The number of bytes of each character is determined by using one of the [Text.Encoding]::GetByteCount overloads. Encoding can be specified via a parameter, but the encoding value should match one of the static encoding properties available from [Text.Encoding]. Defaults to UTF8 as written.
$sb = $null and [System.GC]::Collect() are intended to forcibly clean up the StringBuilder in a memory-constrained environment, but could be omitted if this is not a concern.
-FromIndex takes the start position within -InputString to begin the substring operation from. Defaults to 0 to evaluate from the start of the -InputString.