如何从PowerShell中的字符串中划分特定数量的字节?
我有一个场景,需要在JSON REST响应中获取一个base64编码的json REST响应中的安装程序。由于JSON字符串的大小相当大(180 MB),因此在使用标准PowerShell工具解码剩余响应时会导致问题,因为它导致OUTOFMEMORYEXCEPTION
在有限的内存场景中经常抛出(例如击中WINRM内存配额)。
在单个安装中,不希望在环境中提高内存配额,并且我们没有标准工具来准备一个包裹,该软件包在简单的http端点上不存在有效载荷(我没有直接的许可来发布包裹不发布包裹通过我们的构建系统执行)。在这种情况下,我的解决方案是在块中解码base64字符串。但是,尽管我有此工作,但我对此过程的最后一点点优化。
目前,我正在使用MemoryStream
从字符串中读取,但是我需要提供byte []
:
# $Base64String is a [ref] type
$memStream = [IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes($Base64String.Value))
这不足为奇地导致复制byte [] 表示整个基本64编码的字符串的表示,并且比内置工具的当前形式更少。您在这里看不到的代码从
1024 byts均一次读取,将base64字符串解码并使用$ memstream
binarywriter将字节写入磁盘
。这一切都很好,如果我经常强迫垃圾收集,那么这一切都很好。但是,我想将此字节计数扩展到初始MemoryStream
和 读n
bytes一次。我的理解是base64字符串必须在可排除的字节块中解码4。
问题是[string] .substring([int],[int],[int])
基于字符串长度工作。 ,不是每个字符的字节数。可以认为JSON响应是UTF-8编码,但即使使用此假设UTF-8字符在1-4个字节之间的长度也有所不同。我如何(直接或间接)在PowerShell中(PowerShell中的特定数字字节)串联,以便我可以从此子字符串而不是完整的$ base64String
中创建MemoryStream
?
我会注意到,我已经探索了 [text.encoding] .getBytes(getBytes([string),[string],[int],[int],[int])< /code> OVERLOAD
但是,我遇到了同样的问题,因为该方法期望 count ,而不是 byte count ,对于字符串的长度到从开始索引中获取
byte []
。
I have a scenario where I need to obtain an installer embedded within a JSON REST response that is base64-encoded. Since the size of the JSON string is rather large (180 MB), it causes problems when decoding the REST response using standard PowerShell tooling as it causes OutOfMemoryException
to be thrown quite often in limited memory scenarios (such as hitting WinRM memory quotas).
It's not desirable to raise the memory quota in our environment over a single installation, and we don't have standard tooling to prepare a package whose payload does not exist at a simple HTTP endpoint (I don't have direct permissions to publish packages not performed through our build system). My solution in this case is to decode the base64 string in chunks. However, while I have this working, I am stuck on one last bit of optimization for this process.
Currently I am using a MemoryStream
to read from the string, but I need to provide a byte[]
:
# $Base64String is a [ref] type
$memStream = [IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes($Base64String.Value))
This unsurprisingly results in copying the byte[]
representation of the entire base64-encoded string, and is even less memory-efficient than built-in tooling in its current form. The code you don't see here reads from $memStream
in chunks of 1024
bytes at a time, decoding the base64 string and writing the bytes to disk using BinaryWriter
. This all works well, if slow since I'm forcing garbage collection fairly often. However, I want to extend this byte-counting to the initial MemoryStream
and only read n
bytes from the string at a time. My understanding is that base64 strings must be decoded in chunks of bytes divisible by 4.
The problem is that [string].Substring([int], [int])
works based on string length, not number of bytes per character. The JSON response can be assumed to be UTF-8 encoded, but even with this assumption UTF-8 characters vary between 1-4 bytes in length. How can I (directly or indirectly) substring a specific number of bytes in PowerShell so I can create the MemoryStream
from this substring instead of the full $Base64String
?
I will note that I have explored the use of the [Text.Encoding].GetBytes([string], [int], [int])
overload, however, I face the same issue in that the method expects a character count, not byte count, for the length of the string to get the byte[]
for from the starting index.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了回答基本问题:“如何从powershell中的字符串中划分特定数量的字节”,我能够编写以下函数:
调用类似于So:
有关此实现的一些注释:
将字符串作为
> [参考]
类型,因为原始目标是避免在有限的内存方案中复制完整的字符串。可以使用[String]
type而改用此功能。StringBuilder
之前,直到已编写指定的字节数。[text.encoding] :: getbytecount
Overloads确定的。可以通过参数指定编码,但是编码值应匹配[text.encoding]
的静态编码属性之一。默认为utf8
,如书面。$ sb = $ null
和[system.gc] :: collect()
旨在在内存约束中强行清理StringBuilder
环境,但如果这不是问题,则可能会被省略。-fromindex
在-inputString
中以启动位置开始,从而开始了子字符串操作。默认为0
从-InputString
开始评估。To answer the base question "How can I substring a specific number of bytes from a string in PowerShell", I was able to write the following function:
Invocation works like so:
Some notes on this implementation:
[ref]
type since the original goal was to avoid copying the full string in a limited-memory scenario. This function could be re-implemented using the[string]
type instead.StringBuilder
until the specified number of bytes has been written.[Text.Encoding]::GetByteCount
overloads. Encoding can be specified via a parameter, but the encoding value should match one of the static encoding properties available from[Text.Encoding]
. Defaults toUTF8
as written.$sb = $null
and[System.GC]::Collect()
are intended to forcibly clean up theStringBuilder
in a memory-constrained environment, but could be omitted if this is not a concern.-FromIndex
takes the start position within-InputString
to begin the substring operation from. Defaults to0
to evaluate from the start of the-InputString
.