最大字符串数组 VisualBasic WSH
我正在 VB 中编写一个 WSH 脚本,以读取通过 .Run 方法使用重定向目录列表生成的大量目录列表。
目录列表大约有 8400 行,但是每次运行脚本时,都会执行以下循环
Do Until DirList.AtEndOfStream Redim 保留 arrData(i) arrData(i) = DirList.ReadLine 我=我+1 循环
在看似随机的 1800 到 3500 行范围内提前终止。这听起来像是数组大小问题还是 shell 内存限制?
我听说有人解析大型日志文件,像我一样一次性读取它们。
I'm writing a WSH script in VB to read a massive directory listing generated with a redirected directory listing via .Run method.
The directory listing is about 8400 lines, but every time I run the script, the following loop
Do Until DirList.AtEndOfStream
Redim Preserve arrData(i)
arrData(i) = DirList.ReadLine
i = i + 1
Loop
cuts out early, in a seemingly random range of 1800 to 3500 lines. Does this sound like an array size issue or a shell memory limit?
I have heard of people parsing LARGE log files, reading them all in at once like I have.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在这种情况下,首先循环遍历文件并计算行数,然后将数组重新调整为所需的确切大小不是更好吗?然后关闭文件再次打开它,这次您实际上将这些行分配给了数组元素吗?
Would it not be better, in this instance, to cycle through the file first and count the number of lines, then Redim the array to the exact size required. Then close the file open it again this time you actually assign the lines to the array elements?
VBScript 中数组的大小受到几个不同因素的限制,以先到者为准:
最多 (2 ^ 31) - 1 个元素(因为元素数量在内部存储为
Long
值,并且因为没有更大的数据类型可用作索引器)。最多 60 个维度。
可用系统内存。
然而,这些限制中的大多数除了纯粹的理论练习之外没有任何用处,而且我对编写的任何与它们有关的代码都非常怀疑。
因为你说你只有 8400 行要处理,所以我怀疑你是否遇到了数组大小的理论限制。相反,代码的最大问题是您在循环内使用
Redim Preserve
。MSDN 参考 解释了
Redim Preserve
用于调整数组最后一个维度的大小动态地,同时保留其现有内容。它不一定提到它是如何工作的。每次使用 Redim Preserve 时,都会使用您指定的元素数量创建一个新数组,并将前一个数组中的元素值复制到新数组中。这应该立即发出危险信号,因为这意味着在循环的每次迭代中,您都在为一个全新的数组分配空间并填充该数组。在循环中进行的迭代越多,问题只会变得更糟,因为创建的每个新数组的大小都会逐渐变大。因此,您更有可能溢出 VBScript 为局部变量分配的堆栈空间。 (堆栈溢出错误有多合适?)最终那些未使用的数组将被垃圾收集,但是当您在紧密循环中执行此操作时,您会给内存和资源带来巨大的压力。
您最好在数组中分配足够的空间来保存您需要保存的所有目录列表。您不必完全正确地获得最大尺寸。简单地分配超出您需要的阵列仍然比不断创建和销毁新阵列便宜得多。如果您仍然担心这还不够,您可以在循环中检查当前索引是否大于数组中元素的最大数量,如果是,则分配更多空间(例如,是目前规模的两倍)。完成后,您可以使用 Redim Preserve 命令释放多余的空间,同时保留好的数据,但这次只能使用一次!举一个粗略的例子:
这可能会节省你的内存,避免滥用数组的常量分配和释放。
The size of arrays in VBScript is limited by a few different things, whichever comes first:
A maximum of (2 ^ 31) - 1 elements (because the number of elements is stored internally as a
Long
value, and because there is no larger data type available to use as an indexer).A maximum of 60 dimensions.
Available system memory.
Most of these limits are of no use beyond purely theoretical exercise, however, and I'd be very suspicious of any code written that had to be concerned with them.
Because you say that you only have 8400 lines to process, I doubt you're running into the theoretical limits placed on the size of an array. Instead, the biggest problem with your code is that you're using
Redim Preserve
inside of a loop.The MSDN reference explains that
Redim Preserve
is used to resize the last dimension of the array dynamically, while preserving its existing contents. What it doesn't necessarily mention is how it works. Each time you useRedim Preserve
, a new array is created with the number of elements that you specify, and the values of the elements in the previous array are copied into the new one. This should be sending up red flags immediately, because it means that on each iteration of your loop, you're allocating space for and filling an entirely new array. The problem only gets worse the more iterations you've made in the loop, because the size of each new array that is created is growing incrementally larger.Thus, it's more likely that you're overflowing the stack space that VBScript allocates for local variables. (How appropriate—a stack overflow error?) Eventually those unused arrays will be garbage collected, but you're putting a giant amount of pressure on memory and resources when you do this in a tight loop.
You are far better off simply allocating enough space in your array to hold all of the directory listings that you'll need to hold. You don't necessarily have to get the maximum size exactly right. Simply allocating more than you'll need is still far cheaper than continually creating and destroying new arrays. If you're still concerned that this won't be enough, you can check in the loop if the current index is greater than the maximum amount of elements in the array, and if so, allocate a lot more space (by, say, doubling its current size). After you get finished, you can deallocate the excess space while retaining the good data using the
Redim Preserve
command, but this time only once! For a rough knock-up example:This just might save your memory from the abuse of the constant allocation and deallocation of arrays.
在循环内部使用 ReDim Preserve 是一个很好的做法。这是在 VBScript 中动态调整数组大小的唯一方法,并且 Microsoft 自己的示例代码始终执行此操作。我个人已经对数组进行了数万次迭代,没有出现任何问题。
您遇到的问题是系统资源不足。虽然每次迭代都会分配一个新数组,但旧数组会被释放。这个问题只是名义上的,并且在大多数情况下完全可以忽略不计。
您必须记住,您的脚本是在可执行环境中执行的。对于 WSH,这意味着您的所有操作都在单个线程中执行。 WSH 不提供任何管理脚本中内存使用的方法。
我能给你的最好建议是限制迭代次数或分块读取输入文件(在每次迭代时释放它们)。如果没有看到导致错误的文件,或者没有收到您收到的实际错误消息,我无法提供任何更直接的建议。我只能说这种情况并不经常出现,几乎总是表明机器配置不佳或代码写得不好。
Using ReDim Preserve inside of a loop is a fine practice. It's the only method of dynamically resizing an array in VBScript and Microsoft's own sample codes do it all of the time. I've personally done with arrays well into the tens of thousands of iterations without problem.
The problem you are encountering is that you are running out of system resources. While a new array is allocated on each iteration, the old one is released. The problem is nominal and in most cases completely negligible.
You have to keep in mind that your scripts are executed within an executable environment. In the case of the WSH, this means that all of your actions are performed in a single thread. The WSH does not provide any methods of managing memory usage within your scripts.
The best advice I can give you is to limit the number of iterations or read the input file in chunks (releasing them on each iteration). Without seeing the file that causes the error, or having the actual error message you are receiving, I can't give any more direct advice. I can only say that this situation does not arise very often and almost always point to a poor machine configuration or poorly written code.