需要有关 Excel 和 VBA 字符串处理和存储限制的信息以及建议的解决方法

发布于 2024-10-09 00:21:15 字数 550 浏览 2 评论 0原文

根据此 Microsoft 博客文章 最大字符串对于 MS Office 2010,每个单元格的长度为 32k;我也通过测试证实了这一点。问题是我的字符串(DNA 序列)远高于该长度,并且我在整个 32k+ 序列上匹配 DNA 子序列,这些子序列可以匹配主序列上的任何位置;这意味着我不能简单地将主序列分成 32k 个卡盘,因为我需要能够将“子字符串序列”与整个“主字符串序列”进行字符串匹配。不清楚的一件事是,VBA 是否支持处理大于 32k 的字符串,如果 VBA 支持超过 32k 的字符串连接,这可能是一种解决方法;意思是我将“主字符串序列”分成一行到第 N 列的 32k 个块,然后当我需要处理匹配时,只需连接从第 1 列到第 N 列的一行中的字符串,处理匹配,然后转储存储在 VBA 中的临时“主字符串序列”。

因此,基本上问题是 MS-Office 2010 仅支持每个单元格长度最多为 32k 的字符串,而我的字符串比需要以整个形式处理才能使字符串匹配起作用的字符串大得多。

According to this Microsoft blog post the max string length per cell is 32k for MS Office 2010; I've also confirmed this via testing. Problem is that I've got strings (DNA sequences) that are well above that length, and I'm matching sub-sequences of DNA on the whole 32k+ sequence that could match anywhere on the main sequence; meaning that I can not simple break the main sequence out into 32k chucks, since I need to be able to string match the "sub-string-sequence" to the entire "main-string-sequence". One thing that is not clear is if VBA supports processing of strings large than 32k, if VBA supports string concatenation in excess of 32k that might be a work around; meaning I chunk the "main-string-sequence" into 32k chunks on a row to the N-th column, then when I need to process a match just concatenate the strings in a row from column 1 to the N-th, process the match, then dump the temp "main-string-sequence" being stored in VBA.

So, basically the issue is MS-Office 2010 only supports strings per cell up to 32k in length, and I've got strings that are much larger than that which need to be processed in whole form in order for the string matching to work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

骷髅 2024-10-16 00:21:15

VBA 动态字符串(Dim x As String)可以容纳 2^31 个字符,所以你应该擅长于此。我不确定你为什么要把它们写入单元格。除非您出于其他原因需要,否则我会在 VBA 中完成这一切。目前还不清楚你从哪里得到字符串。

下面的代码可以生成一个假的 40,000 个字符的 DNA 序列和一个假的 5 个字符的子串,然后找到其中一个在另一个中的位置。

Sub FindDNASubString()

    Dim lRnd As Long
    Dim i As Long
    Dim sMain As String
    Dim sSub As String
    Dim vaLetters As Variant
    Dim lPos As Long

    Const lUPPER As Long = 3
    Const lLOWER As Long = 0

    vaLetters = Array("A", "C", "T", "G")

    'Create fake main string
    For i = 1 To 40000
        lRnd = Int((lUPPER - lLOWER + 1) * Rnd + lLOWER)
        sMain = sMain & vaLetters(lRnd)
    Next i

    'create fake substring
    For i = 1 To 5
        lRnd = Int((lUPPER - lLOWER + 1) * Rnd + lLOWER)
        sSub = sSub & vaLetters(lRnd)
    Next i

    'find position of sub in main
    lPos = InStr(1, sMain, sSub)

    MsgBox "Substring is at position " & lPos

End Sub

VBA dynamic strings (Dim x As String) can hold 2^31 characters, so you should be good there. I'm not sure why you are writing them to cells. Unless you need to for some other reason, I would do it all in VBA. It's not clear where you're getting the strings.

Here's some code that makes a fake 40,000 character DNA sequence and a fake 5 character substring, then finds where one is inside the other.

Sub FindDNASubString()

    Dim lRnd As Long
    Dim i As Long
    Dim sMain As String
    Dim sSub As String
    Dim vaLetters As Variant
    Dim lPos As Long

    Const lUPPER As Long = 3
    Const lLOWER As Long = 0

    vaLetters = Array("A", "C", "T", "G")

    'Create fake main string
    For i = 1 To 40000
        lRnd = Int((lUPPER - lLOWER + 1) * Rnd + lLOWER)
        sMain = sMain & vaLetters(lRnd)
    Next i

    'create fake substring
    For i = 1 To 5
        lRnd = Int((lUPPER - lLOWER + 1) * Rnd + lLOWER)
        sSub = sSub & vaLetters(lRnd)
    Next i

    'find position of sub in main
    lPos = InStr(1, sMain, sSub)

    MsgBox "Substring is at position " & lPos

End Sub
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文