这是用于匹配 Excel 公式中任何单元格引用的正则表达式吗？

发布于 2024-08-14 18:45:53 字数 6151 浏览 5 评论 0原文

我一直在尝试创建一个正则表达式模式，该模式可以匹配任何 Excel 公式中的任何引用，包括绝对引用、相对引用和外部引用。我需要返回整个参考，包括工作表和工作簿名称。

我无法找到有关 Excel A1 表示法的详尽文档，但经过大量测试，我确定了以下内容：

公式前面带有等号“=”
公式中的字符串用双引号括起来，需要在查找真正的引用之前删除，否则 =A1&"A1" 会破坏正则
表达式工作表名称的长度最多为 31 个字符，不包括 \ / ? * [ ] :
外部引用中的工作表名称必须以感叹号结尾 =Sheet1!A1
外部引用中的工作簿名称必须用方括号括起来 =[Book1.xlsx]Sheet1!A1=[Book1.xlsx]Sheet1!A1代码>
工作簿路径（如果引用是对封闭工作簿中的范围的引用，Excel 将添加该路径）始终用单引号括起来，并位于工作簿名称 'C:\[Book1.xlsx]Sheet1 的括号左侧'!A1
某些字符（例如不间断空格）会导致 Excel 将外部引用中的工作簿和工作表名称括在单引号中，但我不知道具体是哪些字符 ='[Book 1.xlsx]Sheet 1'!A1
即使启用了 R1C1 表示法，Range.Formula 仍会以 A1 表示法返回引用。 Range.FormulaR1C1 返回 R1C1 表示法中的引用。
3D 参考样式允许一个工作簿上的一系列工作表名称 =SUM([Book5]Sheet1:Sheet3!A1)
命名范围可以在公式中指定：

名称的第一个字符必须是字母，下划线字符 (_) 或反斜杠 (\)。名称中的其余字符可以是字母、数字、句点和下划线字符。
不能使用大写和小写字符“C”、“c”、“R”或“r”作为定义的名称，因为它们都用作在您选择当前选定单元格的行或列时的简写。在“名称”或“转到”文本框中输入它们。
名称不能与单元格引用相同，例如 Z$100 或 R1C1。
名称中不允许包含空格。
名称的长度最多可达 255 个字符。
名称可以包含大写和小写字母。 Excel 不区分名称中的大小写字符。

这是我想出的封装在 VBA 程序中进行测试的内容。我也更新了代码来处理名称：

Sub ReturnFormulaReferences()

    Dim objRegExp As New VBScript_RegExp_55.RegExp
    Dim objCell As Range
    Dim objStringMatches As Object
    Dim objReferenceMatches As Object
    Dim objMatch As Object
    Dim intReferenceCount As Integer
    Dim intIndex As Integer
    Dim booIsReference As Boolean
    Dim objName As Name
    Dim booNameFound As Boolean

    With objRegExp
        .MultiLine = True
        .Global = True
        .IgnoreCase = True
    End With

    For Each objCell In Selection.Cells
        If Left(objCell.Formula, 1) = "=" Then

            objRegExp.Pattern = "\"".*\"""
            Set objStringMatches = objRegExp.Execute(objCell.Formula)

            objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
            & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
            & "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
            & "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
            & "|[a-z]{1,3}\:[a-z]{1,3}" _
            & "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
            & "|[0-9]{1,7}\:[0-9]{1,7}" _
            & "|[a-z_\\][a-z0-9_\.]{0,254})"
            Set objReferenceMatches = objRegExp.Execute(objCell.Formula)

            intReferenceCount = 0
            For Each objMatch In objReferenceMatches
                intReferenceCount = intReferenceCount + 1
            Next

            Debug.Print objCell.Formula
            For intIndex = intReferenceCount - 1 To 0 Step -1
                booIsReference = True
                For Each objMatch In objStringMatches
                    If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
                    And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
                        booIsReference = False
                        Exit For
                    End If
                Next

                If booIsReference Then
                    objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                    & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
                    & "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
                    & "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
                    & "|[a-z]{1,3}\:[a-z]{1,3}" _
                    & "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
                    & "|[0-9]{1,7}\:[0-9]{1,7})"
                    If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
                        objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                        & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
                        & "[a-z_\\][a-z0-9_\.]{0,254}$"
                        If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
                            booNameFound = False
                            For Each objName In objCell.Worksheet.Parent.Names
                                If objReferenceMatches(intIndex).Value = objName.Name Then
                                    booNameFound = True
                                    Exit For
                                End If
                            Next
                            If Not booNameFound Then
                                objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                                & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
                                For Each objName In objCell.Worksheet.Names
                                    If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
                                        booNameFound = True
                                        Exit For
                                    End If
                                Next
                            End If
                            booIsReference = booNameFound
                        End If
                    End If
                End If

                If booIsReference Then
                    Debug.Print "  " & objReferenceMatches(intIndex).Value _
                    & " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
                    & objReferenceMatches(intIndex).Length & ")"
                End If
            Next intIndex
            Debug.Print

        End If
    Next

    Set objRegExp = Nothing
    Set objStringMatches = Nothing
    Set objReferenceMatches = Nothing
    Set objMatch = Nothing
    Set objCell = Nothing
    Set objName = Nothing

End Sub

任何人都可以打破或改进这个吗？如果没有有关 Excel 公式语法的详尽文档，很难知道这是否正确。

谢谢！

原文

I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.

I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:

Formulas are preceded with an equal sign "="
Strings within formulas are enclosed in double quotes and need to be removed before looking for real references, otherwise =A1&"A1" would break regex
Worksheet names can be up to 31 characters long, excluding \ / ? * [ ] :
Worksheet names in external references must be succeeded with bang =Sheet1!A1
Workbook names in external references must be enclosed in square brackets =[Book1.xlsx]Sheet1!A1
Workbook paths, which Excel adds if a reference is to a range in a closed workbook, are always enclosed in single quotes and to the left of the brackets for the workbook name 'C:\[Book1.xlsx]Sheet1'!A1
Some characters (non-breaking space, for example) cause Excel to enclose the workbook and worksheet name in an external reference in single quotes, but I don't know specifically which characters ='[Book 1.xlsx]Sheet 1'!A1
Even if R1C1-notation is enabled, Range.Formula still returns references in A1-notation. Range.FormulaR1C1 returns references in R1C1 notation.
3D reference style allows a range of sheet names on one workbook =SUM([Book5]Sheet1:Sheet3!A1)
Named ranges can be specified in formulas:

The first character of a name must be a letter, an underscore character (_), or a backslash (\). Remaining characters in the name can be letters, numbers, periods, and underscore characters.
You cannot use the uppercase and lowercase characters "C", "c", "R", or "r" as a defined name, because they are all used as a shorthand for selecting a row or column for the currently selected cell when you enter them in a Name or Go To text box.
Names cannot be the same as a cell reference, such as Z$100 or R1C1.
Spaces are not allowed as part of a name.
A name can be up to 255 characters in length.
Names can contain uppercase and lowercase letters. Excel does not distinguish between uppercase and lowercase characters in names.

Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:

Sub ReturnFormulaReferences()

    Dim objRegExp As New VBScript_RegExp_55.RegExp
    Dim objCell As Range
    Dim objStringMatches As Object
    Dim objReferenceMatches As Object
    Dim objMatch As Object
    Dim intReferenceCount As Integer
    Dim intIndex As Integer
    Dim booIsReference As Boolean
    Dim objName As Name
    Dim booNameFound As Boolean

    With objRegExp
        .MultiLine = True
        .Global = True
        .IgnoreCase = True
    End With

    For Each objCell In Selection.Cells
        If Left(objCell.Formula, 1) = "=" Then

            objRegExp.Pattern = "\"".*\"""
            Set objStringMatches = objRegExp.Execute(objCell.Formula)

            objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
            & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
            & "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
            & "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
            & "|[a-z]{1,3}\:[a-z]{1,3}" _
            & "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
            & "|[0-9]{1,7}\:[0-9]{1,7}" _
            & "|[a-z_\\][a-z0-9_\.]{0,254})"
            Set objReferenceMatches = objRegExp.Execute(objCell.Formula)

            intReferenceCount = 0
            For Each objMatch In objReferenceMatches
                intReferenceCount = intReferenceCount + 1
            Next

            Debug.Print objCell.Formula
            For intIndex = intReferenceCount - 1 To 0 Step -1
                booIsReference = True
                For Each objMatch In objStringMatches
                    If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
                    And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
                        booIsReference = False
                        Exit For
                    End If
                Next

                If booIsReference Then
                    objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                    & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
                    & "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
                    & "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
                    & "|[a-z]{1,3}\:[a-z]{1,3}" _
                    & "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
                    & "|[0-9]{1,7}\:[0-9]{1,7})"
                    If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
                        objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                        & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
                        & "[a-z_\\][a-z0-9_\.]{0,254}$"
                        If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
                            booNameFound = False
                            For Each objName In objCell.Worksheet.Parent.Names
                                If objReferenceMatches(intIndex).Value = objName.Name Then
                                    booNameFound = True
                                    Exit For
                                End If
                            Next
                            If Not booNameFound Then
                                objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
                                & "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
                                For Each objName In objCell.Worksheet.Names
                                    If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
                                        booNameFound = True
                                        Exit For
                                    End If
                                Next
                            End If
                            booIsReference = booNameFound
                        End If
                    End If
                End If

                If booIsReference Then
                    Debug.Print "  " & objReferenceMatches(intIndex).Value _
                    & " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
                    & objReferenceMatches(intIndex).Length & ")"
                End If
            Next intIndex
            Debug.Print

        End If
    Next

    Set objRegExp = Nothing
    Set objStringMatches = Nothing
    Set objReferenceMatches = Nothing
    Set objMatch = Nothing
    Set objCell = Nothing
    Set objName = Nothing

End Sub

Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

微凉 2024-08-21 18:45:53

我来晚了几年，但我一直在寻找类似的东西，所以对此进行了深入研究。您使用的主要模式是这样的：

objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"

基本上，您有六个用于范围引用的替代方案（第 3-8 行），其中任何一个都将自行生成匹配项，还有两个用于可选文件名/工作表名称前缀的替代方案（第 1-2 行））。

对于两个前缀替代方案，唯一的区别是第一个前缀用单引号引起来，在初始引号后有一个额外的点星。这些单引号主要在工作表名称中存在空格时出现。点星的目的是匹配初始单引号后的不受约束的文本，这一点尚不清楚，而且似乎会产生问题。我将在下面讨论这些问题。除此之外，两个替代前缀是相同的，我将它们统称为可选外部前缀 (OEP)。

OEP 有自己的两个可选前缀（任一选项都相同）。第一个是工作簿名称，括号中是一个开放式点星。

(\[.*\])?

第二个是“3D”单元格引用，两个工作表名称用冒号分隔；它是包含冒号的初始工作表名称。这里的模式是一个否定字符类，允许最多 31 个字符，除了正斜杠、反斜杠、问号、星号、方括号或冒号，后跟冒号：

([^\:\\\/\?\*\[\]]{1,31}\:)?

最后，对于 OEP 来说，它是唯一必需的部分：工作表名称，与可选工作表名称相同，但不带冒号。效果是（如果这些都正常工作）所需的工作表名称将尽可能匹配，然后仅当存在 3d 参考或其他先前的括号文本时，其可选前缀也会匹配。

工作簿/工作表名称前缀的问题：首先，第一行开头的点星包含过多。同样，工作表名称的否定字符类似乎需要其他字符，包括括号、逗号、加号、减号、等于和感叹号。否则，额外的材料将被解释为图纸名称的一部分。在我的测试中，这种过度包含发生在以下任何一个中：

=SUM(Sheet1!A1,Sheet2!A2)
=Sheet1!A1+Sheet2!A2
=Sheet1!A1-Sheet2!A2

工作表名称可以包含其中一些字符，因此考虑到这一点需要一些额外的措施。例如，一个工作表可以命名为“(Sheet1)”，给出一个奇怪的公式，例如：

=SUM('(Sheet1)'!A1:A2)

您希望获得带有工作表名称的内部括号，而不是外部括号。 Excel 在该单引号上添加单引号，就像在工作表名称中使用空格一样。然后，您可以在非单引号版本中排除括号，因为在单引号内是可以的。但要注意 Excel 似乎甚至允许在工作表名称中使用单引号。将这些命名怪癖发挥到极致，我刚刚成功地将一个工作表命名为“Hi'Sheet1'SUM('Sheet2'!A1,A2)!”。这很荒谬，但它指出了可能发生的事情。我在这样做时了解到，如果我在工作表名称中包含单引号，则公式会使用第二个单引号转义单引号。因此，引用我刚刚创建的工作表的 SUM(A1:A2) 最终看起来像这样：

=SUM('Hi''Sheet1''SUM(''Sheet2''!A1,A2)!'!A1:A2)

这实际上确实让我们对 Excel 解析器本身有了一些了解。我怀疑要充分处理这个问题，您可能需要单独（在正则表达式之外）将潜在的工作表名称或工作簿名称与实际的工作表名称进行比较，就像您对命名范围所做的那样。

这导致正则表达式中允许使用六种形式的单元格引用（其中任何一种，如果满足，都将产生匹配）：

1.) 具有行和列的单单元格或多单元格范围

"(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?"

此处的左括号已关闭在6个选项的最后。否则，此行允许在多单元格区域中进行“$A$1”、“A1”、“$A1”、“A$1”或这些类型的任意组合的基本单元格引用（“$A1:A$2 “， ETC。）。

2.) 仅具有绝对引用的全列或多列范围

"|\$[a-z]{1,3}\:\$[a-z]{1,3}"

此范围允许“$A:$B”类型的单元格引用，并且两者上都带有美元符号。请注意，仅一侧的美元符号将不匹配。

3.) 仅具有相对引用的全列或多列范围

"|[a-z]{1,3}\:[a-z]{1,3}"

此行与最后一行类似，但仅匹配没有美元符号的情况。请注意，仅一侧的美元符号此处也不匹配。

4.) 仅具有绝对引用的全行或多行范围

"|\$[0-9]{1,7}\:\$[0-9]{1,7}"

此行允许“$1:$2”类型的单元格引用，并且两者上都带有美元符号。

5.) 仅具有相对引用的全行或多行范围

"|[0-9]{1,7}\:[0-9]{1,7}"

此版本与上一个版本类似，但仅匹配没有美元符号的情况。

6.) 可以是命名范围的其他文本

 "|[a-z_\\][a-z0-9_\.]{0,254})"

最后，第六个选项允许文本。该文本稍后将与子中的实际命名范围进行比较。

我在这里看到的主要遗漏是具有绝对和相对引用的范围，类型为“A:$A”或“1:$1”。虽然 $A:A 由于包含“A:A”而被捕获，但不会捕获“A:$A”。您可以通过组合 2 和 3 以及将 4 和 5 与可选的美元符号组合来解决这个问题并简化正则表达式：

objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$?[a-z]{1,3}\:\$?[a-z]{1,3}" _
& "|\$?[0-9]{1,7}\:\$?[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"

进一步组合这些似乎会遇到一切都是可选的问题。

另一个问题是匹配字符串的初始正则表达式模式，您可以使用该模式来删除落在带引号的字符串内的潜在范围：
objRegExp.Pattern = "\"".*\"""
当我在公式开头和结尾都有字符串的公式上测试这一点时，点星的贪婪捕获了从初始引用到最终引用的所有内容（换句话说，它将整个公式解释为一个大引用字符串，即使中间有非绳材料）。看来您可以通过使点星变得懒惰（在其后添加问号）来解决此问题。这引发了关于引号内引号的问题，但它们可能不是问题。例如，我测试了这个公式：

="John loves his A1 steak sauce, but said the ""good A1 steak sauce price"" is $" & A2+A3 & " less than the ""bad price"" of $" & A4 & "."

插入单元格值后，该公式的计算结果为：

约翰喜欢他的 A1 牛排酱，但他说“好的 A1 牛排酱”
价格”比“糟糕价格”8 美元低 5 美元。

将惰性修饰符添加到字符串模式后，上面的两个版本“A1”都被识别为出现在字符串中，因此被删除，而 A2、A3 和 A4 则被删除。我确信我上面的

一些语言存在一些技术问题，但希望分析仍然有用。

I'm a few years late here, but I was looking for something similar and so dug into this. The main pattern you use is this:

objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"

Basically you have six alternatives for a range reference (lines 3-8), any of which will produce a match by itself, with two alternatives for an optional filename/sheet name prefix (lines 1-2).

For the two prefix alternatives, the only difference is that the first is wrapped in single quotes, with an extra dot star after the initial quote. These single quotes occur mainly when there is a space in a sheet name. The purpose of the dot star, matching unconstrained text after an initial single quote, is unclear and it appears to create problems. I'll discuss those problems below. Besides that the two alternative prefixes are the same, and I'll refer to them collectively as the Optional External Prefix (OEP).

The OEP has its own two optional prefixes (the same in either alternative). The first is for the workbook name, an open-ended dot star in brackets.

(\[.*\])?

The second is for a "3D" cell reference, with two sheet names separated by a colon; it is the initial sheet name including the colon. The pattern here is a negated character class allowing up to 31 characters of anything except forward slash, back slash, question mark, asterisk, brackets, or colon, followed by a colon:

([^\:\\\/\?\*\[\]]{1,31}\:)?

Finally for the OEP is its only required part: a sheet name, same as the optional sheet name but with no colon. The effect is (if these all worked correctly) that the required sheet name will match if it can, and then only if there is a 3d reference or additional prior bracketed text will its optional prefixes also match.

Issues with the Workbook/Sheet name prefix: First, the dot star at the beginning of the first line is over-inclusive. Similarly, the negated character class for the sheet name appears to need additional characters including parens, comma, plus, minus, equals, and bang. Otherwise, extra material is interpreted as part of the sheet name. On my testing, this overinclusion happened with any of these:

=SUM(Sheet1!A1,Sheet2!A2)
=Sheet1!A1+Sheet2!A2
=Sheet1!A1-Sheet2!A2

Sheet names can include some of these characters, so accounting for that would require some additional measure. For instance, a sheet could be named "(Sheet1)", giving an odd formula like:

=SUM('(Sheet1)'!A1:A2)

You'd like to get the inner parens with the sheet name there, but not the outer paren. Excel puts the single quotes on that one, as it would with a space in the sheet name. You could then exclude parens in the non-single quote version since within the single quote it's ok. But then beware Excel seems to even allow single quotes in sheet names. Taking these naming quirks to the extreme, I just successfully named a sheet "Hi'Sheet1'SUM('Sheet2'!A1,A2)!". That's absurd but it points to what could happen. I learned in doing this that if I include a single quote in a sheet name, formulas escape the single quote with a second single quote. So a SUM(A1:A2) referring to the sheet I just created ends up looking like this:

=SUM('Hi''Sheet1''SUM(''Sheet2''!A1,A2)!'!A1:A2)

That actually does give some insight into the Excel parser itself. I suspect to adequately deal with this you may want separately (outside the regex) to compare the potential sheet names or workbook names to the actual sheet names, as you have done with the named ranges.

This leads to the six forms of cell references allowed in the regex (any one of which, if met, will produce a match):

1.) A one-cell or multi-cell range with rows and columns

"(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?"

The open paren here is closed at the end of the 6 options. Otherwise, this line allows a basic cell reference of the type "$A$1", "A1", "$A1", "A$1", or any combination of these in a multi-cell range ("$A1:A$2", etc.).

2.) A full-column or multi-column range with absolute references only

"|\$[a-z]{1,3}\:\$[a-z]{1,3}"

This one allows a cell reference of the type "$A:$B" with a dollar sign on both. Note a dollar sign on only one side will not match.

3.) A full-column or multi-column range with relative references only

"|[a-z]{1,3}\:[a-z]{1,3}"

This line is like the last, but matches only with no dollar signs. Note a dollar sign on only one side will not match here either.

4.) A full-row or multi-row range with absolute references only

"|\$[0-9]{1,7}\:\$[0-9]{1,7}"

This line allows a cell reference of the type "$1:$2" with a dollar sign on both.

5.) A full-row or multi-row range with relative references only

"|[0-9]{1,7}\:[0-9]{1,7}"

This version is like the last, but matches only with no dollar signs.

6.) Other text that could be a named range

 "|[a-z_\\][a-z0-9_\.]{0,254})"

Finally, the sixth option allows text. This text is compared to actual named ranges later in sub.

The main omission that I see here is ranges that have both absolute and relative references, of the type "A:$A" or "1:$1". While $A:A is captured because it includes "A:A", "A:$A" is not captured. You could address this and simplify the regex by combining 2 and 3 and combining 4 and 5 with optional dollar signs:

objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$?[a-z]{1,3}\:\$?[a-z]{1,3}" _
& "|\$?[0-9]{1,7}\:\$?[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"

Combining these further would seem to come up against the everything-is-optional problem.

One other issue is in the initial regex pattern for matching strings, which you use to expunge potential ranges that fall inside a quoted string:
objRegExp.Pattern = "\"".*\"""
When I test this on a formula with a string at the beginning and end of a formula, the greediness of the dot star captures everything from the initial quote to the final quote (in other words it interprets the entire formula as one big quoted string, even though there is non-string material in the middle). It appears you can fix this by making the dot star lazy (adding a question mark after it). That raises questions about quotes within quotes, but they may not be a problem. For instance, I tested this formula:

="John loves his A1 steak sauce, but said the ""good A1 steak sauce price"" is $" & A2+A3 & " less than the ""bad price"" of $" & A4 & "."

With cell values plugged in, this formula evaluates to:

John loves his A1 steak sauce, but said the "good A1 steak sauce
price" is $5 less than the "bad price" of $8.

With the lazy modifier added to your string pattern, both versions of "A1" above were recognized as occurring within a string and so were expunged, while A2, A3 and A4 were recognized as cell references.

I'm sure there are some technical issues with some of my language above, but hopefully the analysis is still useful.

回复收藏 0 原文

删除会话 2024-08-21 18:45:53

jtolle 引导我走向正确的方向。据我所知，这就是我试图做的事情。我一直在测试，它似乎有效。

stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula

谢谢杰托勒！

jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.

stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula

Thanks jtolle!

回复收藏 0 原文

卖梦商人 2024-08-21 18:45:53

谢谢 Ben（我是新来的，尽管 Stackoverflow 多年来因其高质量的技术内容而引起了我的注意，所以我不确定我是否正确地为作者 J 阅读了此页面）

我尝试了发布的解决方案（测试，测试更新，以及使用 range.precendents 的测试（正确指出，不涵盖对其他工作表或其他工作簿的引用）并发现一个小缺陷：仅当外部工作表名称包含在“单引号”中时它是一个数字；如果它包含空格（可能还包含原始帖子中列出的 Ben (?) 等其他字符），只需在正则表达式中添加一个简单的内容（打开 [）即可纠正（添加“[”，请参阅下面的代码）此外，出于我自己的目的，我将 sub 转换为一个函数，该函数将返回一个删除重复项的逗号分隔列表（注意，这仅删除相同的引用符号，而不是包含在多个范围中的单元格）：

Public Function CellReflist(Optional r As Range)  ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2)  ' INPUT THE CELL HERE , e.g.    RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?"""    ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)"  'grab the address

objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
    Set result = objRegEx.Execute(testExpression)
    If result.Count > 0 Then CellReflist = result(0).Value
    If result.Count > 1 Then
        For i = 1 To result.Count - 1 'Each Match In result
            dbl = False ' poistetaan tuplaesiintymiset
            For j = 0 To i - 1
                If result(i).Value = result(j).Value Then dbl = True
            Next j
            If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
        Next i 'Match
    End If
End If

End Function

Thanks Ben (I'm new to post here, even though Stackoverflow has caught my attention for years for high quality technical stuff, so I'm not sure if I read this page correctly for the author J)

I tried the posted solutions (testing, testing updated, as well as the one using range.precendents (which as correctly pointed, does not cover references to other sheets or other workbooks) and found a minor flaw: the external sheet name is enclosed in 'single quotation marks' only if it is a number; if it contains space (and possibly other characters as Ben (?) listed in the orginal post. with a simple addition to the regEx (opening [) this can be corrected (added "[", see code below). In addition, for my own purpose I converted the sub to a function that will return a comma-separated list with duplicates removed (note, this removes just identical reference notation, not cells that are included in multiple ranges):

Public Function CellReflist(Optional r As Range)  ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2)  ' INPUT THE CELL HERE , e.g.    RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?"""    ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)"  'grab the address

objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
    Set result = objRegEx.Execute(testExpression)
    If result.Count > 0 Then CellReflist = result(0).Value
    If result.Count > 1 Then
        For i = 1 To result.Count - 1 'Each Match In result
            dbl = False ' poistetaan tuplaesiintymiset
            For j = 0 To i - 1
                If result(i).Value = result(j).Value Then dbl = True
            Next j
            If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
        Next i 'Match
    End If
End If

End Function

回复收藏 0 原文

梦里的微风 2024-08-21 18:45:53

我在 Google 表格中解决了类似的问题。

以下内容从公式中添加/减去行引用。因为我只需要更新行引用，而不是提取公式，所以我只是使用此 /((?<=[A-Za-z\$:\!])\d+( ?![A-Za-z\(!]))|(\d+(?=[:]))/

String.prototype.replaceAt = function(index, replacement, diff = 0) {
    let end = this.substr(index + replacement.length + diff)
    if((this.length - 1) === index) end = ""
    return this.substr(0, index) + replacement + end;
}
// Ref: https://stackoverflow.com/a/1431113/2319414

/**
 * @param row - positive integer to add, negative to subtract rows.
 */
function updateRowReference(formula, row){

  let masked = formula
  const mask = "#"

  // masking double quotes in string literals
  let exp = /""/g
  let result;
  while((result = exp.exec(masked)) !== null){
    masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
  }
  // masking string literals
  exp = /\"([^\\\"]|\\.)*\"/g
  // Ref: https://stackoverflow.com/a/9260547
  while((result = exp.exec(masked)) !== null){
    masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
  }

  // updating row references
  const sRow = row.toString()
  // The magic is happening here
  // Just matching a number which is part of range address
  exp = /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/g
  while((result = exp.exec(masked)) !== null){
    const oldRow = Number(result[0])
    // adding/subtracting rows
    const newRow = (row + oldRow).toString()
    
    // preserving formula string length integrity if number of digits of new row is different than old row
    const diff = result[0].length - newRow.length
    masked = masked.replaceAt(result.index, newRow, diff)
    formula = formula.replaceAt(result.index, newRow, diff)
    exp.lastIndex -= diff
  }

  let updated = masked;

  // revert mask
  const array = formula.split("")
  while((result = updated.search(mask)) !== -1){
    updated = updated.replaceAt(result, array[result])
  }

  return updated
}

function test(){
  const cases = [
    "=$A$1", 
    "=A1", 
    "=$A1", 
    "=A$1", 
    "=$A1:B$1", 
    "=1:1", 
    "=Sheet1!1:1", 
    "=Sheet1!$A1:B$1", 
    "=Sheet1!A$1",
    '=IF(AND($C6 <> ""; NOT(ISBLANK(B$6))); IF(SUM(FILTER($F$6:$F$7;$C$6:$C$7 = $C6)) < $G6; 1; IF($E6 = 0; 1; 0)); 0)',
    "=$A$111", "=A111", "=$A111", "=A$111", "=$A111:B$111", 
    "=111:111", 
    "=Sheet1!111:111", 
    "=Sheet1!$A111:B$111", 
    "=Sheet1!A$111",
    '=IF(AND($C111 <> ""; NOT(ISBLANK(B$111))); IF(SUM(FILTER($F$111:$F$112;$C$111:$C$112 = $C111)) < $G111; 1; IF($E111 = 0; 1; 0)); 0)',

    // if string literals have addresses they shouldn't be affected
    '=IF(AND($C111 <> "A1 $A1 $A1:B$1";$C111 <> "Sheet1!1:1";$C111 <> "Sheet1!$A1:B$1"); 1 , 0)'
  ]

  const expectedAdd = [
    '=$A$16',
    '=A16',
    '=$A16',
    '=A$16',
    '=$A16:B$16',
    '=16:16',
    '=Sheet1!16:16',
    '=Sheet1!$A16:B$16',
    '=Sheet1!A$16',
    '=IF(AND($C21 <> ""; NOT(ISBLANK(B$21))); IF(SUM(FILTER($F$21:$F$22;$C$21:$C$22 = $C21)) < $G21; 1; IF($E21 = 0; 1; 0)); 0)',
    '=$A$126',
    '=A126',
    '=$A126',
    '=A$126',
    '=$A126:B$126',
    '=126:126',
    '=Sheet1!126:126',
    '=Sheet1!$A126:B$126',
    '=Sheet1!A$126',
    '=IF(AND($C126 <> ""; NOT(ISBLANK(B$126))); IF(SUM(FILTER($F$126:$F$127;$C$126:$C$127 = $C126)) < $G126; 1; IF($E126 = 0; 1; 0)); 0)',
    '=IF(AND($C126 <> "A1 $A1 $A1:B$1";$C126 <> "Sheet1!1:1";$C126 <> "Sheet1!$A1:B$1"); 1 , 0)' 
    ]

  let results = cases.map(_case => updateRowReference(_case, 15))

  console.log('Test Add')
  console.log(results.every((result, i) => result === expectedAdd[i]))

  console.log('Test Subtract')
  results = results.map(_case => updateRowReference(_case, -15))
  console.log(results.every((result, i) => result === cases[i]))
}

test()

地址为字符串的“INDIRECT”函数将不会更新

I resolved a similar problem in Google Sheets.

The following adds/subtract row references from a formula. Because I just needed to update row references, rather than extracting the formula I just extracted and updated the row reference with this /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/

String.prototype.replaceAt = function(index, replacement, diff = 0) {
    let end = this.substr(index + replacement.length + diff)
    if((this.length - 1) === index) end = ""
    return this.substr(0, index) + replacement + end;
}
// Ref: https://stackoverflow.com/a/1431113/2319414

/**
 * @param row - positive integer to add, negative to subtract rows.
 */
function updateRowReference(formula, row){

  let masked = formula
  const mask = "#"

  // masking double quotes in string literals
  let exp = /""/g
  let result;
  while((result = exp.exec(masked)) !== null){
    masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
  }
  // masking string literals
  exp = /\"([^\\\"]|\\.)*\"/g
  // Ref: https://stackoverflow.com/a/9260547
  while((result = exp.exec(masked)) !== null){
    masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
  }

  // updating row references
  const sRow = row.toString()
  // The magic is happening here
  // Just matching a number which is part of range address
  exp = /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/g
  while((result = exp.exec(masked)) !== null){
    const oldRow = Number(result[0])
    // adding/subtracting rows
    const newRow = (row + oldRow).toString()
    
    // preserving formula string length integrity if number of digits of new row is different than old row
    const diff = result[0].length - newRow.length
    masked = masked.replaceAt(result.index, newRow, diff)
    formula = formula.replaceAt(result.index, newRow, diff)
    exp.lastIndex -= diff
  }

  let updated = masked;

  // revert mask
  const array = formula.split("")
  while((result = updated.search(mask)) !== -1){
    updated = updated.replaceAt(result, array[result])
  }

  return updated
}

function test(){
  const cases = [
    "=$A$1", 
    "=A1", 
    "=$A1", 
    "=A$1", 
    "=$A1:B$1", 
    "=1:1", 
    "=Sheet1!1:1", 
    "=Sheet1!$A1:B$1", 
    "=Sheet1!A$1",
    '=IF(AND($C6 <> ""; NOT(ISBLANK(B$6))); IF(SUM(FILTER($F$6:$F$7;$C$6:$C$7 = $C6)) < $G6; 1; IF($E6 = 0; 1; 0)); 0)',
    "=$A$111", "=A111", "=$A111", "=A$111", "=$A111:B$111", 
    "=111:111", 
    "=Sheet1!111:111", 
    "=Sheet1!$A111:B$111", 
    "=Sheet1!A$111",
    '=IF(AND($C111 <> ""; NOT(ISBLANK(B$111))); IF(SUM(FILTER($F$111:$F$112;$C$111:$C$112 = $C111)) < $G111; 1; IF($E111 = 0; 1; 0)); 0)',

    // if string literals have addresses they shouldn't be affected
    '=IF(AND($C111 <> "A1 $A1 $A1:B$1";$C111 <> "Sheet1!1:1";$C111 <> "Sheet1!$A1:B$1"); 1 , 0)'
  ]

  const expectedAdd = [
    '=$A$16',
    '=A16',
    '=$A16',
    '=A$16',
    '=$A16:B$16',
    '=16:16',
    '=Sheet1!16:16',
    '=Sheet1!$A16:B$16',
    '=Sheet1!A$16',
    '=IF(AND($C21 <> ""; NOT(ISBLANK(B$21))); IF(SUM(FILTER($F$21:$F$22;$C$21:$C$22 = $C21)) < $G21; 1; IF($E21 = 0; 1; 0)); 0)',
    '=$A$126',
    '=A126',
    '=$A126',
    '=A$126',
    '=$A126:B$126',
    '=126:126',
    '=Sheet1!126:126',
    '=Sheet1!$A126:B$126',
    '=Sheet1!A$126',
    '=IF(AND($C126 <> ""; NOT(ISBLANK(B$126))); IF(SUM(FILTER($F$126:$F$127;$C$126:$C$127 = $C126)) < $G126; 1; IF($E126 = 0; 1; 0)); 0)',
    '=IF(AND($C126 <> "A1 $A1 $A1:B$1";$C126 <> "Sheet1!1:1";$C126 <> "Sheet1!$A1:B$1"); 1 , 0)' 
    ]

  let results = cases.map(_case => updateRowReference(_case, 15))

  console.log('Test Add')
  console.log(results.every((result, i) => result === expectedAdd[i]))

  console.log('Test Subtract')
  results = results.map(_case => updateRowReference(_case, -15))
  console.log(results.every((result, i) => result === cases[i]))
}

test()

'INDIRECT' function with addresses as strings will not be updated

回复收藏 0 原文

~没有更多了~

关于作者

流年里的时光

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

这是用于匹配 Excel 公式中任何单元格引用的正则表达式吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

1.) 具有行和列的单单元格或多单元格范围

2.) 仅具有绝对引用的全列或多列范围

3.) 仅具有相对引用的全列或多列范围

4.) 仅具有绝对引用的全行或多行范围

5.) 仅具有相对引用的全行或多行范围

6.) 可以是命名范围的其他文本

1.) A one-cell or multi-cell range with rows and columns

2.) A full-column or multi-column range with absolute references only

3.) A full-column or multi-column range with relative references only

4.) A full-row or multi-row range with absolute references only

5.) A full-row or multi-row range with relative references only

6.) Other text that could be a named range

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

这是用于匹配 Excel 公式中任何单元格引用的正则表达式吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

1.) 具有行和列的单单元格或多单元格范围

2.) 仅具有绝对引用的全列或多列范围

3.) 仅具有相对引用的全列或多列范围

4.) 仅具有绝对引用的全行或多行范围

5.) 仅具有相对引用的全行或多行范围

6.) 可以是命名范围的其他文本

1.) A one-cell or multi-cell range with rows and columns

2.) A full-column or multi-column range with absolute references only

3.) A full-column or multi-column range with relative references only

4.) A full-row or multi-row range with absolute references only

5.) A full-row or multi-row range with relative references only

6.) Other text that could be a named range

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。