正则表达式:尝试改进此正则表达式

发布于 2024-12-20 20:21:30 字数 1800 浏览 3 评论 0原文

使用这个正则表达式:

[']?[%]?[^"]#([^#]*)#[%]?[']?

我在这段文本上

insert into table (id,name,age) values ('#var1#' ,#var2#,'#var3#', 3, 'name') where id = '#id#' like "" 
and test=<cfqueryparam value="#id#">

由于某种原因,它捕获了 #var2#'#var3#' 之间的逗号 但是当我包含 [^,] 它开始做奇怪的事情。 有人可以帮我解决这个问题吗?

当我现在阅读我的正则表达式时,它应该找到任何符合以下条件的内容:

  • 可能有单引号
  • 可能有百分比
  • 没有双引号
  • 然后有一个散列 (#)
  • 后跟没有散列,但全部其他字符
  • 则有一个散列,后跟一个百分比或引号

那么为什么当我在前面添加“无逗号”时,正则表达式会中断?


更新的问题:

好的,我会尝试解释一下:查询可以如下所示:

SELECT  e.*, m.man_id, m.man_title, c.cat_id, c.cat_name
FROM    ec_products e, ec_categories c, ec_manufacturers m
WHERE   c.cat_id = e.prod_category AND
        e.prod_manufacturer = m.man_id AND
        e.prod_title LIKE <cfqueryparam value="%#attributes.keyword#%"> and
test='#var1#'
ORDER BY e.prod_title  

现在我想要 ## 之间的每个值,但不是 queryparam 标记包围的值。因此,在示例中,我确实需要 #var1#,但不需要 #attributes.keyword#。原因是查询中所有没有被标签包围的参数都是不安全的,并且可能导致 SQL 注入。我当前的正则表达式是

(?!")'?%?#(?!\d)[\w.\(\)]+#%?'?(?!")

并且它几乎就在那里。由于 %,它确实找到了 attributes.keyword。我只想要任何带有 ## 但不被双引号包围的内容,所以不是 "##"。这将为我提供 sql 中所有不安全的参数,例如 '#var#'#aNumber#'%##' 、或 '%##%''##%,但不是

<cfqueryparam value="#variable#">

.希望你明白我的意图?

I am using this regex :

[']?[%]?[^"]#([^#]*)#[%]?[']?

on this text:

insert into table (id,name,age) values ('#var1#' ,#var2#,'#var3#', 3, 'name') where id = '#id#' like "" 
and test=<cfqueryparam value="#id#">

For some reason it is catching the comma between #var2# and '#var3#'
but when I include a [^,] it starts doing weird stuff.
Can someone help me with this one.

As I read my regex now, it should find anything that:

  • might have a single quote
  • might have a percentage
  • doesn't have a double quote
  • then has a hash (#)
  • followed by no hash, but all other characters
  • then has a hash and followed by a percentage or quote

So why, when I add "no comma" in front does the regex break??


Updated Question:

okay, Ill try to explain: a query can look like this:

SELECT  e.*, m.man_id, m.man_title, c.cat_id, c.cat_name
FROM    ec_products e, ec_categories c, ec_manufacturers m
WHERE   c.cat_id = e.prod_category AND
        e.prod_manufacturer = m.man_id AND
        e.prod_title LIKE <cfqueryparam value="%#attributes.keyword#%"> and
test='#var1#'
ORDER BY e.prod_title  

Now I want every value between ##, but not the values that are surrounded by a queryparam tag. So in the example I do want #var1# but not #attributes.keyword#. Reason for this is that all params in the query that are not surrounded by a tag are unsafe and can cause SQL injection. My current regex is

(?!")'?%?#(?!\d)[\w.\(\)]+#%?'?(?!")

and it is almost there. It does find the attributes.keyword because of the %. I just want anything that that has ## but not surrounded by double quotes, so not "##". This will give me all unsafe params in the sql, like '#var#', or #aNumber#, or '%##', or '%##%', or '##%, but NOT things like

<cfqueryparam value="#variable#">

. I hope you understand my intentions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

月寒剑心 2024-12-27 20:21:30

我想你可能会误解[^"]。它并不意味着“没有双引号”,而是意味着“一个字符,不是双引号”。类似地,[^,] 表示“一个字符,不是逗号”,因此您的正则表达式:

[']?[%]?[^"]#([^#]*)#[%]?[']?

将匹配 - 例如 - this:

2#,'#

由零个单引号、零个百分号组成,一不是双引号的字符(即 2),一个哈希符号,两个不是哈希符号的字符(即 ,'< /code>)、一个井号、零个百分号和零个撇号 ,'


更新问题的括号中的内容:

我认为你所描述的情况是不可能的仅使用 ColdFusion 正则表达式,因为它需要“lookbehind”(以确保某些内容前面没有双引号),ColdFusion 正则表达式显然是这样的(根据Google-search)不支持但是:

  • 这个 StackOverflow 答案提供了一种使用 Java 的方法。 ColdFusion 中的正则表达式。如果您使用该技术,则可以使用 Java 正则表达式
    '?%?(? 以确保前面没有双引号。
  • 您从未提到过如何实际上正在使用这个正则表达式。匹配
    .'?%?#(?!\d)[\w.()]+#%?'?(?!")

    (即匹配不只是感兴趣的部分,还有前面的字符),然后单独确认匹配的子字符串不以双引号开头?

我还不得不提一下,因为听起来您正在尝试使用基于正则表达式的模式匹配来帮助检测和解决可能的 SQL 注入点,这是一个坏主意;你永远无法完美地做到这一点,所以如果有的话,我认为这最终会增加 SQL 注入的风险(通过增加对有缺陷的方法的依赖)。

I think you might be misunderstanding [^"]. It doesn't mean "doesn't have a double quote", but rather means, "one character, which is not a double-quote". Similarly, [^,] means "one character, which is not a comma". So your regex:

[']?[%]?[^"]#([^#]*)#[%]?[']?

will match — for example — this:

2#,'#

which consists of zero single-quotes, zero percent-signs, one character-which-is-not-a-double-quote (namely 2), one hash-sign, two characters-which-are-not-hash-signs (namely ,'), one hash-sign, zero percent-signs, and zero apostrophes. The ,' is what will be captured by the parentheses.


Update for updated question:

I don't think that what you describe is possible using just a ColdFusion regex, because it would require "lookbehind" (to ensure that something is not preceded by a double-quote), which ColdFusion regexes apparently (according to a Google-search) do not support. However:

  • This StackOverflow answer gives a way of using Java regexes in ColdFusion. If you use that technique, then you can use the Java regex
    '?%?(?<!")(?<!"')(?<!"%)(?<!"'%)#(?!\d)[\w.()]+#(?!%?'?")%?'?

    to ensure that there's no preceding double-quote.

  • You never mentioned how you're actually using this regex. Would it work for you to match
    .'?%?#(?!\d)[\w.()]+#%?'?(?!")

    (i.e., to match not just the section of interest, but also the preceding character), and then separately confirm that the matched substring doesn't start with a double-quote?

I also feel compelled to mention, since it sounds like you're trying to use regex-based pattern-matching to help detect and address points of possible SQL injection, that this is a bad idea; you will never be able to do this perfectly, so if anything, I think it will end up increasing your risk of SQL injection (by increasing your reliance on a buggy methodology).

铁轨上的流浪者 2024-12-27 20:21:30

从初始正则表达式中保留捕获组,这是一个修改后的表达式。

'?%?(?!")#([^#]+)#%?'?

Preserving your capture group from the initial regex, here is a revised expression.

'?%?(?!")#([^#]+)#%?'?
岁月无声 2024-12-27 20:21:30

根据您提供的信息,这应该是正确的。

'?%?(?!")#[^#]+#%?'?

Based on the information you provided this should be correct.

'?%?(?!")#[^#]+#%?'?
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文