如何清除原始 sql 文件中的注释

发布于 2024-11-04 10:51:47 字数 675 浏览 6 评论 0原文

我在清理现有 sql 文件中的注释和空行时遇到问题。 该文件有超过 10k 行,因此无法手动清理它。

我有一个小 python 脚本,但我不知道如何处理多行插入中的注释。

代码:

f = file( 'file.sql', 'r' )
t = filter( lambda x: not x.startswith('--') \
            and not x.isspace() 
  , f.readlines() )
f.close()
t #<- here the cleaned data should be

它应该如何工作:

这应该被清理:

-- normal sql comment

这应该保持原样:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
    AS $$
BEGIN
        -- comment
       [...]
END;
$$;

INSERT INTO public.texts (multilinetext) VALUES ('
and more lines here \'
-- part of text 
\'
[...]

');

I have problem with cleaning comments and empty lines from already existing sql file.
The file has over 10k lines so cleaning it manually is not an option.

I have a little python script, but I have no idea how to handle comments inside multi line inserts.

Code:

f = file( 'file.sql', 'r' )
t = filter( lambda x: not x.startswith('--') \
            and not x.isspace() 
  , f.readlines() )
f.close()
t #<- here the cleaned data should be

How it should work:

This should be cleaned:

-- normal sql comment

This should stay as it is:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
    AS $
BEGIN
        -- comment
       [...]
END;
$;

INSERT INTO public.texts (multilinetext) VALUES ('
and more lines here \'
-- part of text 
\'
[...]

');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

无可置疑 2024-11-11 10:51:52
# Remove comments i.e. lines beginning with whitespace and '--' (using multi-line flag)
re.sub('^\s*--.*\n?', '', query, flags=re.MULTILINE)

正则表达式字符串解释:

  • ^ 行开头
  • \s 空格
  • \s* 零个或多个空格字符
  • - 两个连字符(静态字符串模式)
  • 。* 零个或多个任何字符(即行的其余部分)
  • \n 换行符
  • ? end of string
  • flags = re.M 是多行修饰符

“指定后,模式字符 '^' 匹配字符串的开头和每行的开头(紧跟每个换行符)”

有关更多详细信息,请参阅 Python 正则表达式文档:

https://docs.python.org/3/library/re.html

# Remove comments i.e. lines beginning with whitespace and '--' (using multi-line flag)
re.sub('^\s*--.*\n?', '', query, flags=re.MULTILINE)

Regex string explained:

  • ^ start of line
  • \s whitespace
  • \s* zero or more whitespace characters
  • -- two hypens (static string pattern)
  • .* zero or more of any characters (i.e. the rest of the line)
  • \n newline character
  • ? end of string
  • flags = re.M is the multiline modifier

"When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline)"

See the Python regular expressions documentation for more details:

https://docs.python.org/3/library/re.html

飘过的浮云 2024-11-11 10:51:52

可以使用正则表达式来做到这一点。首先,您必须按字符串拆分文件,然后您可以按注释拆分文件。下面的 Perl 程序可以做到这一点:

#! /usr/bin/perl -w

# Read hole file.
my $file = join ('', <>);

# Split by strings including the strings.
my @major_parts = split (/('(?:[^'\\]++|\\.)*+')/, $file);

foreach my $part (@major_parts) {
    if ($part =~ /^'/) {
        # Print the part if it is a string.
        print $part; 
    }
    else {
        # Split by comments removing the comments
        my @minor_parts = split (/^--.*$/m, $part);
        # Print the remaining parts.
        print join ('', @minor_parts);
    }
}

It is possible to do it with regular expressions. First you have to split the file by strings and after this you can split the file by comments. The following Perl program does it:

#! /usr/bin/perl -w

# Read hole file.
my $file = join ('', <>);

# Split by strings including the strings.
my @major_parts = split (/('(?:[^'\\]++|\\.)*+')/, $file);

foreach my $part (@major_parts) {
    if ($part =~ /^'/) {
        # Print the part if it is a string.
        print $part; 
    }
    else {
        # Split by comments removing the comments
        my @minor_parts = split (/^--.*$/m, $part);
        # Print the remaining parts.
        print join ('', @minor_parts);
    }
}
蓝戈者 2024-11-11 10:51:51

这是示例偏差答案的扩展,适用于您的示例:

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""

new_sql = []

for statement in sqlparse.parse(sql_example):
    new_tockens = [stm for stm in statement.tokens 
                   if not isinstance(stm, sqlparse.sql.Comment)]

    new_statement = sqlparse.sql.TokenList(new_tockens)
    new_sql.append(new_statement.to_unicode())

print sqlparse.format("\n".join(new_sql))

输出:

SELECT * from test;

INSERT INTO test VALUES ('
-- test
a
');

This is an extend of samplebias answer that work with your example :

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""

new_sql = []

for statement in sqlparse.parse(sql_example):
    new_tockens = [stm for stm in statement.tokens 
                   if not isinstance(stm, sqlparse.sql.Comment)]

    new_statement = sqlparse.sql.TokenList(new_tockens)
    new_sql.append(new_statement.to_unicode())

print sqlparse.format("\n".join(new_sql))

Output:

SELECT * from test;

INSERT INTO test VALUES ('
-- test
a
');
深巷少女 2024-11-11 10:51:50

尝试使用 sqlparse 模块。

更新的示例:在插入值内留下注释,并在 CREATE FUNCTION 块内留下注释。您可以进一步调整以调整行为:

import sqlparse
from sqlparse import tokens

queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $
        BEGIN
                -- comment
       END;
       $;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a 
-- foo bar'
);

'''

IGNORE = set(['CREATE FUNCTION',])  # extend this

def _filter(stmt, allow=0):
    ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
    start = ' '.join(d.value for d in ddl[:2])
    if ddl and start in IGNORE:
        allow = 1
    for tok in stmt.tokens:
        if allow or not isinstance(tok, sqlparse.sql.Comment):
            yield tok

for stmt in sqlparse.split(queries):
    sql = sqlparse.parse(stmt)[0]
    print sqlparse.sql.TokenList([t for t in _filter(sql)])

输出:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $
        BEGIN
                -- comment
       END;
       $;

SELECT * FROM TABLE foo;

INSERT INTO foo VALUES ('a -- foo bar');

INSERT INTO foo
VALUES ('
a
-- foo bar'
);

Try the sqlparse module.

Updated example: leaving comments inside insert values, and comments within CREATE FUNCTION blocks. You can tweak further to tune the behavior:

import sqlparse
from sqlparse import tokens

queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $
        BEGIN
                -- comment
       END;
       $;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a 
-- foo bar'
);

'''

IGNORE = set(['CREATE FUNCTION',])  # extend this

def _filter(stmt, allow=0):
    ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
    start = ' '.join(d.value for d in ddl[:2])
    if ddl and start in IGNORE:
        allow = 1
    for tok in stmt.tokens:
        if allow or not isinstance(tok, sqlparse.sql.Comment):
            yield tok

for stmt in sqlparse.split(queries):
    sql = sqlparse.parse(stmt)[0]
    print sqlparse.sql.TokenList([t for t in _filter(sql)])

Output:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $
        BEGIN
                -- comment
       END;
       $;

SELECT * FROM TABLE foo;

INSERT INTO foo VALUES ('a -- foo bar');

INSERT INTO foo
VALUES ('
a
-- foo bar'
);
雅心素梦 2024-11-11 10:51:49

添加更新的答案:)

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
 """
print sqlparse.format(sql_example, strip_comments=True).strip()

输出:

从测试中选择*;
INSERT INTO 测试值 ('
 -  测试
一个
');

它实现了相同的结果,但也涵盖了所有其他极端情况并且更简洁

Adding an updated answer :)

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
 """
print sqlparse.format(sql_example, strip_comments=True).strip()

Output:

SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');

It achieves the same result but also covers all other corner cases and more concise

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文