通过命令行插入 SQL 语句,无需重新打开与远程数据库的连接

发布于 2024-12-11 02:51:41 字数 473 浏览 0 评论 0 原文

我有大量数据文件需要处理并存储在远程数据库中。数据文件的每一行代表数据库中的一行,但在插入数据库之前必须进行格式化。

我的第一个解决方案是通过编写bash脚本处理数据文件并生成SQL数据文件,然后将转储SQL文件导入数据库。这个解决方案似乎太慢了,正如您所看到的,涉及创建中间 SQL 文件的额外步骤。

我的第二个解决方案是编写 bash 脚本,在处理数据文件的每一行时,创建 INSERT INTO ... 语句并将 SQL 语句发送到远程数据库:

echo sql_statement | psql -h Remote_server -U 用户名 -d 数据库

即不创建 SQL 文件。然而,这个解决方案有一个主要问题,我正在寻求建议:
每次我都必须重新连接到远程数据库以插入一行。

有没有一种方法可以连接到远程数据库,保持连接,然后“管道”或“发送”插入 SQL-语句而不创建一个巨大的 SQL 文件?

I have a large amount of data files to process and to be stored in the remote database. Each line of a data file represents a row in the database, but must be formatted before inserting into the database.

My first solution was to process data files by writing bash scripts and produce SQL data files, and then import the dump SQL files into the database. This solution seems to be too slow and as you can see involves an extra step of creating intermediary SQL file.

My second solution was to write bash scripts that while processing each line of the data file, creates and INSERT INTO ... statement and sends the SQL statement to the remote database:

echo sql_statement | psql -h remote_server -U username -d database

i.e. does not create SQL file. This solution, however, has one major issue that I am searching an advice on:
Each time I have to reconnect to the remote database to insert one single row.

Is there a way to connect to the remote database, stay connected and then "pipe" or "send" the insert-SQL-statement without creating a huge SQL file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

山川志 2024-12-18 02:51:41

回答您的实际问题

。您可以使用命名管道而不是创建文件。考虑以下演示。

在我的数据库 event 中创建一个架构 x 进行测试:

-- DROP SCHEMA x CASCADE;
CREATE SCHEMA x;
CREATE TABLE x.x (id int, a text);

从 shell 创建一个命名管道 (fifo),如下所示:

postgres@db:~$ mkfifo --mode=0666 /tmp/myPipe

1) 调用SQL 命令COPY 在服务器上使用命名管道:

postgres@db:~$ psql event -p5433 -c "COPY x.x FROM '/tmp/myPipe'"

这将在表 xx 上获取独占锁数据库。连接保持打开状态,直到 fifo 获取数据。小心不要让这个打开太久!您可以在填充管道后调用此函数,以最大限度地减少阻塞时间。您可以选择事件的顺序。一旦两个进程绑定到管道,该命令就会执行。第一个等待第二个。

或者2)您可以从客户端上的管道执行SQL:

postgres@db:~$ psql event -p5433 -f /tmp/myPipe

这更适合您的情况。此外,在 SQL 整体执行之前不会锁定表。

Bash 将显示为被阻止。它正在等待管道的输入。要从一个 bash 实例完成这一切,您可以将等待进程发送到后台。像这样:

postgres@db:~$ psql event -p5433 -f /tmp/myPipe 2>&1 &

无论哪种方式,从同一个 bash 或不同的实例,您现在都可以填充管道
变体三行演示 1):(

postgres@db:~$ echo '1  foo' >> /tmp/myPipe; echo '2    bar' >> /tmp/myPipe; echo '3    baz' >> /tmp/myPipe;

注意使用制表符作为分隔符或指示 COPY 使用 WITH DELIMITER 'delimiter_character' 接受不同的分隔符)
这将使用 COPY 命令触发挂起的 psql 执行并返回:

COPY 3

变体演示 2)

postgres@db:~$ (echo -n "INSERT INTO x.x VALUES (1,'foo')" >> /tmp/myPipe; echo -n ",(2,'bar')" >> /tmp/myPipe; echo ",(3,'baz')" >> /tmp/myPipe;)

INSERT 0 3

完成后删除命名管道:

postgres@db:~$ rm /tmp/myPipe

检查成功:

event=# select * from x.x;
 id |         a
----+-------------------
  1 | foo
  2 | bar
  3 | baz

上面代码的有用链接

使用 postgres 读取压缩文件命名管道
命名管道简介
在后台运行 bash 脚本的最佳实践


您可能需要也可能不需要的建议

对于批量INSERT,您有比单独的INSERT 每行。使用此语法变体:

INSERT INTO mytable (col1, col2, col3) VALUES
 (1, 'foo', 'bar')
,(2, 'goo', 'gar')
,(3, 'hoo', 'har')
...
;

将语句写入文件并执行批量 INSERT,如下所示:

psql -h remote_server -U username -d database -p 5432 -f my_insert_file.sql

(5432 或数据库集群正在侦听的任何端口)
my_insert_file.sql 可以保存多个 SQL 语句。事实上,像这样恢复/部署整个数据库是常见的做法。请查阅有关-f的手册 参数,或在 bash 中:man psql

或者,如果您可以将(压缩的)文件传输到服务器,则可以使用 COPY 可以更快地插入(解压缩的)数据。

您还可以在 PostgreSQL 内进行部分或全部处理。为此,您可以COPY TO(或INSERT INTO)临时表并使用普通 SQL 语句来准备并最终 INSERT/UPDATE 您的表。我经常这样做。请注意,临时表随会话一起生存和消亡。

您可以使用像 pgAdmin 这样的 GUI 来轻松处理。 SQL 编辑器窗口中的会话将保持打开状态,直到您关闭该窗口。 (因此,临时表会一直存在,直到您关闭窗口为止。)

Answer to your actual question

Yes. You can use a named pipe instead of creating a file. Consider the following demo.

Create a schema x in my database event for testing:

-- DROP SCHEMA x CASCADE;
CREATE SCHEMA x;
CREATE TABLE x.x (id int, a text);

Create a named pipe (fifo) from the shell like this:

postgres@db:~$ mkfifo --mode=0666 /tmp/myPipe

Either 1) call the SQL command COPY using a named pipe on the server:

postgres@db:~$ psql event -p5433 -c "COPY x.x FROM '/tmp/myPipe'"

This will acquire an exclusive lock on the table x.x in the database. The connection stays open until the fifo gets data. Be careful not to leave this open for too long! You can call this after you have filled the pipe to minimize blocking time. You can chose the sequence of events. The command executes as soon as two processes bind to the pipe. The first waits for the second.

Or 2) you can execute SQL from the pipe on the client:

postgres@db:~$ psql event -p5433 -f /tmp/myPipe

This is better suited for your case. Also, no table locks until SQL is executed in one piece.

Bash will appear blocked. It is waiting for input to the pipe. To do it all from one bash instance, you can send the waiting process to the background instead. Like this:

postgres@db:~$ psql event -p5433 -f /tmp/myPipe 2>&1 &

Either way, from the same bash or a different instance, you can fill the pipe now.
Demo with three rows for variant 1):

postgres@db:~$ echo '1  foo' >> /tmp/myPipe; echo '2    bar' >> /tmp/myPipe; echo '3    baz' >> /tmp/myPipe;

(Take care to use tabs as delimiters or instruct COPY to accept a different delimiter using WITH DELIMITER 'delimiter_character')
That will trigger the pending psql with the COPY command to execute and return:

COPY 3

Demo for for variant 2):

postgres@db:~$ (echo -n "INSERT INTO x.x VALUES (1,'foo')" >> /tmp/myPipe; echo -n ",(2,'bar')" >> /tmp/myPipe; echo ",(3,'baz')" >> /tmp/myPipe;)

INSERT 0 3

Delete the named pipe after you are done:

postgres@db:~$ rm /tmp/myPipe

Check success:

event=# select * from x.x;
 id |         a
----+-------------------
  1 | foo
  2 | bar
  3 | baz

Useful links for the code above

Reading compressed files with postgres using named pipes
Introduction to Named Pipes
Best practice to run bash script in background


Advice you may or may not not need

For bulk INSERT you have better solutions than a separate INSERT per row. Use this syntax variant:

INSERT INTO mytable (col1, col2, col3) VALUES
 (1, 'foo', 'bar')
,(2, 'goo', 'gar')
,(3, 'hoo', 'har')
...
;

Write your statements to a file and do one mass INSERT like this:

psql -h remote_server -U username -d database -p 5432 -f my_insert_file.sql

(5432 or whatever port the db-cluster is listening on)
my_insert_file.sql can hold multiple SQL statements. In fact, it's common practise to restore / deploy whole databases like that. Consult the manual about the -f parameter, or in bash: man psql.

Or, if you can transfer the (compressed) file to the server, you can use COPY to insert the (decompressed) data even faster.

You can also do some or all of the processing inside PostgreSQL. For that you can COPY TO (or INSERT INTO) a temporary table and use plain SQL statements to prepare and finally INSERT / UPDATE your tables. I do that a lot. Be aware that temporary tables live and die with the session.

You could use a GUI like pgAdmin for comfortable handling. A session in an SQL Editor window remains open until you close the window. (Therefore, temporary tables live until you close the window.)

酷炫老祖宗 2024-12-18 02:51:41

我知道我迟到了,但为什么不能将所有 INSERT 语句合并到一个字符串中,并用分号标记每个语句的结尾? (警告!前面有伪代码...)

而不是:

for each line
  sql_statement="INSERT whatever YOU want"
  echo $sql_statement | psql ...
done

使用:

sql_statements=""
for each line
  sql_statement="INSERT whatever YOU want;"
  sql_statements="$sql_statements $sql_statement"
done
echo $sql_statements | psql ...

这样您就不必在文件系统上创建任何内容,进行大量重定向,在后台运行任何任务,记住之后删除文件系统上的任何内容,或者甚至提醒自己什么是命名管道。

I know I'm late to the party, but why couldn't you combine all your INSERT statements into a single string, with a semicolon marking the end of each statement? (Warning! Pseudocode ahead...)

Instead of:

for each line
  sql_statement="INSERT whatever YOU want"
  echo $sql_statement | psql ...
done

Use:

sql_statements=""
for each line
  sql_statement="INSERT whatever YOU want;"
  sql_statements="$sql_statements $sql_statement"
done
echo $sql_statements | psql ...

That way you don't have to create anything on your filesystem, do a bunch of redirection, run any tasks in the background, remember to delete anything on your filesystem afterwards, or even remind yourself what a named pipe is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文