多线程BASH编程——通用方法？

发布于 2024-08-10 20:02:33 字数 1681 浏览 13 评论 0原文

好的，我在所有演示上运行 POV-Ray，但 POV 仍然是单线程的，不会使用多个核心。所以，我开始考虑 BASH 中的解决方案。

我编写了一个通用函数，它接受命令列表并在指定数量的子 shell 中运行它们。这实际上有效，但我不喜欢它以~~线程安全~~ 多进程方式处理访问下一个命令的方式：

它需要一个文件作为参数对于命令（每行 1 个），
要获取“下一个”命令，每个进程（“线程”）将：
- 等待直到可以创建锁定文件，其中：ln $CMDFILE $LOCKFILE
- 从文件中读取命令，
- 通过删除第一行来修改 $CMDFILE，
- 删除 $LOCKFILE。

是否有更简洁的方法来执行此操作？ 我无法让子 shell 正确地从 FIFO 读取一行。

Incidentally, the point of this is to enhance what I can do on a BASH command line, and not to find non-bash solutions. I tend to perform a lot of complicated tasks from the command line and want another tool in the toolbox.

同时，这是处理从文件中获取下一行的函数。正如您所看到的，它每次读取/删除一行时都会修改磁盘上的文件。这看起来有些骇人听闻，但我没有想出更好的办法，因为在 bash 中如果没有 setvbuf() 的话 FIFO 就无法工作。

#
# Get/remove the first line from FILE, using LOCK as a semaphore (with
# short sleep for collisions).  Returns the text on standard output,
# returns zero on success, non-zero when file is empty.
#
parallel__nextLine() 
{
  local line rest file=$1 lock=$2

  # Wait for lock...
  until ln "${file}" "${lock}" 2>/dev/null
  do sleep 1
     [ -s "${file}" ] || return $?
  done

  # Open, read one "line" save "rest" back to the file:
  exec 3<"$file"
  read line <&3 ; rest=$(cat<&3)
  exec 3<&-

  # After last line, make sure file is empty:
  ( [ -z "$rest" ] || echo "$rest" ) > "${file}"

  # Remove lock and 'return' the line read:
  rm -f "${lock}"
  [ -n "$line" ] && echo "$line"
}

原文

Ok, I was running POV-Ray on all the demos, but POV's still single-threaded and wouldn't utilize more than one core. So, I started thinking about a solution in BASH.

I wrote a general function that takes a list of commands and runs them in the designated number of sub-shells. This actually works but I don't like the way it handles accessing the next command in a ~~thread-safe~~ multi-process way:

It takes, as an argument, a file with commands (1 per line),
To get the "next" command, each process ("thread") will:
- Waits until it can create a lock file, with: ln $CMDFILE $LOCKFILE
- Read the command from the file,
- Modifies $CMDFILE by removing the first line,
- Removes the $LOCKFILE.

Is there a cleaner way to do this? I couldn't get the sub-shells to read a single line from a FIFO correctly.

Meanwhile, here's the function that handles getting the next line from the file. As you can see, it modifies an on-disk file each time it reads/removes a line. That's what seems hackish, but I'm not coming up with anything better, since FIFO's didn't work w/o setvbuf() in bash.

#
# Get/remove the first line from FILE, using LOCK as a semaphore (with
# short sleep for collisions).  Returns the text on standard output,
# returns zero on success, non-zero when file is empty.
#
parallel__nextLine() 
{
  local line rest file=$1 lock=$2

  # Wait for lock...
  until ln "${file}" "${lock}" 2>/dev/null
  do sleep 1
     [ -s "${file}" ] || return $?
  done

  # Open, read one "line" save "rest" back to the file:
  exec 3<"$file"
  read line <&3 ; rest=$(cat<&3)
  exec 3<&-

  # After last line, make sure file is empty:
  ( [ -z "$rest" ] || echo "$rest" ) > "${file}"

  # Remove lock and 'return' the line read:
  rm -f "${lock}"
  [ -n "$line" ] && echo "$line"
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怕倦 2024-08-17 20:02:33

#adjust these as required
args_per_proc=1 #1 is fine for long running tasks
procs_in_parallel=4

xargs -n$args_per_proc -P$procs_in_parallel povray < list

请注意，coreutils 即将推出的 nproc 命令将自动确定
然后可以传递给 -P 的可用处理单元的数量

#adjust these as required
args_per_proc=1 #1 is fine for long running tasks
procs_in_parallel=4

xargs -n$args_per_proc -P$procs_in_parallel povray < list

Note the nproc command coming soon to coreutils will auto determine
the number of available processing units which can then be passed to -P

回复收藏 0 原文

七度光 2024-08-17 20:02:33

如果您需要真正的线程安全，我建议迁移到更好的脚本系统。

例如，使用 python，您可以使用信号量/队列创建具有安全同步的真实线程。

回复收藏 0 原文

独守阴晴ぅ圆缺 2024-08-17 20:02:33

很抱歉这么久之后才出现这个问题，但我为这个 IMO 拼凑了一个相当好的解决方案
它不能完美地工作，但它会将脚本限制为运行一定数量的子任务，然后等待所有剩余的任务结束。

#!/bin/bash

pids=()
thread() {
  local this
  while [ ${#} -gt 6 ]; do
    this=${1}
    wait "$this"
    shift
  done
  pids=($1 $2 $3 $4 $5 $6)
}
for i in 1 2 3 4 5 6 7 8 9 10
do
  sleep 5 &
  pids=( ${pids[@]-} $(echo $!) )
  thread ${pids[@]}
done
for pid in ${pids[@]}
do
  wait "$pid"
done

它似乎非常适合我正在做的事情（一次处理一堆文件的并行上传）并防止它破坏我的服务器，同时仍然确保在完成脚本之前上传所有文件

sorry to bump this after so long, but I pieced together a fairly good solution for this IMO
It doesnt work perfectly, but it will limit the script to a certain number of child tasks running, and then wait for all the rest at the end.

#!/bin/bash

pids=()
thread() {
  local this
  while [ ${#} -gt 6 ]; do
    this=${1}
    wait "$this"
    shift
  done
  pids=($1 $2 $3 $4 $5 $6)
}
for i in 1 2 3 4 5 6 7 8 9 10
do
  sleep 5 &
  pids=( ${pids[@]-} $(echo $!) )
  thread ${pids[@]}
done
for pid in ${pids[@]}
do
  wait "$pid"
done

it seems to work great for what I'm doing (handling parallel uploading of a bunch of files at once) and keeps it from breaking my server, while still making sure all the files get uploaded before it finishes the script

回复收藏 0 原文