通过位旋转找到循环调度中的下一个

发布于 2024-07-12 22:25:29 字数 743 浏览 12 评论 0原文

考虑以下问题。 您有一个位字符串,以 one-hot 编码表示当前计划的从站。 例如,“00000100”(最左边的位是#7,最右边的位是#0)表示从站#2被调度。

现在,我想在循环调度方案中选择下一个调度的从站,但有一些变化。 我有一个“请求掩码”,它说明了哪些奴隶实际上想要被安排。 下一个奴隶只会从那些愿意的人中挑选出来。

一些示例(假设循环调度是通过向左旋转完成的)。 示例1:

  • 当前:“00000100”
  • 掩码:“01100000”
  • 下一个时间表:“00100000” - 在正常循环中,#3 和#4 应该在#2 之后,但他们不要求,所以选择#5。

示例 2:

  • 当前:“01000000”
  • 掩码:“00001010”
  • 下一个:“00000010” - 因为调度是通过向左循环完成的,并且 #1 是按该顺序第一个请求的从站。

现在,我知道这可以很容易地在循环中编码。 但我实际上想通过一些不循环的操作来获得结果。 动机:我想在 VHDL/Verilog 的硬件(FPGA 中)中实现这一点。

一个额外的好处是编写一个对任意数量的从属 N 通用的算法。

顺便说一句,这不是一个家庭作业问题。 每当人们想要以某种方式调度从站并根据从站的请求来调节调度时,这就是一个重要的问题。 我当前的解决方案有点“沉重”,我想知道我是否遗漏了一些明显的东西。

Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, "00000100" (with the leftmost bit being #7 and rightmost #0) means that slave #2 is scheduled.

Now, I want to pick the next scheduled slave in a round-robin scheduling scheme, with a twist. I have a "request mask" which says which slaves actually want to be scheduled. The next slave will be picked only from those that want to.

Some examples (assume round-robin scheduling is done by rotating left).
Example1:

  • Current: "00000100"
  • Mask: "01100000"
  • Next schedule: "00100000" - in normal round-robin, #3 and then #4 should come after #2, but they don't request, so #5 is picked.

Example2:

  • Current: "01000000"
  • Mask: "00001010"
  • Next: "00000010" - because scheduling is done by cycling left, and #1 is the first requesting slave in that order.

Now, this can be easily coded in a loop, I know. But I actually want to get my result by a bit-twiddling operation, without loops. The motivation: I want to implement this in hardware (in an FPGA) in VHDL/Verilog.

A bonus is to make up an algorithm that's generic for any amount of slaves N.

By the way, this is not a homework question. It's an important problem whenever one wants to schedule slaves in some manner, and condition the scheduling by the slaves' requests. My current solution is somewhat "heavy" and I wanted to know if I'm missing something obvious.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

倾听心声的旋律 2024-07-19 22:25:29

循环不一定是坏的。

我会简单地执行

current[i] = current[i-1] & mask[i] |                         // normal shift logic
                mask[i] & current[i-2] & !mask[i-1] |         // here build logic 
                ...                                          // expression for 
                                                             // remaining 

然后将其放入生成循环中(即它将展开到硬件中),这将为表达式生成并行硬件。

这里提到的其他解决方案使用多个“-”。 我只能劝阻他们,因为这会让你的手术费用非常昂贵。 特别是。 在一热中您可以轻松获得超过> 32 位,这在硬件中不容易实现,因为借位必须经过所有位(某些 FPGA 上的专用进位逻辑使其可用于少量位)。

A loop does not have to be bad.

I would simply do

current[i] = current[i-1] & mask[i] |                         // normal shift logic
                mask[i] & current[i-2] & !mask[i-1] |         // here build logic 
                ...                                          // expression for 
                                                             // remaining 

And then put it into a generate loop (ie it will get unrolled into hardware), which will produce parallel hardware for the expressions.

Other here mentioned solutions use multiple "-". I can only discourage them, as this will get you a really expensive operation. Esp. in one hot you can get easily more than > 32 bits, which will not easily be implementable in HW, as the borrow has to go through all bits (the deadicated carry logic on certain fpgas make it approachable for small number of bits).

嘦怹 2024-07-19 22:25:29

我在 Altera 高级综合手册中找到了以下用于实现该任务的 Verilog 代码。

// 'base' is a one hot signal indicating the first request
// that should be considered for a grant.  Followed by higher
// indexed requests, then wrapping around.
//

module arbiter (
    req, grant, base
);

parameter WIDTH = 16;

input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;
input [WIDTH-1:0] base;

wire [2*WIDTH-1:0] double_req = {req,req};
wire [2*WIDTH-1:0] double_grant = double_req & ~(double_req-base);
assign grant = double_grant[WIDTH-1:0] | double_grant[2*WIDTH-1:WIDTH];

endmodule

它使用减法(尽管只使用一次),因此从概念上讲,它与 Doug 的解决方案非常相似。

I've found the following Verilog code for implementing the task in the Altera advanced synthesis cookbook.

// 'base' is a one hot signal indicating the first request
// that should be considered for a grant.  Followed by higher
// indexed requests, then wrapping around.
//

module arbiter (
    req, grant, base
);

parameter WIDTH = 16;

input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;
input [WIDTH-1:0] base;

wire [2*WIDTH-1:0] double_req = {req,req};
wire [2*WIDTH-1:0] double_grant = double_req & ~(double_req-base);
assign grant = double_grant[WIDTH-1:0] | double_grant[2*WIDTH-1:WIDTH];

endmodule

It uses subtraction (only once, though), so conceptually it's quite similar to Doug's solution.

失去的东西太少 2024-07-19 22:25:29

以下解决方案适用于任意数量的从站 (K),并且在 FPGA 中的复杂度为 O(n)。 对于该字段中的每个位,您将需要三个逻辑门和两个反相器。 我用基本的逻辑模拟器测试了这个概念,它有效。

当前掩码之间的逻辑门链本质上创建了一个优先级系统,该系统有利于链中“较低”的位。 该链在末端循环,但当前位用于断开该链。

为了可视化操作,假设位 3 已设置在 current 字段中,并在图中向下跟踪信号。 位 3 处的逻辑 1 在第一个 AND 门的输入处放置一个逻辑 0,这保证了该 AND 门的输出也为零(这是 OR 门链被破坏的地方) )。 第一个与门的输出端为零,第二个与门的输入端为一。 这使得 next 的位 2 直接依赖于 mask 的位 2

现在,“或”门链开始发挥作用。

如果mask的位2被设置,则直接位于其左侧的OR门的逻辑输出也将是1,这将在输入处放置逻辑1到 current 的位 2 下面的与门(该位为零,因为一次只能设置 current 中的一位)。 顶部与门输出处的逻辑 1 将逻辑 0 放置在底部与门的输入处,从而将 next 的位 1 设置为 0。

如果mask的位2未设置,则或门的两个输入都将为零,因此与门的输出低于位2 current 的位将为零,在底部与门的输入处放置一个 1,因此使 next 的位 1 取决于位1掩码

该逻辑遵循“或”门链“向上”的位,从左侧循环到右侧,确保next中只有一位可以设置为1。 一旦返回到 current 的位 3(由于该位被设置),循环就会停止。 这可以防止电路处于永久循环状态。

我没有 Verilog 或 VHDL 经验,因此我将把实际代码留给您 以及 stackoverflow 的其余部分

替代文本 http://img145.imageshack.us/img145/5125/bitshifterlogicdiagramkn7.jpg< /a>

注释:

  1. 这个解决方案只是部分的。 它仍然需要某种锁存机制来保存位字段。
  2. 请记住,随着位数的增加,栅极电压稳定所需的时间也会增加。
  3. 必须有一些逻辑来处理current字段等于零的情况。 请参阅此 stackoverflow 问题

The following solution works for any number of slaves (K), and is O(n) in your FPGA. For each bit in the field, you will require three logic gates and two inverters. I tested out the concept with a basic logic simulator, and it works.

The chain of logic gates between current and mask essentially creates a priority system that favors bits "lower down" in the chain. This chain is looped at the ends, but the current bits are used to break the chain.

To visualize the operation, imagine that bit 3 is set in the current field, and follow the signal downwards in the diagram. The logical one at bit 3 places a logical zero at the input to the first AND gate, which guarantees that the output of that AND gate will also be zero (this is where the OR-gate chain is broken). The zero at the output of the first AND gate places a one at the input to the second AND gate. This makes bit 2 of next directly dependent on bit 2 of mask.

Now, the chain of OR gates comes into play.

If bit 2 of mask was set, the logical output of the OR gate directly to the left of it will also be a one, which will place a logical one at the input to the AND gate below bit 2 of current (which will be zero, since only one bit in current can be set at a time). The logical one at the output of the top AND gate places a logical zero at the input of the bottom AND gate, thus setting bit 1 of next equal to zero.

If bit 2 of mask was not set, both inputs to the OR gate would be zero, so the output of the AND gate below bit 2 of current would be a zero, placing a one at the input to the bottom AND gate, and therefore making bit 1 of next dependent on bit 1 of mask.

This logic follows the chain of OR gates "up" the bits, looping around from the left side back over to the right, ensuring that only one bit in next can be set to a one. The loop stops once it makes its way back to bit 3 of current, as a result of that bit being set. This prevents the circuit from staying in a perpetual loop.

I have no experience with Verilog or VHDL, so I'll leave the actual code up to you and the rest of stackoverflow.

alt text http://img145.imageshack.us/img145/5125/bitshifterlogicdiagramkn7.jpg

notes:

  1. This solution is only partial. It will still require some kind of latching mechanism to hold the bit fields.
  2. Keep in mind that as you increase the number of bits, the time required for the gate voltages to settle will also increase.
  3. There will have to be some logic in place to handle the case where the current field is equal to zero. See this stackoverflow question.
铁轨上的流浪者 2024-07-19 22:25:29

有趣的问题! 我忍不住想知道你是否不能简化你的调度程序操作,所以这种操作是必要的。

鉴于您了解 VHDL,我不会详细介绍,但我的建议如下:

使用 3 位编码器将当前计划的任务转换为数字:

01000000 --> 6

然后使用桶形移位器将掩码旋转该数字 + 1(以跳过当前任务):

00001010 --> 00010100

然后使用优先级编码器找到第一个可用的“下一个”任务:

00010100 --> 00000100 --> 2

然后通过加法反转桶式移位:

(2+7) % 8 = 1

重新编码时将给出下一个计划任务:

00000010

应该非常快速和直接,尽管桶式移位器在房地产方面“昂贵” ,但目前我没有看到解决这个问题的简单方法。

编辑:道格的解决方案明显更加优雅......-

亚当

Interesting problem! I can't help but wonder if you can't simplify your scheduler operation so this sort of operation would be necessary.

Given that you know VHDL, I won't go into detail, but my suggestion would be the following:

Use a 3 bit encoder to turn the currently scheduled task into a number:

01000000 --> 6

Then use a barrel shifter to rotate the mask by that number + 1 (to skip the current task):

00001010 --> 00010100

Then use a priority encoder to find the first available "next" task:

00010100 --> 00000100 --> 2

Then reverse the barrel shift by addition:

(2+7) % 8 = 1

Which when re-encoded will give the next scheduled task:

00000010

Should be very fast and straightforward, although the barrel shifter is 'expensive' in terms of realestate, but I don't see an easy way to get around that at the moment.

Edit: Doug's solution is significantly more elegant...

-Adam

盛夏已如深秋| 2024-07-19 22:25:29

减去 1 是这里的基本思想。 它用于通过位级联借用来查找下一个任务。

bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current) 
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo

不过,这将在内部使用循环......

Subracting 1 is the essential idea here. It's used to cascade borrows through the bits to find the next task.

bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current) 
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo

This will use a loop internally though...

九公里浅绿 2024-07-19 22:25:29

假设采用二进制补码表示,在 C 中将这两个词称为 maskcurrent

mask_lo = (current << 1) - 1; // the bits to the right and including current
mask_hi = ~mask_lo;           // the bits to the left of current
                              // the left bits, otherwise right:
next = (mask & mask_hi) ? (mask & mask_hi) : (mask & mask_lo);
return (next & -next);        // the least significant bit set

Assuming twos complement representation, call your two words mask and current, in C:

mask_lo = (current << 1) - 1; // the bits to the right and including current
mask_hi = ~mask_lo;           // the bits to the left of current
                              // the left bits, otherwise right:
next = (mask & mask_hi) ? (mask & mask_hi) : (mask & mask_lo);
return (next & -next);        // the least significant bit set
池予 2024-07-19 22:25:29

这应该做你想要的:

number_of_tasks= <number of tasks, in the example this is 8>
next_mask= current | (current - 1);
next_barrel= next | (next << number_of_tasks);
next_barrel&= ~number_of_tasks;
next_barrel&= -next_barrel;
next_barrel|= next_barrel >> number_of_tasks;
next_task_mask= next_barrel & -next_barrel;

基本上,复制下一个任务掩码的位,屏蔽掉我们不想考虑的位,找到最低的设置位,将高位折叠回去,然后取最低的位设置。 这以恒定的时间运行。

编辑:更新以考虑 current == 00010000 和 next_mask == 00111000

This should do what you want:

number_of_tasks= <number of tasks, in the example this is 8>
next_mask= current | (current - 1);
next_barrel= next | (next << number_of_tasks);
next_barrel&= ~number_of_tasks;
next_barrel&= -next_barrel;
next_barrel|= next_barrel >> number_of_tasks;
next_task_mask= next_barrel & -next_barrel;

Basically, duplicate the bits of the next task mask, mask off the bits we don't want to consider, find the lowest set bit, fold the high bits back in, then take the lowest bit set. This runs in constant time.

Edit: Updating to take into account current == 00010000 and next_mask == 00111000

南街九尾狐 2024-07-19 22:25:29

未经测试,但在我的脑海中,如果这没有产生合理的综合,我会感到惊讶......与典型的位调整黑客不同,它的优点是相对可读(无论如何对我来说)。

for i in current'range loop
  current := rotate_left(current, 1);
  if or_reduce(mask and current) = '1' then
     current:= mask and current;
  end if;
end loop;

Untested, but off the top of my head, I'd be surprised if this didn't produce ma reasonable synthesis... Has the advantage of being relatively readable (to me anyway) unlike typical bit-twiddling hacks.

for i in current'range loop
  current := rotate_left(current, 1);
  if or_reduce(mask and current) = '1' then
     current:= mask and current;
  end if;
end loop;
浮云落日 2024-07-19 22:25:29

完整的可参数化仲裁器实现,可配置为循环或优先级仲裁:

https://github.com/alexforencich/verilog-axis/blob/master/rtl/arbiter.v

该设计使用一对优先级编码器来选择序列中的下一个输出。 所使用的优先级编码器被有效地实现为树。

Complete parametrizable arbiter implementation that can be configured for round-robin or priority arbitration:

https://github.com/alexforencich/verilog-axis/blob/master/rtl/arbiter.v

This design uses a pair of priority encoders to select the next output in the sequence. The priority encoders used are implemented efficiently as trees.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文