提高 16 位处理器上 32 位数学的性能
我正在为嵌入式设备开发一些固件,该设备使用以 40 MIPS 运行的 16 位 PIC,并用 C 语言编程。该系统将控制两个步进电机的位置,并始终保持每个电机的步进位置。每个电机的最大位置约为 125000 步,因此我无法使用 16 位整数来跟踪位置。我必须使用 32 位无符号整数 (DWORD)。电机以每秒 1000 步的速度移动,我设计了固件,以便在定时器 ISR 中处理步数。定时器 ISR 执行以下操作:
1) 将一台电机的当前位置与目标位置进行比较,如果它们相同,则设置 isMoving 标志为 false 并返回。如果它们不同,则将 isMoving 标志设置为 true。
2) 如果目标位置大于当前位置,则向前移动一步,然后增加当前位置。
3) 如果目标位置小于当前位置,则向后移动一步,然后将当前位置递减。
代码如下:
void _ISR _NOPSV _T4Interrupt(void)
{
static char StepperIndex1 = 'A';
if(Device1.statusStr.CurrentPosition == Device1.statusStr.TargetPosition)
{
Device1.statusStr.IsMoving = 0;
// Do Nothing
}
else if (Device1.statusStr.CurrentPosition > Device1.statusStr.TargetPosition)
{
switch (StepperIndex1) // MOVE OUT
{
case 'A':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'B':
SetMotor1PosC();
StepperIndex1 = 'C';
break;
case 'C':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'D':
default:
SetMotor1PosA();
StepperIndex1 = 'A';
break;
}
Device1.statusStr.CurrentPosition--;
Device1.statusStr.IsMoving = 1;
}
else
{
switch (StepperIndex1) // MOVE IN
{
case 'A':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'B':
SetMotor1PosA();
StepperIndex1 = 'A';
break;
case 'C':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'D':
default:
SetMotor1PosC();
StepperIndex1 = 'C';
break;
}
Device1.statusStr.CurrentPosition++;
Device1.statusStr.IsMoving = 1;
}
_T4IF = 0; // Clear the Timer 4 Interrupt Flag.
}
当收到移动请求时,在主程序循环中设置目标位置。 SetMotorPos 行只是用于打开/关闭特定端口引脚的宏。
我的问题是:有什么办法可以提高这段代码的效率吗?如果位置是 16 位整数,则代码可以正常运行,但如果位置是 32 位整数,则需要进行太多处理。该设备必须毫不犹豫地与 PC 通信,正如所写的那样,性能会受到明显影响。我真的只需要 18 位数学,但我不知道有什么简单的方法可以做到这一点!任何建设性的意见/建议将不胜感激。
I am working on some firmware for an embedded device that uses a 16 bit PIC operating at 40 MIPS and programming in C. The system will control the position of two stepper motors and maintain the step position of each motor at all times. The max position of each motor is around 125000 steps so I cannot use a 16bit integer to keep track of the position. I must use a 32 bit unsigned integer (DWORD). The motor moves at 1000 steps per second and I have designed the firmware so that steps are processed in a Timer ISR. The timer ISR does the following:
1) compare the current position of one motor to the target position, if they are the same set the isMoving flag false and return. If they are different set the isMoving flag true.
2) If the target position is larger than the current position, move one step forward, then increment the current position.
3) If the target position is smaller than the current position, move one step backward, then decrement the current position.
Here is the code:
void _ISR _NOPSV _T4Interrupt(void)
{
static char StepperIndex1 = 'A';
if(Device1.statusStr.CurrentPosition == Device1.statusStr.TargetPosition)
{
Device1.statusStr.IsMoving = 0;
// Do Nothing
}
else if (Device1.statusStr.CurrentPosition > Device1.statusStr.TargetPosition)
{
switch (StepperIndex1) // MOVE OUT
{
case 'A':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'B':
SetMotor1PosC();
StepperIndex1 = 'C';
break;
case 'C':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'D':
default:
SetMotor1PosA();
StepperIndex1 = 'A';
break;
}
Device1.statusStr.CurrentPosition--;
Device1.statusStr.IsMoving = 1;
}
else
{
switch (StepperIndex1) // MOVE IN
{
case 'A':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'B':
SetMotor1PosA();
StepperIndex1 = 'A';
break;
case 'C':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'D':
default:
SetMotor1PosC();
StepperIndex1 = 'C';
break;
}
Device1.statusStr.CurrentPosition++;
Device1.statusStr.IsMoving = 1;
}
_T4IF = 0; // Clear the Timer 4 Interrupt Flag.
}
The target position is set in the main program loop when move requests are received. The SetMotorPos lines are just macros to turn on/off specific port pins.
My question is: Is there any way to improve the efficiency of this code? The code functions fine as is if the positions are 16bit integers but as 32bit integers there is too much processing. This device must communicate with a PC without hesitation and as written there is a noticeable performance hit. I really only need 18 bit math but I don't know of an easy way of doing that! Any constructive input/suggestions would be most appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
警告:所有数字都是编造的...
假设上述 ISR 有大约 200 条(可能更少)编译代码指令,其中包括在 ISR 之前和之后保存/恢复 CPU 寄存器的指令,每个指令需要 5 个时钟周期(可能是 1 到 3)并且您每秒调用其中 2 个 1000 次,我们最终得到 2*1000*200*5 = 每秒 2 百万个时钟周期或 2米普斯。
您实际上在其他地方消耗了剩余的 38 MIPS 吗?
这里唯一可能重要但我看不到的事情是 SetMotor*Pos*() 函数内部所做的事情。他们会进行复杂的计算吗?它们是否与电机进行一些缓慢的通信,例如等待它们响应发送给它们的命令?
无论如何,令人怀疑的是,如此简单的代码在处理 32 位整数时会比处理 16 位整数时明显慢一些。
如果您的代码速度很慢,请找出时间花在哪里以及花费了多少时间,并对其进行分析。在 ISR 中生成方波脉冲信号(当 ISR 开始时为 1,当 ISR 即将返回时为 0)并用示波器测量其持续时间。或者做任何更容易找到的事情。测量程序所有部分所花费的时间,然后在真正必要的地方进行优化,而不是在您之前认为会优化的地方。
Warning: all numbers are made up...
Supposing that the above ISR has about 200 (likely, fewer) instructions of compiled code and those include the instructions to save/restore the CPU registers before and after the ISR, each taking 5 clock cycles (likely, 1 to 3) and you call 2 of them 1000 times a second each, we end up with 2*1000*200*5 = 2 millions of clock cycles per second or 2 MIPS.
Do you actually consume the rest 38 MIPS elsewhere?
The only thing that may be important here and I can't see it, is what's done inside of the SetMotor*Pos*() functions. Do they do any complex calculations? Do they perform some slow communication with the motors, e.g. wait for them to respond to the commands sent to them?
At any rate, it's doubtful that such simple code would be noticeably slower when working with 32-bit integers than with 16-bit.
If your code is slow, find out where time is spent and how much, profile it. Generate a square pulse signal in the ISR (going to 1 when the ISR starts, going to 0 when the ISR is about to return) and measure its duration with an oscilloscope. Or do whatever is easier to find it out. Measure the time spent in all parts of the program, then optimize where really necessary, not where you have previously thought it would be.
我认为 16 位和 32 位算术之间的差异不应该那么大,因为您只使用增量和比较。但问题可能在于每个 32 位算术运算都意味着一个函数调用(如果编译器不能/不愿意内联更简单的运算)。
一种建议是自己进行算术,将 Device1.statusStr.CurrentPosition 分成两部分,例如 Device1.statusStr.CurrentPositionH 和 Device1.statusStr.CurrentPositionL。然后使用一些宏来进行操作,例如:
#define INC(xH,xL) {xL++;if (xL == 0) xH++;}
The difference between 16 and 32 bits arithmetic shouldn't be that big, I think, since you use only increment and comparision. But maybe the problem is that each 32-bit arithmetic operation implies a function call (if the compiler isn't able/willing to do inlining of simpler operations).
One suggestion would be to do the arithmetic yourself, by breaking the Device1.statusStr.CurrentPosition in two, say, Device1.statusStr.CurrentPositionH and Device1.statusStr.CurrentPositionL. Then use some macros to do the operations, like:
#define INC(xH,xL) {xL++;if (xL == 0) xH++;}
我将摆脱
StepperIndex1
变量,而是使用CurrentPosition
的两个低位来跟踪当前步骤索引。或者,跟踪完整旋转(而不是每一步)的当前位置,以便它可以适合 16 位变量。移动时,仅在移动到“A”阶段时增加/减少位置。当然,这意味着您只能针对每个完整旋转,而不是每个步骤。I would get rid of the
StepperIndex1
variable and instead use the two low-order bits ofCurrentPosition
to keep track of the current step index. Alternately, keep track of the current position in full rotations (rather than each step), so it can fit in a 16 bit variable. When moving, you only increment/decrement the position when moving to phase 'A'. Of course, this means you can only target each full rotation, rather than every step.抱歉,您使用了错误的程序设计。
让我们检查一下 16 位和 32 位 PIC24 或 PIC33 asm 代码之间的差异...
16 位增量
因此 16 位增量需要一个周期
32 位增量
和 32 增量需要三个周期。
总差异为 2 个周期或 50ns(纳秒)。
简单的计算就能告诉你一切。您拥有每秒 1000 步和 40Mips DSP ,因此您以每秒 1000 步的速度每步有 40000 条指令。 绰绰有余!
Sorry, but you are using bad program design.
Let's check the difference between 16 bit and 32 bit PIC24 or PIC33 asm code...
16 bit increment
So 16 bit increment takes one cycle
32bit increment
and 32 increment takes three cycles.
The total difference is 2 cycles or 50ns (nano seconds).
Simple calcolation will show you all. You have 1000 steps per second and 40Mips DSP so you have 40000 instructions per step at 1000 steps per second. More than enough!
当您将其从 16 位更改为 32 位时,您是否更改任何编译标志以告诉它编译为 32 位应用程序。
您是否尝试过使用 32 位扩展但仅使用 16 位整数进行编译。你还会遇到这样的性能下降吗?
很可能只是通过从 16 位更改为 32 位,某些操作就会以不同的方式进行编译,也许可以在两组已编译的 ASM 代码之间进行比较,看看实际上有什么不同,是很多还是只有几行。
解决方案可能是不使用 32 位整数,而只使用两个 16 位整数,
当 valueA 为 int16.Max 时,将其设置为 0,然后将 valueB 加 1,否则只需将 ValueA 加 1,当值 B >= 3 时,然后检查 valueA >= 26696 (或类似的内容,具体取决于您是否使用无符号或签名 int16) 然后你的电机在 12500 进行检查。
When you change it from 16bit to 32bit do you change any of the compile flags to tell it to compile as a 32bit application instead.
have you tried compiling with the 32bit extensions but using only 16bit integers. do you still get such a performance drop?
It's likely that just by changing from 16bit to 32bit that some operations are compiled differently, perhaps do a Diff between the two sets of compiled ASM code and see what is actually different, is it lots or is it only a couple of lines.
Solutions would be maybe instead of using a 32bit integer, just use two 16bit integers,
when the valueA is int16.Max then set it to 0 and then increment valueB by 1 otherwise just incriment ValueA by 1, when value B is >= 3 you then check valueA >= 26696 (or something similar depending if you use unsigned or signed int16) and then you have your motor checking at 12500.