如何为嵌入式系统优化此图像复制功能

发布于 2024-11-08 01:41:45 字数 1581 浏览 0 评论 0原文

下面的函数使用 read_page(pageIter, pageArr, PAGESIZE) 一次一页读取图像，并在 DOUT 和 CCLK 引脚上输出数据。

有人告诉我这效率低下，但我似乎找不到办法让它更快。它基本上是一个管道，在 64 针 uProcessor 上运行，位于两个内存空间之间。一个持有图像，另一个接收图像。

我使用了 register 关键字，删除了数组索引并用指针算术替换，但它需要更快。

谢谢！

/*
Port C Pin Out
*/
#define     BIT0        0x01    // CCLK
#define     BIT1        0x02    // CS_B
#define     BIT2        0x04    // INIT_B
#define     BIT3        0x08    // PROG_B
#define     BIT4        0x10    // RDRW_B
#define     BIT5        0x20    // BUSY_OUT
#define     BIT6        0x40    // DONE
#define     BIT7        0x80    // DOUT (DIN)

/*
PAGE
*/

#define     PAGESIZE    1024    // Example

void copyImage(ulong startAddress, ulong endAddress)
  {
  ulong pageIter;
  uchar *eByte, *byteIter, pageArr[PAGESIZE];
  register uchar bitIter, portCvar;
  portCvar = PORTC;
  /* Loops through pages in an image using ulong type*/
  for(pageIter = startAddress ;  pageIter <= endAddress ; pageIter += PAGESIZE)
    {
    read_page(pageIter, pageArr, PAGESIZE);
    eByte = pageArr+PAGESIZE;
    /* Loops through bytes in a page using pointer to uchar (pointer to a byte)*/
    for(byteIter = pageArr; byteIter <= eByte; byteIter++)
      {
      /* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
      for(bitIter = 0x01; bitIter != 0x00; bitIter = bitIter << 1)
        {
        PORTC = portCvar | BIT0;
        (bitIter & *byteIter) ? (PORTC = portCvar & ~BIT7) : (PORTC = portCvar | BIT7);
        PORTC = portCvar & ~BIT0;
        }
      }
    }
  }

原文

The function below reads an image a page at a time using read_page(pageIter, pageArr, PAGESIZE) and outputs the data on the DOUT AND CCLK pins.

I was told it was inefficient but I can't seem to find a way to make it faster. It is basically a pipe ,running on a 64 pin uProcessor, between two memory spaces. One holds the image and the other receives the image.

I've used the register keyword, removed array indexing and replaced with pointer arithemetic, but it needs to be faster.

Thanks!

/*
Port C Pin Out
*/
#define     BIT0        0x01    // CCLK
#define     BIT1        0x02    // CS_B
#define     BIT2        0x04    // INIT_B
#define     BIT3        0x08    // PROG_B
#define     BIT4        0x10    // RDRW_B
#define     BIT5        0x20    // BUSY_OUT
#define     BIT6        0x40    // DONE
#define     BIT7        0x80    // DOUT (DIN)

/*
PAGE
*/

#define     PAGESIZE    1024    // Example

void copyImage(ulong startAddress, ulong endAddress)
  {
  ulong pageIter;
  uchar *eByte, *byteIter, pageArr[PAGESIZE];
  register uchar bitIter, portCvar;
  portCvar = PORTC;
  /* Loops through pages in an image using ulong type*/
  for(pageIter = startAddress ;  pageIter <= endAddress ; pageIter += PAGESIZE)
    {
    read_page(pageIter, pageArr, PAGESIZE);
    eByte = pageArr+PAGESIZE;
    /* Loops through bytes in a page using pointer to uchar (pointer to a byte)*/
    for(byteIter = pageArr; byteIter <= eByte; byteIter++)
      {
      /* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
      for(bitIter = 0x01; bitIter != 0x00; bitIter = bitIter << 1)
        {
        PORTC = portCvar | BIT0;
        (bitIter & *byteIter) ? (PORTC = portCvar & ~BIT7) : (PORTC = portCvar | BIT7);
        PORTC = portCvar & ~BIT0;
        }
      }
    }
  }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云柯 2024-11-15 01:41:45

展开每个字节的传输来加快速度

PORTC = clock_1; PORTC = (value & 0x01 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x02 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x04 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x08 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x10 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x20 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x40 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x80 ? data1 : data0); PORTC = clock_0;

也许您可以通过在图像循环之外进行预计算之后

unsigned char clock_1 = portC | BIT0;
unsigned char clock_0 = portC & ~BIT0;
unsigned char data1 = portC | BIT7;
unsigned char data0 = portC & ~BIT7;

Probably you can go faster by unrolling the transmission of each byte with something like

PORTC = clock_1; PORTC = (value & 0x01 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x02 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x04 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x08 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x10 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x20 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x40 ? data1 : data0); PORTC = clock_0;
PORTC = clock_1; PORTC = (value & 0x80 ? data1 : data0); PORTC = clock_0;

after precomputing once outside the image loop

unsigned char clock_1 = portC | BIT0;
unsigned char clock_0 = portC & ~BIT0;
unsigned char data1 = portC | BIT7;
unsigned char data0 = portC & ~BIT7;

回复收藏 0 原文

还在原地等你 2024-11-15 01:41:45

/* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
      for(bitIter = 0x01; bitIter <= 0x80; bitIter = bitIter << 1)
        {
    PORTC = portC | BIT0;
    (bitIter & byteIter) ? (PORTC = portC & ~BIT7) : (PORTC = portC | BIT7);
    PORTC = portC & ~BIT0;
    }

那个循环是关键。我会使用生产优化标志对其进行编译，然后查看反汇编。编译器可能会做各种聪明的事情，例如展开循环或简化循环条件。如果我不喜欢在那里看到的内容，我会开始调整 C 代码以帮助编译器找到良好的优化。如果事实证明这是不可能的，那么我可能会使用内联汇编来获得我想要的东西。

假设我们可以尽可能快地进行（并且循环中的延迟不考虑接收器的建立保持时间），那么我希望将该循环减少到尽可能少的指令。您可以同时设置 BIT0 和数据位吗？这会对接收器造成危险吗？如果可以的话，这将节省一两条指令。许多微优化将依赖于特定的指令集。如果数据有大量 0 或 0xFF，您可以进行特殊的展开情况，其中数据位不更改且 BIT0 切换 8 次。您可以为单个 nybble 制作 16 个展开的情况，并为每个字节切换两次。

/* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
      for(bitIter = 0x01; bitIter <= 0x80; bitIter = bitIter << 1)
        {
    PORTC = portC | BIT0;
    (bitIter & byteIter) ? (PORTC = portC & ~BIT7) : (PORTC = portC | BIT7);
    PORTC = portC & ~BIT0;
    }

That loop is the key. I would compile it with production optimization flags and then look at the disassembly. The compiler may do all kinds of clever things like unroll the loop or simplify the loop condition. If I didn't like what I saw there I'd start tweaking the C code to help the compiler find a good optimization. If that proved impossible then I might use inline assembly to get what I want.

Assuming we can go as fast as possible (and delays in the loop aren't accounting for setup-hold times at the receiver) then I'd want to get that loop down to as few instructions as possible. Can you set BIT0 and also the data bit at the same time or does that create a hazard at the receiver? If you can that would save an instruction or two. Lots of microoptimizations would rely on the specific instruction set. If the data has lots of 0 or 0xFF you could make special unrolled cases where the data bit doesn't change and BIT0 toggles 8 times. You could make 16 unrolled cases for a single nybble and switch into that twice for each byte.

回复收藏 0 原文

云巢 2024-11-15 01:41:45

/* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
for(bitIter = 0x01; bitIter <= 0x80; bitIter = bitIter << 1)
{
    PORTC = portC | BIT0;
    (bitIter & byteIter) ? (PORTC = portC & ~BIT7) : (PORTC = portC | BIT7);
    PORTC = portC & ~BIT0;
}

首先，这个循环被打破了。 bitIter 是一个 uchar （我假设它是一个无符号的 8 位字符）。通过将其向左移动，最终将获得预期最终迭代的值 0x80。下一次移位后，它将得到值 0。

转向效率。根据架构，执行操作 PORTC = PORTC | BIT0 可能会导致单个位设置。然而，它也可能导致读取、在寄存器中设置一个位以及存储。

如前所述，如果可能，请尝试同时设置BIT0和BIT7（如果硬件允许）。

我会尝试这样的事情：

bitIter = 0x01;
do
{
  if (byteIter & bitIter)
  {
    PORTC = BIT0;
  }
  else
  {
    PORTC = (BIT0 | BIT7);
  }
  PORTC = 0;

  bitIter <<= 1;
} while (bitIter != 0x80);

通过使用 do ... while 循环，它将终止问题，并且您将在第一次迭代之前摆脱循环测试的不必要的比较（除非您的编译器已经有优化掉它）。

您可以尝试手动展开循环八次，每一位展开一次。

/* Loops through bits in byte and writes to PORTC - DIN ANC CCLK  */
for(bitIter = 0x01; bitIter <= 0x80; bitIter = bitIter << 1)
{
    PORTC = portC | BIT0;
    (bitIter & byteIter) ? (PORTC = portC & ~BIT7) : (PORTC = portC | BIT7);
    PORTC = portC & ~BIT0;
}

To start with, this loop is broken. bitIter is an uchar (which I assume is an unsigned 8-bit character). By shifting it to the left it will eventually get the value 0x80 for the intended final iteration. After the next shift it will get the value 0.

Over to the efficiency. Depending on the architecture, doing the operation PORTC = PORTC | BIT0 might result in a single bit set. However, it also might result in a read, set a bit in a register, and a store.

As mentioned before, if possible, try to set the BIT0 and BIT7 at the same time (if the hardware permits this).

I would try something like:

bitIter = 0x01;
do
{
  if (byteIter & bitIter)
  {
    PORTC = BIT0;
  }
  else
  {
    PORTC = (BIT0 | BIT7);
  }
  PORTC = 0;

  bitIter <<= 1;
} while (bitIter != 0x80);

By using a do ... while loop, it will terminate problem and you would get rid of the unnecessary comparison of the loop test before the first iteration (unless your compiler already have optimized it away).

You could try to unroll the loop, by hand, eigth times, once for every bit.

回复收藏 0 原文

滿滿的愛 2024-11-15 01:41:45

我假设当您输入此函数时 PORTC 处于已知状态：即数据和时钟线为 0？（或者时钟低而数据高？）

如果这个假设成立，您甚至应该能够通过首先设置 value = ~(*byteIter); 然后这样做来避免 @6502 答案中的条件8 次：

 PORTC|=BIT0;PORTC|=(value<<7)&BIT7;PORTC&=~(BIT7|BIT0);value>>=1;

- 或者，如果 Bit7 开始为高 -

 PORTC|=(BIT7|BIT0);PORTC&=(~BIT7|(value<<7));PORTC&=~BIT0;value>>=1;

此处的优点是它避免了条件 - 这可能会对大量流水线处理器的速度造成严重破坏。

I'm assuming that PORTC is in a known state when you enter this function: i.e. the Data and Clock lines are 0? (or Clock is low and Data is high?)

If that assumption is true you should be able to even avoid the conditionals in @6502's answer by first setting value = ~(*byteIter); then doing this 8 times:

 PORTC|=BIT0;PORTC|=(value<<7)&BIT7;PORTC&=~(BIT7|BIT0);value>>=1;

-or, if Bit7 starts high -

 PORTC|=(BIT7|BIT0);PORTC&=(~BIT7|(value<<7));PORTC&=~BIT0;value>>=1;

The advantage here is it avoids the conditionals - which can play havoc on a the speed of a heavily pipelined processor.

回复收藏 0 原文

~没有更多了~

关于作者

Hello爱情风

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何为嵌入式系统优化此图像复制功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如何为嵌入式系统优化此图像复制功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。