Linux 嵌入式 (ARM) 中的内存吞吐量较低

发布于 2024-08-04 17:23:30 字数 1227 浏览 3 评论 0原文

我使用的是ARM926EJS。在没有 Linux 的情况下，我在内存复制测试中获得了 20% 以上的内存速度（就像入门可执行文件一样）。但在 Linux 中，相同的代码运行速度要慢 20%。

代码是

 
/// Below code just performs burst mode memcopy test.        
void asmcpy(void *a, void *b, int iSize)
{
   do
  {
    asm volatile (
             "ldmia %0!, {r3-r10} \n\t"
             "stmia %0!, {r3-r10} \n\t"
             :"+r"(a), "+r"(b)
             :
             :"r"(r3),"r"(r4),"r"(r5),"r"(r6),"r"(r7),"r"(r8),"r"(r9),"r"(r10)
             );
  }while(size--)
}

我验证没有其他进程在 Linux 上占用 CPU 时间。（我使用 time 命令检查了这一点，它显示实时与usr时间相同）

请告诉我linux可能出现什么问题？

谢谢&问候。

添加：

我的测试代码是

int main()
{
  int a[320 * 120], b[320 * 120];

 for(int i=0; i != 10000; i++)
 {
   /// Size is divided by 8 because our memcpy function performs 8 integer load stores in the iteration
   asmcpy(a, b, (320 * 120) / 8);
 }
}

入门可执行文件是一个 bin 文件，它使用串行端口发送到 RAM，并通过跳转到 RAM 中的该地址直接执行。（无需操作系统）

已添加。

我在其他处理器上没有看到这样的性能差异。他们使用的是 SD RAM，该处理器使用的是 DDR Ram。能有理由吗？

额外。入门代码中未启用数据缓存，而在 Linux 模式下启用了数据缓存，因此理想情况下，所有数据都应缓存并在没有任何 RAM 延迟的情况下进行访问，但 Linux 仍然慢 20%。

额外：我的微控制器是LPC3250。两项测试均在同一外部 DDR RAM 上进行测试。

原文

I am using ARM926EJS. I am getting 20 % more memory speed in memory copy test, without Linux ( Just as a Getting Started executable). But in linux same code is running 20% slower.

Code is

 
/// Below code just performs burst mode memcopy test.        
void asmcpy(void *a, void *b, int iSize)
{
   do
  {
    asm volatile (
             "ldmia %0!, {r3-r10} \n\t"
             "stmia %0!, {r3-r10} \n\t"
             :"+r"(a), "+r"(b)
             :
             :"r"(r3),"r"(r4),"r"(r5),"r"(r6),"r"(r7),"r"(r8),"r"(r9),"r"(r10)
             );
  }while(size--)
}

I verified that no other process is taking CPU time on linux.( I checked this with the use of time command, It shows real time is same as usr time)

Please tell me what can be the problem with linux?

Thanks & Regards.

ADDED:

my test code is

int main()
{
  int a[320 * 120], b[320 * 120];

 for(int i=0; i != 10000; i++)
 {
   /// Size is divided by 8 because our memcpy function performs 8 integer load stores in the iteration
   asmcpy(a, b, (320 * 120) / 8);
 }
}

Getting Started executable is a bin file which is sent to the RAM using serial port and executes directly by jumping to that address in RAM. (without the need of an OS)

ADDED.

I haven't seen such performance difference on other processors.They were using SD RAM, This processor is using DDR Ram. Can it be a reason?

ADDED.
Data Cache is not enabled in getting started code and Data Cache is eabled in Linux mode, So Ideally all data should be cached and get accessed without any RAM latency, But still Linux is 20% slow.

ADDED:
My microcontroller is LPC3250. Both the test are been tested on same external DDR RAM.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

青萝楚歌 2024-08-11 17:23:30

该芯片有一个MMU，因此Linux很可能使用它来管理内存。也许仅仅启用它就会带来一些性能影响。此外，Linux 使用惰性内存分配策略，仅在进程第一次访问时才将内存页分配给该进程。如果您正在复制一大块内存，MMU 将生成页面错误，要求内核在循环内分配页面。在低端处理器上，所有这些上下文切换都会导致缓存刷新并导致明显的速度减慢。

如果您的系统足够小，请尝试无 MMU 版本的 Linux（例如 uClinux）。也许它可以让你使用具有相似性能的更便宜的芯片。在嵌入式系统上，每一分钱都很重要。

更新：一些额外的细节：

每个Linux进程都有它自己的内存映射，最初只包括内核和（可能）可执行代码。所有其余的线性 4GB（32 位）似乎都可用，但没有分配给它们的 RAM 页。一旦读取或写入未分配的内存地址，MMU 就会发出页面错误信号并切换到内核。内核发现它仍然有大量空闲 RAM 页，因此选择一个，将其分配给故障点并返回到您的代码，从而完成中断的指令。下一个不会失败，因为整个页面（通常为 4KB）已经被分配；但几次迭代后，它将到达另一个未分配的空间，MMU 将再次调用内核。

回复收藏 0 原文