REE和QSEE之间的记忆复制的耗时问题
首先,如下:
#define DATA_TYPE float
#define _1KB (1024)
static inline __attribute__((__always_inline__)) void swap_data_value(DATA_TYPE* pSrc, DATA_TYPE* pDst, uint32_t elemCnt)
{
for (int i = 0; i < elemCnt; ++i) {
pDst[i] = pSrc[i];
}
}
void test_func()
{
const int DATA_NUM = _1KB * _1KB;
uint32_t calc_len = 64;
int loop_cnt = DATA_NUM / calc_len;
if((DATA_NUM % calc_len) != 0) {
LOGE("loop_cnt not match calc_len");
}
for(int k = 0; k < 666; ++k) {
DATA_TYPE* pData = (DATA_TYPE*)ftk_ta_malloc(2 * DATA_NUM * sizeof(DATA_TYPE));
for(int i = 0; i < DATA_NUM * 2; ++i) {
pData[i] = (DATA_TYPE)i;
}
DATA_TYPE* pSeg1 = pData;
DATA_TYPE* pSeg2 = pData + k * 1024;
ftk_millisecond_t t0 = ftk_ta_get_uptime();
for(int j = 0; j < 400; ++j) {
DATA_TYPE* p1 = pSeg1;
DATA_TYPE* p2 = pSeg2;
for (int i = 0; i < loop_cnt; i++) {
swap_data_value(p1, p2, calc_len);
p1 += calc_len;
p2 += calc_len;
}
}
t0 = ftk_ta_get_uptime() - t0;
LOGD("swap_data_value[%d: %dx%d]: %0.4f ms", k, calc_len, loop_cnt, t0/400.0f);
ftk_ta_free(pData);
}
}
我在平台SDM865上运行此代码,并且REE和QSEE(Qualcomm的Trustzone)之间的性能差异很大。
在REE中,它稳定地花费了0.1325〜0.1375毫秒。 但是在QSEE中,它花费了0.7275〜10.37毫秒,挥发增加了。
我怀疑这是因为缓存的行动。但是我无法在QSEE中获取缓存信息,以下代码会导致TA崩溃(直接退出)。
uint64_t ctr_el0 = 0;
asm volatile("mrs %0, CTR_EL0" : "=r"(ctr_el0) : );
在REE中,我得到缓存线为64B。
那么,这个问题是因为QSEE(TrustZone)限制了缓存大小或缓存访问性能吗?
Firstly, the test code as below:
#define DATA_TYPE float
#define _1KB (1024)
static inline __attribute__((__always_inline__)) void swap_data_value(DATA_TYPE* pSrc, DATA_TYPE* pDst, uint32_t elemCnt)
{
for (int i = 0; i < elemCnt; ++i) {
pDst[i] = pSrc[i];
}
}
void test_func()
{
const int DATA_NUM = _1KB * _1KB;
uint32_t calc_len = 64;
int loop_cnt = DATA_NUM / calc_len;
if((DATA_NUM % calc_len) != 0) {
LOGE("loop_cnt not match calc_len");
}
for(int k = 0; k < 666; ++k) {
DATA_TYPE* pData = (DATA_TYPE*)ftk_ta_malloc(2 * DATA_NUM * sizeof(DATA_TYPE));
for(int i = 0; i < DATA_NUM * 2; ++i) {
pData[i] = (DATA_TYPE)i;
}
DATA_TYPE* pSeg1 = pData;
DATA_TYPE* pSeg2 = pData + k * 1024;
ftk_millisecond_t t0 = ftk_ta_get_uptime();
for(int j = 0; j < 400; ++j) {
DATA_TYPE* p1 = pSeg1;
DATA_TYPE* p2 = pSeg2;
for (int i = 0; i < loop_cnt; i++) {
swap_data_value(p1, p2, calc_len);
p1 += calc_len;
p2 += calc_len;
}
}
t0 = ftk_ta_get_uptime() - t0;
LOGD("swap_data_value[%d: %dx%d]: %0.4f ms", k, calc_len, loop_cnt, t0/400.0f);
ftk_ta_free(pData);
}
}
I run this code on platform sdm865, and has huge difference of performance between REE and QSEE(TrustZone of Qualcomm).
In REE, it spends 0.1325 ~ 0.1375 ms stably.
But in QSEE, it spends 0.7275 ~ 10.37 ms, increased volatilily.
I doubt this is because something of cache limition. But I cann't get the cache information in QSEE, and below codes leads to the TA crash(exit directly).
uint64_t ctr_el0 = 0;
asm volatile("mrs %0, CTR_EL0" : "=r"(ctr_el0) : );
And in REE, I get the cache line is 64B.
So, is this problem because the QSEE(TrustZone) limit the cache size or cache access performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论