-
概述
在许多应用中,代码的执行速度是至关重要的。例如在医疗,监控,电机控制等等一些对时间有严格要求的终端设备。使用内部flash缺点是访问Flash需要等待状态,这使得程序的运行变慢。
现在提供的解决方案,使得设计者能够在运行时把被编译器初始化的代码段从flash复制到ram里,获得最大的运行速度。这使代码执行从多达15个等待状态的提升到0等待状态。另一种解决方案是只将某些函数从Flash复制到RAM。常规的程序启动例程如下所示:
-
CMD文件与存储器映射
CMD的全称为链接命令配置文件。以ROM/FLASH和RAM两类存储器为对象,用户通过编写CMD文件,来管理和分配系统中的所有物理存储器和地址空间。C28388存储器映射如下:
CMD文件包括两方面的内容:
1)用户声明系统的存储器资源。包括DSP芯片自带、外扩的存储器和空间,都要一一声明:有哪些存储器、位置和大小。
2)用户声明资源分配情况。这是编写CMD文件的重点。
MEMORY和SECTIONS指令是CMD文件中的核心指令。MEMORY指令用来指示存储空间,SECTIONS指令用来分配“段”到存储空间,即指定“段”的实际硬件空间。
MEMORY指令 作用:指示存储空间
MEMORY
{
/* BEGIN is used for the "boot to Flash" bootloader mode */
BEGIN : origin = 0x080000, length = 0x000002
BOOT_RSVD : origin = 0x000002, length = 0x0001AE /* Part of M0, BOOT rom will use this for stack */
RAMM0 : origin = 0x0001B1, length = 0x00024F
RAMM1 : origin = 0x000400, length = 0x0003F8 /* on-chip RAM block M1 */
/* RAMM1_RSVD : origin = 0x0007F8, length = 0x000008 */ /* Reserve and do not use for code as per the errata advisory "Memory: Prefetching Beyond Valid Memory" */
RAMD0 : origin = 0x00C000, length = 0x000800
RAMD1 : origin = 0x00C800, length = 0x000800
RAMLS0 : origin = 0x008000, length = 0x000800 /*LS0~LS7 主要分配给CLA存取,主程序不要占用*/
RAMLS1 : origin = 0x008800, length = 0x000800
RAMLS2 : origin = 0x009000, length = 0x000800
RAMLS3 : origin = 0x009800, length = 0x000800
RAMLS4 : origin = 0x00A000, length = 0x000800
RAMLS5 : origin = 0x00A800, length = 0x000800
RAMLS6 : origin = 0x00B000, length = 0x000800
RAMLS7 : origin = 0x00B800, length = 0x000800
RAMGS0_1 : origin = 0x00D000, length = 0x002000
// RAMGS1 : origin = 0x00E000, length = 0x001000
RAMGS2_9 : origin = 0x00F000, length = 0x008000
/* RAMGS3 : origin = 0x010000, length = 0x001000
RAMGS4 : origin = 0x011000, length = 0x001000
RAMGS5 : origin = 0x012000, length = 0x001000
RAMGS6 : origin = 0x013000, length = 0x001000
RAMGS7 : origin = 0x014000, length = 0x001000
RAMGS8 : origin = 0x015000, length = 0x001000
RAMGS9 : origin = 0x016000, length = 0x001000*/
RAMGS10 : origin = 0x017000, length = 0x001000
RAMGS11 : origin = 0x018000, length = 0x001000
RAMGS12 : origin = 0x019000, length = 0x001000
RAMGS13 : origin = 0x01A000, length = 0x001000
RAMGS14 : origin = 0x01B000, length = 0x001000
RAMGS15 : origin = 0x01C000, length = 0x000FF8
/* RAMGS15_RSVD : origin = 0x01CFF8, length = 0x000008 */ /* Reserve and do not use for code as per the errata advisory "Memory: Prefetching Beyond Valid Memory" */
/* Flash sectors */
FLASH0 : origin = 0x080002, length = 0x001FFE /* on-chip Flash */
FLASH1 : origin = 0x082000, length = 0x002000 /* on-chip Flash */
FLASH2 : origin = 0x084000, length = 0x002000 /* on-chip Flash */
FLASH3 : origin = 0x086000, length = 0x002000 /* on-chip Flash */
FLASH4 : origin = 0x088000, length = 0x008000 /* on-chip Flash */
FLASH5 : origin = 0x090000, length = 0x008000 /* on-chip Flash */
FLASH6 : origin = 0x098000, length = 0x008000 /* on-chip Flash */
FLASH7 : origin = 0x0A0000, length = 0x008000 /* on-chip Flash */
FLASH8 : origin = 0x0A8000, length = 0x008000 /* on-chip Flash */
FLASH9 : origin = 0x0B0000, length = 0x008000 /* on-chip Flash */
FLASH10 : origin = 0x0B8000, length = 0x002000 /* on-chip Flash */
FLASH11 : origin = 0x0BA000, length = 0x002000 /* on-chip Flash */
FLASH12 : origin = 0x0BC000, length = 0x002000 /* on-chip Flash */
FLASH13 : origin = 0x0BE000, length = 0x001FF0 /* on-chip Flash */
/* FLASH13_RSVD : origin = 0x0BFFF0, length = 0x000010 */ /* Reserve and do not use for code as per the errata advisory "Memory: Prefetching Beyond Valid Memory" */
CPU1TOCPU2RAM : origin = 0x03A000, length = 0x000800
CPU2TOCPU1RAM : origin = 0x03B000, length = 0x000800
CPUTOCMRAM : origin = 0x039000, length = 0x000200
CMTOCPURAM : origin = 0x038000, length = 0x000200
CPUTOCMRAM_ECAT : origin = 0x039200, length = 0x000200
CMTOCPURAM_ECAT : origin = 0x038200, length = 0x000200
CANA_MSG_RAM : origin = 0x049000, length = 0x000800
CANB_MSG_RAM : origin = 0x04B000, length = 0x000800
RESET : origin = 0x3FFFC0, length = 0x000002
CLA1_MSGRAMLOW : origin = 0x001480, length = 0x000080
CLA1_MSGRAMHIGH : origin = 0x001500, length = 0x000080
FPU32_FAST_TABLES : origin = 0x3F6946, length = 0x00081a
FPU64_FAST_TABLES : origin = 0x3F7160, length = 0x000D30
}
第一列如BEGIN、RAMLS0为存储空间名;orgin用来标明该段的起始地址;length用来指示该段的长度。
SECTIONS指令 作用:分配段到存储空间,即指定段的实际硬件空间。
SECTIONS
{
codestart : > BEGIN, ALIGN(8)
wddisable : > FLASH0/* Used by file CodeStartBranch.asm */
copysections : > FLASH0/* Used by file SectionCopy.asm */
//程序段
.text : LOAD = FLASH5, /* can be ROM */
RUN = RAMGS2_9, /* must be CSM secured RAM */
LOAD_START(_text_loadstart),
RUN_START(_text_runstart),
LOAD_SIZE(_text_size)
//初始化数据
.cinit : LOAD = FLASH0, /* can be ROM */
RUN = RAMGS0_1, /* must be CSM secured RAM */
LOAD_START(_cinit_loadstart),
RUN_START(_cinit_runstart),
LOAD_SIZE(_cinit_size)
//switch字段
.switch : LOAD = FLASH0, /* can be ROM */
RUN = RAMGS0_1, /* must be CSM secured RAM */
LOAD_START(_switch_loadstart),
RUN_START(_switch_runstart),
LOAD_SIZE(_switch_size)
.reset : > RESET, TYPE = DSECT /* not used, */
.stack : > RAMM1 //堆栈,最低至少需要64k
#if defined(__TI_EABI__)
.init_array : > FLASH0, ALIGN(8) //启动时调用的 C++ 构造函数的表
.bss : > RAMLS6 | RAMLS7 //全局变量和静态变量
.bss:output : > RAMLS6 | RAMLS7
.bss:cio : > RAMLS6 | RAMLS7
.data : > RAMLS6 | RAMLS7 //程序运行中所产生的数据
.sysmem : > RAMLS6 | RAMLS7 //maclloc所用的段
/* 在EABI mode下与cinit一致,初始化段上电后会复制到bss */
.const : LOAD = FLASH0, /* can be ROM */
RUN = RAMLS6 | RAMLS7, /* must be CSM secured RAM */
LOAD_START(_const_loadstart),
RUN_START(_const_runstart),
LOAD_SIZE(_const_size)
#else
.pinit : > FLASH0, ALIGN(8) //启动时要调用的构造函数表
.ebss : > RAMLS6 | RAMLS7 //为使用大寄存器模式时的全局变量和静态变量预留的空间,在程序上电时cinit空间中的数据复制出来并存储在.ebss中
.esysmem : > RAMLS6 | RAMLS7
.cio : > RAMLS6 | RAMLS7 //printf等输入输出函数使用的缓冲区所在的段
/* Initalized sections go in Flash */
.econst : >> FLASH0, ALIGN(8)
#endif
MSGRAM_CPU1_TO_CPU2 : > CPU1TOCPU2RAM, type=NOINIT
MSGRAM_CPU2_TO_CPU1 : > CPU2TOCPU1RAM, type=NOINIT
MSGRAM_CPU_TO_CM : > CPUTOCMRAM, type=NOINIT
MSGRAM_CM_TO_CPU : > CMTOCPURAM, type=NOINIT
MSGRAM_CPU_TO_CM_ECAT > CPUTOCMRAM_ECAT, type=NOINIT
MSGRAM_CM_TO_CPU_ECAT > CMTOCPURAM_ECAT, type=NOINIT
dclfuncs : > FLASH0, ALIGN(8)
/* CLA specific sections */
#if defined(__TI_EABI__)
Cla1Prog : LOAD = FLASH4,
RUN = RAMLS4 | RAMLS5,
LOAD_START(Cla1funcsLoadStart),
LOAD_END(Cla1funcsLoadEnd),
RUN_START(Cla1funcsRunStart),
LOAD_SIZE(Cla1funcsLoadSize),
ALIGN(8)
#else
Cla1Prog : LOAD = FLASH4,
RUN = RAMLS4 | RAMLS5,
LOAD_START(_Cla1funcsLoadStart),
LOAD_END(_Cla1funcsLoadEnd),
RUN_START(_Cla1funcsRunStart),
LOAD_SIZE(_Cla1funcsLoadSize),
PAGE = 0, ALIGN(8)
#endif
ClaData : > RAMLS3
Cla1ToCpuMsgRAM : > CLA1_MSGRAMLOW, type=NOINIT
CpuToCla1MsgRAM : > CLA1_MSGRAMHIGH, type=NOINIT
/* SFRA specific sections */
SFRA_F32_Data : > RAMGS10, ALIGN = 64
sfra_data : > RAMGS10
#ifdef CLA_C
/* CLA C compiler sections */
//
// Must be allocated to memory the CLA has write access to
//
CLAscratch :
{ *.obj(CLAscratch)
. += CLA_SCRATCHPAD_SIZE;
*.obj(CLAscratch_end) } > RAMLS2
.scratchpad : > RAMLS2
.bss_cla : > RAMLS2
#if defined(__TI_EABI__)
.const_cla : LOAD = FLASH2,
RUN = RAMLS2,
RUN_START(Cla1ConstRunStart),
LOAD_START(Cla1ConstLoadStart),
LOAD_SIZE(Cla1ConstLoadSize),
ALIGN(8)
#else
.const_cla : LOAD = FLASH2,
RUN = RAMLS2,
RUN_START(Cla1ConstRunStart),
LOAD_START(Cla1ConstLoadStart),
LOAD_SIZE(Cla1ConstLoadSize),
ALIGN(8)
#endif
#endif //CLA_C
#if defined(__TI_EABI__)
.TI.ramfunc : {
} LOAD = FLASH0,
RUN = RAMGS0_1,
LOAD_START(RamfuncsLoadStart),
LOAD_SIZE(RamfuncsLoadSize),
LOAD_END(RamfuncsLoadEnd),
RUN_START(RamfuncsRunStart),
RUN_SIZE(RamfuncsRunSize),
RUN_END(RamfuncsRunEnd),
ALIGN(8)
#else
.TI.ramfunc : {
} LOAD = FLASH0,
RUN = RAMGS0_1,
LOAD_START(_RamfuncsLoadStart),
LOAD_SIZE(_RamfuncsLoadSize),
LOAD_END(_RamfuncsLoadEnd),
RUN_START(_RamfuncsRunStart),
RUN_SIZE(_RamfuncsRunSize),
RUN_END(_RamfuncsRunEnd),
PAGE = 0, ALIGN(8)
#endif
/* Allocate FPU math areas: */
FPUmathTables : > FPU32_FAST_TABLES, TYPE = NOLOAD
}
已初始化段
.text 我们编写的main主函数,子函数或子程序,中断服务程序,都会产生指令代码,都默认分配到这个段;
.cinit 存放初始化的全局和静态变量;
.const 在EABI mode下与cinit一致,初始化段上电后会复制到bss;
.econst 字符串常量和far const定义的全局和静态变量;
.pinit 启动时要调用的构造函数表,在EABI中有时用.init_array段表示;
.switch 存放switch语句产生的常数表格;
未初始化段
.bss 为全局变量和局部变量保留的空间,在程序上电时.cinit空间中的数据复制出来并存储到.bss空间中;
.ebss 为使用大寄存器模式时的全局变量和静态变量预留的空间,在程序上电时,cinit空间中的数据复制出来并存储在.ebss中;
.stack 堆栈空间,主要用于函数传递变量或为局部变量分配空间;
.sysmem 为动态存储分配保留的空间(malloc),如果有宏函数,此空间被占用;
.esysmen 为动态存储分配保留的空间(far malloc),如果有far函数,此空间会被占用。
关于CLA段的配置要分配在RAML0~7中,并且分配空间与代码保持一致。(这是重点!!!!)
CLA分配空间
配置CLA初始化内存代码:
void configureCLA()
{
EALLOW;
#ifdef _FLASH
//
// Copy CLA code from its load address (FLASH) to CLA program RAM
//
// Note: during debug the load and run addresses can be
// the same as Code Composer Studio can load the CLA program
// RAM directly.
//
// The ClafuncsLoadStart, ClafuncsLoadEnd, and ClafuncsRunStart
// symbols are created by the linker.
//
memcpy((uint32_t*)&Cla1funcsRunStart, (uint32_t*)&Cla1funcsLoadStart, (uint32_t)&Cla1funcsLoadSize);
memcpy((uint32_t*)&Cla1ConstRunStart, (uint32_t*)&Cla1ConstLoadStart, (uint32_t)&Cla1ConstLoadSize);
#endif //_FLASH
// Initialize and wait for CLA1ToCPUMsgRAM
MemCfg_initSections(MEMCFG_SECT_MSGCLA1TOCPU);
while (MemCfg_getInitStatus(MEMCFG_SECT_MSGCLA1TOCPU) != 1)
;
// Initialize and wait for CPUToCLA1MsgRAM
MemCfg_initSections(MEMCFG_SECT_MSGCPUTOCLA1);
while (MemCfg_getInitStatus(MEMCFG_SECT_MSGCPUTOCLA1) != 1)
;
// Select LS5RAM to be the programming space for the CLA
// First configure the CLA to be the master for LS5 and then
// set the space to be a program block
MemCfg_setLSRAMMasterSel(MEMCFG_SECT_LS4, MEMCFG_LSRAMMASTER_CPU_CLA1);
MemCfg_setCLAMemType(MEMCFG_SECT_LS4, MEMCFG_CLA_MEM_PROGRAM);
MemCfg_setLSRAMMasterSel(MEMCFG_SECT_LS5, MEMCFG_LSRAMMASTER_CPU_CLA1);
MemCfg_setCLAMemType(MEMCFG_SECT_LS5, MEMCFG_CLA_MEM_PROGRAM);
// Next configure LS2RAM and LS3RAM as data spaces for the CLA
// First configure the CLA to be the master and then
// set the spaces to be code blocks
MemCfg_setLSRAMMasterSel(MEMCFG_SECT_LS2, MEMCFG_LSRAMMASTER_CPU_CLA1);
MemCfg_setCLAMemType(MEMCFG_SECT_LS2, MEMCFG_CLA_MEM_DATA);
MemCfg_setLSRAMMasterSel(MEMCFG_SECT_LS3, MEMCFG_LSRAMMASTER_CPU_CLA1);
MemCfg_setCLAMemType(MEMCFG_SECT_LS3, MEMCFG_CLA_MEM_DATA);
// Compute all CLA task vectors
// On Type-1 CLAs the MVECT registers accept full 16-bit task addresses as
// opposed to offsets used on older Type-0 CLAs
#pragma diag_suppress = 770
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_1, (uint16_t)(&Cla1Task1));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_2, (uint16_t)(&Cla1Task2));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_3, (uint16_t)(&Cla1Task3));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_4, (uint16_t)(&Cla1Task4));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_5, (uint16_t)(&Cla1Task5));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_6, (uint16_t)(&Cla1Task6));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_7, (uint16_t)(&Cla1Task7));
CLA_mapTaskVector(CLA1_BASE, CLA_MVECT_8, (uint16_t)(&Cla1Task8));
#pragma diag_suppress = 770
// Enable the IACK instruction to start a task on CLA in software
// for all 8 CLA tasks. Also, globally enable all 8 tasks (or a
// subset of tasks) by writing to their respective bits in the
// MIER register
CLA_enableIACK(CLA1_BASE);
CLA_enableTasks(CLA1_BASE, CLA_TASKFLAG_ALL);
EDIS;
return;
}
-
程序从FLASH复制到RAM中运行
-
将部分函数拷贝到RAM中运行
首先需要在CMD文件中划分空间,划分出用来存放函数的Flash区域和运行函数的RAM区域。下图是原工程默认的分配,无需改动。
#if defined(__TI_EABI__)
.TI.ramfunc : {
} LOAD = FLASH0,
RUN = RAMGS0_1,
LOAD_START(RamfuncsLoadStart),
LOAD_SIZE(RamfuncsLoadSize),
LOAD_END(RamfuncsLoadEnd),
RUN_START(RamfuncsRunStart),
RUN_SIZE(RamfuncsRunSize),
RUN_END(RamfuncsRunEnd),
ALIGN(8)
#else
.TI.ramfunc : {
} LOAD = FLASH0,
RUN = RAMGS0_1,
LOAD_START(_RamfuncsLoadStart),
LOAD_SIZE(_RamfuncsLoadSize),
LOAD_END(_RamfuncsLoadEnd),
RUN_START(_RamfuncsRunStart),
RUN_SIZE(_RamfuncsRunSize),
RUN_END(_RamfuncsRunEnd),
PAGE = 0, ALIGN(8)
#endif
划分好了地址,下一步需要对定义的函数进行搬运,具体来说,只需要在函数声明前添加:
__attribute__((ramfunc)) // 需要添加的代码
void FCL_runPICtrl(void) // 想要复制到ram的程序,以PI控制器程序为例
{
}
或者
#pragma CODE_SECTION(motorControlISR, ".TI.ramfunc")
__interrupt void motorControlISR(void)
{
}
将全部函数拷贝到RAM中运行
标准的软件流程:code_start->wd_disable->c_int00->mian()。现在这个软件流程比标准的软件流程仅仅多了调用复制代码段函数。code_start->wd_disable->copy_sections->c_int00->mian()。
code_start 和wd_disable 的运行代码由DSP28xxx_CodeStartBranch.asm文件提供。上电后,code_start正常执行,因为它被分配给Flash的引导地址的0x00080000(FLASH首地址)。参考TI文档 《Copying Compiler Sections From Flash to RAM on the TMS320F28xxx DSCs》
WD_DISABLE .set 1 ;set to 1 to disable WD, else set to 0
.ref copy_sections
.global code_start
***********************************************************************
* Function: codestart section
*
* Description: Branch to code starting point
***********************************************************************
.sect "codestart"
code_start:
.if WD_DISABLE == 1
LB wd_disable ;Branch to watchdog disable code
.else
LB copy_sections ;Branch to copy_sections
.endif
用copy_sections代替了原CodeStartBranch.asm文件中的_c_int00。这个调用仅仅在WD_DISABLE为0时执行。 上面的代码,WD_DISABLE 被设置为1。这使得wd_disable运行。wd_disable的代码如下:
***********************************************************************
* Function: wd_disable
*
* Description: Disables the watchdog timer
***********************************************************************
.if WD_DISABLE == 1
.sect "wddisable"
wd_disable:
SETC OBJMODE ;Set OBJMODE for 28x object code
EALLOW ;Enable EALLOW protected register access
MOVZ DP, #7029h>>6 ;Set data page for WDCR register
MOV @7029h, #0068h ;Set WDDIS bit in WDCR to disable WD
EDIS ;Disable EALLOW protected register access
LB copy_sections ;Branch to copy_sections
.endif
Copy_sections:DSP28xxx_SectionCopy_nonBIOS.asm文件中定义了copy_sections函数的代码,第一次运行到这里,看门狗是关闭的,段已经准备好被复制,段大小被存放在累加器,装载地址放在XAR6中,执行地址放在XAR7中,这个函数如下:
MOVL XAR5,#_text_size ; Store Section Size in XAR5
MOVL ACC,@XAR5 ; Move Section Size to ACC
MOVL XAR6,#_text_loadstart ; Store Load Starting Address in XAR6
MOVL XAR7,#_text_runstart ; Store Run Address in XAR7
LCR copy ; Branch to Copy
段的大小,装载开始标志,执行开始标志都由连接器产生,这是在内存分配 -链接器命令文件一节讨论。
在地址和段长度都被存放好之后,copy程序被调用来确定段是否被编译器产生,这由检测累加器是否为0来确定。
copy:
B return,EQ ; Return if ACC is Zero (No section to copy)
RPT AL ; Copy Section From Load Address to
|| PWRITE *XAR7, *XAR6++ ; Run Address
return:
LRETR ; Return
如果累加器为0,程序会返回到调用前的地址,如果累加器不为0,有段需要被复制。这用上面所示的PWRITE指令来实现,PWRITE复制XAR6指向的存储器的内容到XAR7指向的内容。在这里,就是复制装载代码的地址的内容到运行代码的地址。这样,一直到累加器为0,完成整个段的复制,当所有段都被复制完,程序就会跳到c_int00,如下:
LB _c_int00 ; Branch to start of boot.asm in RTS library
到这里,C语言环境被建立,可以找到main()入口地址。
-
实现步骤
- 打开项目文件。
- 在项目中添加 DSP280x_CodeStartBranch_flash.asm文件和DSP28xxx_CodeStartBranch.asm文件。
- 不加载项目中的DSP280x_CodeStartBranch.asm文件和iddk_servo_2838x_ram_lnk_cpu1.cm文件。
- 在项目中添加iddk_servo_2838x_flash_lnk_cpu1.cmd文件。
- 如上所述,编译连接程序,把程序下到芯片里运行。
参考
代码资源:https://download.csdn.net/download/zxnwpu/90867508