Out of memory (ADSP 21065L)
I have encountered the ADSP linker error [Error li1040] more times than I could remember while working on my thesis. This error message goes something like
[Error li1040] “.\EZLAB_21065L_debugger.ldf”:97 Out of memory in output section ’seg_pmco’ in processor ‘p0′
Total of 0×88e word(s) were not mapped.
For those of you who have already programmed a DSP, this is no surprise. The ADSP 21065L’s on-chip SRAM only has 544 Kbits that can be used to store both instructions and data. This on-chip memory is partitioned to 2 blocks: Block0 has 288 Kbits (6k x 48) and Block1 has 256 Kbits (8k x 32). But for the rest, the reaction would be something like “What can I do with 544 Kbits of memory?!”. (Hehe, a LOT actually!) But I remember a software developer once told me, “Memory, to me, is an infinite resource until the SysAd gets mad at me”. Makes sense… but not to a DSP programmer.
At first, I was only having trouble with data memory – I had a handful of arrays in my code and I have codebooks (that are used in the vector quantization). There’s an easy solution to this and that is to store the data in the external memory. In my implementation, memory is organized such that I can store data in the following address: START(0×03000000) END(0×030ffeff). The trade-off, though, is it takes more cycles to access data in external memory (It only takes a single cycle if data is stored in the on-chip memory). So, if the number of processing cycles is a concern (as it is to real-time systems) a careful selection of which data could be stored in external memory should be done.
But what if the problem is in program memory? The error message shown above is one of the error messages I have encountered The segment seg_pmco, defined in the LDF file, is where my instructions are stored. And based on my experience, running out of program memory is more difficult than being out of data memory. Im still experiencing it actually.
This is the case where code optimization comes in. My entire code is in assembly, which in itself is already optimized (compared to C or C++).
Another way to solve the problem in program memory is to implement modularity. I do that but I missed some bits of the code that can be turned into subroutines. So I had to go back to my code (all of it!) and checked for some more optimization opportunities and i was able to save a few more program memory spaces. Right now, all the most commonly used functions such as filtering, inner product computation, math functions (square root computation, division subroutine, acos computation, log computations, etc), scalar and vector quantizations, etc are written as subroutines such as if they are needed anywhere in the program, it only takes a function call. However, it still didnt solve my problem…
By the way, with ‘entire code’, I meant encoder + decoder. So now, the way to solve the problem is to implement the encoder on one board, and the decoder on another
I’m sure there will still be problems that will arise (such as bit syncing, etc) but I’m very excited to implement it
Wish me luck!