Embedded C

EMBEDDED C

KRUNAL SIDDHAPATHAK

HIGH LEVEL OPTIMIZATIONS IN CODE

1. Floating – point to Fixed – point conversion2. Simple loop transformations3. Loop tiling/blocking4. Loop Splitting5. Array Folding

CODE OPTIMIZATION

1) Floating –point to Fixed – point Conversion:

• Reduction in cycle count by 75% and energy consumption by 76% for an MPEG – 2 video compression algorithm.

• Trade – off between cost of implementation and quality of algorithm.

• Done using Fixed – C data types.

• E.g. a=fixed(5,4,s,wt,*b) fixed a,*b,c[8]

2) Array Folding:• Options for reducing storage

requirements of large arrays must be explored since memory space is limited in embedded systems.

• Inter – array folding method employs sharing of memory space among arrays which are not needed at overlapping time intervals.

• Limited sets of components needed within an array can also be taken as at a time only a subset of array elements is needed.

CODE OPTIMIZATION3) Loop tiling/blocking:• It is utmost essential to reuse

“small” memories including caches and scratch – pad memories.

• Blocked or tiled algorithms improves locality of references.

• Innermost loop becomes restricted as it accesses less array elements.

• If a proper blocking factor is selected, the elements are still in the cache when next iteration of the innermost loop starts.

• Improves performance for matrix multiplications by reducing no. of memory references using reuse factor.

4) Loop Splitting:• Efficiency of algorithm

improves if loops are splitted and one loop body handles the regular cases and a second one handles the exceptions.

• Total number of cycles can be saved by splitting of nested loops for various applications and target processors.

• Cycle count can be reduced by 75%.

CODE OPTIMIZATIONSimple Loop

Transformations

Loop Permutation

Loop Fusion, Loop Fission Loop Unrolling

• Two loops can either be merged into a single loop – Loop Fusion.

• Single loop is splitted into two loops – Loop Fission

• Helps in reuse of array elements in cache as next iteration of the loop body will access an adjacent location in memory.

• Number of copies of the loop is called unrolling factor (>2).

• Reduces loop overhead (less branches per instruction) & improves speed but increases code size.

• Restricted to loops with constant no. of iterations.

EMBEDDED C FOR HIGH PERFORMANCE DSP PROGRAMMING

• Performance is the key to digital signal processing because it translates into application – based end – user systems.

• Changes in technological and economic requirements make it more and more expensive to continue programming the DSP processor in assembly languages.

• DSP architectures are not easy to program optimally due to their non – orthogonality.

• Stronger error correction and encryption algorithms must be added to match up to the increased complexity in DSP.

• Communication protocols have become more sophisticated and require much more code to implement.

• Multiple protocol stacks have been implemented to be compatible with multiple service providers.

• In addition, backward compatibility with older protocols is also needed to stay synchronized with provider networks that are in a slow process of upgrading.

ENTERING WITH EMBEDDED C

• Embedded C is designed to bridge the performance mismatch between the signal processing algorithms, standard C and the architecture.

• It is an extension of C language with the primitives that are needed by signal processing applications and that are commonly provided by DSP processors.

• Maintainability and portability of code are the key winners in this process.

REQUIREMENTS FOR I/O HARDWARE ADDRESSING INTERFACE

1. The device drive source code must be portable.

2. The interface must not prevent implementations to produce machine code that is as efficient as other methods.

3. The design should permit encapsulation of the system dependent access method.

MEMORY MANAGEMENT IN AN AEROSPACE EMBEDDED CODE

• Dynamic Allocation eases development by providing system memory to application processes as needed at runtime and retrieving the memory when it is no longer needed.

• C’s runtime library function malloc() can exhibit wildly unpredictable performance and become a bottleneck in multithread programs on multi core systems.

• Hence, dynamic memory allocation is forbidden in a safety – critical embedded avionics code.

WHY NOT DYNAMIC MEMORY ALLOCATION IN AVIONICS?

• Dynamic memory is a poor – choice for a mission – critical code as it is based on list allocator algorithms that organize memory pools into contiguous locations in a single linked list.

• These list allocators allocates a memory using malloc() and de – allocates the memory location for reuse using free(). But it places a burden on the programmer to balance each call to malloc() with a corresponding call to free().

THEN WHAT IS THE SOLUTION?

• Customized memory allocation functions that more closely match specific allocation scenarios are used such as:

1. Stack – based allocator2. Thread – local allocator3. In – Memory Database Systems (IMDS)• The performance, stability and predictability

of the safety – critical code increases using above custom allocators.

STACK – BASED ALLOCATOR• In this algorithm, each allocation returns the address

of the current position of the stack pointer and advances the pointer by the amount of the request.

• When memory is no longer needed, the stack pointer is rewound.

• Processing Overhead is reduced because there is no chain of pointers to manage nor are there any allocation sizes or contiguous locations to track.

• A memory leak can’t be accidentally introduced through improper de – allocation because the application does not have to track specific allocations.

CUSTOM STACK – BASED ALLOCATOR

http://i.opensystemsmedia.com/?bg=ffffff&q=90&w=871&f=jpg&src=http://attachments.opensystemsmedia.com/MES5050/figures/1

THREAD – LOCAL ALLOCATOR• A custom thread – local allocator avoids conflicts by

assigning a specific memory pool to each thread.• The thread’s allocation is performed from this block

without interference with other thread’s requests, thus enhancing performance and predictability.

• It uses a Pending Request List or PRL for each thread to coordinate the release of memory blocks that are freed by a thread other than the one that performed the original allocation.

• Memory that is allocated and de - allocated by the same thread requires no coordination, and therefore no lock conflicts occur.


IN – MEMORY DATABASE SYSTEMS (IMDS)

• Benefits of Custom memory allocators can also be harnessed by integrating third – party software like IMDS.

• IMDS manages application objects in RAM.• Memory allocation & de – allocation of

application objects is also done using malloc() and free().

• With an IMDS, concurrency among multithreads is maintained automatically via transactions.


APPLICATIONS IN MILITARY

• A sensor object could represent either optical sensors for tracking missile targets or biosensors for defense in chemical warfare or motion sensors to aid in navigating an aircraft.

• This sensor object occupies memory from the memory pool and free() returns memory back to the heap & space is relinquished for reuse when the code completes.

• malloc() is responsible for memory fragmentation and for deciding the allocator type.


EMBEDDED C IN FPGA SWITCHING TECHNOLOGY

• C algorithms can be applied to programmable & flexible FPGAs using ultra – low latency.

• Parallelism involves unrolling a software process into multiple parallel hardware processes.

• Recently applied in Wall Street

• Possesses potential use for military purposes.



THANK YOU

Embedded C

Education

Transcript of Embedded C