|
Embedded Computing |
|
Data Recording & Storage |
|
Protocol & Bus Analyzers |
A set of highly optimized libraries for PowerPC 7447 and 7447A CPU that includes a VSIPL implementation of the Core Profile functionality, a standard C implementation of the same functionality and a large, systematic set of vector operations.
Introduction
To be of value to developers, DSP libraries need to be efficient in their use of processor resources in order to provide the fastest possible execution times; yet these libraries also need to be easy to use. For many years, the VSIPL (Vector, Signal and Image Processing Library) standard has provided an application programming interface (API) that simplifies development by hiding many of its implementation features making it widely used in DSP applications.
For some applications it is preferable to use a standard C API or other facilities which are not defined in the VSIPL standard. The VSIPLus libraries meet these requirements by providing:
VSIPL - an optimized implementation conforming to the VSIPL standard
CSIPL - a library with the same functionality as VSIPL, but using a standard C API
VECLIB - a large set of optimized, low level vector routines providing additional functionality to the VSIPL standard.
Ease of Use
The package includes both Development and Production versions of the libraries. The Development versions include full parameter and error checking, to assist in tracking down programming bugs. The production libraries remove the error checking for greater efficiency.
Artificial constraints are not put on the data, so the libraries automatically handle data management allowing:
the data to be strided
the data to have any memory alignment
any data length is permitted (not just multiples of the processor vector length (=4 for the PowerPC74xx))
the data to be of any basic type supported by the processor
complex vector data to be either split or interleaved.
VSIPL and CSIPL Functionality
The VSIPL and CSIPL libraries provide the functionality specified for the VSIPL Core Profile; there are a total of 517 functions in the Core Profile.
The range of functions supported includes:
elementwise vector operations (e.g. vector add)
vector math functions (e.g. sin, cos)
elementwise matrix operations (e.g. matrix add)
gather operations (e.g. dot product)
matrix operations (e.g. transpose)
matrix--vector operations (e.g. general matrix--vector product, GEMV)
matrix--matrix operations (e.g. general matrix--matrix product, GEMM)
windowing and filter operations (e.g. moving average)
FFT and convolution
Table 1: VSIPL Supported Data Types
|
Data Type
|
Comments
|
|
vsip_scalar_vi
|
Scalar vector index
|
|
vsip_scalar_mi
|
Scalar matrix index (not in Core Lite)
|
|
vsip_scalar_bl
|
Scalar boolean
|
|
vsip_scalar_f
|
32- bit (single precision) float
|
|
vsip_cscalar_f
|
Complex (single precision) float
|
|
vsip_scalar_i
|
32 bit signed intege
|
VECLIB Functionality
VECLIB provides highly optimised implementations of a systematic set of vector operations on scalar and one or more vector operands. The library includes:
Binary Functions:
Contains all possible combinations of real, complex, scalar or vector arguments. A total of 110 independent functions, 128 functions in all.
Contains all possible combinations of real scalar or real vector arguments. A total of 149 independent functions, 336 functions in all.
Complex Ternary Functions:
Contains all possible combinations of complex scalar and complex vector arguments. A total of 1263 independent functions, 3024 functions in all.
Real vector Quaternary Functions:
Contains all possible combinations of real vector arguments. A total of 62 independent functions, 64 functions in all.
Contains all possible combinations of complex vector arguments. A total of 805 independent functions, 1024 functions in all.
The VECLIB library contains 4576 functions of 2, 3, or 4 operands, real or complex, scalar or vector
Table 2: Number of Routines in Core VSIPL Libraries
| Algorithm Area | Core Functions | Comments |
| Initialize/finalize |
2
|
Service routines |
| Block support |
43
|
Block/matrix memory management |
| Vector support |
104
|
Vector memory management |
| Vector copy |
12
|
Real and complex |
| Matrix support |
52
|
Matrix memory management |
| Matrix copy |
2
|
Real and complex |
| Scalar functions |
47
|
Indices, arithmetic, random numbers |
| Vector elementwise functions |
147
|
Arithmetic, math (eg sin, cos), min/max, fill, gather, scatter, random numbers,... |
| 1D and 2D FFT functions |
24
|
Complex-complex, real-complex, complex-real, in-place, out of place |
| Window creation |
4
|
Hanning, blackman, kaiser, chebyshev |
| FIR Filter |
8
|
Create, filter, destroy, get attributes |
| Convolution |
4
|
Create, filter, destroy, get attributes |
| Correlation |
8
|
Create, filter, destroy, get attributes |
| Histogram |
1
|
|
| Matrix functions |
19
|
Products, Transpose, Sum, Special Products |
| Linear Algebra |
40
|
LU, Cholesky, QRD, Special solvers inc. Toeplitz |
| TOTAL FUNCTIONS |
517
|
Optimization and Efficiency
Every routine in the libraries has been specifically optimised for the target processor. The implementations do the following:
block the data: the block sizes are tailored to the processor being targeted: typically a multiple of 4 or 8.
unroll block loops: the depth of unrolling is optimised and is operation and data size dependent.
prefetch blocks: when this is helpful.
re-order and group low level operations such as fetch, prefetch, and arithmetic operations.
implement advanced cache management strategies.
implement strategies which depend upon the data details. For example, aligned and unaligned data are treated separately, as are vectors with stride 1 (contiguous data); 2 (typically interleaved complex data); and general strided data.
handle ``edge effects: vector or matrix sizes which are not a multiple of the SIMD length, in a transparent but optimal manner.
utilize a mix of optimised C and assembler modules.
Efficiency is also improved for complex data routines by the choices made for data representation. Within VSIPL, the representation of complex data is transparent to the user. The VSIPLus VSIPL library utilises a split (rather than interleaved) representation internally, and this is reflected in the excellent performance figures.