Q-NOTE QN-7000HX Technical Information Page 15

  • Download
  • Add to my manuals
  • Print
  • Page
    / 28
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 14
Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 15
Using the techniques above, you can modify the C source code to the following style to help the
compiler do automatic vectorization.
float dot_product(float * restrict pa, float * restrict pb, unsigned int
len)
{
float sum=0.0;
unsigned int i;
for( i = 0; i < ( len & ~3); i++ )
sum += pa[i] *pb[i];
return sum;
}
GCC also supports the alternative forms __restrict__ and __restrict when not
compiling for C99. You can specify the standard used in coding the compiler with the option
-std=C99. Possible standards are c90, gnu99, and others.
Some publications state that manually unrolling the loop, as shown in the example below,
makes automatic vectorization by the compiler easier. However, recent GCC compilers are
better at recognizing and automatically vectorizing the above codes than the manually unrolled
codes. In practice, compilers might not vectorize the manually unrolled loop.
float dot_product(float * restrict pa, float * restrict pb, unsigned int
len )
{
float sum[4]={0.0,0.0,0.0,0.0};
unsigned int i;
for(i = 0; i < ( len & ~3); i+=4)
{
sum[0] += pa[i] *pb[i];
sum[1] += pa[i+1] *pb[i+1];
sum[2] += pa[i+2] *pb[i+2];
sum[3] += pa[i+3] *pb[i+3];
}
return sum[0]+sum[1]+sum[2]+sum[3];
}
Use Suitable Data Types
When optimizing algorithms operating on 16-bit or 8-bit data without SIMD, treating the data as
32-bit variables can sometimes yield better performance. This is because the compiler must
generate additional instructions to ensure the result does not overflow by a half-word or byte.
However, when targeting automatic vectorization with NEON, using the smallest data type that
can hold the required values is always the best choice. In a given time period, the NEON engine
can process twice as many 8-bit values as 16-bit values. Also, some NEON instructions do not
support some data types, and some only support certain operations. For example, NEON does
not support double-precision floating-point data types, so using a double-precision where a
single-precision float is adequate can prevent the compiler from vectorizing code. NEON
supports 64-bit integers only for certain operations, so avoid use of long variables where
possible.
NEON includes a group of instructions that can perform structured load and store operations.
These instructions can only be used for vectorized access to data structures where all
members are of the same size. Accessing 2/3/4-channel interleaved data with these
instructions can also accelerate NEON memory access performance.
Deviation of NEON Computation Results
For integers, the order of computation does not matter. For example, summing an array of
integers forward or backward always produces the same result. However, this is not true for
floating-point numbers because of the coding precision. Thus, the NEON-optimized code might
Page view 14
1 2 ... 10 11 12 13 14 15 16 17 18 19 20 ... 27 28

Comments to this Manuals

No comments