Q-NOTE QN-7000HX Technical Information download pdf (Page 15)

Software Performance Optimization Methods

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 15

Using the techniques above, you can modify the C source code to the following style to help the

compiler do automatic vectorization.

float dot_product(float * restrict pa, float * restrict pb, unsigned int

len)

{

float sum=0.0;

unsigned int i;

for( i = 0; i < ( len & ~3); i++ )

sum += pa[i] *pb[i];

return sum;

}

GCC also supports the alternative forms __restrict__ and __restrict when not

compiling for C99. You can specify the standard used in coding the compiler with the option

-std=C99. Possible standards are c90, gnu99, and others.

Some publications state that manually unrolling the loop, as shown in the example below,

makes automatic vectorization by the compiler easier. However, recent GCC compilers are

better at recognizing and automatically vectorizing the above codes than the manually unrolled

codes. In practice, compilers might not vectorize the manually unrolled loop.

float dot_product(float * restrict pa, float * restrict pb, unsigned int

len )

{

float sum[4]={0.0,0.0,0.0,0.0};

unsigned int i;

for(i = 0; i < ( len & ~3); i+=4)

{

sum[0] += pa[i] *pb[i];

sum[1] += pa[i+1] *pb[i+1];

sum[2] += pa[i+2] *pb[i+2];

sum[3] += pa[i+3] *pb[i+3];

}

return sum[0]+sum[1]+sum[2]+sum[3];

}

Use Suitable Data Types

When optimizing algorithms operating on 16-bit or 8-bit data without SIMD, treating the data as

32-bit variables can sometimes yield better performance. This is because the compiler must

generate additional instructions to ensure the result does not overflow by a half-word or byte.

However, when targeting automatic vectorization with NEON, using the smallest data type that

can hold the required values is always the best choice. In a given time period, the NEON engine

can process twice as many 8-bit values as 16-bit values. Also, some NEON instructions do not

support some data types, and some only support certain operations. For example, NEON does

not support double-precision floating-point data types, so using a double-precision where a

single-precision float is adequate can prevent the compiler from vectorizing code. NEON

supports 64-bit integers only for certain operations, so avoid use of long variables where

possible.

NEON includes a group of instructions that can perform structured load and store operations.

These instructions can only be used for vectorized access to data structures where all

members are of the same size. Accessing 2/3/4-channel interleaved data with these

instructions can also accelerate NEON memory access performance.

Deviation of NEON Computation Results

For integers, the order of computation does not matter. For example, summing an array of

integers forward or backward always produces the same result. However, this is not true for

floating-point numbers because of the coding precision. Thus, the NEON-optimized code might

1 2 ... 10 11 12 13 14 15 16 17 18 19 20 ... 27 28

Comments to this Manuals

No comments

Q-NOTE QN-7000HX Technical Information Page 15

Comments to this Manuals

Related products and manuals for Tablets Q-NOTE QN-7000HX