Q-NOTE QN-7000HX Technical Information Page 14

  • Download
  • Add to my manuals
  • Print
  • Page
    / 28
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 13
Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 14
Use suitable data types
An example of a standard dot product algorithm is used here. The following function calculates
the dot product of two float vectors, with each vector having len number of float type elements.
float dot_product(float * pa, float * pb, unsigned int len)
{
float sum=0.0;
unsigned int i;
for( i = 0; i < len; i++ )
sum += pa[i] *pb[i];
return sum;
}
Indicate the Number of Loop Iterations
You can write code to permit the compiler to perform otherwise unsafe operations under the
following conditions:
A loop has a fixed iteration count
The iteration count can be decided as a multiple of N (register length/data type size) at the
coding stage
For the example above: if you know that the value of len is always a multiple of four, you can
indicate this to compiler by masking off the bottom two bits when comparing the loop counter to
len. Because this loop now always executes a multiple of four, the compiler knows it is safe to
vectorize it.
Note:
the iteration count as a multiple of four is only an example. In fact, it should be a multiple of the
number of lanes in a vector. For example, if you plan to use the NEON Quad-word register and the
data type is 32-bit float, an iteration count with a multiple of 4 is desired, as indicated by masking off
the 2 low bits as shown in the next code snippet (page 15).
The requirement to have fixed iteration count or iteration count decided at the coding stage is not a
must. When the iteration count can only be decided at run time, you can split the loop into two loops.
One has the iteration count as a multiple of the number of lanes, and another processes the
remaining iterations.
Avoid Loop-Carried Dependencies
If the code contains a loop in which the result of one iteration is affected by the result of
previous iterations, the compiler will be unable to vectorize it. Re-structuring the code, if
possible, to remove any loop-carried dependencies is necessary for the compiler to do
vectorization.
Avoid Conditions Inside Loops
If possible, process data only inside a loop. Generally speaking, it is difficult for the compiler to
vectorize loops containing conditional sequences. In the best of cases, it duplicates the loop,
but in many cases, this kind of code cannot be vectorized at all.
Use the Restrict Keyword
C99 introduced a new keyword, restrict, which can inform the compiler that the location
accessed through a specific pointer is not to be accessed through any other pointer within the
current scope. In other words, the memory regions targeted by the pointers in the current scope
do not overlap with each other.
Without this keyword, the compiler might assume that pointer pa refers to the same location as
pointer pb. This implies the possibility of a loop-carried dependency, which prevents the
compiler from vectorizing the sequence. With the restrict keyword, you inform the compiler
that memory to which pa and pb point does not overlap. The compiler ignores the possibility of
aliasing and assumes that it can vectorize the sequence without creating errors.
Page view 13
1 2 ... 9 10 11 12 13 14 15 16 17 18 19 ... 27 28

Comments to this Manuals

No comments