Q-NOTE QN-7000HX Technical Information download pdf (Page 21)

Boost NEON Performance by Improving Memory Access Efficiency

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 21

Generally, loading and storing multiple instructions can yield better performance than the

equivalent multiple load-and-store instructions, especially when cache is not enabled or a

memory region is marked as non-cacheable in the translation table. To understand this, you

must study the AMBA

specification carefully. Each memory access has overhead on the AXI

bus. To improve bus efficiency, use an AXI support burst; that is, group N consecutive accesses

together, and you will only need a one-time overhead. If you access N words in a single-beat

manner, N overheads are needed. This not only degrades internal bus throughput, but also

causes long latency.

Normally, the compiler only uses load-and-store multiple instructions for stack operations.

When the routine is memory-access intensive, such as memory copy, you might need to try

LDM/STM manually.

An example of these instructions can be:

LDMIA R10!, { R0-R3, R12 }

This instruction reads five registers from the addresses at which register (R10) points and

increases R10 by 20 (5 × 4 bytes) at the end because of the write-back specifier.

The register list is comma separated, with hyphens indicating ranges. The order specified in

this list is not important. ARM processors always proceed in a fixed fashion, with the lowest

numbered register mapped to the lowest address.

The instruction must also specify how to proceed from the base register, using one of four

modes: IA (increment after), IB (increment before), DA (decrement after), and DB (decrement

before). These specifiers also have four aliases (FD, FA, ED and EA) that work from a stack

perspective. They specify whether the stack pointer points to a full or empty top of the stack,

and whether the stack ascends or descends in memory.

Correspondingly, NEON supports load/store multiple in a similar way. For example:

VLDMmode{cond} Rn{!}, Registers

VSTMmode{cond} Rn{!}, Registers

The Mode should be one of the following:

• IA - Increment address after each transfer. This is the default, and can be omitted.

• DB - Decrement address before each transfer.

• EA - Empty ascending stack operation. This is the same as DB for loads and IA for saves.

• FD - Full descending stack operation. This is the same as IA for loads, and DB for saves.

Note that NEON has some special instructions for interleaving and de-interleaving:

• VLDn (Vector load multiple n-element structures) loads multiple n-element structures from

memory into one or more NEON registers, with de-interleaving (unless n == 1). Every

element of each register is loaded.

• VSTn (Vector store multiple n-element structures) writes multiple n-element structures to

memory from one or more NEON registers, with interleaving (unless n == 1). Every

element of each register is stored.

1 2 ... 16 17 18 19 20 21 22 23 24 25 26 27 28

No comments

Q-NOTE QN-7000HX Technical Information Page 21