Q-NOTE QN-7000HX Technical Information Page 21

  • Download
  • Add to my manuals
  • Print
  • Page
    / 28
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 20
Boost NEON Performance by Improving Memory Access Efficiency
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 21
Generally, loading and storing multiple instructions can yield better performance than the
equivalent multiple load-and-store instructions, especially when cache is not enabled or a
memory region is marked as non-cacheable in the translation table. To understand this, you
must study the AMBA
®
specification carefully. Each memory access has overhead on the AXI
bus. To improve bus efficiency, use an AXI support burst; that is, group N consecutive accesses
together, and you will only need a one-time overhead. If you access N words in a single-beat
manner, N overheads are needed. This not only degrades internal bus throughput, but also
causes long latency.
Normally, the compiler only uses load-and-store multiple instructions for stack operations.
When the routine is memory-access intensive, such as memory copy, you might need to try
LDM/STM manually.
An example of these instructions can be:
LDMIA R10!, { R0-R3, R12 }
This instruction reads five registers from the addresses at which register (R10) points and
increases R10 by 20 (5 × 4 bytes) at the end because of the write-back specifier.
The register list is comma separated, with hyphens indicating ranges. The order specified in
this list is not important. ARM processors always proceed in a fixed fashion, with the lowest
numbered register mapped to the lowest address.
The instruction must also specify how to proceed from the base register, using one of four
modes: IA (increment after), IB (increment before), DA (decrement after), and DB (decrement
before). These specifiers also have four aliases (FD, FA, ED and EA) that work from a stack
perspective. They specify whether the stack pointer points to a full or empty top of the stack,
and whether the stack ascends or descends in memory.
Correspondingly, NEON supports load/store multiple in a similar way. For example:
VLDMmode{cond} Rn{!}, Registers
VSTMmode{cond} Rn{!}, Registers
The Mode should be one of the following:
IA - Increment address after each transfer. This is the default, and can be omitted.
DB - Decrement address before each transfer.
EA - Empty ascending stack operation. This is the same as DB for loads and IA for saves.
FD - Full descending stack operation. This is the same as IA for loads, and DB for saves.
Note that NEON has some special instructions for interleaving and de-interleaving:
VLDn (Vector load multiple n-element structures) loads multiple n-element structures from
memory into one or more NEON registers, with de-interleaving (unless n == 1). Every
element of each register is loaded.
VSTn (Vector store multiple n-element structures) writes multiple n-element structures to
memory from one or more NEON registers, with interleaving (unless n == 1). Every
element of each register is stored.
Page view 20
1 2 ... 16 17 18 19 20 21 22 23 24 25 26 27 28

Comments to this Manuals

No comments