1 /* Written by Richard P. Curnow, SuperH (UK) Ltd.
3 Tight version of mempy for the case of just copying a page.
4 Prefetch strategy empirically optimised against RTL simulations
5 of SH5-101 cut2 eval chip with Cayman board DDR memory.
8 r2 : source effective address (start of page)
9 r3 : destination effective address (start of page)
11 Always copies 4096 bytes.
14 * Currently the prefetch is 4 lines ahead and the alloco is 2 lines ahead.
15 It seems like the prefetch needs to be at at least 4 lines ahead to get
16 the data into the cache in time, and the allocos contend with outstanding
17 prefetches for the same cache set, so it's better to have the numbers
21 .section .text..SHmedia32,"ax"
25 .global sh64_page_copy
28 /* Copy 4096 bytes worth of data from r2 to r3.
29 Do prefetches 4 lines ahead.
30 Do alloco 2 lines ahead */
54 /* Minimal code size. The extra branches inside the loop don't cost much
55 because they overlap with the time spent waiting for prefetches to
58 bge/u r3, r6, tr2 ! skip prefetch for last 4 lines
59 ldx.q r3, r22, r63 ! prefetch 4 lines hence
61 bge/u r3, r7, tr3 ! skip alloco for last 2 lines
62 alloco r3, 0x40 ! alloc destination line 2 lines ahead
75 blink tr0, r63 ! return