FFmpeg/libavcodec/aarch64
Krzysztof Pyrkosz 83e4b068d9 avcodec/aarch64/aacencdsp: NEON implementation
This patch supplies handwritten NEON code for AAC.

The benchmarks below were collected by invoking these two commands on
each of my boards, A78, A72 and Thinkpad x13s:
1) ./tests/checkasm/checkasm --test=aacencdsp --bench --runs=12
2) ./ffmpeg -y -t 10:00 -f lavfi -i sine /tmp/foo.aac (the first line is
speed without the patch, second, with)

- A78
abs_pow34_c:                                          4161.5 ( 1.00x)
abs_pow34_neon:                                       3586.2 ( 1.16x)
quant_bands_signed_c:                                 5548.0 ( 1.00x)
quant_bands_signed_neon:                              1126.8 ( 4.92x)
quant_bands_unsigned_c:                               3979.2 ( 1.00x)
quant_bands_unsigned_neon:                             800.2 ( 4.97x)

size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed=71.6x
size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed=82.3x

- A72
abs_pow34_c:                                         15362.2 ( 1.00x)
abs_pow34_neon:                                      15382.5 ( 1.00x)
quant_bands_signed_c:                                 9926.5 ( 1.00x)
quant_bands_signed_neon:                              2467.8 ( 4.02x)
quant_bands_unsigned_c:                               5469.8 ( 1.00x)
quant_bands_unsigned_neon:                            2089.5 ( 2.62x)

size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed=34.3x
size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed=37.8

- x13s
abs_pow34_c:                                          2413.4 ( 1.00x)
abs_pow34_neon:                                       1796.2 ( 1.34x)
quant_bands_signed_c:                                 2968.9 ( 1.00x)
quant_bands_signed_neon:                               675.6 ( 4.39x)
quant_bands_unsigned_c:                               2311.9 ( 1.00x)
quant_bands_unsigned_neon:                             477.1 ( 4.85x)

size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed= 135x
size=    5251KiB time=00:10:00.00 bitrate=  71.7kbits/s speed= 159x

Signed-off-by: Martin Storsjö <martin@martin.st>
2025-01-28 10:44:40 +02:00
..
h26x aarch64: h26x: Fix the indentation of one function 2024-09-26 13:42:11 +03:00
vvc aarch64/vvc: Add apply_bdof 2024-12-21 11:54:44 +08:00
aacencdsp_init.c avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacencdsp_neon.S avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
aacpsdsp_init_aarch64.c
aacpsdsp_neon.S
ac3dsp_init_aarch64.c
ac3dsp_neon.S
cabac.h
fdct.h lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_init_aarch64.c lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fdctdsp_neon.S lavc/aarch64/fdct: add neon-optimized fdct for aarch64 2024-05-13 14:54:10 +02:00
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S
h264pred_init.c
h264pred_neon.S lavc/aarch64: Fix ff_pred16x16_plane_neon_10 2024-12-17 14:50:29 +02:00
h264qpel_init_aarch64.c
h264qpel_neon.S
hevcdsp_deblock_neon.S
hevcdsp_idct_neon.S
hevcdsp_init_aarch64.c aarch64/hevc: Move epel/qpel to h26x directory 2024-09-14 16:36:34 +08:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S
idct.h
idctdsp_init_aarch64.c lavc/aarch64: fix include for cpu.h 2024-05-13 14:50:38 +02:00
idctdsp_neon.S
Makefile avcodec/aarch64/aacencdsp: NEON implementation 2025-01-28 10:44:40 +02:00
me_cmp_init_aarch64.c avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16 2024-08-17 15:31:48 +02:00
me_cmp_neon.S avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16 2024-08-17 15:31:48 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
mpegvideoencdsp_init.c avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
mpegvideoencdsp_neon.S avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
neon.S
neontest.c
opusdsp_init.c lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
opusdsp_neon.S
pixblockdsp_init_aarch64.c
pixblockdsp_neon.S
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S
simple_idct_neon.S
synth_filter_init.c
synth_filter_neon.S
vc1dsp_init_aarch64.c
vc1dsp_neon.S
videodsp.S
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h
vp8dsp_init_aarch64.c
vp8dsp_neon.S
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9itxfm_16bpp_neon.S
vp9itxfm_neon.S
vp9lpf_16bpp_neon.S
vp9lpf_neon.S
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter 2025-01-03 17:53:46 -05:00