Commit graph

118127 commits

Author SHA1 Message Date
Niklas Haas
a8d01dff9a swscale/utils: add HDR metadata to SwsFormat
Only add the condensed values that we actually care about. Group them into
a new struct to make it easier to discard or replace this metadata.

Define a special comparison function that does not choke on undefined/unknown
metadata.
2024-12-23 12:33:43 +01:00
Niklas Haas
b9dfe8138e swscale/utils: check for supported color transfers
We will use the av_csp_itu_eotf() functions to decode these internally, so
check this function to see if it succeeds.
2024-12-23 12:33:43 +01:00
Niklas Haas
6c9218d748 swscale/unscaled: allow semiplanar copies
As fixed in the previous commit, this enables semipacked range and
bit depth conversions. Previously these would go through the general
purpose path.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:32:02 +01:00
Niklas Haas
77db7f9b87 swscale/unscaled: correctly copy semiplanar formats
This fixes multiple bugs with semiplanar formats like NV12. Not only do these
false positive the grayscale format checks (because dst[2] in NULL), but they
also copied an incorrect number of pixels.

Fixes conversions such as nv12 -> nv12, gray8 -> nv12, nv20le -> nv20be, etc.

Fixes: #11239
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:31:58 +01:00
Niklas Haas
c6bf7f6645 swscale/unscaled: correctly round yuv2yuv when not dithering
We should at least bias towards the nearest integer, instead of always
rounding down, when not dithering. This is a bit more correct.

The FATE changes are only in the cases where sws_dither was explicitly set
to "none", which is exactly as expected.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:29:22 +01:00
Niklas Haas
a9ae2cc14d checkasm/sw_rgb: add alpToYV12 check
Mirroring lumToYV12 and chrToYV12.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:59 +01:00
Niklas Haas
c601bb8df5 checkasm/sw_rgb: add tests for yuv2packed{1,2,X}
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Niklas Haas
57bbdb4fb1 checkasm/sw_scale: add test for yuv2nv12cX
Mirroring yuv2yuvX.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Niklas Haas
fe9bf7cd52 checkasm/sw_scale: add assertion for hscale assumption
This code only checks hcScale. In practice this is not an issue because
the function pointers should always be identical to hyScale for the same
filter size.

Add an assertion just to make sure this assumption never regresses.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Scott Theisen
9da1d2e66a libavcodec/v4l2_buffers.c: set AVFrame interlaced flags
Originally from:
669955c6cb

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-12-23 16:16:16 +08:00
1b8cd00da6
configure: add option to statically link to libvulkan
This may be useful in weird setups and on platforms where
static linking to libvulkan is supported.

libplacebo also has this fallback.
2024-12-23 04:25:09 +09:00
8fbecfd1a0
vulkan_decode: add queue_flags field to specify queue used 2024-12-23 04:25:09 +09:00
2e06b84e27
vulkan: do not reinvent a queue context struct
We recently introduced a public field which was a superset
of the queue context we used to have.

Switch to using it entirely.

This also allows us to get rid of the NIH function which was
valid only for video queues.
2024-12-23 04:25:09 +09:00
157cd820ad
vulkan: remove pointless mutex locks
This code was simply incorrect through and through. It did not
protect what actually has to be protected in a multi-threaded setup.
Perhaps it was used to silence threading errors?

Either way, remove it, and document the correct way to use execution
pools in a threaded environment.
2024-12-23 04:25:09 +09:00
7239be07be
vulkan_decode: use a single execution pool
Originally, the decoder had a single execution pool, with one
execution context per thread. Execution pools were always intended
to be thread-safe, as long as there were enough execution contexts
in the pool to satisfy all threads.

Due to synchronization issues, the threading part was removed at some
point, and, for decoding, each thread had its own execution pool.
Having a single execution pool per context is hacky, not to mention
wasteful.
Most importantly, we *cannot* associate single shaders across multiple
execution pools for a single application. This means that we cannot
use shaders to either apply film grain, or use this framework for
software-defined decoders.

The recent commits added threading capabilities back to the execution
pool, and the number of contexts in each pool was increased. This was
done with the assumption that the execution pool was singular, which
it was not. This led to increased parallelism and number of frames
in flight, which is taxing on memory.

This commit finally restores proper threading behaviour.
The validation layer has isses that are reported and addressed in the
earlier commit.
2024-12-23 04:25:08 +09:00
4ca2b86ed5
hwcontext_vulkan: disable validation layer threading warnings
The layer is buggy currently:
https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/9045
2024-12-23 04:25:08 +09:00
18af3a1db2
hwcontext_vulkan: do not enable portability subset by default
It doesn't make sense to, and could result in the implementation
picking emulation layers.
2024-12-23 04:25:01 +09:00
Benjamin Cheng
bf9f921ef7 avcodec/hw_base_encode: restrict size of next_prev
Some drivers are more strict about the size of the reference lists given
(i.e. VAOn12 [1]). The next_prev list is used to handle multiple "L0"
references in AV1 encode. Restrict the size of next_prev based on the
value of ref_l0 when the GOP structure is initialized.

[1] https://github.com/intel/cartwheel-ffmpeg/issues/278

v2: fix indentation issues
2024-12-23 04:24:54 +09:00
Nuo Mi
0a6388d1da avcodec/hevcdec: remove hevc prefix for x86 asm files 2024-12-22 21:00:06 +08:00
Nuo Mi
8d27256a74 avcodec/vvcdec: remove vvc prefix for x86 and riscv 2024-12-22 21:00:06 +08:00
Peter Ross
350ebef112
avformat/iff: remove surplus if statement
Fixes CID 1636854
2024-12-22 06:24:49 -05:00
Peter Ross
b2cba76d4f avformat/riff: map 0069 twocc to ADPCM IMA XBOX decoder 2024-12-22 16:08:33 +11:00
Paul B Mahol
c3083b3266 avcodec: add ADPCM IMA XBOX decoder 2024-12-22 16:08:33 +11:00
Niklas Haas
095f8038fa swscale/output: fix bilinear yuv2rgb chroma interpolation
These functions were divided into two special cases; one assuming that
uvalpha == 0, and the other assuming that uvalpha == 2048. This worked fine
for simple 2x chroma upscaling but broke for e.g. yuv410p, non-centered chroma,
or other special cases that involved non-aligned chroma filters.

Fix it by instead dividing this check into two cases, a uvalpha==0 fast path
and a uvalpha>0 general path. Instead of (A+B)/2 the general path now multiplies
in the true uvalpha weight.

I tried preserving the old fast path for the case of uvalpha == 2048, but this
was significantly slower in practise versus having just one general path.
However, we still need a uvalpha == 0 path for the unscaled case.

Fixes: ticket #5083
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-21 10:57:54 +01:00
sunyuechi
6b31e42c47 lavc/riscv: vset macro for simplify if-else 2024-12-21 12:03:45 +08:00
Zhao Zhili
952508ae05 aarch64/vvc: Add apply_bdof
Test on rpi 5 with gcc 12:

apply_bdof_8_8x16_c:                                  7315.2 ( 1.00x)
apply_bdof_8_8x16_neon:                               1876.8 ( 3.90x)
apply_bdof_8_16x8_c:                                  7170.5 ( 1.00x)
apply_bdof_8_16x8_neon:                               1752.8 ( 4.09x)
apply_bdof_8_16x16_c:                                14695.2 ( 1.00x)
apply_bdof_8_16x16_neon:                              3490.5 ( 4.21x)
apply_bdof_10_8x16_c:                                 7371.5 ( 1.00x)
apply_bdof_10_8x16_neon:                              1863.8 ( 3.96x)
apply_bdof_10_16x8_c:                                 7172.0 ( 1.00x)
apply_bdof_10_16x8_neon:                              1766.0 ( 4.06x)
apply_bdof_10_16x16_c:                               14551.5 ( 1.00x)
apply_bdof_10_16x16_neon:                             3576.0 ( 4.07x)
apply_bdof_12_8x16_c:                                 7236.5 ( 1.00x)
apply_bdof_12_8x16_neon:                              1863.8 ( 3.88x)
apply_bdof_12_16x8_c:                                 7316.5 ( 1.00x)
apply_bdof_12_16x8_neon:                              1758.8 ( 4.16x)
apply_bdof_12_16x16_c:                               14691.2 ( 1.00x)
apply_bdof_12_16x16_neon:                             3480.5 ( 4.22x)
2024-12-21 11:54:44 +08:00
Peter Ross
7aeae8d1ae avcodec/Makefile: include aom_film_grain.o file for h264_sei component
h264_sei depends on h2645_sei, which in turn depends on aom_film_grain for
ff_aom_uninit_film_grain_params()
2024-12-21 11:38:57 +11:00
Peter Ross
6bf9252807 avformat/Makefile: include object files for image_vbn_pipe demuxer 2024-12-21 11:38:48 +11:00
Peter Ross
c90e0777da avformat/iff: SndAnim decoding
Fixes ticket #5553.
2024-12-20 20:40:12 +11:00
James Almer
4e2b9df48c avformat/isom: use more of the existing channel layout bitmap defines
Signed-off-by: James Almer <jamrial@gmail.com>
2024-12-19 22:06:22 -03:00
James Almer
76049d1c45 avformat/iamf_writer: fix setting num_samples_per_frame for OPUS
As per section 3.11.1 of the IAMF spec, the sample rate used in Codec Config
for Opus shall be 48kHz, regardless of the original sample rate used during
encoding.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-12-19 22:06:22 -03:00
Dmitrii Ovchinnikov
95217872ad avcodec/amfenc: B-Frame support for av1_amf encoder. 2024-12-20 00:43:52 +01:00
Dmitrii Ovchinnikov
c037eb8424 amfenc: Update the min version to 1.4.35.0 for AMF SDK. 2024-12-20 00:43:42 +01:00
Cameron Gutman
a40cbf9792 avcodec/amfenc: Implement async_depth option
This option, which is also available on other FFmpeg hardware encoders,
allows the user to trade throughput for reduced output latency. This is
useful for ultra low latency applications like game streaming.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2024-12-20 00:43:30 +01:00
Peter Ross
494c961379 avformat/Makefile: add iso_writer golomb_tab from shared library dependency 2024-12-19 10:42:29 +11:00
Niklas Haas
b38f6f9990 tests/swscale: allow nonzero positive return codes from sws_scale_frame()
See previous commit.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:48 +01:00
Niklas Haas
e05a1bb879 swscale: fix documentation of sws_scale_frame()
Since its introduction, this function has claimed to return 0 on success, yet
never actually did so (until the introduction of the new graph based API). It
always returned the number of scaled lines, and continues to do so.

To avoid confusion, but also avoid regressing possible clients that relied on
the existing semantics, simply update the documentation to reflect the actual
behavior. Remain ambiguous about the exact interpretation of the return value
on account of the unfortunate difference in behavior between the legacy and
new scaling APIs.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:48 +01:00
Niklas Haas
2df655bc2c swscale/utils: fix sws_getCachedContext check
This logic was inverted, but || was not replaced by &&.

Fixes: ed5dd67562
Fixes: ticket #11353
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:26 +01:00
Martin Storsjö
d1e37eb0cd avutil/mem_internal: Don't include stdalign.h on MSVC
It's currently actually not used in MSVC builds, since
6e49b86996.

Older versions of MSVC (or, in particular, older versions of UCRT)
don't have stdalign.h; it's available since WinSDK 10.0.20348.0;
such a new enough version has been installed by default only since
MSVC 2022 17.4 and newer.

With this change, ffmpeg can still be built with MSVC 2019 16.8
(v19.28).

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-18 16:03:06 +02:00
Martin Storsjö
2bb00ef59c aarch64: vvc: Fix building the dmvr_hv assembly with older MSVC versions
Explicitly use ldur for unaligned offsets; newer versions of
armasm64 implicitly convert ldr to ldur as necessary, but older
versions require it explicitly written out.

This fixes these build errors:

    ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2039) :
     error A2518: operand 2: Memory offset must be aligned
            ldr             s5, [x1, #1]
    ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2250) :
     error A2518: operand 2: Memory offset must be aligned
            ldr             d7, [x1, #2]

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-18 13:45:09 +02:00
Peter Ross
8272d34377 configure: add iso_writer golomb dependency
since commit fce0622d0b, libavformat/hevc.c
depends on golomb vlc tables.
2024-12-18 16:30:47 +11:00
David Rosca
d0facac679 lavc/vaapi_encode_h265: Use surface alignment
This is needed to correctly set conformance window crop with Mesa AMD.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-12-17 21:36:05 +01:00
David Rosca
bcfbf2bac8 lavc/vaapi_encode: Query surface alignment
It needs to create temporary config to query surface attribute.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-12-17 21:36:01 +01:00
Bin Peng
72a3656e84 lavc/aarch64: Fix ff_pred16x16_plane_neon_10
Fix test failure on aarch64:
./tests/checkasm/checkasm --test=h264pred 367840

Signed-off-by: Peng Bin <pengbin@visionular.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-17 14:50:29 +02:00
Bin Peng
decc9e643c lavc/aarch64: Fix ff_pred8x8_plane_neon_10
Fix test failure on aarch64:
./tests/checkasm/checkasm --test=h264pred 479612

The mismatch between neon and C functions can also be reproduced using the following bitstream and command line.

wget https://streams.videolan.org/ffmpeg/incoming/intra8x8pred_10bit.264
 ./ffmpeg -cpuflags 0  -threads 1 -i intra8x8pred_10bit.264  -f framemd5 -y md5_ref
 ./ffmpeg              -threads 1 -i intra8x8pred_10bit.264  -f framemd5 -y md5_neon

Signed-off-by: Bin Peng <pengbin@visionular.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-17 14:50:29 +02:00
Zhao Zhili
7b0bd6c4a7 avutil/vulkan_glslang: Fix build failure
compile_only isn't available until 13.1.0. Let default initialization set
it to zero, so the code works with version before and after 13.1.0.
2024-12-17 19:28:35 +09:00
Rémi Denis-Courmont
bd226fdd74 lavc/h264dsp: R-V V intra loop filter
As with the inter loop filter, performance metrics seem to be biased in
favour of the C implementation because checkasm inputs almost always
fall in the no-op case.

h264_h_loop_filter_chroma_intra_8bpp_c:                 82.8 ( 1.00x)
h264_h_loop_filter_chroma_intra_8bpp_rvv_i32:           72.6 ( 1.14x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c:           41.1 ( 1.00x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_rvv_i32:     72.6 ( 0.57x)
h264_h_loop_filter_luma_intra_8bpp_c:                  166.1 ( 1.00x)
h264_h_loop_filter_luma_intra_8bpp_rvv_i32:            395.4 ( 0.42x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_c:             93.3 ( 1.00x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_rvv_i32:      395.4 ( 0.24x)
h264_v_loop_filter_chroma_intra_8bpp_c:                134.8 ( 1.00x)
h264_v_loop_filter_chroma_intra_8bpp_rvv_i32:           51.6 ( 2.61x)
h264_v_loop_filter_luma_intra_8bpp_c:                  468.1 ( 1.00x)
h264_v_loop_filter_luma_intra_8bpp_rvv_i32:            134.8 ( 3.47x)
2024-12-17 09:00:28 +02:00
sunyuechi
16d4945e9a lavc/vvc_mc R-V V sad
k230               banana_f3
sad_8x16_c:                 387.7 ( 1.00x)    394.9 ( 1.00x)
sad_8x16_rvv_i32:           109.7 ( 3.53x)    103.5 ( 3.82x)
sad_16x8_c:                 378.2 ( 1.00x)    384.7 ( 1.00x)
sad_16x8_rvv_i32:            82.0 ( 4.61x)    61.7 ( 6.24x)
sad_16x16_c:                748.7 ( 1.00x)    759.7 ( 1.00x)
sad_16x16_rvv_i32:          128.5 ( 5.83x)    113.7 ( 6.68x)
2024-12-17 09:21:20 +08:00
sunyuechi
b3f7440298 lavc/hevc: R-V V put_pixels(pow2)
k230               banana_f3
put_hevc_pel_pixels4_8_c:               61.6 ( 1.00x)    69.5 ( 1.00x)
put_hevc_pel_pixels4_8_rvv_i32:         24.6 ( 2.50x)    28.0 ( 2.48x)
put_hevc_pel_pixels8_8_c:              209.8 ( 1.00x)    215.5 ( 1.00x)
put_hevc_pel_pixels8_8_rvv_i32:         52.6 ( 3.99x)    38.2 ( 5.64x)
put_hevc_pel_pixels16_8_c:             839.4 ( 1.00x)    830.0 ( 1.00x)
put_hevc_pel_pixels16_8_rvv_i32:       126.6 ( 6.63x)    90.5 ( 9.17x)
put_hevc_pel_pixels32_8_c:            3246.6 ( 1.00x)    3246.7 ( 1.00x)
put_hevc_pel_pixels32_8_rvv_i32:       311.6 (10.42x)    257.0 (12.63x)
put_hevc_pel_pixels64_8_c:           12894.6 ( 1.00x)    12892.7 ( 1.00x)
put_hevc_pel_pixels64_8_rvv_i32:      1135.8 (11.35x)    778.0 (16.57x)
2024-12-17 09:21:20 +08:00
sunyuechi
dad062c4f8 lavc/vvc_mc: R-V V put_pixels
k230               banana_f3
put_chroma_pixels_8_4x4_c:                              63.5 ( 1.00x)    59.2 ( 1.00x)
put_chroma_pixels_8_4x4_rvv_i32:                        26.5 ( 2.39x)    28.0 ( 2.12x)
put_chroma_pixels_8_8x8_c:                             211.8 ( 1.00x)    215.5 ( 1.00x)
put_chroma_pixels_8_8x8_rvv_i32:                        54.3 ( 3.90x)    48.8 ( 4.42x)
put_chroma_pixels_8_16x16_c:                           841.3 ( 1.00x)    830.0 ( 1.00x)
put_chroma_pixels_8_16x16_rvv_i32:                     137.5 ( 6.12x)    121.8 ( 6.82x)
put_chroma_pixels_8_32x32_c:                          3248.8 ( 1.00x)    3288.2 ( 1.00x)
put_chroma_pixels_8_32x32_rvv_i32:                     350.5 ( 9.27x)    288.5 (11.40x)
put_chroma_pixels_8_64x64_c:                         12998.3 ( 1.00x)    12976.2 ( 1.00x)
put_chroma_pixels_8_64x64_rvv_i32:                    1100.5 (11.81x)    924.0 (14.04x)
put_chroma_pixels_8_128x128_c:                       54284.0 ( 1.00x)    52654.5 ( 1.00x)
put_chroma_pixels_8_128x128_rvv_i32:                  7192.8 ( 7.55x)    2934.2 (17.94x)
put_luma_pixels_8_4x4_c:                                63.5 ( 1.00x)    69.5 ( 1.00x)
put_luma_pixels_8_4x4_rvv_i32:                          26.5 ( 2.39x)    28.0 ( 2.48x)
put_luma_pixels_8_8x8_c:                               211.5 ( 1.00x)    225.8 ( 1.00x)
put_luma_pixels_8_8x8_rvv_i32:                          54.3 ( 3.90x)    38.5 ( 5.86x)
put_luma_pixels_8_16x16_c:                             850.5 ( 1.00x)    830.0 ( 1.00x)
put_luma_pixels_8_16x16_rvv_i32:                       137.5 ( 6.18x)    100.8 ( 8.24x)
put_luma_pixels_8_32x32_c:                            3248.8 ( 1.00x)    3257.2 ( 1.00x)
put_luma_pixels_8_32x32_rvv_i32:                       341.3 ( 9.52x)    246.8 (13.20x)
put_luma_pixels_8_64x64_c:                           13007.5 ( 1.00x)    13038.8 ( 1.00x)
put_luma_pixels_8_64x64_rvv_i32:                      1119.0 (11.62x)    684.2 (19.06x)
put_luma_pixels_8_128x128_c:                         54219.3 ( 1.00x)    52060.8 ( 1.00x)
put_luma_pixels_8_128x128_rvv_i32:                    6813.5 ( 7.96x)    2548.8 (20.43x)
2024-12-17 09:21:20 +08:00