Commit graph

585 commits

Author SHA1 Message Date
Andreas Rheinhardt
5a72266d49 tests/checkasm/sw_rgb: Fix leaks
Also use loop-scope for variables where appropriate.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-01-12 15:41:40 +01:00
James Almer
658a645e18 tests/checkasm/sw_rgb: remove bogus value truncation in check_yuv2packed1()
Fixes out of array accesses.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-12-31 11:53:18 -03:00
Niklas Haas
a9ae2cc14d checkasm/sw_rgb: add alpToYV12 check
Mirroring lumToYV12 and chrToYV12.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:59 +01:00
Niklas Haas
c601bb8df5 checkasm/sw_rgb: add tests for yuv2packed{1,2,X}
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Niklas Haas
57bbdb4fb1 checkasm/sw_scale: add test for yuv2nv12cX
Mirroring yuv2yuvX.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Niklas Haas
fe9bf7cd52 checkasm/sw_scale: add assertion for hscale assumption
This code only checks hcScale. In practice this is not an issue because
the function pointers should always be identical to hyScale for the same
filter size.

Add an assertion just to make sure this assumption never regresses.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:20:58 +01:00
Martin Storsjö
4b524649ff checkasm: Print benchmarks of C-only functions
This corresponds to commit 9278a14cf406f8edb5052c42b83750112bf5b515
in dav1d.

Omitting the C-only functions doesn't speed up benchmarking
anyway (as those has to be benchmarked before we know if we have
any corresponding assembly functions), and being able to benchmark
those functions without corresponding assembly can be valuable in
a number of cases.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-11 10:51:15 +02:00
sunyuechi
82da769492 checkasm/rv40dsp: cover more cases
Co-Authored-By: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-12-10 11:24:45 -05:00
Martin Storsjö
47b1e1bd84 checkasm: vvc: Use checkasm_check for printing failing output
Share the checkasm_check_pixel macro from hevc_pel in checkasm.h,
to allow other tests to use the same. (To use it in other tests,
those tests need to have a similar setup for high bitdepth pixels,
with a local variable named "bit_depth".)

This simplifies the code for checking the output, and can print
the failing output (including a map of matching/mismatching
elements) if checkasm is run with the -v/--verbose option.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-10 11:26:09 +02:00
Zhao Zhili
018ec4fe5f tests/checkasm: Simplify logic for WASI signal handling
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Reviewed-by: Martin Storsjö <martin@martin.st>
2024-12-06 10:48:11 +08:00
Ramiro Polla
384fe39623 swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats
There is an issue with the constants used in YUV to YUV range conversion,
where the upper bound is not respected when converting to mpeg range.

With this commit, the constants are calculated at runtime, depending on
the bit depth. This approach also allows us to more easily understand how
the constants are derived.

For bit depths <= 14, the number of fixed point bits has been set to 14
for all conversions, to simplify the code.
For bit depths > 14, the number of fixed points bits has been raised and
set to 18, to allow for the conversion to be accurate enough for the mpeg
range to be respected.

The convert functions now take the conversion constants (coeff and offset)
as function arguments.
For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit.
For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit.

x86_64:
chrRangeFromJpeg8_1920_c:    2127.4   2125.0  (1.00x)
chrRangeFromJpeg16_1920_c:   2325.2   2127.2  (1.09x)
chrRangeToJpeg8_1920_c:      3166.9   3168.7  (1.00x)
chrRangeToJpeg16_1920_c:     2152.4   3164.8  (0.68x)
lumRangeFromJpeg8_1920_c:    1263.0   1302.5  (0.97x)
lumRangeFromJpeg16_1920_c:   1080.5   1299.2  (0.83x)
lumRangeToJpeg8_1920_c:      1886.8   2112.2  (0.89x)
lumRangeToJpeg16_1920_c:     1077.0   1906.5  (0.56x)

aarch64 A55:
chrRangeFromJpeg8_1920_c:   28835.2  28835.6  (1.00x)
chrRangeFromJpeg16_1920_c:  28839.8  32680.8  (0.88x)
chrRangeToJpeg8_1920_c:     23074.7  23075.4  (1.00x)
chrRangeToJpeg16_1920_c:    17318.9  24996.0  (0.69x)
lumRangeFromJpeg8_1920_c:   15389.7  15384.5  (1.00x)
lumRangeFromJpeg16_1920_c:  15388.2  17306.7  (0.89x)
lumRangeToJpeg8_1920_c:     19227.8  19226.6  (1.00x)
lumRangeToJpeg16_1920_c:    15387.0  21146.3  (0.73x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:    6324.4   6268.1  (1.01x)
chrRangeFromJpeg16_1920_c:   6339.9  11521.5  (0.55x)
chrRangeToJpeg8_1920_c:      9656.0   9612.8  (1.00x)
chrRangeToJpeg16_1920_c:     6340.4  11651.8  (0.54x)
lumRangeFromJpeg8_1920_c:    4422.0   4420.8  (1.00x)
lumRangeFromJpeg16_1920_c:   4420.9   5762.0  (0.77x)
lumRangeToJpeg8_1920_c:      5949.1   5977.5  (1.00x)
lumRangeToJpeg16_1920_c:     4446.8   5946.2  (0.75x)

NOTE: all simd optimizations for range_convert have been disabled.
      they will be re-enabled when they are fixed for each architecture.

NOTE2: the same issue still exists in rgb2yuv conversions, which is not
       addressed in this commit.
2024-12-05 21:10:29 +01:00
Ramiro Polla
536a44e8dc checkasm/sw_range_convert: test negative input values 2024-12-05 21:10:29 +01:00
Zhao Zhili
ea3d21c349 tests/checkasm: Add partial support for wasm
WASI mssing signal and siglongjmp support. This patch workaround
build error and add simd128 flag. Please note that many tests use
large array on stack, so you need to increase the stack size when
build checkasm, e.g., --extra-ldflags='-Wl,-z,stack-size=10485760'

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-12-04 16:43:07 +08:00
Niklas Haas
6a91a165fd swscale: eliminate redundant SwsInternal accesses
This is a purely cosmetic commit aimed at replacing accesses to
SwsInternal.opts by direct access to SwsContext wherever convenient.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 10:59:52 +01:00
Niklas Haas
2d077f9acd swscale/internal: group user-facing options together
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-21 12:49:56 +01:00
James Almer
9d8f7bf4b8 tests/checkasm/diracdsp: fix alignment for src and ombc_weight buffers
They are supposed to be 16 byte aligned, not 8.
Should fix crashes in some systems.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-11-19 12:32:49 -03:00
Rémi Denis-Courmont
55aa81d5cc checkasm: add RISC-V vector width to arch info 2024-11-17 11:28:21 +02:00
Kyosuke Kawakami
711290f9a3 checkasm/diracdsp: test add_dirac_obmc
Signed-off-by: Kyosuke Kawakami <kawakami150708@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-11-15 13:44:53 -05:00
Ramiro Polla
562524587e checkasm/sw_range_convert: indent after previous couple of commits 2024-10-27 13:20:56 +01:00
Ramiro Polla
031d98790e checkasm/sw_range_convert: test all supported bit depths
This commit also reduces the number of times ff_sws_init_scale() gets
called (only once per bit depth), and the number of times randomize_buffers()
gets called (only if the function must be checked).

Benchmarks are only performed on bit depths 8 and 16 (since they are
different functions, and not only different constants).
2024-10-27 13:20:56 +01:00
Ramiro Polla
2c44393c01 checkasm/sw_range_convert: only run benchmarks on largest input width 2024-10-27 13:20:56 +01:00
Ramiro Polla
e308d09fba checkasm/sw_range_convert: reduce number of input sizes tested
Reduce input sizes to 8 (to test that the function works with widths
smaller than the vector length) and 1920 (raising the largest input
size to improve benchmark results).
2024-10-27 13:20:56 +01:00
Ramiro Polla
d1acd68d73 checkasm/sw_range_convert: use YUV pixel formats instead of YUVJ
We are already setting the range, so we can use regular YUV pixel
formats instead of YUVJ.
2024-10-27 13:20:56 +01:00
Ramiro Polla
a8ef1fac0d checkasm: use FF_ARRAY_ELEMS instead of hardcoding size of arrays 2024-10-27 13:20:56 +01:00
Niklas Haas
67adb30322 swscale: rename SwsContext to SwsInternal
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.

This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.

For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.

In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-24 22:50:00 +02:00
James Almer
e1d1ba4cbc tests/checkasm/sw_rgb: don't write random data past the end of the buffer
Should fix fate-checkasm-sw_rgb under gcc-ubsan.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2024-10-17 13:08:39 +02:00
Martin Storsjö
6668268e16 checkasm: lls: Use relative tolerances rather than absolute ones
Depending on the magnitude of the output values, the potential
errors can be larger.

This fixes errors in the lls tests on x86_32 for some seeds,
observed with GCC 11 (on Ubuntu 22.04, with the distro compiler,
with -m32).

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-10-09 15:52:56 +03:00
Martin Storsjö
c65a294f79 checkasm: Print the SVE vector length at startup
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:06:55 +03:00
Martin Storsjö
e6eabb7ce7 aarch64: Add CPU feature flags for SVE and SVE2
Add code for detecting the feature on Linux and Windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:04:30 +03:00
Martin Storsjö
157ce21939 checkasm/sw_rgb: Revert test additions from e18b46d95f
The unaligned width test cases fail on i386; we have an assembly
function of rgb24toyv12 which is enabled only within
"#if ARCH_X86_32 && HAVE_7REGS", which seems to fail these new
test cases for unaligned widths.

As that assembly function has existed for a long time in that form,
the issue probably isn't very recent, thus skip testing these cases
for now.

Once the assembly function has been fixed, these test cases can
be readded.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-26 13:16:56 +03:00
Zhao Zhili
e18b46d95f swscale/aarch64: Fix rgb24toyv12 only works with aligned width
Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-09-24 10:24:14 +08:00
Ramiro Polla
e0cc06184c checkasm/sw_rgb: add rgb24toyv12 tests 2024-09-06 23:06:35 +02:00
Ramiro Polla
c08bb33e41 checkasm/sw_rgb: add deinterleaveBytes 2024-09-06 23:05:06 +02:00
James Almer
2a6f84718b fate/checkasm/sw_gbrp: don't randomly set internal values
They are set by sws_init_context().
May help with signed integer overflows reported by gcc-usan.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-05 22:19:47 -03:00
Rémi Denis-Courmont
d9f594209f checkasm/riscv: print official extension names 2024-09-04 22:04:11 +03:00
Anton Khirnov
3f9ca51015 lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
Ramiro Polla
6aafe61285 avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
Nuo Mi
7175544c0b checkasm: add vvc_bdof test
apply_bdof_8_8x16_c: 5776.5
apply_bdof_8_8x16_avx2: 396.2
apply_bdof_8_16x8_c: 5722.0
apply_bdof_8_16x8_avx2: 216.0
apply_bdof_8_16x16_c: 11213.2
apply_bdof_8_16x16_avx2: 434.5
apply_bdof_10_8x16_c: 5657.7
apply_bdof_10_8x16_avx2: 1096.0
apply_bdof_10_16x8_c: 5531.7
apply_bdof_10_16x8_avx2: 212.5
apply_bdof_10_16x16_c: 11043.7
apply_bdof_10_16x16_avx2: 1252.7
apply_bdof_12_8x16_c: 5680.0
apply_bdof_12_8x16_avx2: 1096.5
apply_bdof_12_16x8_c: 5646.2
apply_bdof_12_16x8_avx2: 624.5
apply_bdof_12_16x16_c: 11076.0
apply_bdof_12_16x16_avx2: 1241.5
2024-08-31 14:08:54 +08:00
J. Dekker
e758b24396 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
d0986709a8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
03f26549cd checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
42528ff835 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Ramiro Polla
4545205a26 swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters 2024-08-18 22:26:11 +02:00
Nuo Mi
7eb1df44ae checkasm: add tests for vvc dmvr
dmvr_8_12x20_c: 186.2
dmvr_8_12x20_avx2: 25.7
dmvr_8_20x12_c: 181.7
dmvr_8_20x12_avx2: 25.2
dmvr_8_20x20_c: 283.2
dmvr_8_20x20_avx2: 32.0
dmvr_10_12x20_c: 90.0
dmvr_10_12x20_avx2: 15.7
dmvr_10_20x12_c: 41.0
dmvr_10_20x12_avx2: 14.7
dmvr_10_20x20_c: 81.5
dmvr_10_20x20_avx2: 26.7
dmvr_12_12x20_c: 190.7
dmvr_12_12x20_avx2: 20.2
dmvr_12_20x12_c: 187.2
dmvr_12_20x12_avx2: 20.2
dmvr_12_20x20_c: 292.7
dmvr_12_20x20_avx2: 27.2
dmvr_h_8_12x20_c: 317.0
dmvr_h_8_12x20_avx2: 37.0
dmvr_h_8_20x12_c: 340.0
dmvr_h_8_20x12_avx2: 41.0
dmvr_h_8_20x20_c: 540.7
dmvr_h_8_20x20_avx2: 64.0
dmvr_h_10_12x20_c: 322.7
dmvr_h_10_12x20_avx2: 30.7
dmvr_h_10_20x12_c: 344.2
dmvr_h_10_20x12_avx2: 34.0
dmvr_h_10_20x20_c: 529.0
dmvr_h_10_20x20_avx2: 51.5
dmvr_h_12_12x20_c: 326.7
dmvr_h_12_12x20_avx2: 33.5
dmvr_h_12_20x12_c: 331.7
dmvr_h_12_20x12_avx2: 51.2
dmvr_h_12_20x20_c: 534.0
dmvr_h_12_20x20_avx2: 62.7
dmvr_hv_8_12x20_c: 650.0
dmvr_hv_8_12x20_avx2: 57.2
dmvr_hv_8_20x12_c: 676.2
dmvr_hv_8_20x12_avx2: 70.0
dmvr_hv_8_20x20_c: 1068.5
dmvr_hv_8_20x20_avx2: 103.2
dmvr_hv_10_12x20_c: 649.0
dmvr_hv_10_12x20_avx2: 48.2
dmvr_hv_10_20x12_c: 677.7
dmvr_hv_10_20x12_avx2: 59.7
dmvr_hv_10_20x20_c: 1093.5
dmvr_hv_10_20x20_avx2: 91.7
dmvr_hv_12_12x20_c: 660.0
dmvr_hv_12_12x20_avx2: 58.7
dmvr_hv_12_20x12_c: 682.7
dmvr_hv_12_20x12_avx2: 72.0
dmvr_hv_12_20x20_c: 1094.0
dmvr_hv_12_20x20_avx2: 113.2
dmvr_v_8_12x20_c: 325.7
dmvr_v_8_12x20_avx2: 31.2
dmvr_v_8_20x12_c: 326.2
dmvr_v_8_20x12_avx2: 38.5
dmvr_v_8_20x20_c: 538.5
dmvr_v_8_20x20_avx2: 54.2
dmvr_v_10_12x20_c: 318.5
dmvr_v_10_12x20_avx2: 23.7
dmvr_v_10_20x12_c: 330.7
dmvr_v_10_20x12_avx2: 40.5
dmvr_v_10_20x20_c: 567.5
dmvr_v_10_20x20_avx2: 48.0
dmvr_v_12_12x20_c: 335.2
dmvr_v_12_12x20_avx2: 30.0
dmvr_v_12_20x12_c: 330.2
dmvr_v_12_20x12_avx2: 39.5
dmvr_v_12_20x20_c: 535.2
dmvr_v_12_20x20_avx2: 60.0
2024-08-15 20:19:45 +08:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
656a9664bf checkasm/riscv: preserve T1 whilst calling...
This preserves T1 whilst calling the instrumented function. In a Sci-Fi
setting where type-based Control Flow Integrity (CFI) is supported, the
calling code (i.e., the `checkasm` test case) will set T1 to the expected
value of the landing pad label (LPL) of the instrumented function.

The call wrapper will always use LPL zero which is a wild card. We should
preserve the value of T1 at least until the indirect call to the
instrumented function. Of course this is Sci-Fi, because:
1) there is no hardware (or even QEMU) support yet,
2) all our assembler functions currently use LPL zero anyway.

This uses T3 rather than T2 because indirect branches with T2 is reserved
for notionally direct calls made with an indirect call instruction (e.g.
due to GOT indirection), and are exempted from forward-edge CFI checks.
2024-08-01 18:44:01 +03:00
Rémi Denis-Courmont
8030876d1c checkasm/riscv: align the landing pads 2024-07-25 23:10:14 +03:00