Block splitter #4136

Cyan4973 · 2024-09-03T23:37:44Z

Instead of ingesting full blocks only (128 KB),
make an a-priori analysis of the data,
and infer a position to split a block at a more appropriate boundary.

This can notably happen in an archive scenario, at the boundary between 2 files of different nature within the archive.

This leads to some non-trivial compression gains, for a correspondingly acceptable speed cost.
The benefit is higher when there isn't already a post-splitter (like in higher btopt levels and above (16+)),
but even when a post-splitter is active, there is still some small compression ratio benefit, making this strategy desirable even for higher compression modes.

However, this input analysis is not free. Therefore, it's currently reserved for higher compression strategies (currently btlazy2 and above), where the speed cost is considered "negligible" (< 5%).
For other modes, the analysis is skipped, and replaced by a static split size, since it's no longer limited to 128 KB only. Through tests, it appears that a static 92 KB block size brings higher compression ratio, at a small-to-negligible compression speed loss (mostly due to increased nb of blocks, hence of block headers).

Here are some benchmarks, focusing on compression savings:

silesia.tar :

level	`dev`	this `PR`	savings	note
1	73422432	73324476	-97956	static block size
12	58213308	58149669	-63639	last static block size
13	57994603	57720597	-274006	first dynamic block size
15	57176243	56905822	-270421	last dynamic block size alone
16	55338671	55256977	-81694	combine pre+post splitting
19	52887208	52842629	-44579	idem
22	52322760	52284324	-38436	idem

calgary.tar :

level	`dev`	this `PR`	savings	note
1	1143607	1129229	-14378	new static block size
12	930831	924326	-6505	last static block size
13	922061	914326	-7735	first dynamic size
15	921606	912055	-9951	last dynamic size alone
16	882353	881217	-1136	combine pre+post splitting
19	861336	859737	-1599	idem
22	861200	859578	-1622	idem

Follow up :

Consider multiple variants, with incremental speed / accuracy trade-offs
Make the choice of splitting strategy selectable via compression parameter

Cyan4973 · 2024-09-04T00:35:50Z

mmmh,
this could be a problem:

test 45 : in-place decompression : Error : test (margin <= ZSTD_DECOMPRESSION_MARGIN(CNBuffSize, ZSTD_BLOCKSIZE_MAX)) failed

I suspect the problem is that the macro ZSTD_DECOMPRESSION_MARGIN() may make an assumption about block sizes being always full (128 KB), and derive an assumption about maximum expansion, which is no longer respected when block sizes are 92 KB by default (lower compression levels). In contrast, ZSTD_decompressionMargin() scans the compressed data, hence it probably ends up with a different result, and now both results differ.

I'm not completely sure what's the wanted behavior here...

edit: confirmed that, when I change the default block size to anything other than 128 KB, it breaks this test.
On the other hand, ZSTD_DECOMPRESSION_MARGIN() macro requires a parameter blockSize, and the test passes ZSTD_BLOCKSIZE_MAX, aka 128 KB, so it's pretty clear what it's expecting.
So the question is: how could this test be passed "reasonably", i.e. without inner knowledge of how the internal block splitting decision is done ?

instead of ingesting only full blocks, make an analysis of data, and infer where to split.

for better portability on Linux kernel

though I really wonder if this is a property worth maintaining.

Cyan4973 · 2024-10-17T21:22:33Z

Who knew adding a single source file (zstd_preSplit.c) would be such a big problem...

currently blocked trying to get the single-file library builder to work,
it doesn't include the new file, resulting in a link stage error.

and then each and every build system also requires updating its own list of files in its own format and location.

short term simplification

for easier local testing

Cyan4973 · 2024-10-17T23:14:19Z

Weird stuff :

error: undefined reference to '__mulodi4'

It only happens during compilation of the clang-asan-ubsan-fuzz32 test, aka undefined sanitizer enabled for 32-bit compilation on clang (same test with gcc compiles fine).

The failure seems to correspond to where * multiplication operations on S64 aka long long variables happen.

And of course, it happens all the time on github CI, but not on any other system I can test the same code and build rule with.

ideally, this workspace would be provided from the ZSTD_CCtx* state

for non 64-bit systems

let's fill the initial stats directly into target fingerprint

Cyan4973 · 2024-10-21T00:54:07Z

All tests passed, ready for review

Cyan4973 self-assigned this Sep 3, 2024

facebook-github-bot added the CLA Signed label Sep 3, 2024

Cyan4973 added 6 commits October 17, 2024 11:40

XP: add a pre-splitter

52ed013

instead of ingesting only full blocks, make an analysis of data, and infer where to split.

fixed strict C90 semantic

f714778

do not use new as variable name

2d4332e

use ZSTD_memset()

be659c8

for better portability on Linux kernel

minor C++-ism

9e3d270

though I really wonder if this is a property worth maintaining.

more ZSTD_memset() to apply

9f692af

Cyan4973 force-pushed the preSplit branch from 6d6d3db to 9f692af Compare October 17, 2024 18:40

Cyan4973 added 3 commits October 17, 2024 12:55

fix overlap write scenario in presence of incompressible data

4786800

fixed RLE detection test

befde12

fixed kernel build

fa49eec

Cyan4973 added 8 commits October 17, 2024 14:41

fixed single-library build

caf1afc

only split full blocks

99bfda8

short term simplification

fix assert

a142f2e

fixed c90 comment style

017d4f3

fixed zstreamtest

db26190

fixed meson build

b8606b4

new Makefile target mesonbuild

2f9282b

for easier local testing

fixed VS2010 solution

31bfcce

Cyan4973 force-pushed the preSplit branch from ed4f04c to 31bfcce Compare October 17, 2024 22:58

fixing minor formatting issue in 32-bit mode with logs enabled

cf349ce

Cyan4973 force-pushed the preSplit branch from 012b3b0 to cf349ce Compare October 17, 2024 23:29

Cyan4973 added 2 commits October 17, 2024 17:05

replaced uasan32 test by asan32 test

30eb9d0

ZSTD_splitBlock_4k() uses externally provided workspace

56d0b4b

ideally, this workspace would be provided from the ZSTD_CCtx* state

Cyan4973 force-pushed the preSplit branch from f79bf8c to 56d0b4b Compare October 18, 2024 06:03

fixed minor conversion warnings on Visual

85e98df

Cyan4973 and others added 3 commits October 18, 2024 11:20

fix alignment test

ab44689

for non 64-bit systems

splitter workspace is now provided by ZSTD_CCtx*

6be0ad2

fixed workspace alignment on non 64-bit systems

dff9bb2

Cyan4973 force-pushed the preSplit branch from a4c653c to 904fa69 Compare October 20, 2024 23:52

Cyan4973 marked this pull request as ready for review October 20, 2024 23:52

updated regression test results

e1c373f

Cyan4973 force-pushed the preSplit branch from 904fa69 to e1c373f Compare October 21, 2024 00:08

minor split optimization

e468136

let's fill the initial stats directly into target fingerprint

Cyan4973 changed the title ~~Experiment : pre-splitter~~ Block splitter Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block splitter #4136

Block splitter #4136

Cyan4973 commented Sep 3, 2024 •

edited

Loading

Cyan4973 commented Sep 4, 2024 •

edited

Loading

Cyan4973 commented Oct 17, 2024 •

edited

Loading

Cyan4973 commented Oct 17, 2024 •

edited

Loading

Cyan4973 commented Oct 21, 2024

Block splitter #4136

Are you sure you want to change the base?

Block splitter #4136

Conversation

Cyan4973 commented Sep 3, 2024 • edited Loading

Cyan4973 commented Sep 4, 2024 • edited Loading

Cyan4973 commented Oct 17, 2024 • edited Loading

Cyan4973 commented Oct 17, 2024 • edited Loading

Cyan4973 commented Oct 21, 2024

Cyan4973 commented Sep 3, 2024 •

edited

Loading

Cyan4973 commented Sep 4, 2024 •

edited

Loading

Cyan4973 commented Oct 17, 2024 •

edited

Loading

Cyan4973 commented Oct 17, 2024 •

edited

Loading