Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of HeightTree #371

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

New implementation of HeightTree #371

wants to merge 1 commit into from

Conversation

dgrunwald
Copy link
Member

The HeightTree is the data structure that maps between document line numbers and visual Y positions. It handles both the varying heights of lines (e.g. due to word-wrapping) and the collapsing of lines (e.g. due to folding).

Where the old implementation was based on a red-black tree, the new implementation is inspired by B+ trees.
We no longer need a separate memory allocation for every line in the document. This should dramatically reduce the memory consumption: a comment in the old code indicated that each line took 56 bytes of memory.
But that was in 32-bit days; in a 64-bit process the old implementation actually needed 88 bytes per line.

The new implementation should need 168 bytes per leaf node which holds 8-16 lines, i.e. we need 10.5-21 bytes per line for the leaf nodes.
It additionally needs 456 bytes per inner node (for the inner node and its children array). The level just above the leafs holds 64-256 lines per inner node, i.e. we need 1.8-7.2 bytes per line for the second level.
Thus the total memory consumption of the new tree should in the worst case still be below 29.2 bytes per line (168/8 + sum(456/(8**lvl) for lvl in range(2,100))).
However after freshly loading a large document from disk we should be close to the best case which in total is just 12.4 bytes per line (168/16 + sum(456/(16**lvl) for lvl in range(2,100))).

Apart from the reduced memory usage, the behavior of the new tree should be identical to the old behavior.
In particular, all operations (line insertion, line deletion, collapsing, uncollapsing, line<->Y queries) maintain their externally-visible behavior and their O(lg N) run-time.
set_DefaultLineHeight runs in O(N) now (previously could take O(N lg N)).

The HeightTree is the data structure that maps between document line numbers and visual Y positions. It handles both the varying heights of lines (e.g. due to word-wrapping) and the collapsing of lines (e.g. due to folding).

Where the old implementation was based on a red-black tree, the new implementation is inspired by B+ trees.
We no longer need a separate memory allocation for every line in the document. This should dramatically reduce the memory consumption: a comment in the old code indicated that each line took 56 bytes of memory.
But that was in 32-bit days; in a 64-bit process the old implementation actually needed 88 bytes per line.

The new implementation should need 168 bytes per leaf node which holds 8-16 lines, i.e. we need 10.5-21 bytes per line for the leaf nodes.
It additionally needs 456 bytes per inner node (for the inner node and its children array). The level just above the leafs holds 64-256 lines per inner node, i.e. we need 1.8-7.2 bytes per line for the second level.
Thus the total memory consumption of the new tree should in the worst case still be below 29.2 bytes per line (`168/8 + sum(456/(8**lvl) for lvl in range(2,100))`).
However after freshly loading a large document from disk we should be close to the best case which in total is just 12.4 bytes per line (`168/16 + sum(456/(16**lvl) for lvl in range(2,100))`).

Apart from the reduced memory usage, the behavior of the new tree should be identical to the old behavior.
In particular, all operations (line insertion, line deletion, collapsing, uncollapsing, line<->Y queries) maintain their externally-visible behavior and their O(lg N) run-time.
set_DefaultLineHeight runs in O(N) now (previously could take O(N lg N)).
@sopgenorth
Copy link

I saw the comment in mainline about changing the implementation, and was curious what an improvement might look like, particularly from my other interests in trying to add some sort of lazy file loading mechanism (just a dream right now, see: #382 )

On a 750MB test file this change reduces RAM gives a healthy reduction from ~7.5GB to ~5GB.

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants