Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit tests for CLUSTER NODES parsing #112

Merged
merged 3 commits into from
Oct 14, 2024

Conversation

bjosv
Copy link
Collaborator

@bjosv bjosv commented Oct 9, 2024

Includes OOM testing of replica parsing and fixes for memory issues when parsing fails.

Fixes #33

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Fix memory issues when the parsing of replicas fails.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1042,8 +1041,19 @@ static dict *parse_cluster_nodes(valkeyClusterContext *cc, valkeyReply *reply) {
error:
sdsfreesplitres(part, count_part);
sdsfreesplitres(slot_start_end, count_slot_start_end);
if (nodes_name != NULL) {
/* Only free parsed replicas since the `nodes` dict owns primary nodes. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little odd and maybe fragile.

What about the corner case primaries without any slots? Does the nodes dict own them too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd.
The nodes dict owns all valkeyClusterNodes after the parsing is complete, these are nodes with-or-without slots. This dict holds key=address, value=primary node, and each primary owns a list of replicas.
Deleting the nodes dict will delete all its objects, primaries and their known replicas, when there is an error while parsing.

But then there is the temporary nodes_name dict which holds references to all currently parsed cluster nodes.
The found issue is that this dict is the actual owner of an replica node, that has a primary, which is not yet parsed.
So, these are the replica nodes that needs to be deleted while the parsing fails.

I have a coming PR where the cluster nodes parsing is refactored.

@@ -170,14 +171,14 @@ void test_alloc_failure_handling(void) {

// Connect
{
for (int i = 0; i < 128; ++i) {
for (int i = 0; i < 148; ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the stack-allocated iterator uses fewer allocations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the config to also parse replicas in this testcase, This required a couple of more allocations to succeed.

@bjosv bjosv merged commit a97a74e into valkey-io:main Oct 14, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve tests of parsing CLUSTER NODES response
2 participants