enable home raft log store UT #515

JacksonYao287 · 2024-08-20T14:31:16Z

1 enable home_raft_log_store UT and add it to docker file and nightly runing
2 remove android directory in github CI in the stage of building cache(previous in create-and-test-package) ,so the space in github CI VM will be definitely freed for every pipeline, since one of CI pipeline does not have create-and-test-package .
3 increase the disk size to 2GB for UT
4 fix log dev flush bug. add a loop to create multiple logGroup to make sure all the expected logs flushed .
5 fix a bug in home_Raft_log_store#pack to make sure the available size of packing buffer is able to hold entry.size() and the length of this entry
6 use truncation instead of flush when home_raft_log_store#compact. this is only a in-memory change. the real truncation will be scheduled by resource manager. also create issue to handle the start_index case when recovery from a crash #530, wo do it in a separate PR.
7 add commit_config log so we can see when a config change is made.
8 fix a bug in handle_raft_event. we need to indentify whether a log entry should be appended to log store. if the req#lsn is not -1, it means this log has been localized and appended before, we should skip it.
9 add a force_leave api for now to handle the case that a follower and a leader has destroyed the raft goup , but the second follower fail to receive this message and will stuck. this is used for fixing raft_repl_dev UT. we can revisit here later if necessary.
10 fix the bug of raft_repl_dev UT when setting last_committed_idx in write_Snapshot_data in follower side. we should get the commited_index from raft_server itself. in our UT, commit_config is not taken account when increasing the commit_count.
11 fix bug bug of raft_repl_dev UT in read_snap_shot of leader. when we search the next start lsn which is the start of sending data. we can not use std::map::find , since if the requested next_lsn is a config_change , it will not be put into kvDB(lsn_index_) and as a result , std::map::find wil return std::end() and nothing will be sent to follower. instead , we should use low_bound, so that we can get the first data kv to be sent , config will be skipped.

codecov-commenter · 2024-08-20T15:02:15Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 52.00000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 68.21%. Comparing base (1a0cef8) to head (65f7777).
Report is 55 commits behind head on master.

Files with missing lines	Patch %	Lines
src/lib/logstore/log_dev.cpp	50.00%	6 Missing and 2 partials ⚠️
src/include/homestore/replication/repl_dev.h	0.00%	2 Missing ⚠️
src/lib/replication/repl_dev/raft_repl_dev.cpp	0.00%	0 Missing and 1 partial ⚠️
src/lib/replication/repl_dev/raft_repl_dev.h	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #515       +/-   ##
===========================================
+ Coverage   56.51%   68.21%   +11.70%     
===========================================
  Files         108      109        +1     
  Lines       10300    10433      +133     
  Branches     1402     1400        -2     
===========================================
+ Hits         5821     7117     +1296     
+ Misses       3894     2638     -1256     
- Partials      585      678       +93

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/lib/logstore/log_dev.cpp

JacksonYao287 · 2024-08-21T01:25:59Z

src/lib/replication/log_store/home_raft_log_store.cpp

-    }
-#endif
-
+    m_log_store->truncate(to_store_lsn(compact_lsn));


in compact here , we need to truncate , which will update the start_index. flush will not update start_index

This is done purposefully. We let resource manager do truncation @yamingk can confirm this.

then , then start_index will not be updated on time, which will probably be wrong to upper layer.
for the view of upper layer , if it compact to lsn n , then the start_index it can see after campact should be lsn + 1.
so , I believe we should do truncate here, and not depend on resource manager

This needs to be discussed with @yamingk

src/tests/test_common/homestore_test_common.hpp

JacksonYao287 · 2024-08-21T02:55:19Z

.github/workflows/build_dependencies.yml

@@ -165,6 +165,7 @@ jobs:
    - name: Build Cache
      run: |
        pre=$([[ "${{ inputs.build-type }}" != "Debug" ]] && echo "-o sisl:prerelease=${{ inputs.prerelease }}" || echo "")
+        sudo rm -rf $ANDROID_HOME


in the failure CI , Create and Test Package is not scheduled ,so sudo rm -rf /usr/local/lib/android/ will not be executed in that CI, which might cause a no-space left issue.

here , I add this to Build Cache phrase, which will be definitely scheduled in every CI pipeline

@sanebay

xiaoxichen

lgtm
3 issues are solved:

clean up the android home so we can enlarge disk size to 2GB
add for loop for flush, this make sense based on @JacksonYao287 's explanation and it is perfect we catch this in the re-enabled ut,
for compact() call we do compact and move the flush into compact to avoid flushing logs before the start_index

JacksonYao287 · 2024-08-21T10:58:24Z

src/lib/replication/log_store/home_raft_log_store.cpp

@@ -264,8 +264,8 @@ raft_buf_ptr_t HomeRaftLogStore::pack(ulong index, int32_t cnt) {
        [this, &out_buf, &remain_cnt]([[maybe_unused]] store_lsn_t cur, const log_buffer& entry) mutable -> bool {
            if (remain_cnt-- > 0) {
                size_t avail_size = out_buf->size() - out_buf->pos();
-                if (avail_size < entry.size()) {
-                    avail_size += std::max(out_buf->size() * 2, (size_t)entry.size());
+                if (avail_size < entry.size() + sizeof(uint32_t)) {


avail_size should be able to hold entry.size() and the length of this entry, which is a uint32

src/lib/logstore/log_dev.cpp

sanebay · 2024-08-22T23:27:44Z

src/lib/logstore/log_store.cpp

@@ -177,6 +177,7 @@ void HomeLogStore::on_log_found(logstore_seq_num_t seq_num, const logdev_key& ld

 void HomeLogStore::truncate(logstore_seq_num_t upto_lsn, bool in_memory_truncate_only) {
    if (upto_lsn < m_start_lsn) { return; }
+    flush();


why do we need a flush here ?

if the caller does not do a explicitly flush before truncating, then m_tail_lsn will not be updated and then truncating might not be able to truncate to the expected one.
add a flush here to make sure we do a flush before truncating. if flush is already scheduled before truncating, then the flush here will do nothing and just return.

sanebay · 2024-08-22T23:30:19Z

src/lib/replication/log_store/home_raft_log_store.cpp

-    }
-#endif
-
+    m_log_store->truncate(to_store_lsn(compact_lsn));


This is done purposefully. We let resource manager do truncation @yamingk can confirm this.

src/tests/test_common/homestore_test_common.hpp

sanebay · 2024-08-22T23:31:21Z

src/tests/test_raft_repl_dev.cpp

@@ -878,14 +878,6 @@ TEST_F(RaftReplDevTest, BaselineTest) {
    LOGINFO("Homestore replica={} setup completed", g_helper->replica_num());
    g_helper->sync_for_test_start();



We need this as we are not truncating when raft calls compact

let`s first make a decision on this topic
https://github.com/eBay/HomeStore/pull/515/files/c3e0ab3f3210c42a63b6b5f6a97386115f1126a9#r1724163097

then we can revisit this. if we need do a real truncate, then this can be removed

xiaoxichen · 2024-08-29T16:35:29Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

@@ -1050,7 +1050,13 @@ std::pair< bool, nuraft::cb_func::ReturnCode > RaftReplDev::handle_raft_event(nu
                if (entry->get_val_type() != nuraft::log_val_type::app_log) { continue; }
                if (entry->get_buf_ptr()->size() == 0) { continue; }
                auto req = m_state_machine->localize_journal_entry_prepare(*entry);
-                if (req == nullptr) {
+                // TODO :: we need to indentify whether this log entry should be appended to log store.


is #1 safe here ? as we dont have term in the rreq, how can we ensure this is not a re-written

we have term in rreq. every rreq is identified by {originator, term , dsn}

yamingk · 2024-08-29T16:59:58Z

src/lib/replication/log_store/home_raft_log_store.cpp

@@ -264,8 +264,8 @@ raft_buf_ptr_t HomeRaftLogStore::pack(ulong index, int32_t cnt) {
        [this, &out_buf, &remain_cnt]([[maybe_unused]] store_lsn_t cur, const log_buffer& entry) mutable -> bool {
            if (remain_cnt-- > 0) {
                size_t avail_size = out_buf->size() - out_buf->pos();
-                if (avail_size < entry.size()) {
-                    avail_size += std::max(out_buf->size() * 2, (size_t)entry.size());
+                if (avail_size < entry.size() + sizeof(uint32_t)) {


can you convert your comment to code comment also?

ack ,will do it in the later pr

src/lib/replication/log_store/home_raft_log_store.cpp

xiaoxichen · 2024-08-29T17:02:02Z

src/tests/test_raft_repl_dev.cpp

+                // destroyed for ever. we need handle this in raft_repl_dev. revisit here after making changes at
+                // raft_repl_dev side to hanle this case. this is a workaround to avoid the infinite loop for now.
+                if (i++ > 10 && !force_leave) {
+                    LOGWARN("Waiting for repl dev to get destroyed and it is leader, so do a force leave");


the log is not accurate as this will be run on every replica, not only leader.

yes , it a mistake, it can be run on any replica. will change this in the later PR

JacksonYao287 requested a review from sanebay August 20, 2024 14:31

xiaoxichen reviewed Aug 20, 2024

View reviewed changes

src/lib/logstore/log_dev.cpp Show resolved Hide resolved

JacksonYao287 commented Aug 21, 2024

View reviewed changes

src/tests/test_common/homestore_test_common.hpp Show resolved Hide resolved

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch from 9901d01 to b39ffdb Compare August 21, 2024 02:52

JacksonYao287 commented Aug 21, 2024

View reviewed changes

xiaoxichen previously approved these changes Aug 21, 2024

View reviewed changes

xiaoxichen dismissed their stale review via baad61f August 21, 2024 04:29

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch from baad61f to 9cac8e5 Compare August 21, 2024 06:27

JacksonYao287 commented Aug 21, 2024

View reviewed changes

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch from 9f02276 to c3e0ab3 Compare August 22, 2024 10:32

sanebay reviewed Aug 22, 2024

View reviewed changes

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch 3 times, most recently from a8f5fcb to 1f85dd3 Compare August 26, 2024 07:57

JacksonYao287 requested review from sanebay and xiaoxichen August 26, 2024 14:24

JacksonYao287 self-assigned this Aug 27, 2024

This was linked to issues Aug 27, 2024

enable home raft log store UT and fix failure #526

Closed

fix raft_repl_dev UT #527

Closed

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch 3 times, most recently from 65453a2 to 6b10cae Compare August 28, 2024 09:40

JacksonYao287 added 6 commits August 28, 2024 17:48

enable home raft log store UT

928deec

add comments

40becc7

address no-space left

ed8bd41

fix pack

8ae5676

fix repl dev ut bug

a5c0c3f

update conan version

65f7777

JacksonYao287 force-pushed the enable-home-raft-log-store-UT branch 2 times, most recently from 97dfa30 to 65f7777 Compare August 29, 2024 08:08

xiaoxichen reviewed Aug 29, 2024

View reviewed changes

yamingk reviewed Aug 29, 2024

View reviewed changes

src/lib/replication/log_store/home_raft_log_store.cpp Show resolved Hide resolved

xiaoxichen reviewed Aug 29, 2024

View reviewed changes

sanebay approved these changes Aug 30, 2024

View reviewed changes

JacksonYao287 merged commit fc1e7a2 into eBay:master Aug 30, 2024
43 of 44 checks passed

JacksonYao287 deleted the enable-home-raft-log-store-UT branch August 30, 2024 01:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable home raft log store UT #515

enable home raft log store UT #515

JacksonYao287 commented Aug 20, 2024 •

edited

Loading

codecov-commenter commented Aug 20, 2024 •

edited

Loading

JacksonYao287 Aug 21, 2024

sanebay Aug 22, 2024

JacksonYao287 Aug 23, 2024

sanebay Aug 27, 2024

JacksonYao287 Aug 21, 2024 •

edited

Loading

xiaoxichen left a comment

JacksonYao287 Aug 21, 2024

sanebay Aug 22, 2024

JacksonYao287 Aug 23, 2024 •

edited

Loading

sanebay Aug 22, 2024

sanebay Aug 22, 2024

JacksonYao287 Aug 23, 2024

xiaoxichen Aug 29, 2024

JacksonYao287 Aug 30, 2024

yamingk Aug 29, 2024

JacksonYao287 Aug 30, 2024

xiaoxichen Aug 29, 2024

JacksonYao287 Aug 30, 2024

		@@ -878,14 +878,6 @@ TEST_F(RaftReplDevTest, BaselineTest) {
		LOGINFO("Homestore replica={} setup completed", g_helper->replica_num());
		g_helper->sync_for_test_start();

enable home raft log store UT #515

enable home raft log store UT #515

Conversation

JacksonYao287 commented Aug 20, 2024 • edited Loading

codecov-commenter commented Aug 20, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

xiaoxichen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 commented Aug 20, 2024 •

edited

Loading

codecov-commenter commented Aug 20, 2024 •

edited

Loading

JacksonYao287 Aug 21, 2024 •

edited

Loading

JacksonYao287 Aug 23, 2024 •

edited

Loading