Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXB-3113 : Improve debug sync framework to allow PXB to pause and res… #1548

Closed
wants to merge 1 commit into from

Conversation

satya-bodapati
Copy link
Contributor

@satya-bodapati satya-bodapati commented Apr 3, 2024

…ume threads

https://perconadev.atlassian.net/browse/PXB-3113

The current debug-sync option in PXB completely suspends PXB process and user can resume by sending SIGCONT signal
This is useful for scenarios where PXB is paused and do certain operations on server and then resume PXB to complete.

But many bugs we found during testing, involves multiple threads in PXB. The goal of this work is to be able to
pause and resume the thread.

Since many tests use the existing debug-sync option, I dont want to disturb these tests. We can convert them to
the new mechanism later.

How to use?
-----------
The new mechanism is used with option --debug-sync-thread="sync_point_name"

In the code place a debug_sync_thread(“debug_point_1”) to stop thread at this place.

You can pass the debug_sync point via commandline --debug-sync-thread=”debug_sync_point1”

PXB will create a file of the debug_sync point name in the backup directory. It is suffixed with a threadnumber.
Please ensure that no two debug_sync points use same name (it doesn’t make sense to have two sync points with same name)

```
2024-03-28T15:58:23.310386-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_before_file_copy_4860396430306702017
```
In the test, after activating syncpoint, you can use wait_for_debug_sync_thread_point <syncpoint_name>

Do some stuff now. This thread is sleeping.

Once you are done, and if you want the thread to resume, you can do so by deleting the file 'rm backup_dir/sync_point_name_*`
Please use resume_debug_sync_thread_point <syncpoint_name> <backup_dir>. It dletes the syncpoint file and additionally checks that syncpoint is
indeed resumed.

More common/complicated scenario:
----------------------------------

The scenario is to signal another thread to stop after reaching the first sync point. To achieve this. Do steps 1 to 3 (above)

Echo the debug_sync point name into a file named “xb_debug_sync_thread”. Example:

4. echo "xtrabackup_copy_logfile_pause" > backup/xb_debug_sync_thread

5. send SIGUSR1 signal to PXB process. kill -SIGUSR1 496102

6. Wait for syncpoint to be reached. wait_for_debug_sync_thread <syncpoint_name>

PXB acknowledges it
2024-03-28T16:05:07.849926-00:00 0 [Note] [MY-011825] [Xtrabackup] SIGUSR1 received. Reading debug_sync point from xb_debug_sync_thread file in backup directory
2024-03-28T16:05:07.850004-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: Deleting  file/home/satya/WORK/pxb/bld/backup//xb_debug_sync_thread

and then prints this once the sync point is reached.
2024-03-28T16:05:08.508830-00:00 1 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_xtrabackup_copy_logfile_pause_10389933572825668634

At this point, we have two threads sleeping at two sync points. Either of them can be resumed by deleting the filenames mentioned in the error log.
(Or use resume_debug_sync_thread())

@satya-bodapati satya-bodapati self-assigned this Apr 3, 2024
…ume threads

https://perconadev.atlassian.net/browse/PXB-3113

The current debug-sync option in PXB completely suspends PXB process and user can resume by sending SIGCONT signal
This is useful for scenarios where PXB is paused and do certain operations on server and then resume PXB to complete.

But many bugs we found during testing, involves multiple threads in PXB. The goal of this work is to be able to
pause and resume the thread.

Since many tests use the existing debug-sync option, I dont want to disturb these tests. We can convert them to
the new mechanism later.

How to use?
-----------
The new mechanism is used with option --debug-sync-thread="sync_point_name"

In the code place a debug_sync_thread(“debug_point_1”) to stop thread at this place.

You can pass the debug_sync point via commandline --debug-sync-thread=”debug_sync_point1”

PXB will create a file of the debug_sync point name in the backup directory. It is suffixed with a threadnumber.
Please ensure that no two debug_sync points use same name (it doesn’t make sense to have two sync points with same name)

```
2024-03-28T15:58:23.310386-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_before_file_copy_4860396430306702017
```
In the test, after activating syncpoint, you can use wait_for_debug_sync_thread_point <syncpoint_name>

Do some stuff now. This thread is sleeping.

Once you are done, and if you want the thread to resume, you can do so by deleting the file 'rm backup_dir/sync_point_name_*`
Please use resume_debug_sync_thread_point <syncpoint_name> <backup_dir>. It dletes the syncpoint file and additionally checks that syncpoint is
indeed resumed.

More common/complicated scenario:
----------------------------------
The scenario is to signal another thread to stop after reaching the first sync point. To achieve this. Do steps 1 to 3 (above)

Echo the debug_sync point name into a file named “xb_debug_sync_thread”. Example:

4. echo "xtrabackup_copy_logfile_pause" > backup/xb_debug_sync_thread

5. send SIGUSR1 signal to PXB process. kill -SIGUSR1 496102

6. Wait for syncpoint to be reached. wait_for_debug_sync_thread <syncpoint_name>

PXB acknowledges it
2024-03-28T16:05:07.849926-00:00 0 [Note] [MY-011825] [Xtrabackup] SIGUSR1 received. Reading debug_sync point from xb_debug_sync_thread file in backup directory
2024-03-28T16:05:07.850004-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: Deleting  file/home/satya/WORK/pxb/bld/backup//xb_debug_sync_thread

and then prints this once the sync point is reached.
2024-03-28T16:05:08.508830-00:00 1 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_xtrabackup_copy_logfile_pause_10389933572825668634

At this point, we have two threads sleeping at two sync points. Either of them can be resumed by deleting the filenames mentioned in the error log.
(Or use resume_debug_sync_thread())
@satya-bodapati
Copy link
Contributor Author

This will be merged along with a feature. Closing this as it is already merged there

@satya-bodapati
Copy link
Contributor Author

Fixed by commit 00ecb25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant