Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: config write guard #2360

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

schlimmchen
Copy link
Contributor

I was trying to tackle application of changed Dynamic Power Limiter settings in my feature branch implementing support for multiple inverters managed by the DPL. Then I noticed, that a bunch of nasty problem would go away if I knew that I could trust the config being consistent, i.e., if the config was only written synchronously with the main loop.

The latter ist currently not true: The config is usually manipulated in the context of the async webserver task, while it is read in the context of the main loop.

I wanted to keep the overhead when reading the config to a minimum, so locking a mutex each time the config shall be read is out of the picture, IMHO.

Writing the config happens rather seldom. So it should be okay if that takes a couple of milliseconds longer -- in the worst case.

My approach: Writing the config is only possible when acquiring a WriteGuard, which locks the "config write mutex" and waits for the signal (from the main loop) to start writing. The signal is needed to make writing the config synchronous with the main loop, and the mutex is needed to use the condition_variable but also to lock multiple writers from each other (which is less of a concern, to be honest).

I like this approach, because I like RAII. However, I am not sure if this is the best approach. Another idea would be to enqueue a callback which is then executed within the context of the main loop. We would need to copy and bind to that callback a significant amount of data, which seems off-putting. Also, access to the queue of callbacks then also need to be locked by a mutex.

@tbnobody I would like to know your opinion on this. Do you like this approach?

@LennartF22 We talked about this briefly. What are your thoughts on this approach?

@LennartF22
Copy link
Contributor

@schlimmchen I have not looked at the changes in detail yet, but it's definitely an improvement.

There are more synchronization issue between the WebApi and the main loop though, like here (if I'm not mistaken):

auto inv = Hoymiles.addInverter(inverter->Name, inverter->Serial);

@tbnobody
Copy link
Owner

Looks great! I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock). Currently I cannot imagine any use case.

@schlimmchen
Copy link
Contributor Author

Thanks for your feedback!

I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock).

Yes. I saw that, too. If one would inadvertently try to do this, the deadlock would happend when testing the change for the first time, and the config would not be saved at all, so it should be noticed early. And if we need to do that, we can adjust the WriteGuard c'tor to see whether the current task it the main loop task and then only lock the mutex, not waiting for the signal.

Let me polish and double-check this, then I will un-draft this PR (next week).

@schlimmchen
Copy link
Contributor Author

@tbnobody are you passionate about

auto const& config = Configuration.get();

versus

const auto& config = Configuration.get();

or whether it's consistent across the code base?

the configuration write guard is now required when the configuration
struct shall be mutated. the write guards locks multiple writers against
each other and also, more importantly, makes the writes synchronous to
the main loop. all code running in the main loop can now be sure that
(1) reads from the configuration struct are non-preemtive and (2) the
configuration struct as a whole is in a consistent state when reading
from it.

NOTE that acquiring a write guard from within the main loop's task will
immediately cause a deadlock and the watchdog will trigger a reset. if
writing from inside the main loop should ever become necessary, the
write guard must be updated to only lock the mutex but not wait for a
signal.
@schlimmchen
Copy link
Contributor Author

I am still happy with this. I double checked that generating a DTU serial works as expected using an erased ESP32:

[2024-10-22 21:27:07.684] rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[2024-10-22 21:27:07.690] configsip: 0, SPIWP:0xee
[2024-10-22 21:27:07.690] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2024-10-22 21:27:07.695] mode:DIO, clock div:2
[2024-10-22 21:27:07.701] load:0x3fff0030,len:1184
[2024-10-22 21:27:07.701] load:0x40078000,len:13232
[2024-10-22 21:27:07.707] load:0x40080400,len:3028
[2024-10-22 21:27:07.707] entry 0x400805e4
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dum���ѥѥ���found!
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dump partition found!
[2024-10-22 21:27:08.398] 
[2024-10-22 21:27:08.398] Starting OpenDTU
[2024-10-22 21:27:08.398] Initialize FS... E (13) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.409] 
[2024-10-22 21:27:08.409] E (20) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.415] E (23) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.415] [    37][E][LittleFS.cpp:98] begin(): Mounting LittleFS failed! Error: -1
[2024-10-22 21:27:08.426] failed... trying to format...E (46) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.437] 
[2024-10-22 21:27:08.437] E (47) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.437] E (51) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.475] failedReading configuration... [   105][E][vfs_api.cpp:105] open(): /littlefs/config.json does not exist, no permits for creation
[2024-10-22 21:27:08.486] Failed to read file, using default configuration
[2024-10-22 21:27:08.486] Check for default DTU serial... generate serial based on ESP chip id: 199980103760... done
[2024-10-22 21:27:08.754] done
[2024-10-22 21:27:08.754] Reading PinMapping... [   384][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
[2024-10-22 21:27:08.767] using default config done
[2024-10-22 21:27:08.767] Initialize Network... done
[2024-10-22 21:27:09.039] Setting Hostname... Initialize NTP... done
[2024-10-22 21:27:09.039] Initialize SunPosition... done
[2024-10-22 21:27:09.044] Initialize MqTT... done
[2024-10-22 21:27:09.044] Initialize WebApi... done
[2024-10-22 21:27:09.054] Initialize Display... done
[2024-10-22 21:27:09.054] Initialize LEDs... done
[2024-10-22 21:27:09.054] Initialize Hoymiles interface... Invalid pin config
[2024-10-22 21:27:09.060] Switch to WiFi mode
[2024-10-22 21:27:09.064] Setting Hostname... done
[2024-10-22 21:27:39.403] Disable search for AP... done

@schlimmchen schlimmchen marked this pull request as ready for review October 22, 2024 19:29
bool read();
bool write();
void migrate();
CONFIG_T& get();
CONFIG_T const& get();
Copy link

@mathieucarbou mathieucarbou Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this getter could become problematic: how it ensures that reads going through it will be protected by the mutex ?
What if another task / core modifies the config while another one calls this method to access a value ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants