-
-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I fixed every single bug in wm
#1334
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## next #1334 +/- ##
==========================================
- Coverage 53.90% 53.77% -0.13%
==========================================
Files 70 70
Lines 15165 15195 +30
==========================================
- Hits 8174 8171 -3
- Misses 6991 7024 +33
|
yshui
force-pushed
the
wm-fixes
branch
3 times, most recently
from
September 7, 2024 08:48
8858de1
to
b384889
Compare
To make it possible to stub them out. These are the only requests wm.c makes to the X server, we will be able to test the wm code on its own if we can provide mocked version of these functions. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
If a new window reuses the ID of a previously destroyed window, we would get confused and will not send a query tree request for this new window, thus misses all of its children. This is because we saved destroyed window as orphans, and we assume all orphans are either imported or in the process of being imported, which is not true for destroyed windows. With this commit we no longer save destroyed windows. We did that to gracefully handle a window being destroyed with a query tree request still in-flight. Now we handle it the more obvious way - we mark any in-flight query trees for omission when a window is destroyed. (Now, event if we don't save explicitly destroyed windows anymore, some orphans might get destroyed and have their ID reused without us knowing. This is because we don't get DestroyNotify for orphans. To catch this case, we pessimistically re-query orphans that don't have children.) Even if the window ID is reused after by a newly created window and the query tree actually succeeds with information from the new window, we can't use that result anyway, because at the point when the query tree is answered, we wouldn't have set up the event mask for the new window yet. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Before, if we got a reparent event for a previously unknown window, we will ignore that event because we expect this window will later be found in a query tree reply. This is not always true. If a previously unknown window is reparented to a window that is already fully imported, because we won't send query tree for a fully imported window, this new will just never be imported. We will now explicitly initiate the import process in this case. This also applies to new window reusing the ID of a previously destroyed window in our orphan tree. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
`node` can legitimately be NULL inside wm_handle_get_wm_state_reply. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Consider this scenario: 1. Window created with ID A, which generates a CreateNotify for A. 2. Window with ID A is destroyed. 3. Another window is created, reusing ID A. 4. We receive the CreateNotify for A. At step 4, we will be querying a different window than what we think we were querying. It might have a different parent from the one in the CreateNotify, and we should also ignore the destroy event in (2), as it wasn't for the window we are querying. To fix that, when we initiate the import process for a window, we no longer attach it to its declared parent, instead we orphan it. We also ignore any destroy event for it while the query tree is in-flight. We let the query tree reply event decide: 1. what's the window's parent, 2. whether the window still exists. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Turns out inferring a window's life cycle based on DestroyNotify sent by its parent is incredibly hard. Especially if the window is at any point reparented to a window we haven't imported yet, because we won't be receiving any DestroyNotify for it. And we also can't reliably know if such reparenting has happened. So we now set the StructureNotifyMask, which will give us DestroyNotify sent from the window itself. With this the tracking window life cycle from the point when the event mask is set becomes easy, and we always know if a window is destroyed or not. Setting StructureNotifyMask also means we will get copies of Circulate/Configure/Map/Unmap/ReparentNotify from the window itself, we filter this so existing code doesn't have to change. Only for DestroyNotify we favor the one sent from the window itself. Also, setting event masks on window MUST be done synchronously. We use to just send out a ChangeWindowAttributes request and ignore its result. But it is actually possible, though extremely unlikely, for the ChangeWindowAttributes to fail but the QueryTree later to succeed, if the window is destroyed and another created (during the time between we send these two requests) with the same ID. In that case we would thought we have set up the event mask, but actually didn't. Which would be detrimental. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Well, we tried to use query tree replies to attach windows to correct parents. But that's not good enough. Because the parent can also actively acquire children via reparent events. The result is windows fighting each other for control of their parent pointer. And this is also bad because in we don't know the stacking order of a window from its query tree reply. The conclusion is, a window must be fully responsible for updating its own children list. Because of ABA problem caused by window ID reuse, it's possible for a window to be attached to a wrong parent temporarily. This also means receiving a DestroyNotify from a window X about one of its children with ID Y, doesn't mean the window with ID Y is actually destroyed, if Y is currently attached to the wrong parent. From that DestroyNotify, the only thing we can deduce is that X lost a child with ID Y. Which means events must be handled in two steps: If we get a DestroyNotify from window X about its child Y, we disconnect Y from X. If we get a DestroyNotify from window Y for itself, only then can we really destroy Y. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
We could be getting reparent events for windows we already found out to be non-existent by trying to set its event mask. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
It is possible for a window in the process of being queried to have an incomplete children list, in which case wm_disconnect might see child/parent mismatches. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Orphan windows were once imported, so should have their event masks set, which means we should be getting DestroyNotify when they are destroyed. That should be enough to guarantee that orphans will be destroyed eventually, we don't need to reap them actively. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
The first thing we do when we get a new toplevel, is sending a GetWindowAttributes request, and based on its reply, we may allocate a managed window struct for it. Currently we find which tree node to attach this managed window struct by its X window ID only, but we should use the full treeid. When a new toplevel is added, we queue a creation tree change for it which contains a pointer to the tree node. If the tree node is destroyed before the creation tree change is dequeued, we must "cancel" that creation tree change, because otherwise the pointer in it would be dangling. This also means there will be no kill tree change for it either. As a consequence, we have no way to tell anybody about its zombie node if we create one. So we just don't. If we don't create a zombie node for a destroyed node, there will be nowhere to put the managed window struct attached to it if there is one. Which means there cannot be a one. Summing it all up - a toplevel cannot have a managed window struct, if its creation tree change was never dequeued. You can imagine the creation tree change is handing out a "key" to the tree node, and this tree node can only have managed window attached to it via this key. However, if we are getting the tree node via its X window ID only, we bypass this "key" mechanism, essentially we will find tree node with the same X ID even if we don't have the key for it yet. This happens if the window was destroyed, then another window took its ID while we were waiting for the GetWindowAttributes reply, we shouldn't start managing it. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
If a node is removed from toplevel then comes back, it will generate a new creation tree change. Because of the assumption explained in the previous commit, we can only attach managed window struct to it using the "key" from the creation tree change. But the node's key hasn't changed! Since it was toplevel, it will have a previous, older creation tree change which also handed out this key, which can be used to attach managed window to it, thus breaks the assumption. This means every time a node becomes a toplevel, it must have a different key. We bump its gen number when it's removed from toplevel to ensure that. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
We care about whether setting the event masks succeeded, we can't deduced that from the query tree request because of race conditions and window ID reuse. So we have to check if the request for setting the event masks itself succeeded or not. But `xcb_request_check` is out of order w.r.t. to other X events/replies. So essentially we are "looking into the future" with it, sensing that a window no longer exists before the usual events and replies we received from X would have told us so. This breaks assumptions we based many of our assertions on. For example, in many places when we can't find a window in our tree, we assume we must have an incomplete tree (i.e. based on the events/replies, there are windows on X's side we haven't queried). But it's actually possible that we looked into the future and saw it no longer exists. We have updated the async request mechanism so it can support requests that don't expect replies. This commit uses that to make setting event masks async, so it no longer uses `xcb_request_check` and is no longer out of order. Also updated how query tree requests are handled to make it work together with this change. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
If a toplevel loses toplevel status, its "client window" will no longer be updated. If the client window is then destroyed, it becomes a dangling pointer. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Previously we would just ignore these updates. But `wm_tree_find_client` does treat a toplevel with WM_STATE set as its own client window. So to keep things consistent, client window refresh is still needed in this case. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
This is already accounted for by wm_tree_find_toplevel_for, but in wm_tree_set_wm_state, there is a bit of copy-and-pasted code that didn't. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Since we know have split reparenting into 2 steps, the first of which will create an orphaned window, we now can have orphaned windows when there are no query trees in-flight. Update wm_is_consistent condition to account for that. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Pass tree node instead of window ID. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
See comments for proof why it's not reachable. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
.... or at least the ones AFL++/libFuzzer can find. Some of these are very tricky, and my brain hurts.
This all came from @Monsterovich reporting an assertion failure in wm. And I thought, the wm code is nicely isolated now, maybe I can fuzz it, see if there are more bugs... and boy are there more 😮💨
Unfortunately some of these changes are big, so I think this warrants an rc4.