Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occupant-id not stable in some cases #4236

Open
lovetox opened this issue Jun 23, 2024 · 13 comments
Open

occupant-id not stable in some cases #4236

lovetox opened this issue Jun 23, 2024 · 13 comments
Assignees

Comments

@lovetox
Copy link

lovetox commented Jun 23, 2024

Environment

  • ejabberd version: 24.02

Bug description

I have 2 different users in Gajim that reported a problem with occupant-id in 2 different MUCs on different servers.

When we investigated we found that the occupant-id in the presence is different from the occupant-id the server adds to their messages.

<message xmlns="jabber:client" to="en@chat.404.city" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
  <body>test</body>
  <origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
</message>

<message xmlns="jabber:client" xml:lang="en" to="myname@myserver.com/gajim.J39W82BR" from="en@chat.404.city/myname" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
  <archived xmlns="urn:xmpp:mam:tmp" by="en@chat.404.city" id="1719035916432908" />
  <stanza-id xmlns="urn:xmpp:sid:0" by="en@chat.404.city" id="1719035916432908" />
  <origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
  <occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgAozSRwPfZaleTLJ0hNM73x5BBMGnB2HfuA12Mm9MpNU=" />
  <body>test</body>
</message>

<presence xmlns="jabber:client" xml:lang="en" to="myname@myserver.com/gajim.J39W82BR" from="en@chat.404.city/myname" id="2aadd6b4-8c2a-4471-9b4f-3a098416a765">
  ...
  <occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgSp/6G+pFGi1tumup3r8yinT6Kk4yPZzoPGdFIzLrteI=" />
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="myname@myserver.com/gajim.J39W82BR" role="participant" affiliation="none" />
    <status code="110" />
</x>
  <show>xa</show>
</presence>

Multiple device where joined in the MUC, if that could be a issue.
We have no reproducible case for now, but it happens.

Could you please check the code, under which conditions this could happen?

@badlop
Copy link
Member

badlop commented Jun 24, 2024

I played trying to reproduce the problem, but did not get it.

Does the problem appear from time to time in your server, even if you are not able to reproduce it voluntarily on demand?

In that case, you could apply a small patch to the modue that generates an ID that we can later decrypt, to determine what is the difference between the two ones

diff --git a/src/mod_muc_occupantid.erl b/src/mod_muc_occupantid.erl
index fabd69e28..42449e563 100644
--- a/src/mod_muc_occupantid.erl
+++ b/src/mod_muc_occupantid.erl
@@ -74,8 +74,8 @@ add_occupantid_packet(Packet, RoomJid) ->
     xmpp:set_subtag(Packet, OccupantElement).
 
 calculate_occupantid(From, RoomJid) ->
-    Term = {jid:remove_resource(From), get_salt(RoomJid)},
-    misc:term_to_base64(crypto:hash(sha256, io_lib:format("~p", [Term]))).
+    Term = {jid:remove_resource(From), RoomJid, get_salt(RoomJid)},
+    misc:term_to_base64(io_lib:format("~p", [Term])).
 
 %%%
 %%% Table storing rooms' salt

Then, the ID is a very long string like:

  <occupant-id id='g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo='
	xmlns='urn:xmpp:occupant-id:0'/>

That can be easily decrypted to know what user, what room and what room salt were used:

misc:base64_to_term("g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo=").

{term,[[123,
        [[123,
          ["jid",44,
           [60,60,"\"user2\"",62,62],
           44,
           [60,60,"\"localhost\"",62,62],
           44,"<<>>",44,
           [60,60,"\"user2\"",62,62],
           44,
           [60,60,"\"localhost\"",62,62],
           44,"<<>>"],
          125],
         44,10," ",
         [123,
          ["jid",44,
           [60,60,"\"sala1\"",62,62],
           44,
           [60,60,"\"conference.localhost\"",62,62],
           44,"<<>>",44,
           [60,60,"\"sala1\"",62,62],
           44,10,
           [[32,"  ",32,32],32],
           [60,60,"\"conference.localhost\"",62|...],
           44,"<<>>"],
          125],
         44,10," ",
         [60,60,"\"15958543455282770612\"",62,62]],
        125]]}

If you compile ejabberd from source code, simply apply the patch, recompile, install the file and restart the module or restart the whole ejabberd.

If you cannot compile ejabberd yourself, ping me and I'll generate an updated module *.beam file so you can install it.

@badlop badlop self-assigned this Jun 24, 2024
@prefiks
Copy link
Member

prefiks commented Jun 24, 2024

It could be that room was recreated between those two stanzas, that would change salt (which i think is expected). Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.

Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?

@badlop
Copy link
Member

badlop commented Jun 24, 2024

Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?

Yes, the reasoning was something like this:

  • The occupant id must be stable (the same id for the same bare user + room) https://xmpp.org/extensions/xep-0421.html#id-generation
  • room salt is a random string (cannot be reproduced, so it must be stored persistently as long as the room exists)
  • in mod_muc, the room state is lost when mod_muc stops, so we must use a persistent storage
  • the first design was to store the room salt in the mod_muc_room #config record. But this salt is used only by mod_muc_occupantid... keeping the data in a specific table managed by mod_muc_occupantid seems reasonable.

It could be that room was recreated between those two stanzas, that would change salt (which i think is expected).

Right, that could be the case here, and it makes sense that the occupantid changes, as the room is different (may be from a different creator, owner, ... both rooms just have the same roomname).

If that's what happened in this case, it will be revealed when there are new logs that have occupantid we can unencrypt to determine their source information.

Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.

Aha, that could be the case here too, and in that case the module should be improved.

Looking at the code, a room salt is only removed when the room is removed (and the remove_room hook is called):

  • close room if temporary and empty
  • change config: remove persistence option when mnesia db
  • destroy room
  • restoring hibernated non-persistent room when mnesia db

Maybe the room is non-persistent, using mnesia storage, got hibernated and the module deleted its salt when got unhibernated. I'll check and update this comment.

@prefiks
Copy link
Member

prefiks commented Jun 24, 2024

Another thing related to hibernation even if hibernation doesn't reset salt (and i think we call remove_room hook only when room is truly removed, and not just hibernated), could be restoring room on different node, i don't think that mnesia table is synced between nodes, so if room was restored on different node, which didn't have it, it will make new one.

But i think it most probably room that is not set to persistent, and it just get deleted when everyone leaves and then get recreated.

@badlop
Copy link
Member

badlop commented Jun 24, 2024

A temporary (that is, not-persistent) room is deleted when the last occupant leaves and it has no subscribers.

I was able to reproduce the problem:

  • A temporary room looses the last occupant but still has some subscriber: it doesn't get destroyed.
  • After an inactivity as long as hibernation_time, it gets hibernated (erlang process gets destroyed, removed from muc_online_room, but room config is kept in muc_room).
  • Later when a user joins the room, the room gets restored from hibernation, an erlang process is started, but there is code in mod_muc that calls forget_room, which means that the room salt will be newly created.

I wonder what was the requirement to forget_room when restoring a temporary room.

Removing it solves this particular problem:

diff --git a/src/mod_muc.erl b/src/mod_muc.erl
index e297d866f..3184fe90a 100644
--- a/src/mod_muc.erl
+++ b/src/mod_muc.erl
@@ -903,14 +903,7 @@ load_room(RMod, Host, ServerHost, Room, ResetHibernationTime) ->
 		    Res2;
 		_ ->
 		    ?DEBUG("Restore hibernated non-persistent room: ~ts", [Room]),
-		    Res = start_room(RMod, Host, ServerHost, Room, Opts0),
-		    case erlang:function_exported(Mod, get_subscribed_rooms, 3) of
-			true ->
-			    ok;
-			_ ->
-			    forget_room(ServerHost, Host, Room)
-		    end,
-		    Res
+		    start_room(RMod, Host, ServerHost, Room, Opts0)
 	    end
     end.
 

@lovetox
Copy link
Author

lovetox commented Jun 24, 2024

Hi,

i think its a fair assumption that a salt is regenerated after a room was destroyed.

the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.

A bit more info, we could see the problem happening consistently, as in messages always had a different id than the presence. It was not a case of send a message on one day, and check the presence on the next day.

so both occupant-ids where present at the same time, not a case of occupant-id changed because of an event and from that moment on it was different everywhere.

I try to get more logs from the user, maybe we can see what the occupant-id was over a longer period of time.

@prefiks
Copy link
Member

prefiks commented Jun 24, 2024

@badlop Ah, missed that forget room call. That call was there to delete entry from db, as we will not delete this otherwise, we probably should add function that just delete room from db, and have other that remove from db and calls that hook.

@lovetox Do you know if server where this is deployed use clustering or is that just single computer?

@lovetox
Copy link
Author

lovetox commented Jun 28, 2024

its 404.city, we try to reach the admin, but until now no success

@lovetox
Copy link
Author

lovetox commented Jun 28, 2024

EDIT: Sorry got this wrong the first time.

Seems like @prefiks found the reason

we are storing those in mnesia table, not sure if it's set to be stored on disk
yes, table that is used by it only keeps that in memory
yeah, i will see if we can change that

@lovetox
Copy link
Author

lovetox commented Oct 18, 2024

This still happening, could it be a problem with mutli devices joined in the same MUC from the same user?

@UntrimmedParasail
Copy link

UntrimmedParasail commented Oct 21, 2024

the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.

I've had this issue occur occasionally for a while and I do have two devices I connect with. But I just met someone else who has the same issue and apparently they only have one device that is connecting to the account they have the issue with.

@UntrimmedParasail
Copy link

For reference, this is what it looks like on client side:

91a8a04f-2f86-4487-8d1a-b4aaf35302cb_df14a6ef4e629d45f42cc44125d5340957035440

@UntrimmedParasail
Copy link

UntrimmedParasail commented Oct 21, 2024

Looks like we can reproduce as well (this was in the DivestOS Mobile chat - xmpp:divestos-mobile@conference.konvers.me?join):

This happened when sending messages after reconnecting after the server restarted (we were connected before server restarted)

1dd5c65e-dbf6-4b9f-8fa4-e03348208d2f_b4009d82e4a4ee71322ab6661810d6b04e43255d

The person I spoke with also experienced the same issue as well after sending a message, so probably not to do with multiple clients but just the server restarting.

In case this information is helpful, the server the account I am connecting is from is conversations.im

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants