Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

冲突登录导致的集群任务报错 Cluster task error caused by conflicting logins #85

Open
helloworldtech1024 opened this issue Oct 17, 2022 · 6 comments

Comments

@helloworldtech1024
Copy link

helloworldtech1024 commented Oct 17, 2022

当我用同一个账号(例如zhangsan@qq.com/abc)在两个客户端(strophe.js)反复顶号冲突登录时,会引起集群报错。
这个报错不仅影响当前冲突的账号,还会导致这个节点瘫痪,使这个节点的全部消息收发异常。
此异常可以很容易复现,且毕现,错误信息为:
When I use the same account (e.g zhangsan@qq.com/abc) on two clients (strophe. js) repeatedly log in with conflicting, cluster errors will occur.
This error not only affects the current conflicting accounts, but also causes the openfire node to be paralyzed, causing all messages sent and received by the node to be abnormal.
This exception can be easily repeated, and is bound to occur,the error is:

2022.10.17 13:46:54 org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task within org.jivesoftware.util.SystemProperty@46d41f17 seconds
java.util.concurrent.TimeoutException: MemberCallableTaskOperation failed to complete within 30 SECONDS. Invocation{op=com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{serviceName='hz:impl:executorService', identityHash=1687865074, partitionId=-1, replicaIndex=0, callId=35822684, invocationTime=1665985584809 (2022-10-17 13:46:24.809), waitTimeout=-1, callTimeout=30000, name=openfire::cluster::executor}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=30000, firstInvocationTimeMs=1665985584840, firstInvocationTime='2022-10-17 13:46:24.840', lastHeartbeatMillis=1665985610034, lastHeartbeatTime='2022-10-17 13:46:50.034', target=[10.201.1.12]:5701, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}
  at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newTimeoutException(InvocationFuture.java:68) ~[hazelcast-3.12.5.jar!/:?]
  at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:202) ~[hazelcast-3.12.5.jar!/:?]
  at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88) ~[hazelcast-3.12.5.jar!/:?]
  at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:459) [hazelcast-2.5.0.jar!/:?]
  at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:736) [xmppserver-4.6.7.jar:4.6.7]
  at org.jivesoftware.openfire.plugin.session.RemoteSession.doSynchronousClusterTask(RemoteSession.java:194) [hazelcast-2.5.0.jar!/:?]
  at org.jivesoftware.openfire.plugin.session.RemoteSession.isClosed(RemoteSession.java:138) [hazelcast-2.5.0.jar!/:?]
  at org.jivesoftware.openfire.plugin.session.RemoteSessionTask.run(RemoteSessionTask.java:97) [hazelcast-2.5.0.jar!/:?]
  at org.jivesoftware.openfire.plugin.session.ClientSessionTask.run(ClientSessionTask.java:70) [hazelcast-2.5.0.jar!/:?]
  at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.0.jar!/:?]
  at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
  at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
  at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
  at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
  at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
  at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]
@guusdk
Copy link
Member

guusdk commented Oct 17, 2022

Thank you for your report. I have some additional questions:

Which version of Openfire are you using?

Which version of the Hazelcast plugin are you using?

Does this problem also occur with other clients? Does it occur with clients that do not use HTTP-Bind/BOSH or websockets?

What is the setting of the "Resource Policy" that you use (see screenshot)?
image

@helloworldtech1024
Copy link
Author

Which version of Openfire are you using?
【4.6.7】and【4.7.3】 both versions will occur.

Which version of the Hazelcast plugin are you using?
【2.5.0】and【2.6.0】corresponding to the above openfire version.

Does this problem also occur with other clients?
No other client has been tried, I'll try 【smack】 later.

Does it occur with clients that do not use HTTP-Bind/BOSH or websockets?
websockets

What is the setting of the "Resource Policy" that you use (see screenshot)?
【Always kick】 I set. It seems that errors still existing after changing the setting to 【Assign kick value = 5】. I can't remember clearly

@helloworldtech1024
Copy link
Author

Thank you for your report. I have some additional questions:

Which version of Openfire are you using?

Which version of the Hazelcast plugin are you using?

Does this problem also occur with other clients? Does it occur with clients that do not use HTTP-Bind/BOSH or websockets?

What is the setting of the "Resource Policy" that you use (see screenshot)? image

see above

@helloworldtech1024
Copy link
Author

Does it occur with clients that do not use HTTP-Bind/BOSH or websockets?
I use websockets, http://ip:7070/ws/

@guusdk
Copy link
Member

guusdk commented Oct 17, 2022

I am having trouble reproducing this problem. I have a cluster of three Openfire nodes. I am using strophe clients, that log in to two different cluster nodes at the same time, using the same username, password and resource. The last client to log in always seems to kick the previous client, which is intended. I do not see stack traces in the log file.

@helloworldtech1024
Copy link
Author

https://www.bilibili.com/video/BV1ee4y1m7kf/?vd_source=7df86661fecef0bfbae11b1b8d74bc9c
I recorded a video to reproduce this exception,.
at 00:09, full jid is 【c02023020@10.201.2.88/sdk】,everything works well now,
at 00:22, open a new tab in the browser, log in with a same full jid, js scripts conflict in two tabs. the full jid is conflict, strophe client disconnects, but my js scripts will drive strophe client to reconnect, so you can see a lot of WS in F12,
then at 01:37, the server exception appears, and now the current node is unavailable, and it affects the console at 01:49, and all messages sent and received by the node to be abnormal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants