Skip to content
This repository has been archived by the owner on Aug 2, 2018. It is now read-only.

Tinq Desktop <-> RTI Connext interoperability #26

Open
vmayoral opened this issue Nov 20, 2014 · 13 comments
Open

Tinq Desktop <-> RTI Connext interoperability #26

vmayoral opened this issue Nov 20, 2014 · 13 comments

Comments

@vmayoral
Copy link
Member

When tested with RTI Connext it seems that discovery is not finishing properly (using the Desktop impl):

!!sdisca
Domain 0 (pid=174): {1}
    GUID prefix: 8f937eb8:00ae729f:03e90000
    RTPS Protocol version: v2.1
    Vendor Id: 1.14 - Technicolor, Inc. - Qeo
    Technicolor DDS version: 4.0-0, Forward: 0
    SecureTransport: none
    Authorisation: Authenticated
    Entity name: Technicolor Chatroom
    Flags: Enabled
    Meta Unicast: 
        UDP:172.23.1.215:7758(3) {MD,UC} H:3
    Meta Multicast: 
        UDP:239.255.0.1:7400(4) {MD,MC} H:4
    Default Unicast: 
        UDP:172.23.1.215:7759(1) {UD,UC} H:1
    Default Multicast: 
        UDP:239.255.0.1:7401(2) {UD,MC} H:2
    Manual Liveliness: 0
    Lease duration: 50.000000000s
    Endpoints: 10 entries (5 readers, 5 writers).
        000001-3, {22}, InlineQoS: No, Writer, imu/simple_msgs::dds_::Vector3_
        000002-4, {24}, InlineQoS: No, Reader, imu/simple_msgs::dds_::Vector3_
    Topics: 
        BuiltinParticipantMessageReader/ParticipantMessageData 
        BuiltinParticipantMessageWriter/ParticipantMessageData 
        SEDPbuiltinPublicationsReader/PublicationBuiltinTopicData 
        SEDPbuiltinPublicationsWriter/PublicationBuiltinTopicData 
        SEDPbuiltinSubscriptionsReader/SubscriptionBuiltinTopicData 
        SEDPbuiltinSubscriptionsWriter/SubscriptionBuiltinTopicData 
        SPDPbuiltinParticipantReader/ParticipantBuiltinTopicData 
        SPDPbuiltinParticipantWriter/ParticipantBuiltinTopicData 
        imu/simple_msgs::dds_::Vector3_ 
    Security: level=Unclassified, access=any, RTPS=clear
    Resend period: 10.000000000s
    Destination Locators: 
        UDP:239.255.0.1:7400(4) {MD,MC} H:4
        TCP:239.255.0.1:7400 {MD,MC}
    Discovered participants:
        Peer #0: {25} - Local activity: 9.07s
        GUID prefix: ac1701d7:00007297:00000001
        RTPS Protocol version: v2.1
        Vendor Id: 1.1 - Real-Time Innovations, Inc. - Connext DDS
        Meta Unicast: 
            UDPv6:2:7:207:::7410 {MD,UC}
            UDP:172.23.1.215:7410 {MD,UC}
        Meta Multicast: 
            UDP:239.255.0.1:7400(4) {MD,MC} H:4
        Default Unicast: 
            UDPv6:2:7:207:::7411 {UD,UC}
            UDP:172.23.1.215:7411 {UD,UC}
        Manual Liveliness: 0
        Lease duration: 100.000000000s
        Endpoints: 4 entries (2 readers, 2 writers).
        Topics:  <none>
        Source: 
            UDP:172.23.1.215:35149 {MD,UC}
        Timer = 90.30s

Seems like SPDP does its job but not SEDP.

@vmayoral vmayoral added the bug label Nov 20, 2014
@vmayoral vmayoral changed the title Tinq <-> RTI Connext interoperability Tinq Desktop <-> RTI Connext interoperability Nov 20, 2014
@jvoe
Copy link

jvoe commented Nov 25, 2014

@vmayoral Would it be possible to send a Wireshark trace of this communication problem?

@vmayoral
Copy link
Member Author

@jvoe, @GerardoPardo capture between RTI's connext and Tinq available here (RTI Connext subscriber, Tinq publisher).

Some remarks:

  • Order of events:
    • RTI's Connext subscriber is launched
    • Tinq's publisher launched
    • trace is left for some seconds
  • SEDP doesn't seem to happen (note that it does happen between Tinq and OpenSplice)
  • Tinq's DDS Debug Shell confirms that there's no information exchanged about the topics:
!!sdisca
Domain 0 (pid=223): {1}
    GUID prefix: 8f937eb8:00df0d4f:03e90000
    RTPS Protocol version: v2.1
    Vendor Id: 1.14 - Technicolor, Inc. - Qeo
    Technicolor DDS version: 4.0-0, Forward: 0
    SecureTransport: none
    Authorisation: Authenticated
    Entity name: Technicolor Chatroom
    Flags: Enabled
    Meta Unicast: 
        UDP:172.23.1.215:7856(3) {MD,UC} H:3
    Meta Multicast: 
        UDP:239.255.0.1:7400(4) {MD,MC} H:4
    Default Unicast: 
        UDP:172.23.1.215:7857(1) {UD,UC} H:1
    Default Multicast: 
        UDP:239.255.0.1:7401(2) {UD,MC} H:2
    Manual Liveliness: 0
    Lease duration: 50.000000000s
    Endpoints: 10 entries (5 readers, 5 writers).
        000001-3, {22}, InlineQoS: No, Writer, imu/simple_msgs::dds_::Vector3_
        000002-4, {24}, InlineQoS: No, Reader, imu/simple_msgs::dds_::Vector3_
    Topics: 
        BuiltinParticipantMessageReader/ParticipantMessageData 
        BuiltinParticipantMessageWriter/ParticipantMessageData 
        SEDPbuiltinPublicationsReader/PublicationBuiltinTopicData 
        SEDPbuiltinPublicationsWriter/PublicationBuiltinTopicData 
        SEDPbuiltinSubscriptionsReader/SubscriptionBuiltinTopicData 
        SEDPbuiltinSubscriptionsWriter/SubscriptionBuiltinTopicData 
        SPDPbuiltinParticipantReader/ParticipantBuiltinTopicData 
        SPDPbuiltinParticipantWriter/ParticipantBuiltinTopicData 
        imu/simple_msgs::dds_::Vector3_ 
    Security: level=Unclassified, access=any, RTPS=clear
    Resend period: 10.000000000s
    Destination Locators: 
        UDP:239.255.0.1:7400(4) {MD,MC} H:4
        TCP:239.255.0.1:7400 {MD,MC}
    Discovered participants:
        Peer #0: {25} - Local activity: 18.05s
        GUID prefix: ac1701d7:00000d46:00000001
        RTPS Protocol version: v2.1
        Vendor Id: 1.1 - Real-Time Innovations, Inc. - Connext DDS
        Meta Unicast: 
            UDPv6:2:7:207:::7410 {MD,UC}
            UDP:172.23.1.215:7410 {MD,UC}
        Meta Multicast: 
            UDP:239.255.0.1:7400(4) {MD,MC} H:4
        Default Unicast: 
            UDPv6:2:7:207:::7411 {UD,UC}
            UDP:172.23.1.215:7411 {UD,UC}
        Manual Liveliness: 0
        Lease duration: 100.000000000s
        Endpoints: 4 entries (2 readers, 2 writers).
        Topics:  <none>
        Source: 
            UDP:172.23.1.215:59433 {MD,UC}
        Timer = 81.51s

@jvoe
Copy link

jvoe commented Nov 25, 2014

@vmayoral It looks like you try to communicate between 2 different DDS host processes on the same machine, the 1st being RTI Connext, the 2nd the Tinq DDS.

Just a few thoughts ...

  1. Might RTI Connext try to connect via the in-memory transport, which Tinq doesn't use, since it's a proprietary transport? This might be one explanation why nothing is happening. A bit strange that Tinq DDS is not initiating AckNacks and Heartbeats though.
  2. Firewall is blocking some packets, maybe?
  3. The capture only shows the multicasts because you didn't capture on all interfaces? Which you should, since local host traffic typically uses the loopback interface.

Hope this helps?

@GerardoPardo
Copy link

Yes, if you want to run RTI Connext DDS in the same machine as another implementation you need to disable the RTI Connext DDS shared memory transport. Apologies for this, we should be smarter and detect this situation...

Disabling the shared memory transport can be done using the XML configuration o QoS (recommended so you do not touch application code) or programatically. This is controlled by the transport_builtin.mask in the DomainParticipantQos.

You can find examples of each of the two approaches here:
http://community.rti.com/comment/851#comment-851
http://community.rti.com/kb/why-doesnt-my-rti-connext-application-communicate-rti-connext-application-installed-windows

@vmayoral
Copy link
Member Author

@jvoe thanks for taking a look at the capture. You are right, my bad. Please find a new capture with all the interfaces enabled here.

@GerardoPardo thanks for your input however shared memory transport is already disabled (otherwise we would not be able to interoperate between PrismTech's OpenSplice and Connext in the same machine which we are doing). The issue should be somewhere else.

Thanks both for your support.

@jvoe
Copy link

jvoe commented Nov 26, 2014

@vmayoral It looks like RTI Connext is sending to the loopback address instead of to one of the announced Tinq DDS locators. This leads to the ICMP Destination Unreadable messages of course, since we only have sockets on the announced locators (see 'scx' output).

@GerardoPardo Any reason why RTI Connext is doing this? Using the loopback address as a source is normal, but I would expect that the SPDP announced locators would be used as the destinations.

@GerardoPardo
Copy link

@jvoe Yes I noticed the same thing. A few thoughts come to mind:

(1) We considered it would not make sense to announce 127.0.0.1 since this is not a routable IP address and only can be used if you are in he same host, which you can deduce from looking at the IP addresses. So we never expect to see it in the announced locators...

(2) Sending to localhost avoids the NIC hardware so is presumably more efficient than sending to the external IP address

(3) In case there are multiple NICS and multiple locators announced sending to just the one localhost is less work than to all the IPs.

So our UDP transport assumes that if it is enabled, then "localhost" is being listened to...
I can see why this is confusing, especially given it is not specified anywhere...

Do you see a problem in always listening to localhost?

BTW I can confirm that the "default to shared memory" issue is already fixed and our next product release will not exhibit this OOB interoperability annoyance.

@jvoe
Copy link

jvoe commented Nov 26, 2014

@GerardoPardo There are a few reasons why we don't listen to the loopback locator by default:

  • In order to have fine-grained control as to which interface is used for DDS traffic, we can enable any subset of source IP interfaces, both for sending (unicast and/or multicast source) as for receiving, so we don't have *: UDP receive sockets, which would automatically enable localhost. All our receive locators are completely bound.
  • In practice, the OS routing software is smart enough not to send destination == : traffic to the NIC. Actually, it wouldn't work if it would send it to the NIC. The NIC would never loop the traffic back anyway. So there is no difference in efficiency between sending to localhost or to a local IP address, except that you might hit different firewall rules.
  • For the case where there are multiple DDS locators for a single destination, as when there are multiple local IP addresses, Tinq DDS optimizes the DDS flow on reliable topics by 'learning' which interfaces are responsive and caching the fastest one locally, by looking at the source IP address in received packets. If changes in topology occur, leading to the cached locator not being useful anymore, this is detected and the cached reply locator is then flushed, after which all the addresses are again tried. This mechanism optimizes the data flows a lot, avoiding a lot of duplicated traffic in these cases. We really needed this mechanism (especially when wireless traffic is the dominant medium), since a lot of our test machines have a lot of local IP addresses (up to 12 in an extreme case). VirtualBox and VMware, for example, all contribute to this, and I'm sure a lot of customers would have the same problems if we didn't do this. The argument that the multicast IP destination should be used in this case isn't an option in a lot of use cases, such as over Wireless, since IP multicasts are extremely lossy in this case. That's also why we disable multicast by default for devices without a wired Ethernet (smartphones, for example). Multicast/broadcast bandwidth is severely limited on Wireless and tends to both drown out the SPDP traffic, leading to devices disappearing and coming back again, as well as causing extreme packet loss when used for data. So we only use multicast for SPDP over wireless, never for data. It's not that we can't use it (the locators are present after all), it just gives a much better behavior by avoiding it.
  • As to listening by default to localhost, I guess this could be done, if really necessary and there is no alternative, but it would require an additional locator, and we need to make this configurable. So possible, but requires some extra work for setting up, extra configuration parameters, etc.

@GerardoPardo
Copy link

@jvoe thank you for the detailed explanation. I see how what you are doing makes sense in your situation.

It was my understanding that the handling of the loopback vs external IP was OS-specific and while most desktop/server OSs may be smart and automatically avoid going to he NIC other embedded OSs would rely on having the correct configuration of the routing table and the actual path followed could be different. I have not looked at this in recent years so this information could be dated or even wrong...

The approach of being smart about with interface is "responsive" seems very neat but if I understood correctly it would only work for reliable traffic so the best-efforts one would still be sent multiple times? The "localhost" trick would work for best efforts as well. That said I like very much that in your approach is able to optimize the multi-NIC/IP traffic even when sending from a different computer which our "localhost" trick cannot handle...

I need to think tis more but it would seem that depending of how they internal middleware is architected this may not be so trivial to implement this cacheing. We process the ACKNACKs completely at a layer above the transport so when we receive it and correlate it to the HB we no longer remember how the ACKNACKs was received. In fact it would be legal for us to send a HB on one transport like UDP and receive the ACKS on a completely different transport. The RTPS layer is happy as long as an ACK is received and it does not care how...

So I think we need to answer two things:

(a) What is the best approach that we can follow quickly to avoid this type of Our-Of-the-box interoperability issue?

(b) Going forward what should the best way to handle this multi-nic/IP situation be and how to we get this into RTPS 2.3?
It would be nice to have something that would:
b1) Work for all kinds of traffic (reliable & best efforts)
b2) Work also when sending to a different machine. However be smart because if there are different physical networks the the application may want to send on the different paths for redundancy (this is something many of our customers rely on).

As far as (a), being biased here :), it would seem that by default you listened to localhost for incoming packets it would address the interoperability issue... For us it would be hard to using the actual IP addresses rather than localhost in the short term as we would need to implement something similar to the cacheing/learning you describe. We would rather do that in conjunction with (b)

Regarding (b) I think it would be very good to have you be a member of the RTPS 2.3 revision task force. Any chance you guys may join OMG? If not can work with you on the side, but it would be nice to have you influence our direction more directly... If we can close some of these issues by March 2015 when RTPS 2.3 comes out it would be great.

@ClarkTucker
Copy link

Just for an additional data point:
CoreDX DDS does listen on localhost, by default; specifically to support on-machine interop with RTI.
We do not write to localhost unless specifically configured to do so.

@jvoe
Copy link

jvoe commented Nov 26, 2014

@GerardoPardo Some implementations have a specific route entry for local IP addresses, specifying the interface, others don't. The first will force a loopback to occur in all cases, the second is not so clear whether this is detected before the packet is sent to the NIC.

So it is indeed a bit murky to be sure that we can always assume, in all cases, whether specifying a local non-loopback address is effective for looping back. On the other hand, I haven't seen an IP stack implementation yet that didn't handle this properly, efficient or not.

I did a bit of testing on both Linux and Windows to see if there is a difference in latency between a local address and the loopback address, but I don't see any difference. In fact, the latency variations are larger than the difference between the two addressing methods. The advantage of the loopback address is that it makes it possible to communicate before an actual IP address is assigned (via DHCP or manually), and that can indeed be important for bootstrap purposes.

I suppose that if we add a receive loopback destination socket that this would enable communication between the two DDS implementations :-) ...

The best approach might be to use it as a configurable implicit fallback locator in Tinq DDS, so that if something is received on it it would be handled as a normal valid receive and the source loopback address should then be handled as a valid source locator for reply purposes.

Use of the loopback address as a destination when there are still other locators for that destination should always be an implementation option, I think, but should be encouraged to handle cases when there are no IP addresses assigned yet.

Not every implementation will check if an IP address is really a local one. In fact, this is next to impossible for TCP locators where NAT is used and both local and remote might have the same (local) IP address.

If the data is received via the loopback IP or IPv6 receive socket, then you can really be sure that it is a local host. Alternatively, if received on a UDP or UDPv6 receive locator socket with the source IP identical to the destination IP address, you can also be sure. All other cases should be seen as non-local host receives though.

As to the notion of the source/reply IP locator in Tinq DDS, this is handled by requiring every transport subsystem to add the source locator as an extra argument when calling the rtps_receive() function. One of the first actions there is to store this locator in the RTPS receive context. Specific submessage receive functions will then use this data to update the reply locator when it is still empty.

The reply locator is kept in the proxy context. Note that an InfoReply has precedence, so that the InfoReply data will be used in preference to the source locator in the reply. If the reply locator is not set in the proxy, we send on all participant locators (meta or user). Whenever we detect that there is a communication problem, i.e. no reply after N HeartBeat transmissions, the reply locator is cleared.

This mechanism might still be useful for Best Effort connections, albeit somewhat less safe, by optionally registering either the SPDP or the Builtin Participant Message topics source locator IP addresses, converting the meta port numbers to user port numbers. Of course, strictly speaking, this shouldn't be done, since in theory there is no clean relationship between meta and user locators.

Regarding OMG membership, this is no longer an option, I'm afraid. Company politics have decided to go the Allseen/AllJoyn road for future consumer IoT strategies and are no longer interested in DDS based solutions.

There are various reasons why this happened, the main one wanting to be in a bigger group of companies, especially because that group is backed by companies like Qualcomm and Microsoft. We were almost alone in promoting a DDS-based IoT solution for consumer devices and didn't have enough backing for it, even though many people still think that we have/had a superior solution.

I wouldn't mind helping out personally, of course, but it would, by necessity, have to be outside of the scope of the OMG :-)

@jvoe
Copy link

jvoe commented Nov 26, 2014

@ClarkTucker Thanks for your input regarding the CDR encapsulation offset. I guess that means that all implementations have the same behavior now ... :-)

I'll add a loopback receive socket just as CoreDX DDS has (as explained in a previous post) for interop with RTI.

@vmayoral I'll let you know when I have something ready ..

@jvoe
Copy link

jvoe commented Dec 2, 2014

@vmayoral Using this loopback socket mechanism is not so simple to do, tbh. I managed to get something working on a device without IP addresses configured (all interfaces disabled) when only the Multicast destination addresses are used, but this is clearly not a nice solution.

Learning the reply locators doesn't seem to work in this case -- the combination of using send_udp locators for sending and separate receive locators for receive, as used on the embedded board currently precludes learning a correct source port. The send_udp locator uses a random source port (since no bind() is done as it is used for any destination, user or meta), which can't be correlated to any proper participant locator, since there are none!

The alternative, i.e. using the receive locators as sending sockets directly would lead to issues on NuttX, as the sockets can't be bidirectional there because they are used in different threads.

Another alternative which I haven't explored fully would be to have two sending UDPv4 locators (and two for UDPv6) per domain, one for user data and one for meta data. This might work, by binding them to the correct source port, but requires a lot more work to set things up, and I'm not sure whether that will really work. Since there are then 2 locators for the same port, i.e. 1 wildcard IP on the sending UDP, and 1 bound completely on the receive UDP port, this will clearly cause issues. It will be OS-specific if and how this would/could work. So I'm not eager to go in this direction, unless we abandon separate send/receive locators altogether, which would clearly be an issue for you.

Both requirements thus seem to exclude themselves, separate sockets for send/receive (NuttX select() limitation) and multiple DDS instances on the same host without IP addresses assigned.

Once a valid IP address is assigned, the correlation can be done, of course. If this is good enough, for you, I could send you a patch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants