Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp reassembly / sip-stream following buggy, misses messages #168

Closed
xhantu opened this issue May 14, 2020 · 7 comments
Closed

tcp reassembly / sip-stream following buggy, misses messages #168

xhantu opened this issue May 14, 2020 · 7 comments

Comments

@xhantu
Copy link
Contributor

xhantu commented May 14, 2020

Hi,

I noticed that captures of sip over tcp will sometimes miss messages, even with tcp assembly switched on.
When looking through the sources I noticed that the sip extraction from the tcp stream seems to be extremely simplified and relying on some assumptions that are not always true. I see the following problems:

If a sip/http/websocket message is detected, all data is consumed, not just the first message.
This assumes that the collected data contains exactly one sip message, nothing more.

A sip message is only detected if the collected data contains exactly one sip message.
This is only the case if a large enough pause between sending messages is used, so that reading from the stream will return after a message.

Both together can be problematic if the sender combines parts of multiple (provisional) messages into one packet or the goroutine scheduling does not schedule the stream reader after the message was finished and before the new one starts.
If the stream reading only once misses the border between messages, it will never detect a message again, but will collect everything in its collection buffer until the stream is closed.

Reading sip, http and websocket is mixed together, but changing between protocols in a stream is limited.

I propose the following changes:

  • check if collected data at least contains one sip/http/websocket message, not exactly one.
  • extract only the detected message from the buffer and retain the remainder for the next message.
  • if the start of the buffer does not look like the start of a sip/http/websocket message, skip to the next point that looks like the start of a sip message. This will resync if part of the stream is skipped for any reason.
  • remember what protocol is used. If it starts with sip it will always be sip. If it starts with http it may upgrade to websocket. If it is a websocket it will always be websocket.
  • skip over data that will be ignore anyway, e.g. http content when waiting for upgrade to websocket. For http only the headers are useful to determine where to find the next message start. It may be needed to follow http chunked encoding.
  • limit the amount of data that is collected. If unreasonable amounts are collected than drop some data and try to resync.
  • handle that capturing may start in the middle of a tcp stream, e.g. if heplify starts. If this is the case can be checked by examining the syn flag of the first packet of the captured stream.

I had started a try to do this with a state machine, but have currently no time to pursue this further.

@lmangani
Copy link
Member

Are you using af_packet or pcap sockets?

@xhantu
Copy link
Contributor Author

xhantu commented May 15, 2020

Currently af_packet is used.

@negbie
Copy link
Member

negbie commented May 15, 2020

@xhantu you have good eyes and a sharp mind ;) All of your points are correct! There are a LOT of shortcomings in the current implementation. I very quickly made it without heavy testing and there is even a race condition inside it currently. Besides that there are still some known issues inside the gopacket lib for tcpassembly aswell. It would take too long to explain them in depth but you can check it under the gopacket repo.
Your suggested changes would help and would be a good starting point but it would still need more work to detect some corner cases where some client's don't behave correctly.

If I will find some time I will check this but I don't think that it will be this year ;)

@xhantu
Copy link
Contributor Author

xhantu commented May 15, 2020

@negbie thanks for the praise :)
If you know some corner cases that should be considered feel free to describe them in this issue.
Maybe I will find some time too fix at least some of the problems. Working tcp capturing is one of the things I would like to have around end of the year. I do not need SIP over websockets, but as somebody probably uses that feature I would not want to break it more than it is already.

@negbie
Copy link
Member

negbie commented May 15, 2020

One corner case is that you need to make sure that RST packets close the stream. You can fix this by calling FlushAll more often but this is error prone and could create race conditions depending on the implementation. Check gopacket issue 425. Another corner case is around selective ACKs. Check gopacket issue 81. Some corner cases are around malfromed SIP messages itself. It's questionable to handle them but as you might know in SIP everything is possible :D

@negbie
Copy link
Member

negbie commented May 18, 2020

@xhantu I added PACKET_FANOUT_HASH with the DEFRAG flag to af_packet. Instead of David here: david415/HoneyBadger#64 I didn't do it because of performance. Performance with -fg and -fw set should be even a bit worse but it should make your life easier to deal with TCP reassembly.

With this each TCP flow is enqueued onto only one socket in the group. You can specify the count with the -fw flag and the group ID with the -fg flag.

The DEFRAG options is needed because when a TCP/UDP packet gets fragmented, only the first packet has the 'original' header in it. This breaks a 5-tuple hashing function, assuming the IP reassembly happens after the hashing. The reassembly is happening within the context of the decode thread, so the subsequent reassembled packets are going to go to a different thread than the first one and incrementing this counter.

There are some af_packet gotchas you should know about:
http://forums.trisul.org/d/27-af-packet-gotchas

@lmangani
Copy link
Member

Closing. Please reopen if anyone still are experiencing this issue with the latest available versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants