FreeBSD exhibits similar behavior to OSX for socket timeouts #14

notsonic · 2022-11-13T20:23:11Z

I just set up the tools as a web server in a jail on my TrueNAS server, which is freebsd.

My cabinets were rapidly rebooting when trying to send games. I found this PR (#10) and swapped out "darwin" for "freebsd12" and all is well. I see that there's an env var to trigger the behavior as well.

Maybe raising an issue isn't entirely necessary since I don't really have an issue, I figured it might be worthwhile to have this in the repos history if someone happens to come across it themselves.

DragonMinded · 2022-11-14T16:34:07Z

Maybe we should just make that check for either? Kinda a pain, because I want it to quickly figure out when a game has been turned off remotely, which means you need a timeout, but BSD/Darwin breaks that. Could also move to a dedicated thread with no timeout and a monitor that nukes it when there isn't motion for awhile? Dunno. The original script I upgraded worked "better" because it didn't try to be fully in control of the process, but then you lose the ability to treat the game like a kiosk and ensure it is running.

notsonic · 2022-11-14T17:00:02Z

I don't fully understand the implications of the different timeout code paths to be honest.
The behavior for my cabinets seems fine. If i turn them on, the game boots. If I change games while its running, they receive the new game. The cabinet status is accurately reflected the whole time (maybe with a bit of delay). Is there some behavior lost without the fast timeout?

DragonMinded · 2022-11-14T23:20:20Z

The idea behind setting a timeout was so that a stalled connection due to a device going offline mid-send could be detected. On some systems, sockets hang forever in that state, and that means you never return control out of the send or recv call. I could experiment with killing the timeout altogether (like the old system had) and seeing if it didn't behave correctly at least on Linux. I think that might fix things across the board, but it might also have the side effect of getting the state machine stuck.

notsonic · 2022-11-15T00:12:26Z

I turned one of my cabinets off while it was loading (again I'm using the server set up) and I could see that the status hung at the same percentage in the web interface. After turning it back on again, it restarted from 0% and seems to have transferred the game successfully. I don't know if this means there's a dangling thread from the previous boot.

Is the 1 or 10 second timeout maybe just too aggressive? I'll try out some different values and report back.

DragonMinded · 2022-11-15T00:18:19Z

That's exactly the issue that the timeout was attempting to fix. I didn't want it hung forever (basically until the next time the cab was powered) sitting at the hung percentage. I wanted the state machine to be able to go back to "waiting for cabinet power".

1 second timeout is FAR FAR too aggressive. Are you netbooting a chihiro/triforce? Try a larger timeout. 10 seconds seems fine for naomi.

notsonic · 2022-11-15T00:48:22Z

Hey, sorry it's a Naomi. The 1 second timeout I was referring to is this one here: https://github.com/DragonMinded/netboot/blob/trunk/netdimm/netdimm.py#L341

I've been messing with these 3 lines of code but I haven't really used the sockets lib before. Would you be able to explain them (341-343)?

Changing the timeout values doesn't seem to do anything. It really seems like the major difference is setting it to blocking.

I noticed in the docs that settimeout changed with 3.7, I'm on 3.9. Is this relevant?

Changed in version 3.7: The method no longer toggles [SOCK_NONBLOCK](https://docs.python.org/3/library/socket.html#socket.SOCK_NONBLOCK) flag on [socket.type](https://docs.python.org/3/library/socket.html#socket.socket.type).

DragonMinded · 2022-11-15T01:19:25Z

Oh, good catch, that would definitely screw things up. Setting the timeout used to also go along with blocking implications. Hmmm, ugh. I really don't know. Its basically impossible to try to test all permutations of Linux/OSX/BSD with Naomi/Triforce/Chihiro, especially given I don't have any chihiros, triforces or native BSD devices.

notsonic · 2022-11-15T01:36:06Z

If it were me, I just wouldn't support BSD lol. I'm only using it because I already had the TrueNAS server running. I could just run this in a linux vm instead of a jail.

I assume the difference comes down to the native socket implementation differences between linux and bsd. They must have different defaults or something. I tried using socket.setsockopt to set time outs SO_RCVTIMEO and SO_SNDTIMEO (socket.settimeout is something that's in the python layer only, apparently) and that didn't seem to do anything.

I did notice one bad behavior using the blocking sockets, the server won't come up if one of the cabinets is already on.

I wonder if there's some magic in socket.create_connection that socket.connect might be missing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FreeBSD exhibits similar behavior to OSX for socket timeouts #14

FreeBSD exhibits similar behavior to OSX for socket timeouts #14

notsonic commented Nov 13, 2022

DragonMinded commented Nov 14, 2022

notsonic commented Nov 14, 2022

DragonMinded commented Nov 14, 2022

notsonic commented Nov 15, 2022

DragonMinded commented Nov 15, 2022

notsonic commented Nov 15, 2022

DragonMinded commented Nov 15, 2022

notsonic commented Nov 15, 2022

FreeBSD exhibits similar behavior to OSX for socket timeouts #14

FreeBSD exhibits similar behavior to OSX for socket timeouts #14

Comments

notsonic commented Nov 13, 2022

DragonMinded commented Nov 14, 2022

notsonic commented Nov 14, 2022

DragonMinded commented Nov 14, 2022

notsonic commented Nov 15, 2022

DragonMinded commented Nov 15, 2022

notsonic commented Nov 15, 2022

DragonMinded commented Nov 15, 2022

notsonic commented Nov 15, 2022