Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect device not sending anymore #5

Open
guihomework opened this issue Nov 8, 2022 · 3 comments
Open

How to detect device not sending anymore #5

guihomework opened this issue Nov 8, 2022 · 3 comments

Comments

@guihomework
Copy link
Collaborator

Devices running a FW (teensy or PIC) using the SerialProtocol (SP) FW library can sometimes lock-up (*), and not send data anymore although they were streaming fine in the first place.

Such case is currently handled in SW agni_serial_protocol with a timeout and a message "sp: nothing to read" repeated as long as the driver is running.

start streaming
streaming (press ctrl-c to quit)
(... streamed data for a while)
sp: nothing to read
sp: nothing to read
(...)
sp: nothing to read

For higher-level systems, detecting the device is in a dead state, is not immediate (maybe detecting no data on the awaited topics), and only a driver restart would throw an exception and confirm the device is dead due to not initializing well anymore.

initializing serial protocol
 sp: nothing to read (did you start streaming ?)
sp: header 0x0 0x0
sp: data 0x0 0x0 0x0 0x0 0x0
sp: init failed:sp: nothing to read (did you start streaming ?)
initialization failed
disconnecting
disconnected

With this issue, I raise the question, should one have a "status" message for the state of the sp_to_ros node, instead of just printing to console. Maybe a diagnostic message would be even cleaner as this is used throughout ROS to monitor nodes and see if they are healthy. The other difference between a basic status and diagnostic message is also that a diagnostic message is sent regularly (to prove the node is alive I suppose).
The higher level system could react to errors in the status or diagnostic messages without monitoring other topics (whose names could be as various as they are devices)

There are also no service available in agni_serial_protocol that directly requeries the device except changing the period of a logical sensor. Maybe a is_alive service doing a ping awaiting the ping response of SP could be an alternative to restarting the driver to test the device.

@rhaschke what do you think ?

(*) device that currently locks-up is the fingernail mock-up device if nail is pressed, and sending a period change request that goes to the device apparently is also not working (says wrong header) even when the device still streams. Probably a too old FW SP lib inside.

@rhaschke
Copy link
Member

rhaschke commented Nov 8, 2022

Introducing a status topic is a good idea. However, I suggest limiting messages to changes of the status. If the topic is latched, any subscriber will get the latest status anyway. I suggest an enum with all relevant states.
I don't see the need to frequently send diagnostic messages - they won't change most of the time, right?
In any case, warnings/errors should be printed to the console as well.

An is_alive service in the firmware is not required in my opinion. The driver sees that device is streaming (or not), doesn't it?

@guihomework
Copy link
Collaborator Author

Right, that make sense to have a latched status topic only. I hesitate to create a specific status message with enums, because that means any tool that wants to read it, then depends on the build of agni_serial_protocol and its messages.
What about using a diagnostic_msgs/DiagnosticStatus that already has 4 states and can hold more info if required about the state if required.

byte OK=0
byte WARN=1
byte ERROR=2
byte STALE=3
byte level
string name
string message
string hardware_id
diagnostic_msgs/KeyValue[] values
  string key
  string value

the agni_serial_protocol does not have so many states in fact, and I believe they all fit in the existing ones.

  • OK = everything runs and data streams fine
  • WARN = some errors occurred at init or during streaming (for instance lost data) that do not hinder the tool completely from continuing its work at least partially.
    Maybe one should then also create an acknowledge service for the user/higher-level app to notify the warning has been seen and processed, so that the state becomes OK if the warning is not repeated.
    We have for instance the checksum bug warnings, that are totally fine, and we have some "failed to find IMU" one time message that could be cleared.
    However we have repeated "IMU not initialized" that repeat until the IMU has moved enough to be initialized. This type of message could be cleaned but would come again. One has to wait for an OK state to come automatically
  • ERROR = errors that hinders the driver to run. But this will probably not be accessible as the driver throws and dies when such error occurs, so the topic would also disappear. Maybe a last error message before the node destroys is still fine for higher-level app monitoring the status soon enough.
  • STALE = there is no response from the device anymore, this is the most interesting case for my usage in fact.

yes the is_alive was supposed to be a user request, to make a specific query to the device through the driver. If the device is not streaming, the driver does not (yet) regularly ping to see if the device is still responding. With the introduction of a status topic, the driver will have to figure it out regularly, and in case not streaming, then ping the device itself to see if it is still alive. So no service required but an additional internal alive check.

@guihomework
Copy link
Collaborator Author

I believe the discussion on PR #6 show that a full rework of all the throwing system is required before being able to identify the correct state of the driver at top level. I won't be able to tackle this project. You can close this topic here. I will work on my fork with a minimalist solution not suitable for upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants