-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching logs crashes edge agent #7074
Comments
Hi @WDoughty - Can you share some additional info so we can potentially repro and better understand the context of the errors you're seeing? For example, please add the following info from our bug template if you can: Runtime Versions
Note: when using Windows containers on Windows, run Logsaziot-edged logs
edge-agent logs
edge-hub logs
Additional InformationPlease provide any additional information that may be helpful in understanding the issue. |
@nlcamp I don't have direct access to the device in question. So I will list what I know. If the device is used more today I will try and reproduce and get more data. Runtime Versions
Logsaziot-edged logs
edge-agent logs
edge-hub logs
|
@nlcamp any ideas? |
@WDoughty - I'm not able to repro this error. I'm using aziot-edged 1.4.16, edgeHub 1.4.18, and edgeAgent 1.4.18. When I deploy a SimulatedTemperatureSensor and then view logs from Azure Portal, I see the following output when tailing the edgeAgent logs via
At the same time, I see the following output from tailing the aziot-edged logs via
Can you answer a few more questions to help us diagnose the cause:
|
Unfortunately, I can't run any commands on the device because it is not in my possession but it seems like the error only occurs when it has spotty internet connectivity, which is often. We have also been able to request the logs at other times than the day this happened and the agent didn't crash. |
@WDoughty - Is there anyone on your team who has remote access to the device (e.g. via SSH)? I'm unable to reproduce the error so in order to efficiently troubleshoot it at this point I'll need you to run some commands on the device to gather more info. |
@nlcamp We are going to send someone out to the device today. What are the commands you would like to be ran? Just the ones mentioned above? |
@WDoughty - Yes, please run the commands from above plus the following which will generate a full support bundle:
Please then post the output from the commands I mentioned in my earlier comment plus the check.json output from the support bundle. Note: please redact any sensitive info from the logs and/or check.json output prior to posting in this public forum. Please save the support-bundle -- I may request additional items from it later. Thanks. |
@nlcamp Finally got some of the logs will post necessary stuff here aziot-edged [run iotedge version]:
Logsaziot-edged logs
edge-agent logs
docker logs command
check.json
|
@WDoughty - Thanks for the additional logs. I've been trying unsuccessfully to repro the issue on my end by initiating streaming logs while my test device is losing network connectivity. However, I'm not seeing the edgeAgent crash you're seeing. I may need to write some lower-level tests so I can better control the order of events to achieve a repro. I suspect there may be a bug on our end in the handling of IOExceptions during the In the meanwhile, can you please check the edgeAgent logs for the exact version you are running? From your earlier response, I understand you're probably specifying 1.4 in your deployment manifest, but I'd like to see which image is being pulled to make sure it's the latest (i.e. 1.4.18). If you have access to the device you can run the following command and share the output:
If you don't have access to the device, you can stream the logs from portal (at a time when your connection is strong enough) and search for a line like the following:
|
@nlcamp We ended up running As for now the device does not have internet connection, if that helps at all. |
@WDoughty - When you run into this issue, do you see any warning messages and notifications in Portal? Could you share a screen capture of what you see? Here's an example: We're expecting that you'll see an "Unexpected end of stream" error message somewhere there, but need to confirm. |
@nlcamp From what I remember there were no warning or notifications and the only thing that popped up was something like "<Unexpected end of stream>" where the logs should have been. Unfortunately, the device in question isn't online right now. I do remember we were able to pull the logs from the edgeAgent at the time and we could see that error in the edgeAgent logs, but for any other custom module it would pop up that "Unexpected end of stream" for the logs themselves. |
@WDoughty - Ok, what you remember is what I would expect. The part that I can't yet explain is why edgeAgent is terminated after the IO Exception is reported:
The error you see when running If so, could you try to recapture logs (and a full support bundle) at a time when the device is back online and you're seeing (or recently saw) the issue? You can add Also, can you upgrade aziot-edge, aziot-identity-service, moby-cli, moby-engine, and the edgeHub/edgeAgent images to the latest releases so we can see if the issue may have resolved itself in some of the changes to our code (and dependencies) since the 1.4.10 version you're currently using? In the meantime, I'll take a look at the part of our edge daemon code that interacts with the Docker Engine to see if anything stands out as a potential root cause. |
@nlcamp I have the same problem as WDoughty, but with access to the device. I noticed that this error occurs when i disconnect the device from the power supply and restart it. After that, i only can get rid off this "Log" error by redeploying all modules. |
@nlcamp Actually had this issue with a separate device right now I have 6 custom modules and only the 1 is unable to fetch the logs. |
@huf-92 - Thanks for providing the repro steps that work for you. I'll give them a try on one of my test devices. In the meantime, could you share the aziot-edged logs from a time when you're seeing the issue? @WDoughty - Thanks for the screen shots. Do you have SSH access to the new device that's showing this error? If so, could you also provide the aziot-edged logs from a time when you're seeing the error? |
This issue is being marked as stale because it has been open for 30 days with no activity. |
@huf-92 / @WDoughty May I ask what's your log configuration look like? Is it similar or different than this (https://learn.microsoft.com/en-us/azure/iot-edge/production-checklist?view=iotedge-1.4#set-up-default-logging-driver) ? |
It was the config that has the size set. |
@WDoughty IoTEdge uses docker to get log modules which runs into the issue. Other folks seems to run into the issue with various environment setup (AWS, and VMs) as well: moby/moby#46699 . @cpuguy83 Do you have thoughts on the issue? |
Is the error log from docker being printed anywhere? |
@cpuguy83 looks like the error is in the edgeAgent "Error grabbing logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)" |
@cpuguy83 - Here are the docker logs from the partial repro I mentioned earlier in this thread:
|
Thanks @nclamp I'm so far unable to reproduce this issue, not to say its not a problem or environmental. Given the panic in your logs I've also gone through the code a bunch to try and hunt down what could be causing that and also haven't come up with anything yet. If we have the actual container log content (from |
@cpuguy83 - I was able to repro again on a pi with the latest version of docker/moby (24.0.9-1 -- see full version printout in the console output copied below). Here is the output of
|
@cpuguy83 Have you been able to look at the logs provided by Noel? |
I've spent a ton of time trying to reproduce the corruption which I've been unable to do. It doesn't look like this change ever got backported to moby 20.10, which some of the reports seem to have. |
@WDoughty - I noticed that you're using an old version of moby-cli from before the fix that @cpuguy83 referenced in his above comment. Could you upgrade to the latest moby-cli and moby-engine, stop iotedge, remove any existing containers so that the old logs will be deleted, and restart iotedge?
While there may be a remaining moby issue related to log corruption when power is removed on raspberry pis, it's possible your issue is different and related to the bug that has been fixed. Please let us know if you still see the issue after upgrading moby and clearing the logs with the steps listed above. |
@nlcamp Thanks for your help. I am not actually on a project using iot edge anymore. |
I think I have an idea of what's happening here.
On 2, what appears to be happening is on power loss the metadata for a given write is in the journal but the data for it is not, leading to these null bytes. |
Expected Behavior
Viewing logs via azure portal should show logs
Logs
Device Information
The text was updated successfully, but these errors were encountered: