Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching logs crashes edge agent #7074

Closed
WDoughty opened this issue Aug 3, 2023 · 34 comments
Closed

Fetching logs crashes edge agent #7074

WDoughty opened this issue Aug 3, 2023 · 34 comments

Comments

@WDoughty
Copy link

WDoughty commented Aug 3, 2023

Expected Behavior

Viewing logs via azure portal should show logs

Logs

<6> 2023-08-03 17:09:20.328 +00:00 [INF] - Received request ping with payload
<6> 2023-08-03 17:09:20.337 +00:00 [INF] - Successfully handled request ping
<6> 2023-08-03 17:09:20.656 +00:00 [INF] - Received direct method call - GetModuleLogs
<6> 2023-08-03 17:09:20.656 +00:00 [INF] - Received request GetModuleLogs with payload
<6> 2023-08-03 17:09:20.731 +00:00 [INF] - Processing request to get logs for {"schemaVersion":"1.0","items":{"id":"\\btcmservice\\b","filter":{"tail":1500,"since":"15m","until":null,"loglevel":null,"regex":""}},"encoding":1,"contentType":1}
<6> 2023-08-03 17:09:20.845 +00:00 [INF] - Initiating streaming logs for tcmservice
<4> 2023-08-03 17:09:21.406 +00:00 [WRN] - Error handling request GetModuleLogs
System.IO.IOException: Unexpected end of stream.
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpBufferedStream.ReadLine() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpBufferedStream.cs:line 91
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpChunkedStreamReader.Read(Byte[] buffer, Int32 offset, Int32 count) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpChunkedStreamReader.cs:line 87
   at System.IO.DelegatingStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at Akka.Streams.Implementation.IO.InputStreamPublisher.ReadAndEmit()
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProcessor.GetText(String id, Stream stream, ModuleLogFilter filter) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProcessor.cs:line 93
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.ProcessByContentType(String id, Stream logsStream, ModuleLogOptions logOptions) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 99
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.GetProcessedLogs(String id, Stream logsStream, ModuleLogOptions logOptions) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 85
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.GetLogs(String id, ModuleLogOptions logOptions, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 38
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.ModuleLogsRequestHandler.<>c__DisplayClass8_0.<<HandleRequestInternal>b__1>d.MoveNext() in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/ModuleLogsRequestHandler.cs:line 58
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.ModuleLogsRequestHandler.HandleRequestInternal(Option`1 payloadOption, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/ModuleLogsRequestHandler.cs:line 76
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.RequestHandlerBase`2.HandleRequest(Option`1 payloadJson, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/RequestHandlerBase.cs:line 30
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.RequestManager.ProcessRequest(String request, String payloadJson) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/RequestManager.cs:line 54
<6> 2023-08-03 17:09:23.457 +00:00 [INF] - Termination requested, initiating shutdown.
<6> 2023-08-03 17:09:23.458 +00:00 [INF] - Main thread terminated
<6> 2023-08-03 17:09:23.466 +00:00 [INF] - Initiating shutdown cleanup.
<6> 2023-08-03 17:09:23.482 +00:00 [INF] - Stopping all modules...

Device Information

  • Host OS: Ubuntu 20.04
  • Architecture: amd64
  • Container OS: Linux Containers
@nlcamp
Copy link
Contributor

nlcamp commented Aug 4, 2023

Hi @WDoughty - Can you share some additional info so we can potentially repro and better understand the context of the errors you're seeing? For example, please add the following info from our bug template if you can:

Runtime Versions

  • aziot-edged [run iotedge version]:
  • Edge Agent [image tag (e.g. 1.0.0)]:
  • Edge Hub [image tag (e.g. 1.0.0)]:
  • Docker/Moby [run docker version]:

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

aziot-edged logs

<Paste here between the triple backticks>

edge-agent logs

<Paste here between the triple backticks>

edge-hub logs

<Paste here between the triple backticks>

Additional Information

Please provide any additional information that may be helpful in understanding the issue.

@WDoughty
Copy link
Author

WDoughty commented Aug 7, 2023

@nlcamp I don't have direct access to the device in question. So I will list what I know. If the device is used more today I will try and reproduce and get more data.

Runtime Versions

  • Edge Agent [image tag (e.g. 1.0.0)]: 1.4
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.4
  • Docker/Moby [run docker version]: 20.10.25+azure-2

Logs

aziot-edged logs

<Paste here between the triple backticks>

edge-agent logs

<6> 2023-08-03 17:09:20.323 +00:00 [INF] - Received direct method call - ping
<6> 2023-08-03 17:09:20.328 +00:00 [INF] - Received request ping with payload
<6> 2023-08-03 17:09:20.337 +00:00 [INF] - Successfully handled request ping
<6> 2023-08-03 17:09:20.656 +00:00 [INF] - Received direct method call - GetModuleLogs
<6> 2023-08-03 17:09:20.656 +00:00 [INF] - Received request GetModuleLogs with payload
<6> 2023-08-03 17:09:20.731 +00:00 [INF] - Processing request to get logs for {"schemaVersion":"1.0","items":{"id":"\\btcmservice\\b","filter":{"tail":1500,"since":"15m","until":null,"loglevel":null,"regex":""}},"encoding":1,"contentType":1}
<6> 2023-08-03 17:09:20.845 +00:00 [INF] - Initiating streaming logs for tcmservice
<4> 2023-08-03 17:09:21.406 +00:00 [WRN] - Error handling request GetModuleLogs
System.IO.IOException: Unexpected end of stream.
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpBufferedStream.ReadLine() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpBufferedStream.cs:line 91
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpChunkedStreamReader.Read(Byte[] buffer, Int32 offset, Int32 count) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpChunkedStreamReader.cs:line 87
   at System.IO.DelegatingStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at Akka.Streams.Implementation.IO.InputStreamPublisher.ReadAndEmit()
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProcessor.GetText(String id, Stream stream, ModuleLogFilter filter) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProcessor.cs:line 93
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.ProcessByContentType(String id, Stream logsStream, ModuleLogOptions logOptions) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 99
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.GetProcessedLogs(String id, Stream logsStream, ModuleLogOptions logOptions) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 85
   at Microsoft.Azure.Devices.Edge.Agent.Core.Logs.LogsProvider.GetLogs(String id, ModuleLogOptions logOptions, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/logs/LogsProvider.cs:line 38
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.ModuleLogsRequestHandler.<>c__DisplayClass8_0.<<HandleRequestInternal>b__1>d.MoveNext() in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/ModuleLogsRequestHandler.cs:line 58
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.ModuleLogsRequestHandler.HandleRequestInternal(Option`1 payloadOption, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/ModuleLogsRequestHandler.cs:line 76
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.RequestHandlerBase`2.HandleRequest(Option`1 payloadJson, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/RequestHandlerBase.cs:line 30
   at Microsoft.Azure.Devices.Edge.Agent.Core.Requests.RequestManager.ProcessRequest(String request, String payloadJson) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/requests/RequestManager.cs:line 54
<6> 2023-08-03 17:09:23.457 +00:00 [INF] - Termination requested, initiating shutdown.
<6> 2023-08-03 17:09:23.458 +00:00 [INF] - Main thread terminated
<6> 2023-08-03 17:09:23.466 +00:00 [INF] - Initiating shutdown cleanup.
<6> 2023-08-03 17:09:23.482 +00:00 [INF] - Stopping all modules...

edge-hub logs

<Paste here between the triple backticks>

@jlian
Copy link
Member

jlian commented Aug 15, 2023

@nlcamp any ideas?

@nlcamp
Copy link
Contributor

nlcamp commented Aug 19, 2023

@WDoughty - I'm not able to repro this error. I'm using aziot-edged 1.4.16, edgeHub 1.4.18, and edgeAgent 1.4.18. When I deploy a SimulatedTemperatureSensor and then view logs from Azure Portal, I see the following output when tailing the edgeAgent logs via sudo iotedge logs -f edgeAgent:

<6> 2023-08-19 00:21:09.481 +00:00 [INF] - Received direct method call - GetModuleLogs
<6> 2023-08-19 00:21:09.481 +00:00 [INF] - Received request GetModuleLogs with payload
<6> 2023-08-19 00:21:09.496 +00:00 [INF] - Processing request to get logs for {"schemaVersion":"1.0","items":{"id":"\bSimulatedTemperatureSensor\b","filter":{"tail":1500,"since":"15m","until":null,"loglevel":null,"regex":""}},"encoding":1,"contentType":1}
<6> 2023-08-19 00:21:09.531 +00:00 [INF] - Initiating streaming logs for SimulatedTemperatureSensor
<6> 2023-08-19 00:21:09.670 +00:00 [INF] - Received 1060 bytes of logs for SimulatedTemperatureSensor
<6> 2023-08-19 00:21:09.674 +00:00 [INF] - Successfully handled request GetModuleLogs

At the same time, I see the following output from tailing the aziot-edged logs via sudo journalctl -f -u aziot-edged:

Aug 18 20:21:09 noel-Virtual-Machine aziot-edged[214573]: 2023-08-19T00:21:09Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 18 20:21:09 noel-Virtual-Machine aziot-edged[214573]: 2023-08-19T00:21:09Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 18 20:21:09 noel-Virtual-Machine aziot-edged[214573]: 2023-08-19T00:21:09Z [INFO] - <-- GET /modules/SimulatedTemperatureSensor/logs?api-version=2022-08-03&follow=false&since=15m&tail=1500 {"host": "mgmt.sock:80", "connection": "close"}
Aug 18 20:21:09 noel-Virtual-Machine aziot-edged[214573]: 2023-08-19T00:21:09Z [INFO] - Getting logs for module SimulatedTemperatureSensor...
Aug 18 20:21:09 noel-Virtual-Machine aziot-edged[214573]: 2023-08-19T00:21:09Z [INFO] - --> 200 {"content-type": "text/plain"}

Can you answer a few more questions to help us diagnose the cause:

  1. As I did above, can you run sudo journalctl -f -u aziot-edged and share the output you see when you request logs for your custom module?
  2. Can you add a SimulatedTemperatureSensor module to your deployment and repeat the above, requesting logs for it? I'd like to see if the crash only happens when you request logs for your custom module.
  3. Can you share the output of running iotedge version on your device?
  4. Can you run the following commands to give me an idea of the size of the logs generated by your custom module in 15 minutes?: sudo docker logs --tail 1500 --since 15m tcmservice >& logs.txt && ls -lrtha logs.txt

@WDoughty
Copy link
Author

Unfortunately, I can't run any commands on the device because it is not in my possession but it seems like the error only occurs when it has spotty internet connectivity, which is often.

We have also been able to request the logs at other times than the day this happened and the agent didn't crash.

@nlcamp
Copy link
Contributor

nlcamp commented Aug 28, 2023

@WDoughty - Is there anyone on your team who has remote access to the device (e.g. via SSH)? I'm unable to reproduce the error so in order to efficiently troubleshoot it at this point I'll need you to run some commands on the device to gather more info.

@WDoughty
Copy link
Author

@nlcamp We are going to send someone out to the device today. What are the commands you would like to be ran? Just the ones mentioned above?

@nlcamp
Copy link
Contributor

nlcamp commented Aug 29, 2023

@WDoughty - Yes, please run the commands from above plus the following which will generate a full support bundle:

sudo iotedge support-bundle

Please then post the output from the commands I mentioned in my earlier comment plus the check.json output from the support bundle. Note: please redact any sensitive info from the logs and/or check.json output prior to posting in this public forum.

Please save the support-bundle -- I may request additional items from it later. Thanks.

@WDoughty
Copy link
Author

@nlcamp Finally got some of the logs will post necessary stuff here

aziot-edged [run iotedge version]: iotedge 1.4.10
Docker/Moby [run docker version]:

Client:
 Version:           20.10.25+azure-2
 API version:       1.41
 Go version:        go1.19.10
 Git commit:        b82b9f3a0e763304a250531cb9350aa6d93723c9
 Built:             Thu Apr  6 10:55:17 UTC 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          23.0.6+azure-2
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.10
  Git commit:       9dbdbd4b6d7681bd18c897a6ba0376073c2a72ff
  Built:            Fri May  5 20:50:10 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21+azure-3
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.8
  GitCommit:        82f18fe0e44a59034f3e1f45e475fa5636e539aa
 docker-init:
  Version:          0.19.0
  GitCommit:    

Logs

aziot-edged logs
-- Logs begin at Wed 2023-07-26 19:31:04 UTC. --
Aug 31 17:41:30 rpc aziot-edged[2268]: 2023-08-31T17:41:30Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:30 rpc aziot-edged[2268]: 2023-08-31T17:41:30Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:35 rpc aziot-edged[2268]: 2023-08-31T17:41:35Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:35 rpc aziot-edged[2268]: 2023-08-31T17:41:35Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:40 rpc aziot-edged[2268]: 2023-08-31T17:41:40Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:40 rpc aziot-edged[2268]: 2023-08-31T17:41:40Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:45 rpc aziot-edged[2268]: 2023-08-31T17:41:45Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:45 rpc aziot-edged[2268]: 2023-08-31T17:41:45Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:50 rpc aziot-edged[2268]: 2023-08-31T17:41:50Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:50 rpc aziot-edged[2268]: 2023-08-31T17:41:50Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:55 rpc aziot-edged[2268]: 2023-08-31T17:41:55Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - <-- GET /modules/tcmservice/logs?api-version=2022-08-03&follow=false&since=15m&tail=1500 {"host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - Getting logs for module tcmservice...
Aug 31 17:41:56 rpc aziot-edged[2268]: 2023-08-31T17:41:56Z [INFO] - --> 200 {"content-type": "text/plain"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - <-- GET /modules/edgeAgent/logs?api-version=2022-08-03&follow=false&since=15m&tail=1500 {"host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - Getting logs for module edgeAgent...
Aug 31 17:42:01 rpc aziot-edged[2268]: 2023-08-31T17:42:01Z [INFO] - --> 200 {"content-type": "text/plain"}
Aug 31 17:42:06 rpc aziot-edged[2268]: 2023-08-31T17:42:06Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:42:06 rpc aziot-edged[2268]: 2023-08-31T17:42:06Z [INFO] - --> 200 {"content-type": "application/json"}
Aug 31 17:42:11 rpc aziot-edged[2268]: 2023-08-31T17:42:11Z [INFO] - <-- GET /modules?api-version=2022-08-03 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Aug 31 17:42:11 rpc aziot-edged[2268]: 2023-08-31T17:42:11Z [INFO] - --> 200 {"content-type": "application/json"}

edge-agent logs
Error grabbing logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)

docker logs command
-rw-rw-r-- 1 - - 15K Aug 31 17:55 logs.txt
check.json
{
	"additional_info": {
		"aziot_edged_version": "1.4.10",
		"docker_version": "23.0.6+azure-2",
		"now": "2023-08-31T17:44:22.685402342Z",
		"os": {
			"arch": "x86_64",
			"bitness": 64,
			"id": "ubuntu",
			"version_id": "20.04"
		},
		"system_info": {
			"disks": [],
			"total_ram": "3.72 GiB",
			"total_swap": "3.73 GiB",
			"used_ram": "1.53 GiB",
			"used_swap": "0 B"
		}
	},
	"checks": {
		"aziot-edge-version": {
			"result": {
				"result": "warning",
				"details": [
					"Installed IoT Edge daemon has version 1.4.10 but 1.4.16 is the latest stable version available.\nPlease see https://aka.ms/iotedge-update-runtime for update instructions."
				]
			},
			"additional_info": {
				"actual_version": "1.4.10",
				"expected_version": "1.4.16"
			}
		},
		"aziot-edged-config-well-formed": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"aziot-version": {
			"result": {
				"result": "warning",
				"details": [
					"Installed aziot-identity-service package has version 1.4.4 but 1.4.5 is the latest stable version available.\nPlease see https://aka.ms/aziot-update-runtime for update instructions."
				]
			},
			"additional_info": {
				"actual_version": "1.4.4",
				"expected_version": "1.4.5"
			}
		},
		"certd-config-well-formed": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"certd-running": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"certs-match-private-keys": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"certs-preloaded": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"certs-read": { "result": { "result": "ok" }, "additional_info": {} },
		"check-agent-image": {
			"result": { "result": "skipped" },
			"additional_info": {}
		},
		"config-up-to-date": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"configs-up-to-date": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"connect-management-uri": {
			"result": { "result": "ok" },
			"additional_info": {
				"connect_management_uri": "unix:///var/run/iotedge/mgmt.sock",
				"listen_management_uri": "fd://aziot-edged.mgmt.socket"
			}
		},
		"container-connect-upstream-amqp": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Amqp"
			}
		},
		"container-connect-upstream-https": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Https"
			}
		},
		"container-connect-upstream-mqtt": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Mqtt"
			}
		},
		"container-default-connect-upstream-amqp": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Amqp"
			}
		},
		"container-default-connect-upstream-https": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Https"
			}
		},
		"container-default-connect-upstream-mqtt": {
			"result": { "result": "skipped" },
			"additional_info": {
				"diagnostics_image_name": null,
				"network_name": null,
				"proxy": null,
				"upstream_hostname": null,
				"upstream_port": "Mqtt"
			}
		},
		"container-engine-dns": {
			"result": {
				"result": "warning",
				"details": [
					"Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.\nPlease see https://aka.ms/iotedge-prod-checklist-dns for best practices.\nYou can ignore this warning if you are setting DNS server per module in the Edge deployment."
				]
			},
			"additional_info": {
				"container_engine_config_path": "/etc/docker/daemon.json",
				"dns": null
			}
		},
		"container-engine-ipv6": {
			"result": { "result": "ignored" },
			"additional_info": {
				"actual_use_ipv6": null,
				"expected_use_ipv6": false
			}
		},
		"container-engine-logrotate": {
			"result": { "result": "ok" },
			"additional_info": {
				"daemon_config": {
					"log-driver": "local",
					"log-opts": { "max-file": "3", "max-size": "10m" }
				}
			}
		},
		"container-engine-uri": {
			"result": { "result": "ok" },
			"additional_info": {
				"docker_host_arg": "unix:///var/run/docker.sock",
				"docker_server_version": "23.0.6+azure-2"
			}
		},
		"container-local-time": {
			"result": { "result": "ok" },
			"additional_info": {
				"actual_duration": { "nanos": 0, "secs": 1693503883 },
				"diff": 0,
				"expected_duration": { "nanos": 799210059, "secs": 1693503883 }
			}
		},
		"container-resolve-parent-hostname": {
			"result": { "result": "ignored" },
			"additional_info": {}
		},
		"edge-agent-storage-mounted-from-host": {
			"result": {
				"result": "warning",
				"details": [
					"The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.\nData might be lost if the module is deleted or updated.\nPlease see https://aka.ms/iotedge-storage-host for best practices."
				]
			},
			"additional_info": {
				"container_directories": [
					"/var/run/iotedge/mgmt.sock",
					"/var/run/iotedge/workload.sock"
				],
				"storage_directory": "/tmp/edgeAgent"
			}
		},
		"edge-hub-storage-mounted-from-host": {
			"result": {
				"result": "warning",
				"details": [
					"The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.\nData might be lost if the module is deleted or updated.\nPlease see https://aka.ms/iotedge-storage-host for best practices."
				]
			},
			"additional_info": {
				"container_directories": ["/var/run/iotedge/workload.sock"],
				"storage_directory": "/tmp/edgeHub"
			}
		},
		"est-identity-and-bootstrap-certificate-expiry": {
			"result": { "result": "ignored" },
			"additional_info": {
				"bootstrap_certificate_info": null,
				"identity_certificate_info": null
			}
		},
		"est-server-https": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"host-connect-dps-endpoint": {
			"result": { "result": "ok" },
			"additional_info": {
				"dps_endpoint": "https://global.azure-devices-provisioning.net/",
				"dps_hostname": "global.azure-devices-provisioning.net",
				"proxy": null
			}
		},
		"host-connect-iothub-amqp": {
			"result": {
				"result": "warning",
				"details": [
					"Could not retrieve iothub_hostname from provisioning file.\nPlease specify the backing IoT Hub name using --iothub-hostname switch if you have that information.\nSince no hostname is provided, all hub connectivity tests will be skipped."
				]
			},
			"additional_info": {
				"iothub_hostname": null,
				"port_number": 5671,
				"proxy": null
			}
		},
		"host-connect-iothub-https": {
			"result": {
				"result": "warning",
				"details": [
					"Could not retrieve iothub_hostname from provisioning file.\nPlease specify the backing IoT Hub name using --iothub-hostname switch if you have that information.\nSince no hostname is provided, all hub connectivity tests will be skipped."
				]
			},
			"additional_info": {
				"iothub_hostname": null,
				"port_number": 443,
				"proxy": null
			}
		},
		"host-connect-iothub-mqtt": {
			"result": {
				"result": "warning",
				"details": [
					"Could not retrieve iothub_hostname from provisioning file.\nPlease specify the backing IoT Hub name using --iothub-hostname switch if you have that information.\nSince no hostname is provided, all hub connectivity tests will be skipped."
				]
			},
			"additional_info": {
				"iothub_hostname": null,
				"port_number": 8883,
				"proxy": null
			}
		},
		"host-local-time": {
			"result": { "result": "ok" },
			"additional_info": { "offset": 0 }
		},
		"hostname": {
			"result": { "result": "ok" },
			"additional_info": {
				"config_hostname": "rpc",
				"machine_hostname": "rpc"
			}
		},
		"identity-certificate-expiry": {
			"result": { "result": "ignored" },
			"additional_info": {
				"certificate_info": null,
				"provisioning_mode": "dps-other"
			}
		},
		"identityd-config-well-formed": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"identityd-running": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"key-pairs-read": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"keyd-config-well-formed": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"keyd-running": { "result": { "result": "ok" }, "additional_info": {} },
		"local-ca-certificate-expiry": {
			"result": { "result": "ignored" },
			"additional_info": { "certificate_info": null }
		},
		"parent_hostname": {
			"result": { "result": "ignored" },
			"additional_info": { "config_parent_hostname": null }
		},
		"proxy-settings": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"tpmd-config-well-formed": {
			"result": { "result": "ok" },
			"additional_info": {}
		},
		"tpmd-running": { "result": { "result": "ok" }, "additional_info": {} }
	}
}

@nlcamp
Copy link
Contributor

nlcamp commented Sep 8, 2023

@WDoughty - Thanks for the additional logs. I've been trying unsuccessfully to repro the issue on my end by initiating streaming logs while my test device is losing network connectivity. However, I'm not seeing the edgeAgent crash you're seeing.

I may need to write some lower-level tests so I can better control the order of events to achieve a repro. I suspect there may be a bug on our end in the handling of IOExceptions during the GetModuleLogs direct method call, but I need to consult with the code owners on my team.

In the meanwhile, can you please check the edgeAgent logs for the exact version you are running? From your earlier response, I understand you're probably specifying 1.4 in your deployment manifest, but I'd like to see which image is being pulled to make sure it's the latest (i.e. 1.4.18). If you have access to the device you can run the following command and share the output:

sudo iotedge logs edgeAgent -f | grep "Version - 1\.4\."

If you don't have access to the device, you can stream the logs from portal (at a time when your connection is strong enough) and search for a line like the following:

<6> 2023-09-08 15:52:55.529 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Agent.Service.Program] - Version - 1.4.18.77829725 (e4f4d92ccd8c284ee7603e31a44bcdb21136f711)

@WDoughty
Copy link
Author

@nlcamp
We kept running into an issue running with the iotedge logs command and it would return this
Error grabbing logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)

We ended up running docker logs edgeAgent and the version was this Version - 1.4.15.76247273 (9e713419235666f04ee5fa8823386890ef087193)

As for now the device does not have internet connection, if that helps at all.

@nlcamp
Copy link
Contributor

nlcamp commented Sep 20, 2023

@WDoughty - When you run into this issue, do you see any warning messages and notifications in Portal? Could you share a screen capture of what you see? Here's an example:

portal-stream-logs-error

We're expecting that you'll see an "Unexpected end of stream" error message somewhere there, but need to confirm.

@WDoughty
Copy link
Author

WDoughty commented Sep 21, 2023

@nlcamp From what I remember there were no warning or notifications and the only thing that popped up was something like "<Unexpected end of stream>" where the logs should have been. Unfortunately, the device in question isn't online right now.

I do remember we were able to pull the logs from the edgeAgent at the time and we could see that error in the edgeAgent logs, but for any other custom module it would pop up that "Unexpected end of stream" for the logs themselves.

@nlcamp
Copy link
Contributor

nlcamp commented Sep 21, 2023

@WDoughty - Ok, what you remember is what I would expect. The part that I can't yet explain is why edgeAgent is terminated after the IO Exception is reported:

<6> 2023-08-03 17:09:23.457 +00:00 [INF] - Termination requested, initiating shutdown.

The error you see when running sudo iotedge logs edgeAgent is a clue that something is failing in the edge daemon when it makes a call to the Docker Engine to retrieve container logs. I would expect to see logs in aziot-edged when this error occurs. Is it possible that the aziot-edged logs you posted above were from a period when you were not seeing the issue?

If so, could you try to recapture logs (and a full support bundle) at a time when the device is back online and you're seeing (or recently saw) the issue? You can add --since to go back more than the 1-day default when running both iotedge logs and iotedge support-bundle.

Also, can you upgrade aziot-edge, aziot-identity-service, moby-cli, moby-engine, and the edgeHub/edgeAgent images to the latest releases so we can see if the issue may have resolved itself in some of the changes to our code (and dependencies) since the 1.4.10 version you're currently using?

In the meantime, I'll take a look at the part of our edge daemon code that interacts with the Docker Engine to see if anything stands out as a potential root cause.

@huf-92
Copy link

huf-92 commented Oct 9, 2023

@nlcamp I have the same problem as WDoughty, but with access to the device. I noticed that this error occurs when i disconnect the device from the power supply and restart it. After that, i only can get rid off this "Log" error by redeploying all modules.

@WDoughty
Copy link
Author

@nlcamp Actually had this issue with a separate device right now
image
image
image

I have 6 custom modules and only the 1 is unable to fetch the logs.

@nlcamp
Copy link
Contributor

nlcamp commented Oct 16, 2023

@huf-92 - Thanks for providing the repro steps that work for you. I'll give them a try on one of my test devices. In the meantime, could you share the aziot-edged logs from a time when you're seeing the issue?

@WDoughty - Thanks for the screen shots. Do you have SSH access to the new device that's showing this error? If so, could you also provide the aziot-edged logs from a time when you're seeing the error?

@nlcamp
Copy link
Contributor

nlcamp commented Nov 8, 2023

@huf-92 / @WDoughty - Just a quick update that I have a partial repro using a raspberry pi and the steps huf-92 mentioned of disconnecting power (while streaming logs in Portal). We don't yet have a fix, but now that we can repro, we'll start working on one.

Copy link

github-actions bot commented Dec 9, 2023

This issue is being marked as stale because it has been open for 30 days with no activity.

@yophilav
Copy link
Contributor

@huf-92 / @WDoughty May I ask what's your log configuration look like? Is it similar or different than this (https://learn.microsoft.com/en-us/azure/iot-edge/production-checklist?view=iotedge-1.4#set-up-default-logging-driver) ?

@WDoughty
Copy link
Author

It was the config that has the size set.

@yophilav
Copy link
Contributor

@WDoughty IoTEdge uses docker to get log modules which runs into the issue. Other folks seems to run into the issue with various environment setup (AWS, and VMs) as well: moby/moby#46699 .

@cpuguy83 Do you have thoughts on the issue?

@jlian
Copy link
Member

jlian commented Jan 4, 2024

@cpuguy83 would you mind sharing some thoughts on if the issue @yophilav mentioned is the cause?

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 9, 2024

Is the error log from docker being printed anywhere?

@jlian
Copy link
Member

jlian commented Jan 16, 2024

@cpuguy83 looks like the error is in the edgeAgent "Error grabbing logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)"

@nlcamp
Copy link
Contributor

nlcamp commented Jan 24, 2024

Is the error log from docker being printed anywhere?

@cpuguy83 - Here are the docker logs from the partial repro I mentioned earlier in this thread:

Nov 03 18:51:58 ubuntu systemd[1]: Stopped Docker Application Container Engine.
Nov 03 18:51:58 ubuntu systemd[1]: Starting Docker Application Container Engine...
Nov 03 18:51:59 ubuntu dockerd[4498]: time="2023-11-03T18:51:59.090224688Z" level=info msg="Starting up"
Nov 03 18:51:59 ubuntu dockerd[4498]: time="2023-11-03T18:51:59.092196448Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
Nov 03 18:51:59 ubuntu dockerd[4498]: time="2023-11-03T18:51:59.612183085Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Nov 03 18:51:59 ubuntu dockerd[4498]: time="2023-11-03T18:51:59.653813151Z" level=info msg="Loading containers: start."
Nov 03 18:52:09 ubuntu dockerd[4498]: time="2023-11-03T18:52:09.825331596Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f
Nov 03 18:52:09 ubuntu dockerd[4498]: time="2023-11-03T18:52:09.825330985Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=b3de453567c6a1ced79fe60f12ef93d097a9b0268a7620166988fd8cbd3f6db6
Nov 03 18:52:09 ubuntu dockerd[4498]: time="2023-11-03T18:52:09.835380507Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=7996dfda521adf76084c68e4da10a1309b1ebeae588a1e11dc6d68a290698917
Nov 03 18:52:09 ubuntu dockerd[4498]: time="2023-11-03T18:52:09.982252581Z" level=info msg="ignoring event" container=7996dfda521adf76084c68e4da10a1309b1ebeae588a1e11dc6d68a290698917 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 03 18:52:10 ubuntu dockerd[4498]: time="2023-11-03T18:52:10.001226226Z" level=info msg="ignoring event" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 03 18:52:10 ubuntu dockerd[4498]: time="2023-11-03T18:52:10.001359175Z" level=info msg="ignoring event" container=b3de453567c6a1ced79fe60f12ef93d097a9b0268a7620166988fd8cbd3f6db6 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 03 18:52:11 ubuntu dockerd[4498]: time="2023-11-03T18:52:11.367255463Z" level=info msg="Removing stale sandbox 418857c915681bd06b73b0201e93ad23a3b7a07f65d9e6b2a311bd6134377024 (7996dfda521adf76084c68e4da10a1309b1ebeae588a1e11dc6d68a290698917)"
Nov 03 18:52:11 ubuntu dockerd[4498]: time="2023-11-03T18:52:11.476774019Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint a314dcb0d455ca17b491e026b82b91ab72e4298de9815ce45307d63ac878d4bc 9f360f09001b37fe4bb2fbea3de3c9df6f939450da5bb05ff3b41a6723788178], retrying...."
Nov 03 18:52:11 ubuntu dockerd[4498]: time="2023-11-03T18:52:11.788343618Z" level=info msg="Removing stale sandbox 61f7a459e83f6aab04b411777cc2077f848ba2cc37e723d243c43a4f3f2df634 (bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f)"
Nov 03 18:52:11 ubuntu dockerd[4498]: time="2023-11-03T18:52:11.801629786Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint a314dcb0d455ca17b491e026b82b91ab72e4298de9815ce45307d63ac878d4bc 38fe603429ee8686712c09f7b824f8e355ade56f40f515ee9e3bb932abc39e8e], retrying...."
Nov 03 18:52:12 ubuntu dockerd[4498]: time="2023-11-03T18:52:12.101599782Z" level=info msg="Removing stale sandbox 8915c29f418444d7a7e3052bf9dd93eee04751a441b16a8c2557a00c1f97d711 (b3de453567c6a1ced79fe60f12ef93d097a9b0268a7620166988fd8cbd3f6db6)"
Nov 03 18:52:12 ubuntu dockerd[4498]: time="2023-11-03T18:52:12.226305170Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint a314dcb0d455ca17b491e026b82b91ab72e4298de9815ce45307d63ac878d4bc 2d20332179d88d05c23e6d55034d3975dcf593a9527600289cf3dc020bc66b3b], retrying...."
Nov 03 18:52:12 ubuntu dockerd[4498]: time="2023-11-03T18:52:12.363180120Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Nov 03 18:52:12 ubuntu dockerd[4498]: time="2023-11-03T18:52:12.508013714Z" level=info msg="Loading containers: done."
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243422029Z" level=warning msg="WARNING: No memory limit support"
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243544830Z" level=warning msg="WARNING: No swap limit support"
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243593269Z" level=warning msg="WARNING: No kernel memory TCP limit support"
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243634024Z" level=warning msg="WARNING: No oom kill disable support"
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243756492Z" level=info msg="Docker daemon" commit=606a7a76949ec08cb26940037b1c71c0024c6b44 graphdriver=overlay2 version=23.0.7+azure-1
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.243974543Z" level=info msg="Daemon has completed initialization"
Nov 03 18:52:13 ubuntu systemd[1]: Started Docker Application Container Engine.
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.326040156Z" level=info msg="API listen on /run/docker.sock"
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.331541785Z" level=error msg="Handler for GET /containers/json returned error: write unix /run/docker.sock->@: write: broken pipe"
Nov 03 18:52:13 ubuntu dockerd[4498]: 2023/11/03 18:52:13 http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils.go:92)
Nov 03 18:52:13 ubuntu dockerd[4498]: time="2023-11-03T18:52:13.346241996Z" level=error msg="Error streaming logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f method="(*Daemon).ContainerLogs" module=daemon
Nov 03 18:52:57 ubuntu dockerd[4498]: time="2023-11-03T18:52:57.910687774Z" level=info msg="ignoring event" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 03 18:53:09 ubuntu dockerd[4498]: time="2023-11-03T18:53:09.493448131Z" level=error msg="Error streaming logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f method="(*Daemon).ContainerLogs" module=daemon
Nov 03 18:53:15 ubuntu dockerd[4498]: time="2023-11-03T18:53:15.700071700Z" level=error msg="Error streaming logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)" container=bca34506fefeab9c4bee6def8889f901f53bf8efc7fd6ee053bc2c41a17b8b4f method="(*Daemon).ContainerLogs" module=daemon
Nov 03 18:53:20 ubuntu dockerd[4498]: time="2023-11-03T18:53:20.671895510Z" level=error msg="Error streaming logs: error unmarshalling log entry (size=15370): proto: LogEntry: illegal tag 0 (wire type 6)" container=edgeAgent method="(*Daemon).ContainerLogs" module=daemon
Nov 03 18:53:48 ubuntu dockerd[4498]: panic: runtime error: invalid memory address or nil pointer dereference
Nov 03 18:53:48 ubuntu dockerd[4498]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xaaaae56e5510]
Nov 03 18:53:48 ubuntu dockerd[4498]: goroutine 693 [running]:
Nov 03 18:53:48 ubuntu dockerd[4498]: io.(*multiReader).Read(0x4001172048, {0x4000833800, 0x4, 0x800})
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/usr/local/go/src/io/multi.go:26 +0xb0
Nov 03 18:53:48 ubuntu dockerd[4498]: io.ReadAtLeast({0xaaaae777f520, 0x4001172048}, {0x4000833800, 0x4, 0x800}, 0x4)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/usr/local/go/src/io/io.go:332 +0xa0
Nov 03 18:53:48 ubuntu dockerd[4498]: io.ReadFull(...)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/usr/local/go/src/io/io.go:351
Nov 03 18:53:48 ubuntu dockerd[4498]: github.com/docker/docker/daemon/logger/local.(*decoder).readRecord(0x4000e609c0, 0x4)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/local/read.go:103 +0xa0
Nov 03 18:53:48 ubuntu dockerd[4498]: github.com/docker/docker/daemon/logger/local.(*decoder).decodeSizeHeader(0x4000e609c0)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/local/read.go:183 +0x24
Nov 03 18:53:48 ubuntu dockerd[4498]: github.com/docker/docker/daemon/logger/local.(*decoder).Decode(0x4000e609c0)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/local/read.go:131 +0x17c
Nov 03 18:53:48 ubuntu dockerd[4498]: github.com/docker/docker/daemon/logger/loggerutils.tailFiles({0x4001007b10, 0x1, 0xedcd73538?}, 0x400102eba0, {0xaaaae778ba30, 0x4000e609c0}, 0xaaaae776cf58, {{0x0, 0xedcd73538, 0xaaaae893e300}, ...})
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/loggerutils/logfile.go:614 +0x400
Nov 03 18:53:48 ubuntu dockerd[4498]: github.com/docker/docker/daemon/logger/loggerutils.(*LogFile).readLogsLocked(0x4000dc0f00, {0xaaaae59a8630?, 0x0?}, {{0x0, 0xedcd73538, 0xaaaae893e300}, {0x0, 0x0, 0x0}, 0x5dc, ...}, ...)
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/loggerutils/logfile.go:452 +0x3ec
Nov 03 18:53:48 ubuntu dockerd[4498]: created by github.com/docker/docker/daemon/logger/loggerutils.(*LogFile).ReadLogs
Nov 03 18:53:48 ubuntu dockerd[4498]: 	/build/moby-engine/.gopath/src/github.com/docker/docker/daemon/logger/loggerutils/logfile.go:392 +0x298
Nov 03 18:53:48 ubuntu systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 03 18:53:48 ubuntu systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 03 18:53:50 ubuntu systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Nov 03 18:53:50 ubuntu systemd[1]: Stopped Docker Application Container Engine.
Nov 03 18:53:50 ubuntu systemd[1]: Starting Docker Application Container Engine...
Nov 03 18:53:51 ubuntu dockerd[5937]: time="2023-11-03T18:53:51.096707880Z" level=info msg="Starting up"

@cpuguy83
Copy link
Member

Thanks @nclamp

I'm so far unable to reproduce this issue, not to say its not a problem or environmental.
That's definitely a bug in docker.

Given the panic in your logs I've also gone through the code a bunch to try and hunt down what could be causing that and also haven't come up with anything yet.

If we have the actual container log content (from /var/lib/docker/containers/<id>/container-cached.log or /var/lib/docker/containers/<id>/local-logs depending on the log configuration) this would be helpful to try and reproduce.

@nlcamp
Copy link
Contributor

nlcamp commented Mar 7, 2024

@cpuguy83 - I was able to repro again on a pi with the latest version of docker/moby (24.0.9-1 -- see full version printout in the console output copied below).

Here is the output of docker logs <container-id> as well as the corresponding contents of /var/lib/docker/containers/<container-id>/local-logs/container.log:

pi@ubuntu:~$ sudo docker ps
CONTAINER ID   IMAGE                                                             COMMAND                   CREATED          STATUS         PORTS                                                                                                                                   NAMES
8e6aad9da7d7   mcr.microsoft.com/azureiotedge-hub:1.4                            "/bin/sh -c 'echo \"$…"   13 minutes ago   Up 7 minutes   0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:5671->5671/tcp, :::5671->5671/tcp, 0.0.0.0:8883->8883/tcp, :::8883->8883/tcp, 1883/tcp   edgeHub
146a28a13972   mcr.microsoft.com/azureiotedge-simulated-temperature-sensor:1.0   "/bin/sh -c 'echo \"$…"   13 minutes ago   Up 7 minutes                                                                                                                                           SimulatedTemperatureSensor
023db9834127   mcr.microsoft.com/azureiotedge-agent:1.4                          "/bin/sh -c 'exec /a…"    13 minutes ago   Up 7 minutes                                                                                                                                           edgeAgent
pi@ubuntu:~$ sudo docker logs 146a28a13972
2024-03-07 16:18:32 +00:00 Starting Module
SimulatedTemperatureSensor Main() started.
Initializing simulated temperature sensor to send 500 messages, at an interval of 5 seconds.
To change this, set the environment variable MessageCount to the number of messages that should be sent (set it to -1 to send unlimited messages).
[Information]: Trying to initialize module client using transport type [Amqp_Tcp_Only].
[Information]: Successfully initialized module client of transport type [Amqp_Tcp_Only].
        03/07/2024 16:18:55> Sending message: 1, Body: [{"machine":{"temperature":22.102442439576816,"pressure":1.1255947083062194},"ambient":{"temperature":21.469049792722355,"humidity":25},"timeCreated":"2024-03-07T16:18:55.6101547Z"}]
        03/07/2024 16:19:00> Sending message: 2, Body: [{"machine":{"temperature":22.57950201541162,"pressure":1.179943267578539},"ambient":{"temperature":21.310193751384595,"humidity":26},"timeCreated":"2024-03-07T16:19:00.9264706Z"}]
        03/07/2024 16:19:05> Sending message: 3, Body: [{"machine":{"temperature":22.715802069970312,"pressure":1.195471121895352},"ambient":{"temperature":20.57475486727187,"humidity":26},"timeCreated":"2024-03-07T16:19:05.9429249Z"}]
        03/07/2024 16:19:10> Sending message: 4, Body: [{"machine":{"temperature":23.56161603706964,"pressure":1.2918296751091995},"ambient":{"temperature":21.369712012759276,"humidity":26},"timeCreated":"2024-03-07T16:19:10.9601739Z"}]
        03/07/2024 16:19:15> Sending message: 5, Body: [{"machine":{"temperature":23.753015581985476,"pressure":1.3136346865553075},"ambient":{"temperature":20.76553803554994,"humidity":24},"timeCreated":"2024-03-07T16:19:15.9775807Z"}]
        03/07/2024 16:19:21> Sending message: 6, Body: [{"machine":{"temperature":24.266670646270118,"pressure":1.3721523521067223},"ambient":{"temperature":20.98909418121404,"humidity":25},"timeCreated":"2024-03-07T16:19:21.0040903Z"}]
        03/07/2024 16:19:26> Sending message: 7, Body: [{"machine":{"temperature":24.955209874876406,"pressure":1.4505935300492108},"ambient":{"temperature":21.38531716069454,"humidity":24},"timeCreated":"2024-03-07T16:19:26.0242546Z"}]
        03/07/2024 16:19:31> Sending message: 8, Body: [{"machine":{"temperature":24.858433448177962,"pressure":1.439568367513945},"ambient":{"temperature":21.336172424180514,"humidity":24},"timeCreated":"2024-03-07T16:19:31.0387658Z"}]
        03/07/2024 16:19:36> Sending message: 9, Body: [{"machine":{"temperature":25.554813315768172,"pressure":1.5189027828090322},"ambient":{"temperature":21.45598522897623,"humidity":25},"timeCreated":"2024-03-07T16:19:36.0537525Z"}]
        03/07/2024 16:19:41> Sending message: 10, Body: [{"machine":{"temperature":26.244536073992276,"pressure":1.5974787932396262},"ambient":{"temperature":21.162087186082307,"humidity":25},"timeCreated":"2024-03-07T16:19:41.0677896Z"}]
        03/07/2024 16:19:46> Sending message: 11, Body: [{"machine":{"temperature":26.060037749963826,"pressure":1.5764599968313218},"ambient":{"temperature":21.17788335246867,"humidity":25},"timeCreated":"2024-03-07T16:19:46.0812487Z"}]
        03/07/2024 16:19:52> Sending message: 12, Body: [{"machine":{"temperature":26.166192736088384,"pressure":1.588553602845512},"ambient":{"temperature":20.534086932444055,"humidity":26},"timeCreated":"2024-03-07T16:19:52.2961596Z"}]
        03/07/2024 16:19:57> Sending message: 13, Body: [{"machine":{"temperature":27.186876609193565,"pressure":1.704834044085343},"ambient":{"temperature":20.71243130332438,"humidity":24},"timeCreated":"2024-03-07T16:19:57.3072524Z"}]
        03/07/2024 16:20:02> Sending message: 14, Body: [{"machine":{"temperature":27.68378070331354,"pressure":1.761443371263568},"ambient":{"temperature":20.522139865915356,"humidity":26},"timeCreated":"2024-03-07T16:20:02.3182719Z"}]
        03/07/2024 16:20:07> Sending message: 15, Body: [{"machine":{"temperature":27.62082924152297,"pressure":1.754271685743123},"ambient":{"temperature":21.296942980399795,"humidity":26},"timeCreated":"2024-03-07T16:20:07.3320173Z"}]
        03/07/2024 16:20:12> Sending message: 16, Body: [{"machine":{"temperature":27.86536106158297,"pressure":1.7821297411929966},"ambient":{"temperature":20.557960025993157,"humidity":24},"timeCreated":"2024-03-07T16:20:12.3488403Z"}]
        03/07/2024 16:20:17> Sending message: 17, Body: [{"machine":{"temperature":28.23314502811205,"pressure":1.8240291804178286},"ambient":{"temperature":21.100762649253365,"humidity":26},"timeCreated":"2024-03-07T16:20:17.366256Z"}]
        03/07/2024 16:20:22> Sending message: 18, Body: [{"machine":{"temperature":29.21004133471755,"pressure":1.9353211647146573},"ambient":{"temperature":20.840698345722956,"humidity":26},"timeCreated":"2024-03-07T16:20:22.3826944Z"}]
        03/07/2024 16:20:27> Sending message: 19, Body: [{"machine":{"temperature":29.893873267641226,"pressure":2.013226068465456},"ambient":{"temperature":21.299479244183505,"humidity":24},"timeCreated":"2024-03-07T16:20:27.4314665Z"}]
        03/07/2024 16:20:32> Sending message: 20, Body: [{"machine":{"temperature":30.376896226022804,"pressure":2.0682540004329777},"ambient":{"temperature":21.353786472163065,"humidity":26},"timeCreated":"2024-03-07T16:20:32.4451792Z"}]
        03/07/2024 16:20:37> Sending message: 21, Body: [{"machine":{"temperature":31.048282107710033,"pressure":2.1447409996125355},"ambient":{"temperature":21.131051530889724,"humidity":25},"timeCreated":"2024-03-07T16:20:37.4577484Z"}]
        03/07/2024 16:20:42> Sending message: 22, Body: [{"machine":{"temperature":31.625054591160755,"pressure":2.210449257220845},"ambient":{"temperature":21.18687486075185,"humidity":24},"timeCreated":"2024-03-07T16:20:42.4704962Z"}]
error from daemon in stream: Error grabbing logs: log message is too large (1937007727 > 1000000)

pi@ubuntu:~$ sudo more /var/lib/docker/containers/146a28a13972c53a39a8a7eb92bee2d9ce453596ecfd5df8458724192bc1009f/local-logs/container.log

�stdout���������*2024-03-07 16:18:32 +00:00 Starting Module
�stdout����������*SimulatedTemperatureSensor Main() started.
�stdout���Ħ������\Initializing simulated temperature sensor to send 500 messages, at an interval of 5 seconds.
�stdout���̦��������To change this, set the environment variable MessageCount to the number of messages that should be sent (set it to -1 to send unlimited messages).
�stdout�����������W[Information]: Trying to initialize module client using transport type [Amqp_Tcp_Only].
�stdout�����������X[Information]: Successfully initialized module client of transport type [Amqp_Tcp_Only].reated":"2024-03-07T16:18:55.6101547Z"}]/2024 16:19:05> Sending message: 3, Body: [{"machine":{"temperature":22.715802069970312,"pressure":1.195471121895352},"ambient":{"temperature":20.57475486727187,"humidity":26},"timeCreated":"2024-03-07T16:19:05.9429249Z"}]56161603706964,"pressure":1.2918296751091995},"ambient":{"temperature":21.369712012759276,"humidity":26},"timeCreated":"2024-03-07T16:19:10.9601739Z"}]20.76553803554994,"humidity":24},"timeCreated":"2024-03-07T16:19:15.9775807Z"}]903Z"}] 8, Body: [{"machine":{"temperature":24.858433448177962,"pressure":1.439568367513945},"ambient":{"temperature":21.336172424180514,"humidity":24},"timeCreated":"2024-03-07T16:19:31.0387658Z"}]9027828090322},"ambient":{"temperature":21.45598522897623,"humidity":25},"timeCreated":"2024-03-07T16:19:36.0537525Z"}]5},"timeCreated":"2024-03-07T16:19:41.0677896Z"}]����      03/07/2024 16:19:52> Sending message: 12, Body: [{"machine":{"temperature":26.166192736088384,"pressure":1.588553602845512},"ambient":{"temperature":20.534086932444055,"humidity":26},"timeCreated":"2024-03-07T16:19:52.2961596Z"}]"temperature":27.186876609193565,"pressure":1.704834044085343},"ambient":{"temperature":20.71243130332438,"humidity":24},"timeCreated":"2024-03-07T16:19:57.3072524Z"}]{"temperature":20.522139865915356,"humidity":26},"timeCreated":"2024-03-07T16:20:02.3182719Z"}]07T16:20:07.3320173Z"}]message: 17, Body: [{"machine":{"temperature":28.23314502811205,"pressure":1.8240291804178286},"ambient":{"temperature":21.100762649253365,"humidity":26},"timeCreated":"2024-03-07T16:20:17.366256Z"}]e":1.9353211647146573},"ambient":{"temperature":20.840698345722956,"humidity":26},"timeCreated":"2024-03-07T16:20:22.3826944Z"}]05,"humidity":24},"timeCreated":"2024-03-07T16:20:27.4314665Z"}]]y: [{"machine":{"temperature":31.625054591160755,"pressure":2.210449257220845},"ambient":{"temperature":21.18687486075185,"humidity":24},"timeCreated":"2024-03-07T16:20:42.4704962Z"}]178},"ambient":{"temperature":20.556829705860853,"humidity":24},"timeCreated":"2024-03-07T16:20:47.4923738Z"}]ated":"2024-03-07T16:20:52.5397534Z"}]:21:02> Sending message: 26, Body: [{"machine":{"temperature":33.007752572888386,"pressure":2.3679718121012083},"ambient":{"temperature":21.477450023394752,"humidity":24},"timeCreated":"2024-03-07T16:21:02.5633504Z"}].33128153163999,"pressure":2.404829541579239},"ambient":{"temperature":20.801232031686805,"humidity":26},"timeCreated":"2024-03-07T16:21:07.576183Z"}]20.714625861130013,"humidity":24},"timeCreated":"2024-03-07T16:21:12.5854124Z"}]1:17.5976429Z"}]message: 31, Body: [{"machine":{"temperature":35.70288892157975,"pressure":2.675012661952123},"ambient":{"temperature":20.858440609815737,"humidity":25},"timeCreated":"2024-03-07T16:21:27.6217249Z"}],"pressure":2.7264216390013223},"ambient":{"temperature":21.3836788082885,"humidity":25},"timeCreated":"2024-03-07T16:21:32.6296758Z"}]138,"humidity":24},"timeCreated":"2024-03-07T16:21:39.3141035Z"}]y: [{"machine":{"temperature":38.557130508849916,"pressure":3.000179425058851},"ambient":{"temperature":21.23632585757241,"humidity":24},"timeCreated":"2024-03-07T16:21:54.3354602Z"}]54343},"ambient":{"temperature":20.678017213557855,"humidity":26},"timeCreated":"2024-03-07T16:21:59.344184Z"}]},"timeCreated":"2024-03-07T16:22:04.35177Z"}]/2024 16:22:14> Sending message: 40, Body: [{"machine":{"temperature":41.09127888856049,"pressure":3.2888798733803086},"ambient":{"temperature":21.329929344742524,"humidity":25},"timeCreated":"2024-03-07T16:22:14.36295Z"}].123244368877835,"pressure":3.406445561011399},"ambient":{"temperature":21.338562286849395,"humidity":25},"timeCreated":"2024-03-07T16:22:19.3694515Z"}]rature":21.418399781416355,"humidity":24},"timeCreated":"2024-03-07T16:22:24.3735588Z"}]7T16:22:29.3835774Z"}]message: 45, Body: [{"machine":{"temperature":46.09741198706785,"pressure":3.859198833969755},"ambient":{"temperature":21.283635398737918,"humidity":25},"timeCreated":"2024-03-07T16:22:39.3939161Z"}]:3.9839846683809617},"ambient":{"temperature":20.821951046735958,"humidity":25},"timeCreated":"2024-03-07T16:22:44.3993047Z"}]dity":25},"timeCreated":"2024-03-07T16:22:49.4058473Z"}]�������������     03/07/2024 16:22:59> Sending message: 49, Body: [{"machine":{"temperature":47.431801161114954,"pressure":4.0112178537979055},"ambient":{"temperature":21.04399436225369,"humidity":24},"timeCreated":"2024-03-07T16:22:59.4167902Z"}]"temperature":48.39093674714254,"pressure":4.1204864648643404},"ambient":{"temperature":21.011321064788532,"humidity":25},"timeCreated":"2024-03-07T16:23:04.4248145Z"}]mbient":{"temperature":21.46935205346409,"humidity":26},"timeCreated":"2024-03-07T16:23:09.4315005Z"}]
�stdout�Ý��샢���*2024-03-07 16:24:34 +00:00 Starting Module
�stdout�dz�����*SimulatedTemperatureSensor Main() started.
�stdout�ǖ������\Initializing simulated temperature sensor to send 500 messages, at an interval of 5 seconds.
�stdout�����������To change this, set the environment variable MessageCount to the number of messages that should be sent (set it to -1 to send unlimited messages).
�stdout��Ҫ�����W[Information]: Trying to initialize module client using transport type [Amqp_Tcp_Only].
�stdout���ñ������X[Information]: Successfully initialized module client of transport type [Amqp_Tcp_Only].24-03-07T16:24:50.4396443Z"}]Sending message: 3, Body: [{"machine":{"temperature":21.622120207581723,"pressure":1.0708744540282975},"ambient":{"temperature":21.103554299847946,"humidity":26},"timeCreated":"2024-03-07T16:25:00.7991494Z"}]210174,"pressure":1.0581335453027298},"ambient":{"temperature":20.506001058968717,"humidity":26},"timeCreated":"2024-03-07T16:25:05.8524269Z"}]21.193563898416965,"humidity":25},"timeCreated":"2024-03-07T16:25:10.8691783Z"}]15.8891134Z"}] 8, Body: [{"machine":{"temperature":23.343765919489673,"pressure":1.2670113072836338},"ambient":{"temperature":21.120805087322744,"humidity":25},"timeCreated":"2024-03-07T16:25:26.4092003Z"}]e":1.3546149540101031},"ambient":{"temperature":20.894845011362268,"humidity":25},"timeCreated":"2024-03-07T16:25:31.4260672Z"}]332,"humidity":26},"timeCreated":"2024-03-07T16:25:36.4405437Z"}]y: [{"machine":{"temperature":26.010742078423842,"pressure":1.5708440342508174},"ambient":{"temperature":20.827176505386447,"humidity":26},"timeCreated":"2024-03-07T16:25:51.4905674Z"}]65182714301885},"ambient":{"temperature":20.501683817711513,"humidity":26},"timeCreated":"2024-03-07T16:25:56.5038511Z"}]idity":25},"timeCreated":"2024-03-07T16:26:01.515598Z"}]�ֆ������     03/07/2024 16:26:11> Sending message: 17, Body: [{"machine":{"temperature":27.048639841283965,"pressure":1.6890855515386796},"ambient":{"temperature":21.300260320212814,"humidity":24},"timeCreated":"2024-03-07T16:26:11.5494992Z"}]"temperature":27.89832624625336,"pressure":1.7858852685605096},"ambient":{"temperature":21.008004596693443,"humidity":26},"timeCreated":"2024-03-07T16:26:16.5625251Z"}]ambient":{"temperature":20.992091644318815,"humidity":26},"timeCreated":"2024-03-07T16:26:21.5857289Z"}]reated":"2024-03-07T16:26:26.5975974Z"}]:26:36> Sending message: 22, Body: [{"machine":{"temperature":28.340041442699746,"pressure":1.836207252965794},"ambient":{"temperature":20.586517997126336,"humidity":24},"timeCreated":"2024-03-07T16:26:36.6158845Z"}].23287220647692,"pressure":1.9379221501049655},"ambient":{"temperature":21.397597690065204,"humidity":26},"timeCreated":"2024-03-07T16:26:41.6272356Z"}]rature":21.37435556849202,"humidity":26},"timeCreated":"2024-03-07T16:26:46.6383604Z"}]6:52.1936252Z"}]message: 27, Body: [{"machine":{"temperature":31.228033709562396,"pressure":2.165219030203311},"ambient":{"temperature":20.869110911325137,"humidity":24},"timeCreated":"2024-03-07T16:27:02.2157158Z"}],"pressure":2.2108409579858392},"ambient":{"temperature":21.086539512773296,"humidity":26},"timeCreated":"2024-03-07T16:27:07.2230901Z"}]66489240745,"humidity":25},"timeCreated":"2024-03-07T16:27:12.2322896Z"}]21Z"}]y: [{"machine":{"temperature":32.40575913195766,"pressure":2.2993902808559357},"ambient":{"temperature":20.587950812693663,"humidity":25},"timeCreated":"2024-03-07T16:27:27.2609696Z"}]589535},"ambient":{"temperature":20.654190473330296,"humidity":26},"timeCreated":"2024-03-07T16:27:32.2708724Z"}]},"timeCreated":"2024-03-07T16:27:37.3323787Z"}]/2024 16:27:47> Sending message: 36, Body: [{"machine":{"temperature":33.01805037889538,"pressure":2.3691449798741573},"ambient":{"temperature":21.093550368488557,"humidity":26},"timeCreated":"2024-03-07T16:27:47.3504645Z"}]ture":33.575530445354765,"pressure":2.4326553671923152},"ambient":{"temperature":21.292386830222043,"humidity":26},"timeCreated":"2024-03-07T16:27:52Sending message: 41, Body: [{"machine":{"temperature":35.28310019198483,"pressure":2.627188629466626},"ambient":{"temperature":21.03503506050214,"humidity":25},"timeCreated":"2024-03-07T16:28:12.4360429Z"}],"pressure":2.6361859009522104},"ambient":{"temperature":21.081995744529177,"humidity":25},"timeCreated":"2024-03-07T16:28:17.4477447Z"}]97,"humidity":25},"timeCreated":"2024-03-07T16:28:22.4532778Z"}]��㊢�����     03/07/2024 16:28:32> Sending message: 45, Body: [{"machine":{"temperature":37.131634310065586,"pressure":2.8377811239315225},"ambient":{"temperature":21.02051458252618,"humidity":26},"timeCreated":"2024-03-07T16:28:32.4684455Z"}]"temperature":36.890469118901734,"pressure":2.8103066084824757},"ambient":{"temperature":20.654490653963055,"humidity":26},"timeCreated":"2024-03-07T16:28:37.4741127Z"}]ambient":{"temperature":20.78078398028425,"humidity":26},"timeCreated":"2024-03-07T16:28:42.4802462Z"}]:"2024-03-07T16:28:47.5270453Z"}]Sending message: 50, Body: [{"machine":{"temperature":40.56291686583445,"pressure":3.22868673155076},"ambient":{"temperature":20.642666611421234,"humidity":24},"timeCreated":"2024-03-07T16:28:57.5489675Z"}],"pressure":3.2851670131757973},"ambient":{"temperature":20.5940921502626,"humidity":25},"timeCreated":"2024-03-07T16:29:02.5540966Z"}]93,"humidity":24},"timeCreated":"2024-03-07T16:29:07.5595103Z"}]]y: [{"machine":{"temperature":43.15916067241186,"pressure":3.524461342426667},"ambient":{"temperature":21.347216073352477,"humidity":25},"timeCreated":"2024-03-07T16:29:22.5764895Z"}]524998345992},"ambient":{"temperature":20.559978528441853,"humidity":26},"timeCreated":"2024-03-07T16:29:27.5820014Z"}],"timeCreated":"2024-03-07T16:29:32.5891688Z"}]/2024 16:29:42> Sending message: 59, Body: [{"machine":{"temperature":43.373678873583515,"pressure":3.548900124838628},"ambient":{"temperature":21.38440062985029,"humidity":26},"timeCreated":"2024-03-07T16:29:42.6160661Z"}]ture":43.12502296484308,"pressure":3.5205722365011107},"ambient":{"temperature":21.100673056952967,"humidity":25},"timeCreated":"2024-03-07T16:29:47.6232132Z"}]{"temperature":20.647152324741313,"humidity":25},"timeCreated":"2024-03-07T16:29:52.6292375Z"}]07T16:29:57.6351274Z"}]message: 64, Body: [{"machine":{"temperature":46.73577705944691,"pressure":3.931923968797749},"ambient":{"temperature":20.813264608529053,"humidity":24},"timeCreated":"2024-03-07T16:30:07.6519389Z"}]e":4.052083895753128},"ambient":{"temperature":21.21195736188067,"humidity":26},"timeCreated":"2024-03-07T16:30:12.6573986Z"}]ity":26},"timeCreated":"2024-03-07T16:30:17.6631053Z"}]�鐎������        03/07/2024 16:30:27> Sending message: 68, Body: [{"machine":{"temperature":50.95248060461714,"pressure":4.4123079169817},"ambient":{"temperature":21.079125414406473,"humidity":26},"timeCreated":"2024-03-07T16:30:27.67446Z"}]ture":51.153521366605304,"pressure":4.435211294929719},"ambient":{"temperature":20.975383471453274,"humidity":24},"timeCreated":"2024-03-07T16:30:32.6826512Z"}]ature":21.309685255777875,"humidity":24},"timeCreated":"2024-03-07T16:30:37.6874889Z"}]:42.6927119Z"}]y: [{"machine":{"temperature":51.830834090281655,"pressure":4.5123735039561375},"ambient":{"temperature":21.133466716685085,"humidity":25},"timeCreated":"2024-03-07T16:30:52.7033453Z"}]132397309501},"ambient":{"temperature":21.047747771510736,"humidity":26},"timeCreated":"2024-03-07T16:30:57.7085823Z"}]ty":26},"timeCreated":"2024-03-07T16:31:02.7140019Z"}]03/07/2024 16:31:12> Sending message: 77, Body: [{"machine":{"temperature":53.70620111397289,"pressure":4.72602291171843},"ambient":{"temperature":20.745733897781808,"humidity":25},"timeCreated":"2024-03-07T16:31:12.7258105Z"}]ture":53.463981601625655,"pressure":4.6984282837295055},"ambient":{"temperature":20.5911805411294,"humidity":25},"timeCreated":"2024-03-07T16:31:17.7311468Z"}]ature":20.58075876304915,"humidity":25},"timeCreated":"2024-03-07T16:31:22.7372934Z"}]:27.7472233Z"}]y: [{"machine":{"temperature":56.80980656845951,"pressure":5.079598216659944},"ambient":{"temperature":21.140747596342464,"humidity":25},"timeCreated":"2024-03-07T16:31:37.7563111Z"}]8826},"ambient":{"temperature":20.885630801033987,"humidity":24},"timeCreated":"2024-03-07T16:31:42.7599456Z"}],"timeCreated":"2024-03-07T16:31:47.7659759Z"}]/2024 16:31:57> Sending message: 86, Body: [{"machine":{"temperature":59.730110395341214,"pressure":5.4122910576971},"ambient":{"temperature":20.980893836114973,"humidity":24},"timeCreated":"2024-03-07T16:31:57.7740902Z"}].603062183993416,"pressure":5.397817210834693},"ambient":{"temperature":21.078969927774263,"humidity":24},"timeCreated":"2024-03-07T16:32:02.780709Z"}]0.55846333273615,"humidity":25},"timeCreated":"2024-03-07T16:32:07.7860725Z"}]
pi@ubuntu:~$ docker version
Client:
 Version:           24.0.9-1
 API version:       1.43
 Go version:        go1.20.13
 Git commit:        293681613032e6d1a39cc88115847d3984195c24
 Built:             Wed Jan 31 20:53:14 UTC 2024
 OS/Arch:           linux/arm64
 Context:           default
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/version": dial unix /var/run/docker.sock: connect: permission denied
pi@ubuntu:~$ sudo docker version
Client:
 Version:           24.0.9-1
 API version:       1.43
 Go version:        go1.20.13
 Git commit:        293681613032e6d1a39cc88115847d3984195c24
 Built:             Wed Jan 31 20:53:14 UTC 2024
 OS/Arch:           linux/arm64
 Context:           default

Server:
 Engine:
  Version:          24.0.9-1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.13
  Git commit:       fca702de7f71362c8d103073c7e4a1d0a467fadd
  Built:            Thu Feb  1 00:12:23 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.28-1
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12-1
  GitCommit:        51d5e94601ceffbbd85688df1c928ecccbfa4685
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
pi@ubuntu:~$

@wvangeem
Copy link

I experience the same behaviour at a customer.
On RPI with IoTEdge 1.4 special characters arrive in the container logs, causing the log to be unretrievable by iotedge logs <modulename>

image

@konichi3
Copy link

@cpuguy83 Have you been able to look at the logs provided by Noel?

@cpuguy83
Copy link
Member

I've spent a ton of time trying to reproduce the corruption which I've been unable to do.
I can see that the logs are indeed corrupted and I'm wondering if the logs were created before this upstream change: moby/moby@7493342

It doesn't look like this change ever got backported to moby 20.10, which some of the reports seem to have.
But even on moby 23.0, which has that patch, if the logs were already corrupted, we can't go back and fix them, they need to be deleted.

@nlcamp
Copy link
Contributor

nlcamp commented Apr 9, 2024

@WDoughty - I noticed that you're using an old version of moby-cli from before the fix that @cpuguy83 referenced in his above comment. Could you upgrade to the latest moby-cli and moby-engine, stop iotedge, remove any existing containers so that the old logs will be deleted, and restart iotedge?

sudo apt-get update
sudo apt-get install moby-engine moby-cli
sudo iotedge system stop
sudo docker rm <container-id-1> <container-id-2> ...
sudo iotedge system restart

While there may be a remaining moby issue related to log corruption when power is removed on raspberry pis, it's possible your issue is different and related to the bug that has been fixed.

Please let us know if you still see the issue after upgrading moby and clearing the logs with the steps listed above.

@WDoughty
Copy link
Author

@nlcamp Thanks for your help. I am not actually on a project using iot edge anymore.

@cpuguy83
Copy link
Member

I think I have an idea of what's happening here.
There's nothing really we can do to prevent it from happening in dockerd, however:

  1. We should be able to scan the log file on startup for trailing null bytes and truncate them.
  2. You can change your ext4 mount options to use "data=journal" which would disable ext4's "Delayed Allocation" feature and require all writes (data and metadata) to hit the journal first. There is going to be a performance hit for this, though.

On 2, what appears to be happening is on power loss the metadata for a given write is in the journal but the data for it is not, leading to these null bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants