Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmm-agent.service causes Too many open files crash on MongoDB and memory leaks on MariaDB #3262

Open
1 task done
Bobzemob opened this issue Oct 22, 2024 · 0 comments
Open
1 task done
Assignees
Labels
bug Bug report

Comments

@Bobzemob
Copy link

Description

MongoDB

pmm-agent.service on MongoDB cluster opened 40,000+ pages, causing MongoDB to crash with Too many open files error.
This behavior occured on a MongoDB server cluster running version 6.0.18.

MariaDB

The pmm-agent.service on several MariaDB servers caused errors that coincided with the MySQL process grabbing multiple GB of memory and not releasing it until the MySQL process was restarted. After the MySQL process was restarted, the memory grabbing behavior continued on severs that had the pmm-agent.service active and stopped on the ones that had it disabled. (See logs below)
Affected servers were using MariaDB version 10.3.39.

Expected Results

Expected pmm agent to not cause crashing/memory leaks.

Actual Results

PMM agent opened too many pages on MongoDB servers causing the MongoDB process to crash.
Disabling pmm-agent caused the number of open pages to drop from 40,000+ to around 500.

PMM agent caused MariaDB to allocate multiple GBs of memory and not releasing said memory, eventually leading to a crash.
Disabling the pmm-agent service stopped this behavior from occurring.

Version

PMM Server 2.43.1
PMM Agent 2.43.1-6

Steps to reproduce

No response

Relevant logs

------- MariaDB-DB1 -------

Oct 19 20:26:09 wc-demo-db1 pmm-agent[7740]: time="2024-10-19T20:26:09.537-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'binlog_expire_logs_seconds
'\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runn
er.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/r
untime/asm_amd64.s:1695" component=runner id=/action_id/3ee7b8a6-c981-4a8c-8588-9386a2f2f49d type=mysql-query-select

Oct 19 20:26:48 wc-demo-db1 pmm-agent[7740]: time="2024-10-19T20:26:48.992-04:00" level=warning msg="Action terminated with error: Error 1146 (42S02): Table 'performance_schema.global_variables' doesn't
 exist\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent
/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/
src/runtime/asm_amd64.s:1695" component=runner id=/action_id/c5a00902-0466-48bd-9737-e96481ba336a type=mysql-query-select

Oct 19 20:27:08 wc-demo-db1 pmm-agent[7740]: time="2024-10-19T20:27:08.295-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'
\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runne
r.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/ru
ntime/asm_amd64.s:1695" component=runner id=/action_id/872cc710-2347-406a-9ff4-2dc860098e1d type=mysql-query-select

Oct 19 20:27:08 wc-demo-db1 pmm-agent[7740]: time="2024-10-19T20:27:08.303-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'
\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runne
r.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/ru
ntime/asm_amd64.s:1695" component=runner id=/action_id/4c68b175-81ee-477a-9843-c8bca1b99c1b type=mysql-query-select

------- MariaDB-DB2 -------

Oct 19 20:26:03 wc-demo-db2 pmm-agent[1061]: time="2024-10-19T20:26:03.262-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'binlog_expire_logs_seconds
'\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runn
er.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/r
untime/asm_amd64.s:1695" component=runner id=/action_id/edf95dab-c4af-4f03-9ffb-aa33a4a0b7dc type=mysql-query-select


Oct 19 20:26:42 wc-demo-db2 pmm-agent[1061]: time="2024-10-19T20:26:42.635-04:00" level=warning msg="Action terminated with error: Error 1146 (42S02): Table 'performance_schema.global_variables' doesn't
 exist\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent
/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/
src/runtime/asm_amd64.s:1695" component=runner id=/action_id/8881d5c2-e578-456c-8743-6b3a7078a7c7 type=mysql-query-select


Oct 19 20:27:01 wc-demo-db2 pmm-agent[1061]: time="2024-10-19T20:27:01.915-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'
\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runne
r.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/ru
ntime/asm_amd64.s:1695" component=runner id=/action_id/a4bca5b4-b0a7-412c-985f-2be52b6f1cbc type=mysql-query-select


Oct 19 20:27:01 wc-demo-db2 pmm-agent[1061]: time="2024-10-19T20:27:01.923-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'
\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runne
r.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/ru
ntime/asm_amd64.s:1695" component=runner id=/action_id/374783d2-aeec-495d-a68b-9a521217d66c type=mysql-query-select

------- MariaDB-DB3 -------

Oct 19 20:17:32 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:17:32.657-04:00" level=warning msg="Action terminated with error: Error 1146 (42S02): Table 'performance_schema.replication_connection_con
figuration' doesn't exist\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.co
m/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexi
t\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/action_id/403df0eb-ac90-419d-9c38-0bf8e7d1e261 type=mysql-query-select

Oct 19 20:22:34 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:22:34.654-04:00" level=warning msg="Action terminated with error: context deadline exceeded\ngithub.com/percona/pmm/agent/runner/actions.(
*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:81\ngithub.com/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/g
ithub.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/ac
tion_id/8b49d229-5348-4ed3-8c79-12aa241e8dfa type=mysql-query-select

Oct 19 20:26:13 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:26:13.820-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'binlog_expire_logs_seconds'\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/action_id/01bdc6f9-7c8b-4045-b495-d27e255656d8 type=mysql-query-select

Oct 19 20:26:53 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:26:53.193-04:00" level=warning msg="Action terminated with error: Error 1146 (42S02): Table 'performance_schema.global_variables' doesn't exist\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/action_id/e8d6680c-8af9-4cf5-8cfb-7f7f0ce07e57 type=mysql-query-select

Oct 19 20:27:12 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:27:12.498-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/action_id/fff68a5f-1185-441d-944a-94eaa8521a55 type=mysql-query-select

Oct 19 20:27:12 wc-demo-db3 pmm-agent[930]: time="2024-10-19T20:27:12.506-04:00" level=warning msg="Action terminated with error: Error 1193 (HY000): Unknown system variable 'default_password_lifetime'\ngithub.com/percona/pmm/agent/runner/actions.(*mysqlQuerySelectAction).Run\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/actions/mysql_query_select_action.go:75\ngithub.com/percona/pmm/agent/runner.(*Runner).handleAction.func1\n\t/tmp/go/src/github.com/percona/pmm/agent/runner/runner.go:382\nruntime/pprof.Do\n\t/usr/local/go/src/runtime/pprof/runtime.go:51\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" component=runner id=/action_id/80935cc1-dc5a-4352-b8b3-c3d6b9894e98 type=mysql-query-select

------- MongoDB-DB1 -------

Oct 20 23:28:08 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:08.156+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=local type=qan_mongodb_profiler_agent
Oct 20 23:28:08 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:08.156+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=DatabaseProd type=qan_mongodb_profiler_agent
Oct 20 23:28:08 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:08.156+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=Database type=qan_mongodb_profiler_agent
Oct 20 23:28:09 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:09.479+00:00" level=error msg="time=\"2024-10-20T23:28:09Z\" level=error msg=\"Registry - Cannot get node type to check if this is
a mongos : server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: c
onnect: connection refused }, ] }\"" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.010+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"Registry - Cannot get node type to check if this is
a mongos : server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: c
onnect: connection refused }, ] }\"" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.157+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=DatabaseProd type=qan_mongodb_profiler_agent
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.157+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=Database type=qan_mongodb_profiler_agent
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.157+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, c
urrent topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-
a89b-441f-9982-13a39d4282ad component=agent-builtin db=local type=qan_mongodb_profiler_agent
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.384+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"error while checking mongodb connection: server sele
ction error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused },
] }. mongo_up is set to 0\" collector=general" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.401+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"Cannot get node type: server selection error: contex
t canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }\" component=dia
gnosticDataCollector" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.477+00:00" level=info msg="2024-10-20T23:28:10.476Z\twarn\tVictoriaMetrics/lib/promscrape/scrapework.go:387\tcannot scrape targe
t \"http://127.0.0.1:42000/metrics?collect%5B%5D=diagnosticdata&collect%5B%5D=replicasetstatus&collect%5B%5D=topmetrics\" ({agent_id=\"/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d\",agent_type=\"mongo
db_exporter\",cluster=\"BHProdCluster\",instance=\"/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d\",job=\"mongodb_exporter_agent_id_11bb5598-ae08-4a2a-a3a6-eb4d6145957d_hr\",machin..node_id=\"/node_id/c0273607-9363-4359-b2f9-d29b6ffc082d\",node_name=\"mongo-db1\",node_type=\"generic\",replication_set=\"bh-prod-rs\",service_id=\"/service_id/16332cef-07e6-424e-b998-7aa6884471ba\",service_name=\"mongo-db1-mongodb\",service_type=\"mongodb\"}) 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: error when scraping \"http://127.0.0.1:42000/metrics?collect%5B%5D=diagnosticdata&collect%5B%5D=replicasetstatus&collect%5B%5D=topmetrics\" with timeout 4.5s: timeout" agentID=/agent_id/3d2d20fb-419d-4940-9617-7ceb9b29820a component=agent-process type=vm_agent
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.478+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"error while checking mongodb connection: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }. mongo_up is set to 0\" collector=general" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.478+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"Cannot get node type: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }\" component=diagnosticDataCollector" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:10 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:10.972+00:00" level=error msg="time=\"2024-10-20T23:28:10Z\" level=error msg=\"Registry - Cannot get node type to check if this is a mongos : server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }\"" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:11 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:11.972+00:00" level=error msg="time=\"2024-10-20T23:28:11Z\" level=error msg=\"error while checking mongodb connection: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }. mongo_up is set to 0\" collector=general" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:12 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:12.158+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-a89b-441f-9982-13a39d4282ad component=agent-builtin db=DatabaseProd type=qan_mongodb_profiler_agent
Oct 20 23:28:12 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:12.158+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-a89b-441f-9982-13a39d4282ad component=agent-builtin db=local type=qan_mongodb_profiler_agent
Oct 20 23:28:12 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:12.158+00:00" level=error msg="couldn't create system.profile iterator, reason: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }" agentID=/agent_id/2d65dcef-a89b-441f-9982-13a39d4282ad component=agent-builtin db=Database type=qan_mongodb_profiler_agent
Oct 20 23:28:30 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:30.460+00:00" level=error msg="time=\"2024-10-20T23:28:30Z\" level=error msg=\"Registry - Cannot get node type to check if this is
a mongos : server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: c
onnect: connection refused }, ] }\"" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:30 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:30.477+00:00" level=info msg="2024-10-20T23:28:30.477Z\twarn\tVictoriaMetrics/lib/promscrape/scrapework.go:387\tcannot scrape targe
t \"http://127.0.0.1:42000/metrics?collect%5B%5D=diagnosticdata&collect%5B%5D=replicasetstatus&collect%5B%5D=topmetrics\" ({agent_id=\"/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d\",agent_type=\"mongo
db_exporter\",cluster=\"BHProdCluster\",instance=\"/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d\",job=\"mongodb_exporter_agent_id_11bb5598-ae08-4a2a-a3a6-eb4d6145957d_hr\",machin..node_id=\"/node_id/c
0273607-9363-4359-b2f9-d29b6ffc082d\",node_name=\"mongo-db1\",node_type=\"generic\",replication_set=\"bh-prod-rs\",service_id=\"/service_id/16332cef-07e6-424e-b998-7aa6884471ba\",service_name=\"bh-fwd
c-db1-mongodb\",service_type=\"mongodb\"}) 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: error when scraping \"http://127.0.0.1:42000/metrics?collect%5B%5D=diagnostic
data&collect%5B%5D=replicasetstatus&collect%5B%5D=topmetrics\" with timeout 4.5s: timeout" agentID=/agent_id/3d2d20fb-419d-4940-9617-7ceb9b29820a component=agent-process type=vm_agent
Oct 20 23:28:30 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:30.656+00:00" level=error msg="time=\"2024-10-20T23:28:30Z\" level=error msg=\"error while checking mongodb connection: server sele
ction error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused },
] }. mongo_up is set to 0\" collector=general" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:30 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:30.673+00:00" level=error msg="time=\"2024-10-20T23:28:30Z\" level=error msg=\"error while checking mongodb connection: server sele
ction error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused },
] }. mongo_up is set to 0\" collector=general" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter
Oct 20 23:28:30 mongo-db1 pmm-agent[4189141]: time="2024-10-20T23:28:30.690+00:00" level=error msg="time=\"2024-10-20T23:28:30Z\" level=error msg=\"Cannot get node type: server selection error: contex
t canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-db1.com:27017, Type: Unknown, Last error: dial tcp 172.27.238.77:27017: connect: connection refused }, ] }\" component=dia
gnosticDataCollector" agentID=/agent_id/11bb5598-ae08-4a2a-a3a6-eb4d6145957d component=agent-process type=mongodb_exporter

Code of Conduct

  • I agree to follow Percona Community Code of Conduct
@Bobzemob Bobzemob added the bug Bug report label Oct 22, 2024
@Bobzemob Bobzemob changed the title pmm-agent.service causes Too many open file crash on MongoDB and memory leaks on and MariaDB pmm-agent.service causes Too many open files crash on MongoDB and memory leaks on MariaDB Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants