Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cockpit fails to launch in pods #20829

Open
jrafanie opened this issue Nov 18, 2020 · 7 comments
Open

Cockpit fails to launch in pods #20829

jrafanie opened this issue Nov 18, 2020 · 7 comments

Comments

@jrafanie
Copy link
Member

I'm not sure if this works in appliances but in pods, after #20827 #20823, opening here since the problematic code is in core..

{"@timestamp":"2020-11-18T22:51:15.378843 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"info","message":"MiqCockpitWsWorker::Runner started. ID [24], PID [1293], GUID [88d8560d-9edf-4a85-8fe5-758e78ee4de4], Zone [default], Role [automate,cockpit_ws,database_operations,database_owner,ems_inventory,ems_operations,event,remote_console,reporting,scheduler,smartstate,user_interface,web_services]"}
[----] I, [2020-11-18T22:51:15.379001 #1293:2ab9f2e63968]  INFO -- : MiqCockpitWsWorker::Runner started. ID [24], PID [1293], GUID [88d8560d-9edf-4a85-8fe5-758e78ee4de4], Zone [default], Role [automate,cockpit_ws,database_operations,database_owner,ems_inventory,ems_operations,event,remote_console,reporting,scheduler,smartstate,user_interface,web_services]
[----] I, [2020-11-18T22:51:15.427438 #1293:2ab9f2e63968]  INFO -- : MIQ(MiqCockpitWsWorker::Runner#stop_drb_service) MIQ(MiqCockpitWsWorker::Runner) stopped drb Process at
[----] I, [2020-11-18T22:51:15.437525 #1293:2ab9f2e63968]  INFO -- : MIQ(MiqCockpitWsWorker::Runner#start_drb_service) MIQ(MiqCockpitWsWorker::Runner) Started drb Process at drbunix:///tmp/cockpit20201118-1293-gp32ay
[----] I, [2020-11-18T22:51:15.437711 #1293:2ab9f2e63968]  INFO -- : MIQ(MiqCockpitWsWorker::Runner#start_cockpit_ws) MIQ(MiqCockpitWsWorker::Runner) Starting cockpit-ws Process
[----] I, [2020-11-18T22:51:15.437848 #1293:2ab9f2e63968]  INFO -- : MIQ(MiqCockpitWsWorker::Runner#cockpit_ws_run) MIQ(MiqCockpitWsWorker::Runner) cockpit-ws process starting
{"@timestamp":"2020-11-18T22:51:15.427282 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"info","message":"MIQ(MiqCockpitWsWorker::Runner#stop_drb_service) MIQ(MiqCockpitWsWorker::Runner) stopped drb Process at "}
{"@timestamp":"2020-11-18T22:51:15.437346 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"info","message":"MIQ(MiqCockpitWsWorker::Runner#start_drb_service) MIQ(MiqCockpitWsWorker::Runner) Started drb Process at drbunix:///tmp/cockpit20201118-1293-gp32ay"}
{"@timestamp":"2020-11-18T22:51:15.437627 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"info","message":"MIQ(MiqCockpitWsWorker::Runner#start_cockpit_ws) MIQ(MiqCockpitWsWorker::Runner) Starting cockpit-ws Process"}
{"@timestamp":"2020-11-18T22:51:15.437763 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"info","message":"MIQ(MiqCockpitWsWorker::Runner#cockpit_ws_run) MIQ(MiqCockpitWsWorker::Runner) cockpit-ws process starting"}
[----] E, [2020-11-18T22:51:15.448402 #1293:2ab9f2e63968] ERROR -- : AwesomeSpawn: which exit code: 1
[----] E, [2020-11-18T22:51:15.448608 #1293:2ab9f2e63968] ERROR -- : AwesomeSpawn: which: no apachectl in (/opt/manageiq/manageiq-gemset/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)

{"@timestamp":"2020-11-18T22:51:15.448188 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"err","message":"AwesomeSpawn: which exit code: 1"}
{"@timestamp":"2020-11-18T22:51:15.448510 ","hostname":"orchestrator-6b56cdc5fd-plh9j","pid":1293,"tid":"2ab9f2e63968","level":"err","message":"AwesomeSpawn: which: no apachectl in (/opt/manageiq/manageiq-gemset/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)\n"}

I think it's failing in code that's expecting access to apachectl or apache in general:

has_apache = current_ui_server?(miq_server) ? MiqEnvironment::Command.supports_apache? : true

The monitor thread launches but the runner seems to fail. Thankfully, it doesn't look to constantly restart.

Related to ManageIQ/manageiq-pods#531
and ManageIQ/manageiq-pods#595

@jrafanie
Copy link
Member Author

Note, see this comment showing how cockpit still doesn't work in pods, but at least it doesn't thrash the system anymore.

@kbrock
Copy link
Member

kbrock commented Jun 4, 2021

Users or administrators should never need to start this program as it automatically started by systemd(1) on bootup
ref

cockpit will then run various services to administer this server.
I need to look a little closer to see if our goal is to run the bridge, which allows us to skip security. we may have wanted to do this since we can use our user look up tables to lookup security and not rely upon the os... not saying this is the way we want to go but offering a guess as to the intent.

I don't think we want this running on our main server and not as a local custom service
They do implement single signon type logic but that is via ipa

@Fryguy
Copy link
Member

Fryguy commented Jun 24, 2021

Wonder if it's mostly because we never merged ManageIQ/manageiq-pods#97

@Fryguy
Copy link
Member

Fryguy commented Jun 24, 2021

To be more accurate, we don't have apache in the pods either, so that makes sense.

@Fryguy
Copy link
Member

Fryguy commented Jun 24, 2021

So I dug into this with @jrafanie and this is how it works, more or less:

The cockpit integration is more or less like a remote console where users can proxy cockpit traffic to some other machine (in our case a Vm, Host, or a ContainerNode) through the appliance that has the cockpit role.

When someone turns on the cockpit role, a thread is started (used to be a full blown worker, but now it's just a thread). The thread eventually checks that Apache is available [1], but that's mostly not important anymore because we have the apache config baked into our appliance [2]. In the past that configuration was actually dynamically generated, but not any longer. This is what is currently failing in pods, because which is not available. That seems like a rather simple fix to include which, but it will likely still fail as we move forward because of what else it does.

Eventually it will try to start cockpit-ws as a child process. What this tool does is start a local webservice on port 9002 and is designed to proxy cockpit traffic to other systems. With the local appliance's apache config set to redirect to localhost:9002 [3], this effectively exposes that webservice through our Apache instance on the appliance.

In the ManageIQ UI, a button is tied to the cockpit of a Vm, Node, or ContainerNode by presenting a URL that looks roughly like https://<hostname_of_miq_server_with_cockpit_role>/cws/=<ip_or_hostname_of_remote_server>. That URL goes through that appliance's Apache, redirects to its localhost:9002, and cockpit-ws sends that traffic over to the real server, and then it's all reverse proxied back to the user.

So, overall, the cockpit integration is, IMO, a glorified remote console, just instead of binary console traffic it's cockpit https traffic being proxied through the appliance that has the cockpit role. The rationale for it makes sense and was described in #12506

ManageIQ currently links to cockpit by providing a web interface button. This takes users to https://domain.or.ip:9090. This does not work well for many common setups. Because

  1. The target server must be reachable by the end-users machine via the browser. This doesn't work when the target servers are not routeable from the users network or behind firewalls where port 9090 is not exposed publicly.
  2. The target server needs to expose a certificate that the user's browser trusts. This can be problematic especially when addressing machines directly by IP. Asking users to accept self-signed certificates is not good practice.

I sort of lump this together conceptually with other remote consoles. So, the question is, do we keep it or remove it? If we keep it, how can we do this in podified? In my opinion, since I see it like another remote console, whatever decision we make probably has similar rationales for keeping or removing other remote consoles as well.

If we keep it, I think what we should do is either bake this into the remote console worker instead of in the manageiq-orhcestrator where it currently lives as a thread, or we should expose it as a separate worker that the httpd container can route to. Either way, that would allow us to keep some parity between podified and appliances. We will probably need to also investigate if cockpit-ws itself can be run inside a container, and more importantly as non-root.

cc @chessbyte @agrare @jrafanie @kbrock

@jrafanie
Copy link
Member Author

This is what is currently failing in pods, because which is not available. That seems like a rather simple fix to include which, but it will likely still fail as we move forward because of what else it does.

Just to clarify, which exists in the podified cockpit worker (a thread in the server's monitor code) but apachectl doesn't:

AwesomeSpawn: which: no apachectl in (/opt/manageiq/manageiq-gemset/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)

@miq-bot
Copy link
Member

miq-bot commented Feb 27, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@Fryguy Fryguy removed the stale label Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants