Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNSSEC validation issues with generic/ubuntu1804 on Hyper-V #106

Open
Wenzel opened this issue Dec 16, 2019 · 8 comments
Open

DNSSEC validation issues with generic/ubuntu1804 on Hyper-V #106

Wenzel opened this issue Dec 16, 2019 · 8 comments

Comments

@Wenzel
Copy link

Wenzel commented Dec 16, 2019

Hi, I'm cross-posting this issue from hashicorp/vagrant#11256

Basically, everytime I try to use your image, it fails with a random network access issue in the guest.

Vagrant version

Vagrant 2.2.6

Host operating system

Windows 10 build 1909

Guest operating system

generic/ubuntu1804

Vagrantfile

Please look at the following repository:
https://github.com/Wenzel/vagrant-oswatcher

Provider: Hyper-V

Vagrant.configure("2") do |config|
    config.vm.box = "generic/ubuntu1804"
    config.vm.box_version = "2.0.6"
    config.vm.define "oswatcher"
    # OSWatcher local git repo
    oswatcher_path = "/path/to/oswatcher"

    config.vm.provider "hyper-v" do |hyperv|
        hyperv.vmname = "oswatcher"
        hyperv.cpus = 2
        hyperv.memory = 2048
        # allow nested virtualization
        hyperv.enable_virtualization_extensions = true
    end

    config.vm.synced_folder ".", "/vagrant"
    # make oswatcher local path accessible as /vagrant/oswatcher
    config.vm.synced_folder oswatcher_path, "/vagrant/oswatcher"

    config.vm.provision "ansible_local" do |ansible|
        ansible.compatibility_mode = "2.0"
        ansible.playbook = "ansible/playbook.yml"
        ansible.extra_vars = {
          'oswatcher_path': "/vagrant/oswatcher",
        }
    end
end

Debug output

I don't believe Vagrant's debug log are useful here,
since the box configuration seems to be the root cause of this issue.

Expected behavior

The box should have been provisionned without any issues.

Actual behavior

The Ansible playbook systematically fails because of a network issue:

  • Temporary failure resolving us.archive.ubuntu.com
  • 400 bad request
  • 404 not found

Or even before the Ansible playbook is executed, the network fails.
dns1
dns2
dns3

And if it manages to execute the playbook, it systematically fails here:
dns4

I had to add a retry statement in my Ansible task to force retry until the network is reachable:

    - name: install OSWatcher system dependencies
      package:
        name: "{{ item }}"
      with_items:
        - virtualenv
        - python3-virtualenv
        - libguestfs0
        - libguestfs-dev
        - python3-guestfs
        - python3-dev
        - pkg-config
        - libvirt-dev
      register: result
      until: result is succeeded
      retries: 3
      delay: 10

Result:
retry

Steps to reproduce

  1. git clone https://github.com/Wenzel/vagrant-oswatcher
  2. git clone https://github.com/Wenzel/oswatcher
  3. Edit vagrant-oswatcher/Vagrantfile and set the local path to oswatcher's repo
  4. vagrant up --provider hyperv

➡️ Any ideas on what could be the possible root cause of this ? (systemd-resolved, hardcoded DNS servers, IPv6 stack being disabled) ?

Note: my host network is absolutely fine, I only have issues like this with Vagrant and your image so far.

Thank you for providing an Hyper-V Ubuntu image guys !

@Wenzel
Copy link
Author

Wenzel commented Dec 22, 2019

Hi,

Looking at journalctl in the VM, I can see DNS failures. (NXDOMAIN and DNSSEC validation)
dns_failure

Maybe systemd-resolved is configured to validate DNSSEC and us.archive.ubuntu.com has a faulty configuration ?

@ladar
Copy link
Member

ladar commented Apr 30, 2020

Yeah, this is part of the discussion here ... but it sounds like hardcoded public DNS servers might be overriden, and your local DNS servers don't support/properly forward DNSSEC information.

I'm open to suggestions for improvement? I'd like to avoid disabling DNSSEC validation, since that opens up security holes.

@ladar ladar changed the title Network is unrealiable in generic/ubuntu1804 box DNSSEC validation issues with generic/ubuntu1804 on Hyper-V Apr 30, 2020
@Wenzel
Copy link
Author

Wenzel commented May 12, 2020

I'm open to suggestions for improvement? I'd like to avoid disabling DNSSEC validation, since that opens up security holes.

Well, I always prefer to have a reliable DNS resolution when I'm building a VM with Vagrant, even if the security is not state of the art.

With this image I'm having trust issue everytime I type vagrant up... will it fail randomly 30 min later ?

You can always build security later on, but reliability is more pressing issue I think.

@ladar
Copy link
Member

ladar commented May 12, 2020

@Wenzel I'm a big fan of reliability, and I've wrestled with vagrant quite a bit,, with that goal in mind. Sadly, the number of host, guest, provider, vagrant combinations is massive. If you had multiple versions, and network setups into the mix, the combinations become infinite. As a result, the boxes I use more, on the platforms I use most tend to have the most bug fixes, workarounds.

Long story short... can you tell me what files inside the guest need to be changed (and changed to what)? And which boxes the change should be applied to?

If there is an opportunistic setting I'm happy to incorporate it.

@NightTsarina
Copy link

I have spent most of the day today trying to work around this issue. I am creating vagrant instances using generic/ubuntu2004, and I have not yet found a way to use the SSHFS plugin, because it tries to install sshfs before I have a chance to run a provisioning script to disable DNSSEC.

The fix is just to set DNSSEC=no in /etc/systemd/resolved.conf, but I can't find a way to do this early enough. It seems to me that this would be a sane default for the robox images, as many users will have difficult-to-solve DNS issues because of it.

@yajo
Copy link

yajo commented Sep 8, 2021

FTR I've lost 3 work days until I got here.

@jerrac
Copy link

jerrac commented Jan 10, 2022

I just ran into this issue with generic/ubuntu2004 on libvirtd. It looks like the ubuntu defaults are changed here:

sed -i -e "s/#DNSSEC=.*/DNSSEC=yes/g" /etc/systemd/resolved.conf
As are the defaults in all the other scripts/< ubuntu version >/network.sh scripts.

Is the issue actually that the dns server that sits between the vm and host isn't able to use dnssec? As in the dnsmasq instance libvirtd sets up, or whatever windows hyper-v does.

Any ideas how we could test that?

@jerrac
Copy link

jerrac commented Jan 10, 2022

Judging from the man page, dnsmasq needs to be configured to support dnssec.

My libvirt dnsmasq conf file doesn't do that.

cat /var/lib/libvirt/dnsmasq/dockerstack1.conf
##WARNING:  THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
##OVERWRITTEN AND LOST.  Changes to this configuration should be made using:
##    virsh net-edit dockerstack1
## or other application using the libvirt API.
##
## dnsmasq conf file created by libvirt
strict-order
user=libvirt-dnsmasq
pid-file=/run/libvirt/network/dockerstack1.pid
except-interface=lo
bind-dynamic
interface=virbr4
dhcp-range=172.16.250.1,172.16.250.254,255.255.255.0
dhcp-no-override
dhcp-authoritative
dhcp-lease-max=254
dhcp-hostsfile=/var/lib/libvirt/dnsmasq/dockerstack1.hostsfile
addn-hosts=/var/lib/libvirt/dnsmasq/dockerstack1.addnhosts

And, according to ps, the running dnsmasq instance doesn't have the --dnssec flag set.

libvirt says you can pass custom config into the dnsmasq files. But I think you'd need to also configure some trust anchors. So I'm not really inclined to try and make that work when I'm only using vagrant in a dev environment.

Anyway, I'm torn on what the actual solution should be. If I were working in a production environment, I'd definitely want to keep things more secure and put the effort out to configure dnssec. But my primary use of vagrant is dev, so I really don't want to put that kind effort out. I guess it boils down to what the primary use of these boxes is. Are they meant to be used in production? Or mostly just dev?

For now I'll just manually update things to not use dnssec.

Hopefully my research helps someone down the line. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants