Server fails to start after Centos 7 update

From Cor ad Cor
Jump to navigation Jump to search

The Problem

HTTPD failed after yum updated Centos 7.

SSH also failed.

There were over 400 updates or installations in over 700 steps during the update. A few that caught my eye just now in reviewing the log:

Updated: centos-release-7-4.1708.el7.centos.x86_64
Updated: firewalld-filesystem-0.4.4.4-6.el7.noarch
Updated: iptables-1.4.21-18.0.1.el7.centos.x86_64
Updated: 1:NetworkManager-libnm-1.8.0-9.el7.x86_64

Updated: device-mapper-persistent-data-0.7.0-0.1.rc6.el7.x86_64

Updated: initscripts-9.49.39-1.el7.x86_64

Updated: cronie-anacron-1.4.11-17.el7.x86_64

Installed: 1:NetworkManager-1.8.0-9.el7.x86_64
Installed: 1:NetworkManager-ppp-1.8.0-9.el7.x86_64

Updated: cloud-init-0.7.9-9.el7.centos.2.x86_64

Installed: 1:grub2-2.02-0.64.el7.centos.x86_64

Updated: 1:NetworkManager-tui-1.8.0-9.el7.x86_64
Updated: 1:NetworkManager-team-1.8.0-9.el7.x86_64

Installed: kernel-3.10.0-693.2.2.el7.x86_64

Fortunately, I was able to access the server through an emergency console via the Rackspace cloud server interface. The emergency console never let me down all through this process. I'm so grateful for that small mercy!

The boot.log showed the heart of the matter:

Starting LSB: Bring up/down networking
Failed to start LSB: Bring up/down networking

It said to run systemctl status network.service for more details. The screen capture below shows the output from that command.

network.service failed to bring up networking because /etc/sysconfig/network-scripts/ifcfg-eth0 was changed from "BOOTPROTO=static" to "BOOTPROTO=dhcp". The crucial piece of information in the screen capture is that "Determining IP information for eth0... failed." "RTNETLINK answers: File exists" is just another symptom from the loss of the static definition of eth0.

Sometime during the update, the configuration for eth0 was changed from static to dhcp. This is an old copy of ifcfg-eth0 from a server that I crashed in May of 2016. It tells eth0 that my IP address is "166.78.150.236". There is no way dhcp could figure that out by itself. The IP addresses assigned to my server, the gateway, and the DNS servers are precious bits of information that must be fed into the system on boot.

# Automatically generated, do not edit

# Label public
DEVICE=eth0
BOOTPROTO=static
HWADDR=bc:76:4e:05:75:f4
IPADDR=166.78.150.236
NETMASK=255.255.255.0
DEFROUTE=yes
GATEWAY=166.78.150.1
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6ADDR=2001:4800:7812:0514:7cbc:4d9b:ff05:75f4/64
IPV6_DEFAULTGW=fe80::def%eth0
DNS1=72.3.128.241
DNS2=72.3.128.240
ONBOOT=yes
NM_CONTROLLED=no

I didn't save a copy of the mangled file. The crucial change was BOOTPROTO=dhcp. No IPADDR or GATEWAY or DNS servers were configured in the system-generated file, either.

SSH and HTTPD failed because eth0 was uninformed. My server was running OK in all other respects, but it was cut off from the internet.

The Solutions

Choose between network.service and NetworkManager.service

My system was trying to use network.service. Many advice pages recommend not trying to run both network.service and NetworkManager. See this one, for example.

Stopping NetworkManager and disabling it did not solve my problem as it did for that fellow.

I eventually decided to disable network.service and do my best to learn how NetworkManager works. I thought it was the direction that RedHat is taking. I'm not so sure any more.

use nmtui to examine eth1 and configure eth0

- "Network configuration using sysconfig files."
- nmcli connection reload
- nmcli dev disconnect interface-name
- nmcli con up interface-name

configure /etc/sysconfig/network-scripts/ifcfg-eth0 to be static

After a lot of tweaking and testing and rebooting, this is what I finally came up with for /etc/sysconfig/network-scripts/ifcfg-eth0:

#Label public

DEVICE=eth0
BOOTPROTO=static
HWADDR=BC:76:4E:05:85:D3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
IPADDR=162.242.144.228
NETMASK=255.255.255.0
PREFIX=24
GATEWAY=162.242.144.1
DNS1=72.3.128.241
DNS2=72.3.128.240
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV4_ROUTE_METRIC=0
IPV4_DNS_PRIORITY=100
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6_ADDR=2001:4800:7817:101:be76:4eff:fe05:85d3/64
IPV6_DEFAULTGW=fe80::def%eth0
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
# .... IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_DNS_PRIORITY=100
NAME="System eth0"
ONBOOT=yes
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
NM_CONTROLLED=yes

stop cloud-init from rewriting ifcfg-eth0

No matter how many times I edited ifcfg-eth0 to salt it with the IP, gateway, and DNS server addresses, every time I rebooted, the BOOTPROTO=dhcp file came back, along with a line at the top that said "Automatically generated, do not edit." I blame cloud-init for the rewriting, but I never could figure out where it was getting the bad configuration from. Similarly, /etc/resolv.conf was also being rewritten — quite stupidly, in fact, with the warning line being added multiple times.

To stop cloud-init from mangling eth0, I created /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with this single line in it:

network: {config: disabled}

That stopped the system from trashing my static configuration. Unfortunately, with network.services also stopped, it stopped my system from activating eth0 and eth1.

let NetworkManager manage the connections

In /etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1, I set "NM_CONTROLLED=yes". This pretty much ran contrary to all the advice I saw, but with network.services and cloud-init disabled, that is what it took to wake up the connections.

start httpd after NetworkManager configures network

The final problem was that the Apache daemon was called before the network was configured and online. This may not be the smartest or most elegant solution, but it's working at present for me. I created /etc/NetworkManager/dispatcher.d/22-httpd as shown below. The system calls it when NetworkManager thinks it is open for business.

#!/bin/sh
# This is a NetworkManager dispatcher script to turn http on
# when the network is ready.  MXM
# /etc/NetworkManager/dispatcher.d/22-httpd
# https://wiki.archlinux.org/index.php/NetworkManager#Network_services_with_NetworkManager_dispatcher

if [ "$2" = "up" ]; then
	systemctl start httpd
fi

if [ "$2" = "down" ]; then
	systemctl stop httpd
fi

exit 0