Server fails to start after Centos 7 update

From Cor ad Cor
Jump to navigation Jump to search

The Problem

HTTPD failed after yum updated Centos 7.

SSH also failed.

There were over 400 updates or installations in over 700 steps during the update. A few that caught my eye just now in reviewing the log:

Updated: centos-release-7-4.1708.el7.centos.x86_64
Updated: firewalld-filesystem-0.4.4.4-6.el7.noarch
Updated: iptables-1.4.21-18.0.1.el7.centos.x86_64
Updated: 1:NetworkManager-libnm-1.8.0-9.el7.x86_64

Updated: device-mapper-persistent-data-0.7.0-0.1.rc6.el7.x86_64

Updated: initscripts-9.49.39-1.el7.x86_64

Updated: cronie-anacron-1.4.11-17.el7.x86_64

Installed: 1:NetworkManager-1.8.0-9.el7.x86_64
Installed: 1:NetworkManager-ppp-1.8.0-9.el7.x86_64

Updated: cloud-init-0.7.9-9.el7.centos.2.x86_64

Installed: 1:grub2-2.02-0.64.el7.centos.x86_64

Updated: 1:NetworkManager-tui-1.8.0-9.el7.x86_64
Updated: 1:NetworkManager-team-1.8.0-9.el7.x86_64

Installed: kernel-3.10.0-693.2.2.el7.x86_64

Fortunately, I was able to access the server through an emergency console via the Rackspace cloud server interface. The emergency console never let me down all through this process. I'm so grateful for that small mercy!

The boot.log showed the heart of the matter:

Starting LSB: Bring up/down netorking
Failed to start LSB: Bring up/down netowrking

It said to run systemctl status network.service for more details. The screen capture below shows the output from that command.

network.service failed to bring up networking because /etc/sysconfig/network-scripts/ifcfg-eth0 was changed from "BOOTPROTO static" to "BOOTPROTO dhcp". The crucial piece of information in the screen capture is that "Determining IP information for eth0... failed."

Sometime during the update, the configuration for eth0 was changed from static to dhcp. This is an old copy of ifcfg-eth0 from a server that I crashed in May of 2016. It tells eth0 that my IP address is "166.78.150.236". There is no way dhcp could figure that out by itself. It is a precious bit of information that must be fed into the system on boot.

# Automatically generated, do not edit

# Label public
DEVICE=eth0
BOOTPROTO=static
HWADDR=bc:76:4e:05:75:f4
IPADDR=166.78.150.236
NETMASK=255.255.255.0
DEFROUTE=yes
GATEWAY=166.78.150.1
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6ADDR=2001:4800:7812:0514:7cbc:4d9b:ff05:75f4/64
IPV6_DEFAULTGW=fe80::def%eth0
DNS1=72.3.128.241
DNS2=72.3.128.240
ONBOOT=yes
NM_CONTROLLED=no

I didn't save a copy of the mangled file. The crucial change was BOOTPROTO=dhcp. No IPADDR or GATEWAY or DNS servers were configured in the system-generated file, either.

SSH and HTTPD failed because eth0 was uninformed. My server was dead in the water.

The Solutions

Choose between network.service and NetworkManager.service

My system was trying to use network.service. Many advice pages recommend not trying to run both network.service and NetworkManager. See this one, for example.

Stopping NetworkManager and disabling it did not solve my problem as it did for that fellow.

I eventually decided to disable network.service and do my best to learn how NetworkManager works. I thought it was the direction that RedHat is taking. I'm not so sure any more.

use nmtui to examine eth1 and configure eth0

- "Network configuration using sysconfig files."
- nmcli connection reload
- nmcli dev disconnect interface-name
- nmcli con up interface-name

configure /etc/sysconfig/network-scripts/ifcfg-eth0 to be static

After a lot of tweaking and testing and rebooting, this is what I finally came up with for /etc/sysconfig/network-scripts/ifcfg-eth0:

#Label public

DEVICE=eth0
BOOTPROTO=static
HWADDR=BC:76:4E:05:85:D3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
IPADDR=162.242.144.228
NETMASK=255.255.255.0
PREFIX=24
GATEWAY=162.242.144.1
DNS1=72.3.128.241
DNS2=72.3.128.240
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV4_ROUTE_METRIC=0
IPV4_DNS_PRIORITY=100
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6_ADDR=2001:4800:7817:101:be76:4eff:fe05:85d3/64
IPV6_DEFAULTGW=fe80::def%eth0
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
# .... IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_DNS_PRIORITY=100
NAME="System eth0"
ONBOOT=yes
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
NM_CONTROLLED=yes

stop cloud-init from rewriting ifcfg-eth0

No matter how many times I edited ifcfg-eth0 to salt it with the IP, gateway, and DNS server addresses, every time I rebooted, the BOOTPROTO=dhcp file came back, along with a line at the top that said "Automatically generated, do not edit." I blame cloud-init for the rewriting, but I never could figure out where it was getting the bad configuration from. Similarly, /etc/resolv.conf was also being rewritten--quite stupidly, in fact, with the warning line being added multiple times.

To stop cloud-init from mangling eth0, I created /99-disable-network-config.cfg

network: {config: disabled}


start httpd after NetworkManager configures network

The final problem was that

#!/bin/sh
# This is a NetworkManager dispatcher script to turn http on
# when the network is ready.  MXM
# /etc/NetworkManager/dispatcher.d/22-httpd
# https://wiki.archlinux.org/index.php/NetworkManager#Network_services_with_NetworkManager_dispatcher

if [ "$2" = "up" ]; then
	systemctl start httpd
fi

if [ "$2" = "down" ]; then
	systemctl stop httpd
fi

exit 0