Central on Prem
As with other things, the domain is mobility.nis.vt.edu
.
For example, the hostname central
has the FQDN central.mobility.nis.vt.edu
.
Hostname | Interface | IPv4 |
---|---|---|
central | ens1f0 | 198.82.169.222/24 |
central-node-1 | ens1f0 | 198.82.169.223/24 |
central-node-2 | ens1f0 | 198.82.169.224/24 |
central-node-3 | ens1f0 | 198.82.169.225/24 |
central-node-4 | ens1f0 | 198.82.169.226/24 |
central-node-5 | ens1f0 | 198.82.169.227/24 |
Additional VIP hostnames:
central-central
apigw-central
ccs-user-api-central
sso-central
POD IP Range: 10.0.0.0/16
Service IP Range: 10.1.0.0/16
iLO Configuration
Access credentials
- Local credentials only
- See password repository for details
Network
iLO Dedicated Network Port > IPv4:
- Not posting IPs because iLO is hella insecure. They are documented in the NEO password repo.
- DNS:
172.19.128.3
- IPv6 is currently not configured.
iLO Dedicated Network Port > SNTP:
- Disable DHCPv4/6 Supplied Time Settings
- Disable Propagate NTP Time to Host
- Primary Time Server: 172.19.131.253
- Secondary Time Server: conehead or grub
- Time Zone: Bogota, Lima, Quito, Easter Time(US & Canada) (GMT-05:00:00) NOTE: changing SNTP values will likely require an iLO reset.
Monitoring
SNMP
Management > SNMP Settings:
- System location: ISB 118
- System contact: nis-wifi-g@vt.edu
- System role: Central on Prem
- System Role Detail: Node 1, Node 2, ...
- Disable SNMPv1
- SNMPv3 Users:
- Security Name: nisnmp
- See password repo for credentials
- User Engine ID: blank
- SNMP Alert Destinations:
- akips.nis.ipv4.vt.edu
- Trap Community: blank
- SNMP Protocol: SNMPv3 Inform
- SNMPv3 User: nisnmp
Syslog
Management > Remote SNMP:
- Enable iLO Remote Syslog
- Remote Syslog Port: 514
- Remote Syslog Server: akips.nis.ipv4.vt.edu
Disable iLO Federation
iLO Federation > Setup:
- Delete the default group
- Disable multicast options:
- iLO Federation Management
- Multicast Discovery
IPv6
IPv6 is not supported at all. There is no way to configure an IPv6 address. Not only that, but when configuring the networks settings, we see:
Created symlink /etc/systemd/system/basic.target.wants/disable-ipv6.service → /etc/systemd/system/disable-ipv6.service.
smtp
Allowlist for mailrelay.smtp.vt.edu:
198.82.169.222,central.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.223,central-node-1.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.224,central-node-2.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.225,central-node-3.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.226,central-node-4.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.227,central-node-5.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
Parts for redundancy
iLO Administrator and firmware password
The iLO "Administrator" account uses a password derived from the baseband serial number. This is done by the COP installation media. The same password is used for access to the firmware interface.
NOTE: This means that the serial numbers of the nodes are sensitive information! They are stored in the NEO password vault.
The script itself derives the password with the following commands (and some unnecessary file and variable creation...):
dmidecode -t baseboard \
| grep Serial \
| grep -o '[^ ]\+$' \
| md5sum \
| grep -Eo '^[^ ]+' \
| cut -c1-8
We can simplify this to:
dmidecode -s baseboard-serial-number | md5sum | head -c 8
Managing the RAID from a live environment
HPE has a variation of secure boot enabled, so we cannot just boot to whatever we want. However, secure boot is just looking for something signed by Canonical... so just grab Ubuntu and be off. Other distros signed with common keys may or may not work, but COP is built on Ubuntu 18.04, so that is the least likely to cause issues.
Unlike the COP ISO, the Ubuntu image can be dd
'd to a USB drive to create a
bootable media.
iLO can also be used to mount virtual media to boot from.
Add HPE repositories
The ssacli
utility allows us to reconfigure the RAID setup.
The best way to get this is by adding the HPE software delivery repository
Management Component Pack.
/etc/apt/sources.list.d/mcp.list
:
# HPE Management Component Pack
deb https://downloads.linux.hpe.com/SDR/repo/mcp bionic/current non-free
Now, install the keys:
curl https://downloads.linux.hpe.com/SDR/hpPublicKey2048.pub | sudo apt-key add -
curl https://downloads.linux.hpe.com/SDR/hpPublicKey2048_key1.pub | sudo apt-key add -
curl https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub | sudo apt-key add -
Then update the repositories:
sudo apt update
Convert array to RAID 10
This will take a long time. If building a new system, create a new array instead of migrating an existing one.
# ssacli
=> ctrl slot=0 ld 1 add drives=allunassigned
=> ctrl slot=0 ld 1 show status
logicaldrive 1 (3.49 TB, RAID 0): Transforming, 0.83%
=> ctrl slot=0 ld 1 show status
logicaldrive 1 (3.49 TB, RAID 0): Transforming, 0.83%
=> ctrl slot=0 ld 1 modify raid=1+0
=> ctrl slot=0 ld 1 show status
logicaldrive 1 (3.49 TB, RAID 1+0): Transforming, 0.07%
=> ctrl slot=0 ld 1 show status
logicaldrive 1 (3.49 TB, RAID 1+0): OK
=>
Build a new RAID 10 array
This is a destructive process, but much faster than migrating an array. It is necessary to install COP from an ISO afterwards.
# ssacli
=> ctrl slot=0 ld 1 delete
[confirm]
=> ctrl slot=0 create type=ld drives=allunassigned raid=1+0
=>
Drive replacement (RAID 0)
A failed drive in a RAID 0 array is catastrophic, thus re-installing COP from the ISO afterwards is required.
- Physically replace the bad drive with a good one
- Reboot the system
- Press
F9
during the boot to enter System Utilities, a BIOS like environment. You may need to pressF1
to continue past the warning message (telling you a drive has failed and been replaced). - Select "System Configuration"
- Select "Embedded RAID 1: HPE Smart Array P408i-a SR Gen 10"
- Select "Array Configuration"
- Select "Manage Arrays"
- Select "Array A"
- Select "List Logical Drives"
- Select "Logical Drive 1 (...)"
- Select "Re-Enable Logical Drive"
- Confirm that you want to Re-Enable the Logical Drive. We are not expecting the data to be recoverable.
- Exit the menus until you can exit the system utilities. Re-enabling the array does not count as a change, so there is no need to save.