Overview
PKS 1.1.5 was recently released and has a number of important bug fixes and improvements. These include:
- Support for NSX-T 2.2
- NCP 2.2.1
- TLS support for Kubernetes Ingress
- NCP is no longer a Kubernetes Pod and is now a Linux process running on the master nodes
- NCP no longer creates duplicate virtual servers when restarted, which would happen when a master VM was restarted. This was problematic since the LB could only support 10 virtual servers. Once this limit was reached you would no longer be able to created Kubenetes load balanced servers or ingresses.
- All virtual servers are now removed when deleting multi-port Kubernetes services. Previously virtual servers would be left behind, which again would cause the LB to hit the maximum of 10 virtual servers.
- Running the pks delete-cluster command will now cleanup NSX-T related resources even if the cluster is in a bad state. Previously this required running the PKS NSX-T cleanup script.
See the release notes for more information.
Recovering from some of these issues required running multiple API calls and was kind of a pain. I’ve been putting the 1.1.5 release through its paces in multiple lab environments and it’s resolved all of the issues that I was running into.
This walk through will show how to upgrade from PKS 1.1.x to 1.1.5.
Upgrade Checklist
- Read the Release Notes
- Read the Documentation
- Verify the health of the current environment:
- Run kubectl get nodes for all Kubernetes context and verify that all all nodes are in a ready state.
- Run kubectl get pods –all-namespaces for all Kubernetes context and verify that all pods are running.
- Run bosh -d service-instance <UUID> instances –ps for each BOSH Kubernetes deployment and verify that all the processes are in a running state.
- Make sure there are no issues at the IaaS layer. If using vSphere, verify that datastores have enough space, hosts have enough memory, there are no alarms, hosts are in a good state, etc.
- Backup the environment
Files Used
- PKS CLI – Linux and/or Windows
- kubectl 1.10.5 – Linux and/or Windows
- Pivotal Container Service – 1.1.5-build.4
- Stemcell 3586.36
Upgrade the PKS Tile
In the Ops Manager portal, select Import a Product, browser to the PKS file and select it. When using Chrome, you can monitor the upload progress in the status bar:
It can take a while once it gets to Waiting for 10.40.14.3…
Once it’s finished you’ll need to select the + sign to add the product:
Import Stemcell
This release of PKS requires Stemcell 3586.36. After you download the stemcell, select Stemcell Library and then Import Stemcell.
Browse to where you downloaded the stemcell, select it and then select Apply Stemcell to Products.
Verify that stemcell is applied to PKS:
Now the dashboard should be all green:
Upgrade the worker node size
- Navigate to the Ops Manager Installation Dashboard.
- Click the Pivotal Container Service tile.
- Click Plan 1.
- Under Worker VM Type, select a K8 worker VM type with a minimum disk size of 16 GB.
Verify the NSX-T Manager CA Cert settings
- Navigate to the Ops Manager Installation Dashboard.
- Click the Pivotal Container Service tile.
- Click Networking
- Under NSX Manager CA Cert make sure you either have a valid NSX-T manager cert or check Disable SSL certificate verification but not both.
Apply Changes
In the upper-right of the Ops Manager portal you should see the pending changes, which include updating PKS. Select Apply Changes to upgrade the environment.
Post Upgrade Checklist
Verify the health of the current environment:
- Run kubectl get nodes for all Kubernetes context and verify that all all nodes are in a ready state.
- Run kubectl get pods –all-namespaces for all Kubernetes context and verify that all pods are running.
- Run bosh -d service-instance <UUID> instances –ps for each BOSH Kubernetes deployment and verify that all the processes are in a running state.
NCP Changes
NCP will be running as a bosh host process starting in PKS 1.1.5. Each master VM will have one NCP process running. One NCP process will be active and the others will be in standby.
Check NCP Process
Use BOSH to ssh into the master node and run monit summary
The Monit daemon 5.2.5 uptime: 8d 21h 33m
Process 'kube-apiserver' running Process 'kube-controller-manager' running Process 'kube-scheduler' running Process 'etcd' running Process 'blackbox' running Process 'ncp’ running <<<<<<< this is the NCP process Process 'bosh-dns' running Process 'pks-helpers-bosh-dns-resolvconf' running System 'system_localhost’ running
Check if the NCP process in this master is active or standby
Use BOSH to ssh into the master node and run /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status
This instance is the NCP master
Current NCP Master id is 03631258-f37d-41f7-8d78-9e4233995a23
Current NCP Instance id is 03631258-f37d-41f7-8d78-9e4233995a23
Last master update at Thu Aug 23 17:26:57 2018
Restart NCP service
Use BOSH to ssh into the master node and run monit restart ncp
Monitor the restart status with monit summary
The Monit daemon 5.2.5 uptime: 8d 21h 36m
Process 'kube-apiserver' running Process 'kube-controller-manager' running Process 'kube-scheduler' running Process 'etcd' running Process 'blackbox' running Process 'ncp' not monitored - restart pending <<<<<<< NCP is restarting Process 'bosh-dns' running Process 'pks-helpers-bosh-dns-resolvconf' running System 'system_localhost' running
Note: Restarting the NCP process will also trigger cache rebuild.