Tech Team Lead News: pods

Introduction

During a recent project we saw that HTTP requests are still arriving in pods (Spring Boot MVC controllers) even though Kubernetes' kubelet told the pod to exit by sending it a SIGTERM.

Not nice, because that means that those HTTP requests that still get routed to the (shutting down) pod will most likely fail, since the Spring Boot Java process for example has already closed already all its connection pools.

See this post (also shown below) for an overview of the Kubernetes architecture, e.g regarding kubelets.

Analysis

The process for Kubernetes to terminate a pod is as follows:

The kubelet always sends a SIGTERM before a SIGKILL.
Only when a POD does not finish within the graceful period (default 30 sec) after SIGTERM, the kubelet sends a SIGKILL.
Kubernetes keeps routing traffic to a pod until the readiness probe fails, even after the pod received a SIGTERM.

So for a pod there is always an interval between receiving the SIGTERM and the next readiness probe request for that pod. In that period requests can (and most likely) will still be routed to that pod, and even (business) logic can still be executed in the terminated pod.

This means that after sending the SIGTERM, the readiness probes must fail as soon as possible to prevent the SIGTERMed pod from receiving more HTTP requests. But still there will be a (small) period of time requests can be routed to the pod.

A solution would be to terminate the webserver within the pod's process (in this case Spring Boot's webserver) immediately gracefully after receiving a SIGTERM. This way any still directed requests before the readiness probe fails will fail in any way, i.e no more requests are accepted.

So still you would have some failing requests getting passed on to the pod. But at least no business logic will be executed anymore.

This and other options/considerations are discussed here.

Introduction

Originally I was connected from a Windows 10 machine via OpenVPN to a network (segment?) for "our" project. I could access all servers and websites related to it. But when switching to another project (using the same OpenVPN settings) I could only access the new project's servers when at the premise of that project. At home or from any other place, I could not get to the servers, e.g Jenkins. The error shown was "This site can't be reached" in Chrome. See screenshot below for the exact error:

But I could get to the microservices pods directly by IP address, e.g 172.18.33.xyz (xyz are not the same in below example IP addresses, just obfuscators). So quite strange.

The administrator of the OpenVPN server didn't know how to fix the problem either. Suggested was to make sure "to route all IPV4 traffic through VPN". That made me search on the interwebs and I found below solution to work, without having to change any server settings. (I did not even have access to those server settings.)

Analyzing the problem

A) Trying the website with the hostname:
C:\Users\moi>tracert website.eu
Tracing route to website.eu [183.45.163.xyz] over a maximum of 30 hops:
1 1 ms 1 ms 1 ms MODEM [192.169.178.x]
2 20 ms 19 ms 20 ms d13.xs4all.com [195.109.5.xyz]
3 22 ms 22 ms 22 ms 3d13.xs4all.com [195.109.7.xyz]
...

B) Trying the well-known google gateway:
C:\Users\moi>tracert 8.8.8.8
Tracing route to google-public-dns-a.google.com [8.8.8.8] over a maximum of 30 hops:
1 1 ms 1 ms 1 ms MODEM [192.169.178.x]
2 21 ms 20 ms 21 ms d12.xs4all.com [195.109.5.xyz]
...

Hmm so its route goes via the same initial gateway for both external IPs and the hostname, so not via the VPN.

C) Trying with the IP that works (note not the IP for the hostname from above!):
C:\Users\moi>tracert 172.18.33.xyz
Tracing route to 172.18.33.xyz over a maximum of 30 hops
1 97 ms 21 ms 20 ms 192.169.200.xyz
2 45 ms 98 ms 29 ms 172.16.11.xyz
3 130 ms 65 ms 68 ms 172.16.11.xyz
...

As you can see, the first entrypoint gateway is a different one, and most likely the wrong one.

The solution

The solution was to add this to the .ovpn OpenVPN configuration file:

route-method exe
route-delay 2
redirect-gateway def1

For me even only the last line (redirect-gateway def1) was sufficient, but for others the other two lines had to be added too.

D) After adding the setting, you can see the IP of the gateway changed to, the what turns out to, be the correct one:
C:\Users\moi>tracert website.eu
Tracing route to website.eu [183.45.163.yyy] over a maximum of 30 hops:
1 143 ms 31 ms 21 ms 192.169.200.xyz
2 21 ms 20 ms 21 ms static.services.de [88.20.160.xyz]
3 21 ms 21 ms 25 ms 10.31.17.xyz
4 25 ms 21 ms 91 ms 10.31.17.xyz
...

References used:
- http://superuser.com/questions/120069/routing-all-traffic-through-openvpn-tunnel
- http://askubuntu.com/questions/665394/how-to-route-all-traffic-through-openvpn-using-network-manager

Tech Team Lead News

Friday, August 11, 2017

How do Kubernetes and its pods behave regarding SIGTERM, SIGKILL and HTTP request routing

Introduction

Analysis

Wednesday, January 18, 2017

OpenVPN how to route all IPV4 traffic through the OpenVPN tunnel

Introduction

Analyzing the problem

The solution

About Me

Subscribe via RSS

Subscribe via email

Twitter Follow Me

Useful Links

Total Pageviews

Live Traffic Map

Blog Archive