Unable to communicate between pods running on different nodes in Kubernetes

I have been building a distributed load testing application using Kubernetes and Locust (similar to this).

I currently have a multi-node cluster running on bare-metal (running on an Ubuntu 18.04 server, set up using Kubeadm, and with Flannel as my pod networking addon).

The architecture of my cluster is as follows:

As of now, I don't believe that that is happening. My cluster shows that all of my deployments are healthy and running, however I am unable to access the logs of any of my slave instances which are running on nodes other than my master node. This leads me to believe that my pods are unable to communicate with each other across different nodes.

Is this an issue with my current networking or deployment setups (I followed the linked guides pretty-much verbatim)? Where should I start in debugging this issue?

Can pods connect to another pod within same host?
– BMW
Jul 2 at 23:58

So that 10.0.2.15:10250 i/o timeout is almost certainly the firewall failing to expose the kubelet port to the outside world; kubectl logs contacts the API server to obtain the correct URL, but then redirects kubectl directly to the Node to obtain the logs (otherwise the logs would have to travel from the Node through the API server down to you, which would be a huge bottleneck) -- you should fix the firewall, but in the interim you can ssh onto the Nodes and use docker logs to have a peek for yourself what's going on
– Matthew L Daniel
Jul 3 at 2:55

10.0.2.15:10250

kubectl logs

kubectl

docker logs

@BMW pods within the same host are able to connect to each other
– whiletrue
Jul 3 at 15:23

3 Answers
3

How slaves instances try to join the master instance. You have to create master service (with labels) to access master pod. Also, make sure your SDN is up and master is reachable to slave instances. You can test using telnet to master pod IP from slave instances.

Slave instances of my application join the master instance through binding to a port exposed on the master container (5557-5558 I believe). As far as my SDN, Flannel is up and running on all of my nodes, but the core of my problem is that the slave instances cannot connect to my master instance. They can connect to my master instance when I have my master instance and slave instances running in the same node, but I'm getting that timeout/connection error when I have a slave instance trying to connect to the master from another node
– whiletrue
Jul 3 at 15:34

Based on your description of the problem I can guess that you have a connection problem caused by firewall or network misconfiguration.

From the network perspective, there are requirements mentioned in Kubernetes documentation:

From the firewall perspective, you need to ensure the cluster traffic can pass the firewall on the nodes.

Here is the list of ports you should have opened on the nodes provided by CoreOS website:

Master node inbound: TCP: 443 from Worker Nodes, API Requests, and End-Users UDP: 8285,8472 from Master & Worker Nodes Worker node inbound: TCP: 10250 from Master Nodes TCP: 10255 from Heapster TCP: 30000-32767 from External Application Consumers TCP: 1-32767 from Master & Worker Nodes TCP: 179 from Worker Nodes UDP: 8472 from Master & Worker Nodes UPD: 179 from Worker Nodes Etcd node inbound: TCP: 2379-2380 from Master & Worker Nodes

Please excuse my lack of understanding in this as I'm completely new to K8s, but I'd need to manually open those ports on my nodes? Just to give some more context, I'm joining nodes to my cluster via the "kubeadm join" command; that command doesn't take care of exposing those ports for me? Thanks!
– whiletrue
Jul 3 at 15:28

It depends on your firewall policy. If you have default rule Deny/Drop, you should explicitly allow all that traffic. If you have default rule Allow/Pass, just check that some deny rules wouldn't block that traffic before it reaches "default allow". In the very beginning of Kubernetes learning I would suggest you to build your cluster inside the trusted network and do not use firewall on the nodes or add your cluster network interfaces to the trusted zone in firewall settings on all nodes in the cluster.
– VAS
Jul 3 at 15:40

Right now I'm running the cluster on VMs/Bare-Metal. My master node is running on an Ubuntu machine, and that same machine is running VirtualBox, and spinning up my VM worker nodes. I don't believe that this is a firewall issue then? Because all of my nodes are communicating on a host-only network in promiscuous mode?
– whiletrue
Jul 5 at 19:56

see ip forwarding is enabled on all the nodes.

# sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1

if not enable it like this and test it.

echo 1 > /proc/sys/net/ipv4/ip_forward

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Fjhtyj