Network

OSI Model

The OSI model is a theoretical model designed to understand and describe how different network protocols interact and work together. In the real world, protocols don’t always fit neatly into a single layer and may operate across multiple layers.

Standard OSI model

Layer 1 - Physical
- electric signals / driver
- the data gets converted into a bit stream (1/0)
- Bits -> Signal
Layer 2 - Data Link
- frames / mac address / ethernet
- facilitates data transfer between two devices on the same network
- Frame (breaks packets into smaller pieces called frames)
Layer 3 - Network
- IP / ARP / ICMP
- destination and source IP
- Packet (break segments into smaller units called packets and reassembles these packets on the receiving device)
Layer 4 - Transport
- TCP / UDP
- destination and source Port to identify services or applications
- Segment (break the data into chunks called segments)
Layer 5 - Session
- connection establishment / TLS
- state / stateful / cookie session
Layer 6 - Presentation
- encoding / serialisation
- JSON object / UTF-8
Layer 7 - Application
- HTTP / FTP / gRPC

TCP/IP model

Network Access (Data Link + Pyhsical)
Internetwork (Network)
Transport (Transport)
Application (Session + Presentation + Application)

An example to explain how OSI model deal with sending a POST request to HTTPs endpoint

Application
- This is where the HTTP protocol lives.
- A POST request is formed here with all necessary elements including the request method (POST), headers, and the message body.
Presentation
- This layer is responsible for data representation and encryption.
- In the case of HTTPS, the data is encrypted for security. This is also where data compression could occur if needed.
Session
- This layer establishes, manages, and terminates connections between applications.
- In this context, it would be maintaining the connection for the HTTPS session.
Transport
- This layer is responsible for end-to-end communication services for applications.
- In the case of HTTPS, it typically uses TCP (Transmission Control Protocol) to ensure that packets are sent and received in the correct order.
- This is also where services like error checking and data recovery occur.
Network
- This layer is responsible for packet forwarding, including routing through different networks.
- The IP (Internet Protocol) address of the HTTPS endpoint would be used here to route the request to the correct location.
Data Link
- This layer provides reliable transit of data across a physical or logical link.
- It’s involved in error detection and correction, as well as defining the protocol for the next layer to use.
Physical
- This is the lowest layer of the OSI model, and it’s responsible for the actual physical connection between the devices.
- It defines the characteristics of the hardware to be used for the transmission like the cables, connectors, and the binary transmission of data.

Ref:

Heartbeat packet

A heartbeat packet is a type of signal sent at regular intervals between two systems to check that the connection is still alive and functioning properly

It’s used to check the health of servers

Heartbeat doesn’t belong to a specific OSI models. It’s more of an operational concept that can be implemented at different layers.

Load Balancer

L3 Load Balancer (Network Layer)
- it makes decisions based on IP protocol data
- It can distribute traffic based on IP address and TCP or UDP port numbers
L7 Load Balancer (Application Layer)
- It’s capable of “reading” the contents of the packets and making decisions based on this.
- It can distribute traffic based on cookies, headers, or other information in the HTTP request itself
- It can do SSL termination
- slower and more resource-intensive than L3

IP

Basics

192.168.1.14
- 192.168.1: class of network
- .14: assigned to each host; an unique identified machine
broadcast address: host ID is filled with 1

Subnet mask

IP 192.168.123.132 / 255.255.255.0/24
- ip address: 192.168.123.132
- subnet mask: 255.255.255.0
- network address: 192.168.123.0
- Usable Host IP Range: 1.163.70.205 - 1.163.70.206
IP 1.163.70.205 / 255.255.255.252/30
- ip address: 1.163.70.205
- subnet mask: 255.255.255.252
- network address: 192.168.123.204
- Broadcast Address: 192.168.123.207
- Usable Host IP Range: 1.163.70.205 - 1.163.70.206
- last part of address
  - 11001100 204
  - 11001101 205
  - 11001110 206
  - 11001111 207

Classes

Class A
- start with 0
  - 0xxxxxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx
- 0.0.0.0 - 127.255.255.255
- private ip (10.0.0.0/8): 10.0.0.0 - 10.255.255.255
Class B
- start with 10
  - 10xxxxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx
- 128.0.0.0 - 191.255.255.255
- private ip (172.16.0.0/12): 172.16.0.0 – 172.31.255.255
Class C
- start with 110
  - 110xxxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx
- 192.0.0.0 - 223.255.255.255
- private ip (192.168.0.0/16): 192.168.0.0 – 192.168.255.255
Class D
- start with 1110
  - 1110xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx
- 224.0.0.0 - 239.255.255.255
Class E
- start with 11110
  - 1111xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx
- 240.0.0.0 - 247.255.255.255

Special IP address ranges that are used for special purposes are:

0.0.0.0: addresses used to communicate with the local network
127.0.0.0: loopback addresses
255.255.255.255: a limited broadcast address, which is used to send a network broadcast packet to all devices on the local network

Proxy server vs Reverse proxy server

regular proxy server

When a client sends a request to access a server, the request is first intercepted by the proxy server.
The proxy server then forwards the request to the destination server on behalf of the client, and the server responds to the proxy server.
The proxy server then forwards the response back to the client.

flow

client -> proxy server -> server

For example: CDN and Nginx

reverse proxy server

When a client sends a request to access a server, the request is first intercepted by the reverse proxy server.
The reverse proxy server then forwards the request to one of the servers in the group on behalf of the client, and that server responds to the reverse proxy server.
The reverse proxy server then forwards the response back to the client.

flow

client -> reverse proxy server
                    |--------------> server A
                    |--------------> server B
                    |--------------> server C

For example: Load balancer

The difference between regular proxy server and reverse proxy server

reverse proxy server needs a set of rules for forwarding requests to the destination server while regular proxy server simply forwards the request without any specific rules

P2P connection

Introduction

When two servers behind their own NATs need to establish a P2P connection, they need to exchange information about their private IP and port numbers to establish a direct connection. This process is called NAT traversal.

Techniques for NAT Traversal

STUN (Session Traversal Utilities for NAT)
- To allow a server to know its own public IP and port
TURN (Traversal Using Relays around NAT)
- A relay server to allows the servers to communicate even if they are behind different types of NATs
ICE (Interactive Connectivity Establishment).
- A combination of STUN and TURN to support both approaches to communicate. STUN is the priority if works.

The steps to establish a p2p connection between 2 servers behind their own NATs

The first server sends a connection request to the second server. The connection request includes the first server’s private IP and port as well as the public IP and port of the NAT device that the first server is behind. The second server receives the connection request from the first server, but because it is behind a NAT device, the request appears to come from the public IP and port of the NAT device and not the private IP and port of the first server.

                                SDP offer
First Server (FS)   ─────>>>───[Connection Request]───────>>>   Second Server (SS)
                               Private IP & Port (FS)
                               Public IP & Port (NAT FS)

The second server sends a response back to the first server. This response includes the second server’s private IP and port as well as the public IP and port of the NAT device that the second server is behind. The first server receives the response from the second server, but because it is behind a NAT device, the response appears to come from the public IP and port of the NAT device and not the private IP and port of the second server.

                                    SDP answer
First Server (FS)   <<<─────────────[Response]──────<<<──────   Second Server (SS)
                                  Private IP & Port (SS)
                                  Public IP & Port (NAT SS)

The first server sends a second connection request to the second server. This request includes the private IP and port of the first server and the public IP and port of the NAT device that the first server is behind, as well as the private IP and port of the second server and the public IP and port of the NAT device that the second server is behind.

First Server (FS)   ───>>>───[2nd Connection Request]─────>>>   Second Server (SS)
                                  Private IP & Port (FS)
                                  Public IP & Port (NAT FS)
                                  Private IP & Port (SS)
                                  Public IP & Port (NAT SS)

The second server receives the second connection request from the first server, and because it now has both private and public IP and port information for both servers, it is able to create a mapping in its NAT device that allows incoming traffic from the first server to be routed to the second server.

First Server (FS)   ───> NAT ────[2nd Connection Request]───> NAT ───> Second Server (SS)
                          ├─ Private IP & Port (FS)            ├─ Private IP & Port (FS)
                          ├─ Public IP & Port (NAT FS)         ├─ Public IP & Port (NAT FS)
                          ├─ Private IP & Port (SS)            ├─ Private IP & Port (SS)
                          └─ Public IP & Port (NAT SS)         └─ Public IP & Port (NAT SS)

The two servers are now able to establish a P2P connection and communicate directly with each other.

First Server (FS)    <────────────[P2P Connection]──────────>   Second Server (SS)

TCP

3-way handshake

SYN: The client sends a packet with the SYN (synchronize) flag set to the server.
SYN-ACK: The server acknowledges the client’s SYN packet by sending a packet with both the SYN and ACK (acknowledge) flags set.
ACK: The client acknowledges the server’s SYN-ACK packet by sending a packet with the ACK flag set.

diagram

Client                                                                Server
  |                                                                     |
  |----- SYN (Sequence Number: X) ------------------------------------->|
  |                                                                     |
  |<---- SYN-ACK (Sequence Number: Y, Acknowledgment Number: X + 1) ----|
  |                                                                     |
  |----- ACK (Acknowledgment Number: Y + 1) --------------------------->|
  |                                                                     |

TCP states

CLOSED: No connection is active or pending.
LISTEN: The server is waiting for an incoming call. This state signifies that a socket is ready to accept incoming connections.
SYN-SENT: The application has started to open a connection and the system has sent a SYN message to start the three-way handshake. This state signifies that the client has sent a connection request and is waiting for the SYN-ACK reply.
SYN-RECEIVED: The server just received a SYN request from a client and has responded with SYN-ACK. It is now waiting for an ACK from the client.
ESTABLISHED: The connection is active, and both devices can send and receive data. The system has received an acknowledgment of the connection.
FIN-WAIT-1: The application has said it is finished with the connection. This state indicates that the system is waiting for the remote side of the connection to send its own FIN signal.
FIN-WAIT-2: The system is waiting for the other side to terminate its half of the connection. This follows the receipt of the first FIN signal.
CLOSE-WAIT: The remote side has shut down, and the system is waiting for the application to close its own side of the connection.
CLOSING: Both sides have sent but not yet acknowledged the FIN signals.
LAST-ACK: The system is in the process of terminating the connection and is waiting for a final acknowledgment.
TIME-WAIT: The connection has been closed, and the system is waiting to be sure that the remote system received the last acknowledgment.

TIME_WAIT

When a connection is closed, it will be turned into the state TIME_WAIT. Network in real world can be unpredicable. There is no gurantee for packet to be delivered in order. When lost packet was delayed to ask to close the connection, which was already disconnected, server might be confused by it and close the new established connection if it doesn’t have TIME_WAIT state. TIME_WAIT keeps the info of source address, source port, destination address and destination port. When server recevied delayed packet to ask for closing connection, server can use TIME_WAIT to verify whether it has been closed or not. TIME_WAIT’s purpose is for preventing server from messing up the new connection from the same address.

When you close the socket’s file descriptor, the file descriptor itself is closed, but the socket in TIME_WAIT will still consume file descriptors. Therefore, a socket in TIME_WAIT consumes file descriptors

查看系統 TCP 相關設定

MacOS:

sysctl net.inet.tcp

simultaneously maximum connection for a server

ref: https://www.quora.com/What-is-the-maximum-number-of-concurrent-tcp-connections-system-can-support

The theoretical maximum number of connections per client per server port is 65534.
Assuming one network interface (i.e., 1 IP on your host),
then you could potentially make ~4 million (IP address space size) x 65534.
I think maybe we should talk in logarithmic terms since this number is already so huge :)
What will bite you before that will be other issues - such as the design of application which is making or handling such connections,
your OS's TCP/IP stack design or ultimately the amount of memory.
If memory is not the issue and you want to increase that number,
you can add another network interface and double this number further (ie., increase number of clients).

ref: https://medium.com/fantageek/understanding-socket-and-port-in-tcp-2213dc2e9b0c

What is the maximum number of concurrent TCP connections that a server can handle, in theory ?

A single listening port can accept more than one connection simultaneously.
There is a ‘64K’ limit that is often cited, but that is per client per server port, and needs clarifying.
If a client has many connections to the same port on the same destination,
then three of those fields will be the same — only source_port varies to differentiate the different connections.
Ports are 16-bit numbers, therefore the maximum number of connections any given client can have to any given host port is 64K.
However, multiple clients can each have up to 64K connections to some server’s port,
and if the server has multiple ports or either is multi-homed then you can multiply that further
So the real limit is file descriptors. Each individual socket connection is given a file descriptor,
so the limit is really the number of file descriptors that the system has been configured to allow and resources to handle.
The maximum limit is typically up over 300K, but is configurable e.g. with sysctl

nf_conntrack: table full, dropping packet

What’s nf_conntrack?

It is a feature to allow kernel to keep track of connections. It might be only used in haproxy server. When NAT or firewall works, it’s nf_conntrack under the hood. nf_conntract records connections info, including the mapping between public IPs (external IPs) and private IPs (internal IPs) for NAT, so that the packets can be sent to the right end.

What are common causes?

High volume of connections: If your system is handling a high volume of concurrent connections, the nf_conntrack table might become full. This can happen in cases where your system is serving as a heavily utilized NAT device, firewall, or high-traffic web server.
Many short-lived connections: A large number of short-lived connections (e.g., connections that are opened and closed quickly) can cause the nf_conntrack table to fill up, especially if the connection tracking entries are not being cleared quickly enough.
Inadequate connection timeouts: The nf_conntrack subsystem relies on connection timeouts to determine when to remove connection entries from the table. If the connection timeouts are set too high, the entries will remain in the table longer, which can cause the table to become full.
Connection tracking table size: The maximum size of the nf_conntrack table is determined by the nf_conntrack_max parameter. If this parameter is set too low for your system’s requirements, the table may fill up quickly.
DDoS or flood attacks: Your system might be experiencing a Distributed Denial of Service (DDoS) attack or a SYN flood attack, causing a massive influx of connections that fill up the nf_conntrack table.

How to solve this?

Note: If it is full, it won’t accept new connection. There are 3 ways to fix it:

Disable connection track (recommend)
Reduce the timeout of connection being tracked
Increase the maximum of connection tracking table

If nf_conntrack is disabled, what would be the impact?

NAT functionality impairment: Connection tracking plays a crucial role in NAT, which is responsible for translating between private IP addresses and public IP addresses. Disabling nf_conntrack would impair NAT functionality, causing issues with IP address translations and breaking communication between internal and external networks.
Firewall rule impact: If your firewall rules (iptables) are based on connection state (e.g., NEW, ESTABLISHED, RELATED), disabling nf_conntrack would cause those rules to fail. This could result in unexpected behavior in your firewall, potentially making it more permissive or more restrictive.
Performance impact: In some cases, disabling connection tracking might improve system performance if your device is dealing with a very high number of connections and connection tracking is causing resource exhaustion. However, it could also lead to performance issues if your system relies on connection tracking for stateful packet filtering, NAT, or other functions that require maintaining state information.

VPN

讓 Private 可以被特定的連線操作

OpenVPN (SSL VPN) or IPsec
OpenVPN 安全性會比 IPsec 好

Unix Domain Socket, aka IPC socket

Unix Domain Socket is a data communications endpoint for exchanging data between processes executing on the same host operating system. It supports transmission of a reliable stream of bytes, ordered and reliable transmission of datagrams. The API for Unix domain sockets is similar to that of an Internet socket, but rather than using an underlying network protocol, all communication occurs entirely within the operating system kernel. Unix domain sockets use the file system as their address name space. Processes reference Unix domain sockets as file system inodes, so two processes can communicate by opening the same socket. Instead of identifying a server by an IP address and port, a Unix domain socket is known by a pathname. Obviously the client and server have to agree on the pathname for them to find each other.

RPC

Remote procedure call (RPC) is an Inter-process communication technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction.

An RPC (remote procedure call) is a form of IPC (inter-process communication)

gRPC

Improved RPC
Developed by google
Use HTTP/2 for transport, Protocal Buffers as interface description language (protobuf)
Provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts.

RPC vs IPC

RPC - Remote Procedure Call - is a particular type of communication, but can be on a single machine, or across a network between machines.
IPC - Inter-Process Communication - is a general term for communication between different processes (which are usually on a single machine).
RPC: remotely preferred, IPC: locally preferred

Named Pipe vs IPC

Duplex: Stream sockets provide bi-directional communication while named pipes are uni-directional.
Distinct clients: Clients using sockets each have an independent connection to the server. With named pipes, many clients may write to the pipe, but the server cannot distinguish the clients from each other– the server has only one descriptor to read from the named pipe. Because the named pipe has only read descriptor and possibly-multiple writers, random interleaving can also occur if a client writes more than PIPE_BUF bytes in one operation. Since pipes have these limitations, UNIX domain sockets should be used if there are multiple clients that need to be distinguishable or which write long messages to the server.
Method of creating and opening: Sockets are created using socket and assigned their identity via bind. Named pipes are created using mkfifo. To connect to a Unix domain socket the normal socket/connect calls are used, but a named pipe is written using regular file open and write. That makes them easier to use from a shell script for example.

ref:

traceroute 觀察 host 經過的節點狀態

$ traceroute google.com
traceroute to google.com (172.217.31.142), 30 hops max, 60 byte packets
 1  ec2-175-41-192-150.ap-northeast-1.compute.amazonaws.com (175.41.192.150)  16.685 ms ec2-175-41-192-144.ap-northeast-1.compute.amazonaws.com (175.41.192.144)  19.225 ms ec2-175-41-192-146.ap-northeast-1.compute.amazonaws.com (175.41.192.146)  16.309 ms
 2  100.64.1.200 (100.64.1.200)  17.298 ms 100.64.3.78 (100.64.3.78)  13.307 ms 100.64.0.78 (100.64.0.78)  21.296 ms
 3  100.66.3.36 (100.66.3.36)  17.270 ms 100.66.3.108 (100.66.3.108)  20.964 ms 100.66.3.192 (100.66.3.192)  14.131 ms
 (...略...)
16  108.170.242.193 (108.170.242.193)  4.186 ms 108.170.242.161 (108.170.242.161)  3.221 ms 108.170.242.193 (108.170.242.193)  5.237 ms
17  74.125.251.237 (74.125.251.237)  3.611 ms  3.618 ms  5.141 ms
18  nrt20s08-in-f14.1e100.net (172.217.31.142)  2.942 ms  4.001 ms  2.928 ms