Software engineering notes

Linux netstat

常用

列出在監聽的網路服務:

netstat -tunl

列出所有的連線

netstat -antp

列出已連線的網路連線狀態:

netstat -tun

刪除已建立或在監聽當中的連線:

netstat -tunp   #看他的pid ex: 10270
kill -9 10270

netstat -nltp

顯示 timer

$ netstat -antop

tcp        0      0 0.0.0.0:55206           0.0.0.0:*               LISTEN      -                off (0.00/0/0)
tcp        0      0 10.173.27.211:11211     0.0.0.0:*               LISTEN      -                off (0.00/0/0)
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      -                off (0.00/0/0)
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                off (0.00/0/0)
tcp        0      0 182.92.229.142:40879    110.75.102.62:80        ESTABLISHED -                off (0.00/0/0)
tcp        0      0 10.173.27.211:58364     10.171.39.175:22        TIME_WAIT   -                timewait (15.42/0/0)
tcp        0      0 182.92.229.142:22       125.227.251.150:51042   ESTABLISHED -                keepalive (5318.20/0/0)
tcp        0      0 182.92.229.142:22       125.227.251.150:56130   ESTABLISHED -                keepalive (5187.13/0/0)
tcp        0      0 10.173.27.211:2049      10.171.39.175:825       ESTABLISHED -                off (0.00/0/0)
tcp        0      0 10.173.27.211:52061     10.242.174.13:80        TIME_WAIT   -                timewait (48.05/0/0)
tcp        0      0 10.173.27.211:2049      10.170.189.142:696      ESTABLISHED -                off (0.00/0/0)
tcp6       0      0 :::56705                :::*                    LISTEN      -                off (0.00/0/0)
tcp6       0      0 :::2049                 :::*                    LISTEN      -                off (0.00/0/0)

顯示連線狀態數量

$ netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c

     16 CLOSING
    130 ESTABLISHED
    298 FIN_WAIT1
     13 FIN_WAIT2
      9 LAST_ACK
      7 LISTEN
    103 SYN_RECV
   5204 TIME_WAIT

netstat command example to find out gateway/router IP

netstat -r -n

or

route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.79.xxx  0.0.0.0         UG    0      0        0 em1
192.168.79.0    *               255.255.255.128 U     0      0        0 em1

顯示主機上所有已建立的連線 (己開放的port)

netstat -na

顯示所有 port 80 的連線,並把結果排序。

netstat -an | grep :80 | sort

列出主機上有多少個 SYNC_REC,一般上這個數字應該相當低。

netstat -n -p|grep SYN_REC | wc -l

同樣是列出 SYNC_REC,但不只列出數字,而是將每個 SYNC_REC 的連線列出。

netstat -n -p | grep SYN_REC | sort -u

列出發送 SYNC_REC 的所有 ip 地址。

netstat -n -p | grep SYN_REC | awk ‘{print $5}’ | awk -F: ‘{print $1}’

計算每一個 ip 在主機上建立的連線數量。

netstat -ntu | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -n

列出從 TCP 或 UDP 連線到主機的 ip 的數量。

netstat -anp |grep ‘tcp\|udp’ | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -n

列出每個 ip 建立的 ESTABLISHED 連線數量。

netstat -ntu | grep ESTAB | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -nr

列出每個 ip 建立的 port 80 連線數量。

netstat -plan|grep :80|awk {‘print $5′}|cut -d: -f 1|sort|uniq -c|sort -nk 1

ref: http://www.hkcode.com/linux-bsd-notes/559

狀態: 描述

ref: https://i.stack.imgur.com/5H0Cr.png

大量 TIME_WAIT 問題

(..略..)
tcp        0      0 127.0.0.1:9000              127.0.0.1:37260             TIME_WAIT   -
tcp        0      0 10.0.11.229:58492           10.0.23.32:3306             TIME_WAIT   -
(..略..)

問題 :

php-fpm 使用 9000 port, mysql 使用 3306 如果一個 request 進來最少就會有兩個 TIME_WAIT,要等系統 1 分鐘才會回收,在這期間就佔用了兩個 TCP 連線, 如果 request 量大的時候就會沒 port 可用造成錯誤

產生原因

ref : 打開linux-tcp端口快速回收

  1. nginx現有的負載均衡模塊實現php fastcgi負載均衡,nginx使用了短連接方式,所以會造成大量處於TIME_WAIT狀態的連接。
  2. TCP/IP設計者本來是這麼設計的, 主要有兩個原因
    • 防止上一次連接中的包,迷路後重新出現,影響新連接(經過2MSL,上一次連接中所有的重複包都會消失)
    • 可靠的關閉TCP連接, 在主動關閉方發送的最後一個ack(fin) ,有可能丟失,這時被動方會重新發fin, 如果這時主動方處於CLOSED 狀態,就會響應rst 而不是ack。所以主動方要處於TIME_WAIT 狀態,而不能是CLOSED 。

解決方法 : 開啟 Linux TCP port 快速回收

修改 /etc/sysctl.conf, 但重啟後會還原(?)

net.ipv4.tcp_tw_reuse = 1                   # Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0.  It should not be changed without advice/request of technical experts.
net.ipv4.tcp_tw_recycle = 1                 # Enable fast recycling TIME-WAIT sockets. Default value is 0.  It should not be changed without advice/request of technical experts.

再執行 sudo sysctl -p 生效

注意!! 修改 tcp_tw_recycle 在使用 load balancer 可能會造成問題 ref : 記一次TIME_WAIT網絡故障 下面別人的回覆訂正原文錯誤也值得一看

結果

當 request 一執行完畢 TIME_WAIT 只會存在一下子就消失了

TIME_WAIT 補充

TCP is layer 4, HTTP layer 7.

In HTTP 1.0, HTTP Keep-Alive is used at layer 7 to simulate persistent connections using Connection header.

In HTTP 1.1, connections are assumed persistent by default and then rely on TCP only to do that job. Requests can be pipelined in the same TCP connection, then one side will set Connection: close in the last request or response headers, so both side knows that no more HTTP request can be exchanged and the connection will then be closed.

Usually in the case of a web server, the TIME_WAIT state will be the state after which, once decided to actively close the connection, it received client’s FIN packet and is sending the last ACK back in the four-way tear-down. After this, it waits for 2 * MSL : it’s a way to be sure that the connection is closed. That’s where the 60s compiled in the kernel comes from. In this way we are sure that we won’t receive in a new connection, using the same 4 tuple, packets out of sequence arising from the previous connection.

You don’t want to change it.

In the other side server.max-keep-alive-idle is the timeout after which an ESTABLISHED connection will be considered idle if no HTTP request comes in and will be actively closed by the web server. When this decision is made, as you understand now, the TCP tear-down will take place.

Be very careful with tcp_tw_recycle, if your visitors come from behind a wide NATed network then it could lead to multiple TCP connections with the same 4 tuple taking place with out of order timestamps resulting in silently dropping client connections attempts on the server side.

So the best option is to adjust the parameter you saw in lighttpd. System-wide, you can safely lower FIN_WAIT2 state and raise buckets for sockets in TIME_WAIT state with net.ipv4.tcp_fin_timeout and net.ipv4.tcp_max_tw_buckets.

lsof

查看 port 被那隻程式佔用

lsof -i :3306

查看有些建立連線 (4 stands for ipv4)

lsof -i 4

查看某個 process 建立的連線

lsof -p 12084

找出哪個 pid 開啟某個特定的檔案

lsof -t /tmp/qq.log

Show process name, pid and port

$ lsof -i -n -P | grep TCP
Spotify   60924  test   47u  IPv4 0x5f4ad82c61e46fc7      0t0  TCP 127.0.0.1:4381 (LISTEN)
Spotify   60924  test   48u  IPv4 0x5f4ad82c620734df      0t0  TCP 127.0.0.1:4371 (LISTEN)

LISTEN + ESTABLISHED

sudo lsof -n -i | grep -e LISTEN -e ESTABLISHED

Show processes of connection over specific hostname and port

lsof -i @localhost:1883

Compatible with OS X

Or

lsof -i 6tcp:1883 -sTCP:ESTABLISHED

6 means IPv6

Kill a tcp connection

netstat + rmsock

$ netstat -Aan | grep 30542
f10000f303321b58 tcp4 0 0 *.30542 *.* LISTEN

$ rmsock f10000f303321b58 tcpcb
The socket 0x3321800 is being held by proccess 692476 (db2sysc).

sudo kill -9 692476

lsof

lsof -i :1883
kill 53457