df显示磁盘已满但du显示还有很多空间的问题

df -hT 显示磁盘已满, 但 du -sh / 显示还有很多空间的问题.

df显示磁盘已满但du显示还有很多空间的问题

测试环境有一台服务器有告警显示空间不足,使用 df -hT 显示磁盘已满,但 du -sh / --exclude=/proc 显示还有很多空间:

[[email protected] ~]# du -h / --exclude=/proc
18G     /

[[email protected] ~]# df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/vda1      ext4       99G   97G     0 100% /

这一现象是由于有文件已经被删除,但还被某些进程占用着,我们可以找出这些进程并 kill 掉:

使用 lsof +L1 查找到这样的文件:

[[email protected] /]# lsof +L1
COMMAND     PID USER   FD   TYPE DEVICE    SIZE/OFF NLINK    NODE NAME
tomcat     7266 root   43w   REG  253,1         405     0 5685359 /data1/tomcat/log/tomcat_localhost_obs_modify_user_passwd_20191018.log (deleted)
router_mc  7356 root    3u  FIFO  253,1         0t0     0   24595 /tmp/router_ccd_2_mcd.fifo (deleted)
router_mc  7356 root    4u  FIFO  253,1         0t0     0   24596 /tmp/router_dcc_2_mcd.fifo (deleted)
router_cc  7360 root    4u  FIFO  253,1         0t0     0   24595 /tmp/router_ccd_2_mcd.fifo (deleted)
router_cc  7360 root    5u  FIFO  253,1         0t0     0   24601 /tmp/router_mcd_2_ccd.fifo (deleted)
tail      10927 test    3r   REG  253,1  4608138057     0 5685273 /data1/tomcat/log/tomcat_localhost_Data_20191016.log (deleted)
tail      11665 test    3r   REG  253,1  5166861678     0 5686750 /data1/tomcat/log/tomcat_localhost_Data_20191015.log (deleted)
tail      11758 test    3r   REG  253,1   552673328     0 5686749 /data1/tomcat/log/tomcat_localhost_Transaction_20191015.log (deleted)
tail      12417 test    3r   REG  253,1   552673328     0 5686749 /data1/tomcat/log/tomcat_localhost_Transaction_20191015.log (deleted)
tail      12594 test    3r   REG  253,1   552673328     0 5686749 /data1/tomcat/log/tomcat_localhost_Transaction_20191015.log (deleted)
tail      12617 test    3r   REG  253,1  5166861678     0 5686750 /data1/tomcat/log/tomcat_localhost_Data_20191015.log (deleted)
tail      18777 test    3r   REG  253,1  4608138057     0 5685273 /data1/tomcat/log/tomcat_localhost_Data_20191016.log (deleted)
superviso 20768 root    3w   REG  253,1       74295     0   24585 /tmp/supervisord.log (deleted)
tail      22050 test    3r   REG  253,1  4608138057     0 5685273 /data1/tomcat/log/tomcat_localhost_Data_20191016.log (deleted)
tail      24794 test    3r   REG  253,1  4608138057     0 5685273 /data1/tomcat/log/tomcat_localhost_Data_20191016.log (deleted)
python3   29773 root  cwd    DIR  253,1           0     0   90141 /home/worker/app/NodeMgr (deleted)
python3   29773 root    1w   REG  253,1 71502671917     0   90193 /home/worker/app/NodeMgr/out.log (deleted)
python3   29773 root    2w   REG  253,1 71502671917     0   90193 /home/worker/app/NodeMgr/out.log (deleted)

使用 awk 过滤并 kill 进程:

lsof +L1 | awk '{print $2}' | kill

# 上面可能只能干掉一部分,使用下面的命令强制 kill
lsof +L1 | awk '{print $2}' | kill -9

上述方式虽然有效但比较暴力,请谨慎使用。