I have often noticed that some folks reboot systems to fix stale NFS mount problems which can be disruptive.

Fortunately, that often isn’t necessary. All you have to do is restart nfs and autofs services. However that sometimes fails because user processes have files open on the stale partition or users are cd’ed to the stale partition.

Both conditions are easy to fix. The steps to fix stale mounts by addressing the previously described conditions are described below.

Step 1. Kill process with open files on the partition

Use lsof to find the processes that have files open on the partition and then kill those processes using kill or pkill.

% sudo su -
 
% # Find the jobs that are accessing the state partition and kill them.
% kill -9 $(lsof |\
    egrep '/stale/fs|/export/backup' |\
    awk '{print $2;}' |\
    sort -fu )
 
% # Restart the NFS and AUTOFS services
% service nfs stop
% service autofs stop
% service nfs start
% service autofs start

% # Check it
% ls /stale/fs

% sudo su -

% # Find the jobs that are accessing the state partition and kill them.

% kill -9 $(lsof |\

egrep '/stale/fs|/export/backup' |\

awk '{print $2;}' |\

sort -fu )

% # Restart the NFS and AUTOFS services

% service nfs stop

% service autofs stop

% service nfs start

% service autofs start

% # Check it

% ls /stale/fs

Typically this is sufficient but if it fails, you need to go to step 2.

Step 2. Kill process that have cd’ed to the partition

Look at the current working directory of all of the users. If any of them are on the partition, that process has to be killed.

% sudo su -

% # List the users that are cd'ed to the stale partition and kill their jobs.
% # NOTE: change /stale/fs to the path to your stale partition.
% kill -9 $( for u in $( who | awk '{print $1;}' | sort -fu ) ; do \
    pwdx $(pgrep -u $u) |\
    grep '/stale/fs' |\
    awk -F: '{print $1;}' ; \
done)

% # umount the stale partition
% umount -f /state/fs

% # Restart the NFS and AUTOFS services
% service nfs stop
% service autofs stop
% service nfs start
% service autofs start

% # Check it
% ls /stale/fs

% sudo su -

% # List the users that are cd'ed to the stale partition and kill their jobs.

% # NOTE: change /stale/fs to the path to your stale partition.

% kill -9 $( for u in $( who | awk '{print $1;}' | sort -fu ) ; do \

pwdx $(pgrep -u $u) |\

grep '/stale/fs' |\

awk -F: '{print $1;}' ; \

done)

% # umount the stale partition

% umount -f /state/fs

% # Restart the NFS and AUTOFS services

% service nfs stop

% service autofs stop

% service nfs start

% service autofs start

% # Check it

% ls /stale/fs

If that fails you have to kill all of the users.

Step 3. Kill all of the users

If step 2 doesn’t work then there is something strange going on but killing all of the user processes will usually fix it. That is done as follows.

% sudo su -

% # Kill all user processes.
% for u in $( who | awk '{print $1;}' | sort -fu ) ; do \
    kill -9 $(pgrep -u $u) |\
    awk -F: '{print $1;}' ; \
done

% # umount the stale partition
% umount -f /state/fs

% # Restart the NFS and AUTOFS services
% service nfs stop
% service autofs stop
% service nfs start
% service autofs start

% # Check it
% ls /stale/fs

% sudo su -

% # Kill all user processes.

% for u in $( who | awk '{print $1;}' | sort -fu ) ; do \

kill -9 $(pgrep -u $u) |\

awk -F: '{print $1;}' ; \

done

% # umount the stale partition

% umount -f /state/fs

% # Restart the NFS and AUTOFS services

% service nfs stop

% service autofs stop

% service nfs start

% service autofs start

% # Check it

% ls /stale/fs

As you can see, it is basically the same as step 2 except that all user processes are killed.

If that doesn’t work you need to resort the nuclear option: rebooting.

Step 4. Reboot

This is the option of last resort but it should always work.

If you know of any other tips for fix stale NFS mounts I would really like to hear about them.

Enjoy!

10 thoughts on “How to fix stale NFS mounts on linux without rebooting”

Dave says:

April 30, 2012 at 11:48 pm

Often,
umount -lf /stale/fs
is sufficient, instead of restarting nfs and autofs.

Log in to Reply
1. jlinoff says:
  
  May 2, 2012 at 3:07 pm
  
  That is an excellent suggestion. Thank you.
  
  Log in to Reply
Dave says:

October 2, 2012 at 9:49 pm

Often, “sudo umount -lf” is sufficient to fix things. It’s amazingly effective.

Log in to Reply
Dan says:

May 23, 2013 at 9:19 pm

The problem is that hung nfs mounts also hang lsof. Definitely true for SLES10 and 11. Having that trouble right now… typed lsof half an hour ago and it still hasn’t come back. The same thing happens with a forced lazy umount. May be Suse specific behavior? Still a PITA.

Log in to Reply
Richard says:

October 16, 2013 at 7:51 pm

Just a note, for OpenSuse 12.3:

lsof hangs

but

umount -lf

did the trick instantly,

Thanks — saved me a reboot on a busy production system.

Log in to Reply
1. jlinoff says:
  
  October 16, 2013 at 9:29 pm
  
  Thank you. That is an excellent suggestion.
  
  Log in to Reply
Gerry says:

March 20, 2014 at 1:42 am

Be careful with umount -lf commands. I had a situation where a super-block error occurred.

I would try umount -force first, but I’m not that inclined to use lazy umount. Perhaps my filesystem was busy then even though lsof was ok then.

Log in to Reply
smolarek999 says:

October 7, 2015 at 3:19 am

What about lsof hanging out? Is there any way to prevent this?

Log in to Reply
1. jlinoff says:
  
  October 8, 2015 at 10:12 am
  
  What do you mean by “lsof hanging out”?
  
  Log in to Reply
Pingback: Dealing With Hung NFS Mounts – IP-Life.net