How to fix stale NFS mounts on linux without rebooting

I have often noticed that some folks reboot systems to fix stale NFS mount problems which can be disruptive.

Fortunately, that often isn’t necessary. All you have to do is restart nfs and autofs services. However that sometimes fails because user processes have files open on the stale partition or users are cd’ed to the stale partition.

Both conditions are easy to fix. The steps to fix stale mounts by addressing the previously described conditions are described below.

Step 1. Kill process with open files on the partition

Use lsof to find the processes that have files open on the partition and then kill those processes using kill or pkill.

Typically this is sufficient but if it fails, you need to go to step 2.

Step 2. Kill process that have cd’ed to the partition

Look at the current working directory of all of the users. If any of them are on the partition, that process has to be killed.

If that fails you have to kill all of the users.

Step 3. Kill all of the users

If step 2 doesn’t work then there is something strange going on but killing all of the user processes will usually fix it. That is done as follows.

As you can see, it is basically the same as step 2 except that all user processes are killed.

If that doesn’t work you need to resort the nuclear option: rebooting.

Step 4. Reboot

This is the option of last resort but it should always work.

If you know of any other tips for fix stale NFS mounts I would really like to hear about them.

Enjoy!

10 thoughts on “How to fix stale NFS mounts on linux without rebooting”

  1. The problem is that hung nfs mounts also hang lsof. Definitely true for SLES10 and 11. Having that trouble right now… typed lsof half an hour ago and it still hasn’t come back. The same thing happens with a forced lazy umount. May be Suse specific behavior? Still a PITA.

  2. Be careful with umount -lf commands.  I had a situation where a super-block error occurred.

    I would try umount -force first, but I’m not that inclined to use lazy umount. Perhaps my filesystem was busy then even though lsof was ok then.

Leave a Reply to Dave Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.