Linux Kernel sync() Bug
13 May 2011 Leave a comment
There is a bug in the Linux kernel (up to 2.6.37) that can result in a sync() command taking many minutes (instead of several seconds).
The exact nature of the bug is unclear, but it shows up in the use of dpkg – since dpkg now uses sync() instead of fsync(). The result is updating an Ubuntu or Debian system can hang when dpkg goes to synchronize the file system.
The symptom also shows up by using the sync command directly.
The dpkg command was updated in 1.16.0 to include a force-unsafe-io option which re-enables the previous behavior which bypasses this bug. This version is not yet available in Ubuntu Lucid Lynx (10.04) but should be showing up in 10.04 LTS sooner or later. The option can be added to /etc/dpkg/dpkg.cfg or to a file in /etc/dpkg/dpkg.cfg.d to make it a default setting.
It is possible to upgrade to dpkg 1.16.0 on Lucid, but it requires pulling in the Natty Narwhal repositories and setting apt tp prefer the Lucid repositories, then pulling in the Natty version of dpkg specifically.
Suggestions on how to fix the problem (aside from replacing the kernel) now include:
- Change the I/O scheduler for a disk
- Change the max_writeback_mb_bump parameter from 128 to 4 for the disk
- Quiet down disks or stop processes that are doing a lot of I/O
- Unmount NFS shares
- Unmount or remove USB disks
- Upgrade dpkg to 1.16.0 (workaround)
- Upgrade Linux kernel to 2.6.37
Correction: bug #15426 is a Linux kernel bug report, not a Debian bug report. We apologize for the error.