While creating DB2 pureScale instance, it appears that the node becomes unresponsive under RHEL 7.2. If you reboot the node and look at the /var/log/messages
, you may notice these several messages:
kernel:BUG: soft lockup - CPU#1 stuck for 23s! [mmfsd:3280]
The mmfsd
is GPFS (aka IBM Spectrum Scale) file system daemon and somehow it looks that this is the cause of the this CPU soft look.
The other symptom of soft CPU lockup is the high queue seen in the vmstat
output. Please look at the first column ‘r’ under procs. The value of ‘r’ would be very high in this case.
It looks that Supervisor Mode Access Prevention (SMAP) feature of Intel Xeon V4 processor (Broadwell) and Linux kernel 3.7 or later causes mmfsd to not have access to some memory space. The SMAP feature in Intel Broadwell family of CPU (including Intel Core i7 6820 HQ) has the protection enabled which disallows access from kernel-space memory to user-space memory, a feature aimed at making it harder to exploit software bugs. Now, GPFS is a kernel level access and this feature is disallowing GPFS access of kernel-space memory with a result that soft lockup of CPU occurs and that leads to system appearing hung-up.
This causes the node to appear to hang but it is actually soft CPU lockup issue as seen with the above command. The soft CPU lockup also causes the high queue – with a result that the system becomes non-responsive.
The RHEL 7.2 kernel has the support for SMAP feature by default. If your cpu has this feature or not, you can check the output from cat /proc/cpuinfo | grep smap
and if you see smap
in the flags section, you have this Supervisor Mode Access Prevention (SMAP) feature enabled.
GPFS has fixed this issue in v4.2.1.1 but the version that comes with DB2 11.1 FP 1 is v4.1.1.9. If you are using later version of DB2, you can find out the version of GPFS that will be installed by looking at file spec
in the folder <db2softwaredir>/server_t/db2
directory.
You can disable smap
feature in Linux kernel by adding kernel parameter nosamp
as shown below (for RHEL 7.2).
- Locate
grub.cfg
–> It can be in different places depending upon legacy or EFI boot. In my case, this was in/boot/grub2/grub.cfg
- Edit
grub.cfg
and find line associated with the system image such as line containingvmlinuz
and add “nosamp
” parameter at the end.
Alternatively, you can use edit /etc/default/grub
and add nosamp
parameter at the end of line containing GRUB_CMDLINE_LINUX
and then run grub2-mkconfig -o /boot/grub2/grub.cfg
and reboot the system.
You can also run the grubby command to add this kernel parameter.
# grubby --update-kernel=ALL --args=nosmap