1. Issue Description
The DB2 PureScale database in the client’s core marketing system experienced a sudden failure and was unable to restart after crashing.
2. Problem Analysis
The engineer determined that the database cluster had unexpectedly stopped, with the error message:
"Node unable to establish a session with the cluster manager."
The database log showed a "Repair Domain failed" error, indicating that an attempt to repair the cluster was unsuccessful. Manual restart attempts for the database also failed.
Reviewing the GPFS logs revealed errors indicating the GPFS file system was not able to prepare properly, resulting in a failure to mount the GPFS file system on both nodes.
Since DB2 PureScale relies on GPFS as its shared file system, a GPFS failure renders the DB2 database unusable.
3. Problem Resolution
To address the GPFS 6027-305 issue, the engineer consulted the official guide, modified the verifyGpfsReady
setting to “no,” and disabled the verifyGpfsReady
feature. This resolved the issue with /var/mmfs/etc/gpfsready
not executing successfully. After applying this fix, GPFS started normally, and the file system was able to auto-mount. The DB2 PureScale cluster service comprises three main components:
Once the GPFS issue was resolved, RSCT commands like lsrpdomain
and lsrpnode
were operable on one node. However, attempts to run these commands on the second node returned error 2612-022.
This error indicated that the RSCT resource group had not started properly, preventing access to the configuration resource manager.
To resolve this, steps were taken to re-establish the remote client connection between the two nodes:
bash/usr/sbin/rsct/bin/rmcctrl -A
/usr/sbin/rsct/bin/rmcctrl -p
After these steps, the database was able to start successfully.
4. Summary of Lessons Learned