Quantcast
Channel: DB2 | Zinox
Viewing all 67 articles
Browse latest View live

DB2 pureScale Install Problem Determination

$
0
0

RSCT License Issue

$ db2start 128
SQL1677N  DB2START or DB2STOP processing failed due to a DB2 cluster services error

DATA #9 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 3508 bytes
commandResponse->callRC: 0x00000000
commandResponse->output: Error: Product license is invalid and needs to be upgraded.

2016-06-30-09.40.31.819767-240 I6015E554             LEVEL: Error
PID     : 18164                TID : 140258826409760 PROC : db2start
INSTANCE: db2psc               NODE : 000
HOSTNAME: purescale.zinox.com
FUNCTION: DB2 UDB, high avail services, sqlhaVerifyHostLicenses, probe:18163
MESSAGE : The cluster manager license for the host is not ok:
DATA #1 : String, 42 bytes
purescale.zinox.com
DATA #2 : SQLHA_LICENSE_STATUS, PD_TYPE_SQLHA_LICENSE_STATUS, 4 bytes
SQLHA_LICENSE_STATUS_EVALUATION_PERIOD_EXPIRED

You applied RSCT license using samlicm -i <sam32.lic or sam41.lic> but you still see above message in db2diag.log. Even though samlicm -i <license file> did not report any error. The license may still be invalid. This can occur due to variety of reasons not known to me. However, it is always a good idea to check if license applied is valid or not.

# samlicm -t
# echo $?

The first command tests the license if it is OK or not. The second output should show a value of 0, which means that the license is valid. If the return code is ‘1’, the license is invalid. Download the license file again from IBM Passport Advantage site and try it again.

For example:

# samlicm -t
# echo $?
1
# samlicm -i sam32.lic
# samlicm -t
# echo $?
0

Reload License

Applying a license does not mean that the running processes know about it. Either reboot the machine for license to pickup or kill IBM.ConfigRMd process without (-9) so that it will restart again. This may work or may not work as the critical resource protection method may get invoked and RSCT may reboot the server.

# ps -ef | grep -i config
root      1704  6398  0 09:52 pts/0    00:00:00 grep -i config
root      2106   992  0 09:36 ?        00:00:00 /usr/sbin/rsct/bin/IBM.ConfigRMd
# kill 2106

netmon.cf

If you specified entries in netmon.cf for the layer 2 network having an outside IP address, you must make sure that you are able to ping the IP address using the interface. For example:

# cd /var/ct/cfg
 # cat netmon.cf
 !IBQPORTONLY !ALL
 !REQD eth0 10.10.120.11
 !REQD eth1 192.168.120.11

Make sure that you are able to ping the IP addresses using the interface. If you do not get output from ping, you either have wrong interface name or IP address or something has changed since last good config like a NIC card was replaced and the interface name changed but IP address was same.

$ ping -I eth0 10.10.121.11
$ ping -I eth1 192.168.120.11

SSH Key has changed

When a machine gets rebuilt and backup restored, the SSH key may get changed and you will have that node not working or you will see the following messages in your db2diag.log file. Fix your SSH keys on all hosts and make sure that you are able to do ssh using localhost, IP address, FQDN and short name.

2016-06-30-08.33.53.397828-240 E2122E2289            LEVEL: Severe
PID     : 32035                TID : 140342334629664 PROC : db2cluster
INSTANCE: db2psc               NODE : 000
HOSTNAME: purescale.zinox.com
FUNCTION: DB2 UDB, high avail services, sqlhaExecuteCommandLocal, probe:1264
DATA #1 : String, 25 bytes
/var/db2/db2ssh/db2locssh
DATA #2 : String, 21 bytes
root@vpdb202 hostname
DATA #3 : signed integer, 8 bytes
6
DATA #4 : unsigned integer, 4 bytes
32047
DATA #5 : Boolean, 1 bytes
true
DATA #6 : unsigned integer, 8 bytes
853
DATA #7 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 3508 bytes
commandResponse->callRC: 0x00000000
commandResponse->output: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
c9:96:96:d1:3e:f5:e1:96:0f:b9:9b:64:43:89:0e:63.
Please contact your system administrator.
Add correct host key in /home/db2psc/.ssh/known_hosts to get rid of this message.
Offending key in /home/db2psc/.ssh/known_hosts:8
RSA host key for purescale.zinox.com has changed and you have requested strict checking.
Host key verification failed.
failure - examine the system log on the remotehost for additional information

 


DB2 11.1 Rebuild TSA Resources

$
0
0

When you get errors like this:

"SQL1517N db2start failed because the cluster manager resource states are inconsistent." or something like "Cluster manager resource states for the DB2 instance are inconsistent. Refer to the db2diag.log for more information on inconsistencies."

Or, When you try to repair resources using command db2cluster -cm -repair -resources and you may receive an error stating that repair resources failed and "Refer to the db2cluster command log file. This log file can be found in /tmp/ibm.db2.cluster.*."

Then, you may need to repair / rebuild the TSA resources.

Please follow the following procedure to save basic information that we will need later to rebuild the resources.

  1. Save the output from db2hareg -dump and db2greg -dump as db2 instance owner in a file or notepad.
  2. Save the output from ~/sqllib/db2nodes.cfg, if you are able to access GPFS. If not, don’t worry about it now.
  3. Save the output from lsrpdomain as root to note the name of the RSCT domain name. We want to keep the same name when we rebuild the resources.
  4. Save the output from db2cluster -cm -list -hostfailuredetectiontime as root to know current host failure detection time set for your cluster. if you do not get the output or if it takes a very long time, we will set it to a default value of 4.
  5. Save the output from lscomg as root.
  6. Save the output from db2cluster -cm -list -tiebreaker to know the tie breaker. If on AIX, note the PVID value of the disk.

Destroy the RSCT domain – Procedure – 1

# export CT_MANAGEMENT_SCOPE=2
# rmrpdomain -f <domainName>

After above is run successfully, run lsrpdomain as root on each host and you should not see any output, which means that the domain is destroyed successfully. If rmrpdomain command was successful, skip the next step.

If by any chance, rmrpdomain takes very long and does not complete or just waits, then use this alternate procedure to destroy the domain or reset RSCT completely.

Destroy the RSCT domain – Procedure – 2

On each host as root, run commands to save netmon.cf and trace.conf

# cp /var/ct/cfg/netmon.cf /tmp/netmon.cf.`hostname -s`
# cp /var/ct/cfg/trace.conf /tmp/trace.conf.`hostname -s`

Reset RSCT domain on all hosts.

# /usr/sbin/rsct/install/bin/recfgct –> Run on all hosts

After RSCT domain is reset, restore netmon.cf and trace.conf

# cp /tmp/netmon.cf.`hostname -s` /var/ct/cfg/netmon.cf
# cp /tmp/trace.conf.`hostname -s` /var/ct/cfg/trace.conf 

Reboot all hosts

Please do not ignore this.

Check RSCT license

Run samlicm -t command on each host to make sure that the RSCT license is applied successfully.

# samlicm -t
# echo $?

The output of echo $? should be zero. If the output shows as 1, reapply SAM license using samlicm -i <lic file>

Exchange Keys.

Run prepnonode command on all hosts to exchange keys.

# preprpnode <host1> <host2> <host3> <host4> .... <hostn>

If you have 3 hosts with names as node01, node02 and node03, then run preprpnode node01 node02 node03 command on all hosts so that keys are exchanged.

Create RSCT Domain and add hosts

Go to your db2 software bin directory and run db2cluster command to create RSCT domain. Please use the same domain name that you saved through your lsrpdomain command.

# cd /opt/IBM/db2/V11.1/bin
# ./db2cluster -cm -create -host <firsthostName> -domain <domainname>
# ./db2clutser -cm -add -host <secondhostname>

Repeat above command to add all remaining hosts.

Stop GPFS domain to set host failure detection time

# ./db2cluster -cfs -stop -all
# ./db2cluster -cm -set -option hostfailuredetectiontime -value 4

Set Tie Breaker Disk

Please note the name of the tie-breaker disk from your output. For AIX, get the PVID of the tie-breaker disk.

# ./db2cluster -cm -set -tiebreaker -disk PVID=<pvid>

Start GPFS domain and fix network

# ./db2cluster -cfs -start -all
# ./db2cluster -cfs -repair -network_resiliency -all

Rebuild Resources

Login as db2 instance owner and run the command.

$ db2cluster -cm -repair -resources

Do sanity check

After resources are rebuilt successfully, then do the following checks using root

  1. Check lsrpdomain output on all hosts and the domain should be online on all hosts
  2. Check lsrpnode output and all hosts should show online on all hosts
  3. Check /usr/lpp/mmfs/bin/mmgetstate -a on all hosts and all GPFS should show as active and not arbitrating.

Run these commands as db2 instance owner to check the following.

  1. Check the cat ~/sqllib/db2nodes.cfg and the output of the file should be correct. For example, if the host failure was there at the time of cluster failure, fix the line entry of that host so that it is on correct hosts. For example.
0 node01 1 node02-r1,node02-r2 - MEMBER
1 node02 0 node02-r1,node02-r2 - MEMBER
2 node03 0 node03-r1,node03-r2 - MEMBER
128 node04 0 node04-r1,node05-r2 - CF 
129 node05 0 node05-r1,node05-r2 - CF

In above output, node02 failed to node01 and we need to fix this line as we rebuilt the resources from scratch. The correct db2nodes.cfg after fix is as follows.

0 node01 0 node01-r1,node01-r2 - MEMBER
1 node02 0 node02-r1,node02-r2 - MEMBER
2 node03 0 node03-r1,node03-r2 - MEMBER
128 node04 0 node04-r1,node05-r2 - CF 
129 node05 0 node05-r1,node05-r2 - CF

After db2nodes.cfg is corrected, run db2start command to start all hosts.

Rebuild TSA Resources in HADR Environment

$
0
0

This is about how to rebuild TSA resources in HADR environment if you have used db2haicu in past but added resources such as tie-breaker disks that are not supported by the db2haicu tool.

So, in case of a difficulty – how do you get back soon.

You can always recover from TSA issues such as failed-offline etc. if you know the proper command to execute.

Sometime, timeliness is the essence. Can I afford to spend 1-2 hour to fix an existing problem or just use 5 minutes of scripted way of destroying the existing config and then rebuilding it again.

For a good working TSA cluster (using HADR), follow this procedure.

1. Find the name of your domain

# lsrpdomain

2. Save your current config by using the command:

# export CT_MANAGEMENT_SCOPE=2
# sampolicy -s hadrdomain.xml

3. Disable critical protection method

# chrsrc -c IBM.PeerNode CritRsrcProtMethod=5

4. Disable TSA

$ db2haicu -disable

5. Drop RSCT domain

# rmrpdomain -f domainName

6. Make sure that the domain is gone from all nodes by using lsrpdomain.

7. Copy the following files to a safe place.

# cp /var/ct/cfg/netmon.cf /tmp
# cp /var/ct/cfg/trace.conf /tmp
# cp /var/ct/cfg/ConfigRM.cfg /tmp

8. Run the following on all nodes.

# /usr/sbin/rsct/install/bin/recfgct

9. Copy back those files on all hosts

# cp /tmp/netmon.cf /var/ct/cfg
# cp /tmp/trace.conf /var/ct/cfg
# cp /tmp/ConfigRM.cfg /var/ct/cfg

10. Run this on all nodes to exchange keys

# preprpnode node1 node2 node3

11. Create domain

# mkrpdomain domainName node1 node2 node3

12. Start the domain

# startrpdomain domainName

13. Restore the domain

# sampolicy -a hadrdomain.xml

14. Set Critical protection method to 3

# chrsrc -c IBM.PeerNode CritRsrcProtMethod=3

14. Enable TSA at DB2 level

$ db2haicu

 

Automatic Client Reroute – Templates

$
0
0

Here are two templates that one can use to make Automatic Client Reroute to work properly in a HADR environment.

Please note that these instructions are only valid for HADR pair in which the second machine is ready to takeover. I will publish / update this article later to include instructions for a HADR pair in which the standby is used as RoS (Read on Standby) or multiple standbys are also used.

Non-Java

Use db2dsdriver.cfg file and place it in ~/sqllib/cfg directory. If you are using thin client and do not have sqllib directory then make sure that the environment variable – DB2DSDRIVER_CFG_PATH is set properly pointing to the location of the db2dsdriver.cfg file.

Even if you are using Virtual IP address (VIP), use the VIP address in both places for primary and standby.

Myth: People think that if they use VIP, their is no need of ACR. Even if you are using VIP, you still need ACR as that is the trigger for the DB2 drivers (Java and non-Java) to trigger retry logic for the connection reestablisment if connection is lost for the primary.

Here is a sample db2dsdriver.cfg file.

<configuration>
 <dsncollection>
   <dsn alias="ALIASNAME" name="DBNAME" host="serverPrimary" 
      port="port_num"/>
 </dsncollection>
 <databases>
   <database name="PSDB" host="serverPrimary" port="port_num">
     <parameter name="KeepAliveTimeout" value="15"/>
     <parameter name="tcpipConnectTimeout" value="5"/> 
     <parameter name="ConnectionTimeout" value="15"/>
     <acr>
       <parameter name="enableAcr" value="true"/>
       <parameter name="enableSeamlessAcr" value="true"/>
       <parameter name="enableAlternateServerListFirstConnect" 
           value="true"/>
       <parameter name="maxAcrRetries" value="60"/> 
       <parameter name="acrRetryInterval" value="3"/> 
       <alternateserverlist>
          <server name="server1" hostname="serverPrimary" 
              port="port_num"/>
          <server name="server2" hostname="serverStandby" 
              port="port_num"/>
       </alternateserverlist>
     </acr>
   </database>
 </databases>
</configuration>

Additionally, on the client – you can set the following db2 registry variables to the following values:

db2set DB2TCP_CLIENT_CONTIMEOUT=15
db2set DB2TCP_CLIENT_RCVTIMEOUT=15
db2set DB2TCP_CLIENT_KEEPALIVE_TIMEOUT=15

On DB2 servers (both HADR pair), you must set the following:

Primary:

UPDATE ALTERNATE SERVER FOR DATABASE <dbname> 
    USING HOSTNAME <standbyhost> PORT <standbyport>

Standby:

UPDATE ALTERNATE SERVER FOR DATABASE <dbname> 
    USING HOSTNAME <primaryhost> PORT <primaryport>

Optionally, you can set the following additional registry variable depending upon the replication rate to tune the tcpip network send and receive buffer.

Primary:

db2set DB2_HADR_SOSNDBUF=4096  --> small workload
db2set DB2_HADR_SOSNDBUF=16385 --> medium workload
db2set DB2_HADR_SOSNDBUF=65536 --> heavy workload

db2set DB2_HADR_SORCVBUF=4096  --> small workload
db2set DB2_HADR_SORCVBUF=16385 --> medium workload
db2set DB2_HADR_SORCVBUF=65536 --> heavy workload

Standby:

db2set DB2_HADR_SOSNDBUF=4096  --> small workload
db2set DB2_HADR_SOSNDBUF=16385 --> medium workload
db2set DB2_HADR_SOSNDBUF=65536 --> heavy workload

db2set DB2_HADR_SORCVBUF=4096  --> small workload
db2set DB2_HADR_SORCVBUF=16385 --> medium worload
db2set DB2_HADR_SORCVBUF=65536 --> heavy workload

Java

For Java set the following properties wherever applicable based upon the application server, or stand alone application for without read on standby HADR pair.

clientProgramName=<name of your program>
driverType=4
enableSeamlessFailover=1
clientRerouteAlternateServerName=hadrprimaryservername,hadrstandbyservername
clientRerouteAlternatePortNumber=hadrserverdb2portnumber,hadrserverdb2portnumber
maxRetriesForClientReroute=60
retryIntervalForClientReroute=3
blockingReadConnectionTimeout=5
tcpipConnectTimeout=5
loginTimeout=15
keepAliveTimeOut=15

The other parameters like update alternate server, db2 registry variables should be same as set for Non-Java.

Please note: I have used TCP/IP timeout etc and they are all optional. These come into picture when you have network glitches and in absence of these parameters, the application may wait for long to switch to the standby. Each one of these parameters has a history and a story behind this and when they were implemented in DB2. So, without thinking too much – use these parameters as is and your life will be simpler.

For example: keepAliveTimeOut in java is useful for connections already established whereas tcpipConnectTimeout, loginTimeout and blockingReadConnectionTimeout come into picture for the new connection. Bottom line is : Don’t think too much and use the above template and be happy.

SQL fishing using WLM

$
0
0

The SQLs while DB2 is up and running can be captured in different ways such as:

  • Query column STMT_TEXT from table functions such as
    • MON_GET_ACTIVITY or WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES, and MON_GET_ACTIVITY_DETAILS –> SQLs that have been submitted but not yet completed or currently running SQLs
    • MON_GET_PKG_CACHE_STMT and MON_GET_PKG_CACHE_STMT_DETAILS –> For SQLs that have completed and are in package cache

It is necessary to understand the difference between SQLs that are currently running and those which are in package cache or have been completed. We can find out details about the SQLs that are currently running like their origin – from where they came from like identifications such as work load, client connection parameters etc. These are like rivers (currently running SQLs) ready to flow into an ocean (Package cache). The SQLs that are in package cache looses its origin like from where it came from.

Their is no guarantee that you will always find a SQL in the package cache depending upon how active the system is and how frequently SQLs are being flushed out of this cache to make room for other SQLs ready to get dumped in this cache.

The view MON_CURRENT_SQL shows similar information for the SQLs that are currently running. The view TOP_DYNAMIC_SQL shows SQLs from the package cache. These two views show subset of the information that you get from the table functions as shown above. Which one should you use – monitoring table functions as they are efficient particularly for multi-partition databases and for pureScale members.

DB2 also provides event monitor to collect SQLs including the ones that were evicted from the package cache and static as well as dynamic SQL statements. You can create an event monitor like:

create event monitor evmon_pkgcache for package cache write to table manualstart
set event monitor evmon_pkgcache state 1

and run SQL SELECT * FROM PKGCACHE_EVMON_PKGCACHE to get the SQL statement text and choose other parameters for analysis. Please notice MANUALSTART. The default is AUTOSTART in which the event monitor will start automatically on next database activation. This is a bad idea. You should control when to start and stop the event monitors as they are expensive.

But, this is a sledge hammer approach and I normally do not use this. This is a too much drag on the system.

I like to use a approach that has the least impact and only concerns with a particular type of the work load that I am interested in. The idea is to minimize the impact on the monitored database.

This approach requires that you use a proper method of identifying the workload. If you run command db2 list applications and you see the name as db2jcc_appl, you are not doing your job right. It means that you have failed to ask the application developers to properly identify the application names. This can be done by variety of methods such as by setting the proper JCC driver properties etc. It requires a separate write-up as this is a long topic. It is easy to do from any application server and requires some work for Java and non-Java applications. Assuming that you have a proper way of defining your workload, follow this least impact method to capture the SQL.

We like to call it SQL fishing.

Use these SQLs and customize them as per your needs.

Step-1

create workload wl_dsm applname('DSMRtMonBg','DSMRtMonFg', 'DSMRtMonPram') 
disable collect aggregate unit of work data;

Step-2

create work class set user_wcs (work class expensive_dml 
work type dml for timeroncost from 1000 to unbounded);

Step-3

create work action set user_was 
for workload wl_dsm using work class set user_wcs (
work action expensive_action on work class expensive_dml 
collect activity data with details
) disable;

Step-4

create threshold th_expsql for workload wl_dsm activities 
enforcement database disable 
when estimatedsqlcost > 1000
collect activity data on all members
with details continue;

Step-5

create event monitor db2thresviolations for 
threshold violations write to table manualstart;

Step-6

create event monitor db2activities for 
activities write to table manualstart;

set event monitor db2thresviolations state 1;
set event monitor db2activities state 1;
alter workload wl_dsm enable;
alter work action set user_was enable;
alter threshold th_expsql enable;
set workload to automatic;

I wrote above to find out the expensive SQL statements thrown from the application Data Server Manager to see the SQLs that have costs more than 1000 and just capture those.

First I define the WORKLOAD and this is the statement that you have to define it as per your needs and look DB2 Knowledge Center for the syntax to see what other features that you could use. This step should be done carefully to just focus only one application at a time and to bundle all database connections in single workload.

The second and third statements are optional and I just created to create a turnstile counter to report on the SQLs that were expensive compared to the total SQLs so that I have something to compare how improvements are occurring. Since my work action set name was USER_WAS, I could then just look SELECT * FROM table (MON_GET_WORK_ACTION_SET_STATS('USER_WAS',-1)) as t for total activity under expensive SQL.

The 4th step is the most important as I am creating a threshold for the workload (only selected thing) for SQL cost greater than 1000. This is called SQL fishing as I am throwing a net in the ocean and I am not catching all fish but the one I need more than a particular size. If you understand how THRESHOLDS can be used, you can do wonderful things in DB2. Now, this threshold domain is only for the workload but the enforcement is at the database level.

The 5th and 6th steps are for creating the event monitors for the threshold violations and capturing the SQLs. The rests are enable statements and starting the event monitors.

Start your application and let it run for the designated time and then run the following SQL to capture the count of threshold violations and the SQLs that were expensive.

select count(*) from THRESHOLDVIOLATIONS_DB2THRESVIOLATIONS;

SELECT TIME_OF_VIOLATION,
 STMT_TEXT STMT_TEXT
FROM THRESHOLDVIOLATIONS_DB2THRESVIOLATIONS TV,     
     ACTIVITYSTMT_DB2ACTIVITIES A
WHERE TV.APPL_ID = A.APPL_ID
AND TV.UOW_ID = A.UOW_ID
AND TV.ACTIVITY_ID = A.ACTIVITY_ID;

Once you have captured the SQL texts, do not forget to turn-off the event monitors.

This is the light approach and should be used to filter the SQLs that you do not need by using the threshold intelligently.

I am giving you the template for SQL fishing and control and I helped one of the largest bank to control the execution in such a fashion to monitor and control selectively.

SET WORKLOAD TO SYSDEFAULTADMWORKLOAD;

CREATE SERVICE CLASS USERS_SC DISABLE;

CREATE WORKLOAD USER_WL SESSION_USER ROLE ('BATCHROLE') DISABLE;

ALTER WORKLOAD USER_WL SERVICE CLASS USERS_SC;

GRANT USAGE ON WORKLOAD USER_WL TO ROLE BATCHROLE;

CREATE WORK CLASS SET USER_WL_WORK_CLASS_SET 
( 
WORK CLASS COSTLY_DML WORK TYPE DML FOR TIMERONCOST FROM 500000 TO 6000000, 
WORK CLASS EXPENSIVE_DML WORK TYPE DML FOR TIMERONCOST FROM 6000001 TO UNBOUNDED
);

CREATE WORK ACTION SET USER_WL_WORK_ACTION_SET 
FOR WORKLOAD USER_WL 
USING WORK CLASS SET USER_WL_WORK_CLASS_SET 
( 
   WORK ACTION COSTLY_ACTION_ALLOW ON WORK CLASS COSTLY_DML 
   WHEN CONCURRENTDBCOORDACTIVITIES > 5 
        AND QUEUEDACTIVITIES UNBOUNDED 
   COLLECT ACTIVITY DATA WITH DETAILS CONTINUE, 
   WORK ACTION EXPENSIVE_ACTION_PREVENT ON WORK CLASS 
        EXPENSIVE_DML PREVENT EXECUTION 
) DISABLE;

ALTER SERVICE CLASS USERS_SC ENABLE;
ALTER WORKLOAD USER_WL ENABLE;
ALTER WORK ACTION SET USER_WL_WORK_ACTION_SET ENABLE;

SET WORKLOAD TO AUTOMATIC;

Now the above template is simple. I just defined a service class without any attributes. You can use resource control as you like. To be very frank, I am not a fan of resource control in the first place. But, that is my personal preference as I sincerely believe that the true WLM is achieved through concurrency control and through the proper usage of the thresholds.

Then I create a workload. This is the most important step as how you create it to club activities that you see as an application.

Then I create a work class set to define costly and expensive SQLs. Please don’t go over the difference between costly and expensive. You can call them peter and pan.

Then, I define a work action set that has concurrency control and a threshold to prevent SQLs that are expensive. The number that you use for concurrency control should be 2 or 4 less than the number of cores/threads you have in your system. This will not overwhelm the system for large number of queries to run concurrently. If there are no cores available then the query will wait which is far better than large number of queries than the available cores you have available.

Now optionally you can create an ACTIVITY event monitor as shown previously and capture the SQLs that are costly. The expensive ones are zapped automatically. Please notice that we did not create a threshold in this case and we used the WORK ACTION SET capability to do two things in one step for two defined WORK CLASS SETS.

The WLM is one of the best feature in DB2 but least understood and least implemented. What you learned above will work provided you define the workload appropriately.

GPFS Disk Cleanup

$
0
0

During pureScale instance creation, you might run into instance creation failed condition. Under some circumstances, it is possible that db2icrt does not perform a proper clean-up for GPFS and it may fail next time complaining that the disk is already a GPFS disk. When NSDs are created, GPFS writes the inode table at several places for recovery purposes and by simply using dd command to wipe out first few KB at starting of the disk may not be sufficient at times and GPFS signatures are still there.

What is the easiest way to clean / remove GPFS signatures from the disk?

You can use the following commands to first create a GPFS cluster, start it and then create NSD and then delete it. This will remove the disk cleanly.

#!/bin/bash
DOMAIN=zinox.com
NODE=db2test01
PEERDOMAIN=gpfs
DISK=/dev/sdc

mmcrcluster -N $NODE:manager-quorum -p $NODE \ 
     -R /usr/bin/scp -r /usr/bin/ssh -C $PEERDOMAIN -U $DOMAIN
mmchlicense server --accept -N $NODE
mmstartup -a
mmgetstate -a
sleep 10
cat << EOF | tee disk.txt
%nsd:
 device=$DISK
 nsd=nsd01
 usage=dataAndMetadata
EOF

mmcrnsd -F disk.txt -v no
mmdelnsd -F disk.txt
mmshutdown
mmdelnode $NODE
mmlscluster

The most important commands are mmcrnsd -F disk.txt -v no using -v switch to no and then mmdelnsd -F disk.txt.

Kernel upgrade and impact on pureScale

$
0
0

In Linux environment, it is a common practice to upgrade the kernel for security patches, vulnerability fixes etc. And if you are running a DB2 pureScale environment, it is likely that you are going to break the GPFS GPL as soon as you upgrade the kernel and this will result into DB2 pureScale GPFS getting broken. If by any chance, you were running a production environment – you invited a big trouble for yourself.

The reason being – When you create a DB2 pureScale instance, it will compile GPL and those libraries are stored in /lib/modules/`uname -r`/extra folder. When kernel gets upgraded, the uname -r value changes and GPFS can no longer locate GPFS libraries in the new /lib/modules/`uname -r`/extra since this no longer exists. So what is the procedure one should follow to safely upgrade the Linux kernel and also get the appropriate GPFS GPL for that kernel.

You need to ask one question from your SA before they attempt to build a new kernel for whatever reason. Do you have a compatible GPFS for the Linux kernel that you are upgrading? Who can reply this? You can go to this IBM link https://www.ibm.com/support/knowledgecenter/SSFKCN/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html and search for latest supported Linux Distributions and it will have supported Linux kernel version. If you are on latest and greatest Kernel version and it may so happen that you may not see the kernel version that you are looking at. How do you go forward? Here are the steps.

Get a test environment with same Kernel version as you have in your pureScale environment and it is not necessary that you have to have pureScale in this test environment. Install GPFS RPMs from your DB2 software directory. The sample commands are:

# cd /root/download
# tar xvfz gpfs-4.1.tar.gz 
# tar xvfz gpfs-4.1.1.11.tar.gz

# cd 4.1
# rpm -ivh gpfs.base-4.1.0-0.x86_64.rpm
# rpm -ivh gpfs.gskit-8.0.50-16.x86_64.rpm
# rpm -ivh gpfs.msg.en_US-4.1.0-0.noarch.rpm
# rpm -ivh gpfs.gpl-4.1.0-0.noarch.rpm

# cd ../4.1.1.11

# rpm -Uvh gpfs.base-4.1.1-11.x86_64.update.rpm
# rpm -Uvh gpfs.gpl-4.1.1-11.noarch.rpm
# rpm -Uvh gpfs.gskit-8.0.50-47.x86_64.rpm
# rpm -Uvh gpfs.msg.en_US-4.1.1-11.noarch.rpm

# rpm -qa | grep -i gpfs
gpfs.msg.en_US-4.1.1-11.noarch
gpfs.gpl-4.1.1-11.noarch
gpfs.base-4.1.1-11.x86_64
gpfs.gskit-8.0.50-47.x86_64

OR

# rpm -ivh 4.1/*.rpm
# rpm -Uvh 4.1.1.11/*.rpm

After installation, run command /usr/lpp/mmfs/bin/mmbuildgpl to build the GPFS build directory. The GPFS kernel libraries will be under /lib/modules/`uname -r`/extra.

Now do the kernel upgrade and reboot the machine and again run /usr/lpp/mmfs/bin/mmbuildgpl to build the GPFS build directory. If it is successful, you are good to go as new GPFS libraries are in new /lib/modules/`uname -r`/extra

But if for some reason – mmbuildgpl fails and it will if the Kernel that you upgraded has not been validated for the GPFS. What are the choices?

Check from the above IBM link if your new kernel is supported or not. If not, open a PMR and request for a special build of GPFS for the kernel that you have to upgrade for and install / upgrade the new RPMs and then repeat the /usr/lpp/mmfs/bin/mmbuildgpl and you are good to go if this succeeds,

If the kernel is supported but your DB2 software does not have the newer version then you should look for the DB2 fix pack that has the GPFS version that supports the new kernel. You can do a google search on DB2 Software Compatibility Matrix Report and this will take you to an IBM link and you will be able to find out the DB2 fix pack that has the desired GPFS version. Get the fix pack and go to server/db2/linuxamd64/gpfs folder and run the commands:

# ./db2ckgpfs -v media
# ./db2ckgpfs -v install

To know the GPFS version in media and the installed version. The base and fp directories will have the base and GPFS fix packs and run appropriate rpm commands to either upgrade or install it.

Your other option is to search IBM Fix Central and look for the desired GPFS fixpacks and then install / upgrade it. But, this is not my preferred path. I would always like to get the GPFS software from the db2 software directory.

How to detect log full condition

$
0
0

I have seen working with hundreds of customers a chronic LOG FULL condition and the shortcut people use is to increase the LOGPRIMARY and LOGSECOND. This solution works well for a single unit of work that can not fit into the total log space available.

So, first let’s calculate the total log space available:

LOG SIZE = (LOGPRIMARY + LOGSECOND) * 4 * LOGFILSIZ

For example, if my LOGPRIMARY=2, LOGSECOND=1 and LOGFILSIZ=4096, then my LOG SIZE = (2+1) * 4 * 4096 = 49, 152 KB = 48 MB.

If you look at the physical log files:

$ ls -l *.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:38 S0000000.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:21 S0000001.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:23 S0000002.LOG

If you look at the size of each log file, it is more by 8 KB for each file than what our actual calculations show. This is the overhead of the log space.

The LOG FULL conditions can appear most for the following two conditions:

  • The size of a single unit of work is greater than the actual size of the log files. In this case, if we are doing INSERTS and if each unit of work exceeds 48 MB, we will run out of the log space. This issue can be happily resolved by increasing all or a combination of database configurations parameters such as: LOGPRIMARY, LOGSECOND and LOGFILSIZ. This appears to solve the LOG FULL condition.
  • The second most common cause of the LOG FULL is not from the single unit of work exceeding all available log space. But, it is from the fact that a single transaction has started the unit of work and it has not yet finished with a result that the active log file is still open. You continue to do the work and when it is turn of this log file to be archived (archive logging) or use it again (circular logging), you will get LOG FULL condition. If you are using LOAD utility, the LOAD record gets logged and that log file will remain open for the duration of the LOAD command. During this condition, if a need arises to either archive the log or use the same file again, you will get LOG FULL condition.

You can use the following SQL to determine the connection (doing the work on behalf of the application) that is holding the oldest transaction that has not yet been committed. For this type of LOG FULL condition, it is best to kill the connection and let the work keep going.

SELECT MEMBER,
       TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
       TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
       CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
            OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
            OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
            ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
               (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100 
       END)) AS DECIMAL (5,2)) AS USED_PCT,
       APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY USED_PCT DESC;

You may see a sample output as shown below:

MEMBER LOG_AVAILABLE_MB     LOG_USED_MB          USED_PCT APPLID_HOLDING_OLDEST_XACT
------ -------------------- -------------------- -------- --------------------------
     0                    0                    46   98.37                        185

If you notice that the LOG used is more than 98% and the connection holding the log file hostage is 185. Find out who that person is and get it fixed to release the active log file to remove the log full condition.

If you want to go further and see how long this transaction has been on hold, run the following SQL.

select a.application_handle, 
       a.workload_occurrence_state as status, 
       substr(a.session_auth_id,1, 10) as authid,
       substr(c.application_name, 1, 10) as applname, 
       int(a.uow_log_space_used/1024/1024) as logusedM, 
       timestampdiff(4, char(current timestamp 
          - b.agent_state_last_update_time)) as idleformin 
from table(mon_get_unit_of_work(NULL,-2)) as a, 
     table(mon_get_agent(NULL,NULL,NULL,-2)) as b, 
     table(mon_get_connection(NULL, -2)) as c 
where a.application_handle = b.application_handle 
  and a.coord_member = b.member
  and a.coord_member = a.member 
  and b.agent_type = 'COORDINATOR'
  and a.uow_stop_time is null
  and a.application_handle = c.application_handle 
  and a.coord_member = c.member 
  and b.event_type = 'WAIT' 
  and b.event_object = 'REQUEST' 
  and b.event_state = 'IDLE';

The sample output is shown as below:

APPLICATION_HANDLE   STATUS                           AUTHID     APPLNAME      LOGUSEDM  IDLEFORMIN 
-------------------- -------------------------------- ---------- ---------- ----------- -----------
                 185 UOWWAIT                          DB2PSC     db2bp                0          72

If you notice that connection 185 has not used much log space (less than 1 MB) and it has been waiting since last 72 minutes. This is drag on the system and this has the potential of bringing the system down to its knees.

But, please note the SQL LOG FULL condition for condition number 2 as explained above depends upon the log file that is active and that can happen for variety of reasons. One of the other common cuase that I have seen is the bad programming in which ROLLBACK is not specified on an exception and commit never happened for that unit of the work. This situation will lead to LOG FULL condition if that transaction is not resolved.


DB2 pureScale Health Check

$
0
0

More and more customers are now asking for some way to check the health of the DB2 pureScale system. Let’s just focus on few early diagnostics before we jump to the deep exploration at the SQL statement level. This is just akin going to primary health care physician to catch symptoms early.

The DB2 pureScale heart is Cluster Caching Facility (CF) and if this slows down, it will have effect on everything.

If we just focus early on the symptoms exhibited by CF, it is equivalent to finding the cure for the disease.

In my experience working with hundreds customers all over North America, almost 99.9% of the folks jump right to the SQL tuning. Their is nothing wrong in it but I consider it to be treating the symptom and than the disease. A particular SQL could as well be the real issue and I do not deny that but it is last on my to-do list to tune a system. Let’s get on track to check the health of pureScale system.

I am just copying few SQLs here that I have used in many situations to derive some conclusions on the health of the CF.

CF Health Check

If CF becomes the bottleneck, everything slows down. This is on my priority list to check its health. As per Steve Rees, a performance expert Guru, I have settled on the following 3 SQLs to examine CF health.

SLS and WARM Rate

SELECT CF_CMD_NAME, 
       DEC(CAST(SUM(TOTAL_CF_WAIT_TIME_MICRO) AS FLOAT) / 
       CAST(SUM(TOTAL_CF_REQUESTS) AS FLOAT),5,0) AS CF_CMD_RATE 
FROM   TABLE (SYSPROC.MON_GET_CF_WAIT_TIME(-2)) 
WHERE  ID = (select ID from sysibmadm.db2_cf where STATE = 'PRIMARY') 
AND    CF_CMD_NAME IN ('SetLockState', 'WriteAndRegisterMultiple') 
GROUP BY CF_CMD_NAME 
HAVING SUM(TOTAL_CF_WAIT_TIME_MICRO) > 0 
AND    SUM(TOTAL_CF_REQUESTS) > 0
;

In above SQL, we are just taking 2 CF commands SetLockState and WriteAndRegisterMultiple and finding out the average rate in micro seconds. The SetLockState (SLS) is a small packet and it should bea quick turnaround and gives a baseline for all other messages. The other message WriteAndRegisterMultiple (WARM) is a larger packet and will include XI time. High WARM times can indicate both heavier network activity and more page invalidation, and measuring WARM times periodically can reveal when statement times are being adversely affected. (Courtesy: Toby Haynes of IBM Lab in Toronto).

Overall CF Wait Rate

The following SQL is the average rate over all CF commands. This is another overall indicator but it has to be compared against a baseline that you collected in a healthy system.

-- AVG_CF_WT_MICRO is an indication if CF is the bottleneck
-- This is an average over all CF calls
-- Best way to judge good or bad number - Look for a change
-- from what is normal for your system

SELECT INT(SUM(RECLAIM_WAIT_TIME)) RECLAIMTIME_MILLI,
 INT(SUM(CF_WAITS)) AS NUM_CF_WAITS,
 INT(SUM(CF_WAIT_TIME)) CF_WAIT_MILLI, 
 CAST ((CASE WHEN SUM(CF_WAITS) = 0 THEN NULL 
 ELSE (INT(1000 * DEC(SUM(CAST(CF_WAIT_TIME AS FLOAT))/
 SUM(CAST(CF_WAITS AS FLOAT)),5,4)))
 END) AS DECIMAL(10,4)) AVG_CF_WT_MICRO 
FROM TABLE(SYSPROC.MON_GET_WORKLOAD('',-2)) AS t
;

However, the output from above two SQLs should be seen along with the page negotiation rate. The page negotiation refers to a page getting tossed around due to request made by another member. If the page negotiation rate is much higher, it will lead to performance degradation as more and more time is spent in shuffling / tossing around the page between members through CF. Use the following SQL to determine this rate.

Page Negotiation Rate

-- For example: If Member 'A' acquires a page P and modifies a row on it.
-- 'A' holds an exclusive page lock on page until 'A' commits
-- Member 'B' wants to modify a different row on the same page.
-- 'B' does not have to wait until 'A' commits
-- CF will negotiate the page back from 'A' on 'B's behalf.
-- Provides better concurrency - Good but excessive can cause
-- contention, low CPU usage, reduced throughput.

-- Recoomendations to reduce excessive page reclaim
-- What is excessive is debatable and sort it in desc order 
-- for all tables and pick top 5 tables
-- If excessive page reclaim then do these
-- Consider smaller page size
-- For small HOT tables with frequent updates, increase PCTFREE
-- PCTFREE will spread rows over more pages
-- Side effect - More space consumption (but this is small table anyway)
 
SELECT MEMBER, 
       SUBSTR(TABSCHEMA,1,12) AS SCHEMA, 
       SUBSTR(TABNAME,1,16) AS NAME,
       SUBSTR(OBJTYPE,1,16) AS TYPE,
      (PAGE_RECLAIMS_X + PAGE_RECLAIMS_S) AS PAGE_RECLAIMS, 
       RECLAIM_WAIT_TIME
FROM TABLE( MON_GET_PAGE_ACCESS_INFO(NULL,NULL, NULL) ) AS WAITMETRICS
ORDER BY SCHEMA, NAME
;

The output from the above 3 SQLs can be collected at 15 minute interval plot the time series graph and observe the trend. The increase in SLS and WARM rate by a factor of 2 to 3 with reference to a baseline will indicate that CF network is becoming the bottleneck.

Adding more adapters at the CF will help resolve this bottleneck.

Group Buffer Pool Full Condition

Use the following SQL to determine group buffer pool full condition. This condition will cause a stall condition in which synchronous I/O will run to flush the pages through members to make room in the Group Buffer Pool. This will lead to a burst of the I/O activity. Increasing the CF memory may delay the burst activity.

-- GBP Full Condition
-- Good value: < 5 per 10000 transactions
-- If higher, GBP is small
-- The castout engines might not be keeping up
-- SOFTMAX is set too high

WITH GBPC AS ( 
 SELECT 10000.0 * SUM(GBP.NUM_GBP_FULL) / 
        SUM(COMMIT_SQL_STMTS) AS GBP_FULL_CONDITION 
 FROM TABLE(MON_GET_GROUP_BUFFERPOOL(-2)) as GBP, SYSIBMADM.SNAPDB
) SELECT CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'GOOD VALUE' 
 ELSE 'GBP FULL CONDITION' END AS RESULT,
 CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'NO GBP FULL CONDITION' 
 ELSE 'INCREASE CF_GBP_SZ OR DECREASE SOFTMAX OR INCREASE NUM_IOSERVERS' 
 END AS RECOMMENDATION
 FROM GBPC 
;

CF Swap Usage

We may not be able to avoid few hundred pages of swap but excessive swapping will adversely affect the performance of the system as a whole. Keep an eye of the CF swap usage. If you are not dipping into swap space, you have kept one problem at bay.

WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_CF_SYS_RESOURCES 
               GROUP BY NAME)
SELECT A.ID, 
 SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME,
 INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE ,
 INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE
FROM SYSIBMADM.ENV_CF_SYS_RESOURCES A, TYPES T
WHERE A.NAME = T.NAME
GROUP BY A.ID
;

Member Swap Usage

WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_SYS_RESOURCES 
 GROUP BY NAME)
SELECT SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME,
 INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE ,
 INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE
FROM SYSIBMADM.ENV_SYS_RESOURCES A, TYPES T
WHERE A.NAME = T.NAME
;

SAN System Performance

The SAN system performance measurement is not accurately possible using DB2 but we can measure the rate at which pages are flushed to the disk from the buffer pool and from the DB2 logs. This is a crude indication of the performance of the I/O subsystem.

The castout is the term used on z/OS DB2 sysplex, which is similar to page cleaning in DB2 LUW. Calculate number of writes per transaction and calculate time per write. If this time is increasing over 10 ms, it is an indication that I/O subsystem is not able to keep-up.

-- Calculate number of writes / transactions (CASTOUTS_PER_TRANSACTION)
-- Calculate time per write (CASTOUT_TIME_MILLI_PER_TRANSACTION)
-- 
-- Bursty write activity may be a sign of SOFTMAX being high
-- Long Write Time (CASTOUT_TIME_MILLI_PER_TRANSACTION) is an
-- indication that I/O subsystem may not be able to keep up
-- 
-- Castout activity is influenced by
-- SOFTMAX - Lower value means faster group crash receovery but more
-- aggressive cleaning
-- Consider setting SOFTMAX higher than equivalent EE system

SELECT CASE WHEN SUM(W.TOTAL_APP_COMMITS) < 100 
       THEN NULL ELSE
 CAST( FLOAT(SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES))
 / SUM(W.TOTAL_APP_COMMITS) AS DECIMAL(6,1)) END 
 AS "CASTOUTS_PER_TRANSACTION",
 CASE WHEN SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) < 1000 
    THEN NULL ELSE
 CAST( FLOAT(SUM(B.POOL_WRITE_TIME))
 / SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) AS DECIMAL(5,1)) END 
 AS "CASTOUT_TIME_MILLI_PER_TRANSACTION"
FROM TABLE(MON_GET_WORKLOAD(NULL,NULL)) AS W, 
TABLE(MON_GET_BUFFERPOOL(NULL,NULL)) AS B
;

If the value of CASTOUTS_PER_TRANSACTION is less than 10 and CASTOUT_TIME_MILLI_PER_TRANSACTION is less than 1 ms, the I/O subsystem is keeping well and indicates that it is a modern flash based SAN storage.

Size of Database

select member, 
 decimal(sum(double(tbsp_used_pages) * tbsp_page_size ) / 
 1024 / 1024, 10, 2 ) as db_mb_used 
from table( mon_get_tablespace(null, -2)) 
group by member
;

Detect Log Full Condition

SELECT MEMBER,
       TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
       TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
       CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
            OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
            OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
            ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
               (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100
       END)) AS DECIMAL (5,2)) AS USED_PCT,
       APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY USED_PCT DESC;
From above, find out the % of log space used and most importantly the application id that is holding a log file active, which may lead to a log full condition.

Detect Log Hog / Drag Application

select a.application_handle,
       a.workload_occurrence_state as status,
       substr(a.session_auth_id,1, 10) as authid,
       substr(c.application_name, 1, 10) as applname,
       int(a.uow_log_space_used/1024/1024) as logusedM,
       timestampdiff(4, char(current timestamp
          - b.agent_state_last_update_time)) as idleformin
from table(mon_get_unit_of_work(NULL,-2)) as a,
     table(mon_get_agent(NULL,NULL,NULL,-2)) as b,
     table(mon_get_connection(NULL, -2)) as c
where a.application_handle = b.application_handle
  and a.coord_member = b.member
  and a.coord_member = a.member
  and b.agent_type = 'COORDINATOR'
  and a.uow_stop_time is null
  and a.application_handle = c.application_handle
  and a.coord_member = c.member
  and b.event_type = 'WAIT'
  and b.event_object = 'REQUEST'
  and b.event_state = 'IDLE';
Use above SQL to find out the application that has been idle or waiting for a long time and this can hold a log file active.

DB2 pureScale instance creation hangs after GPFS

$
0
0

While creating DB2 pureScale instance, it appears that the node becomes unresponsive under RHEL 7.2. If you reboot the node and look at the /var/log/messages, you may notice these several messages:

kernel:BUG: soft lockup - CPU#1 stuck for 23s! [mmfsd:3280]

The mmfsd is GPFS (aka IBM Spectrum Scale) file system daemon and somehow it looks that this is the cause of the this CPU soft look.

The other symptom of soft CPU lockup is the high queue seen in the vmstat output. Please look at the first column ‘r’ under procs. The value of ‘r’ would be very high in this case.

It looks that Supervisor Mode Access Prevention (SMAP) feature of Intel Xeon V4 processor (Broadwell) and Linux kernel 3.7 or later causes mmfsd to not have access to some memory space. The SMAP feature in Intel Broadwell family of CPU (including Intel Core i7 6820 HQ) has the protection enabled which disallows access from kernel-space memory to user-space memory, a feature aimed at making it harder to exploit software bugs. Now, GPFS is a kernel level access and this feature is disallowing GPFS access of kernel-space memory with a result that soft lockup of CPU occurs and that leads to system appearing hung-up.

This causes the node to appear to hang but it is actually soft CPU lockup issue as seen with the above command. The soft CPU lockup also causes the high queue – with a result that the system becomes non-responsive.

The RHEL 7.2 kernel has the support for SMAP feature by default. If your cpu has this feature or not, you can check the output from cat /proc/cpuinfo | grep smap and if you see smap in the flags section, you have this Supervisor Mode Access Prevention (SMAP) feature enabled.

GPFS has fixed this issue in v4.2.1.1 but the version that comes with DB2 11.1 FP 1 is v4.1.1.9. If you are using later version of DB2, you can find out the version of GPFS that will be installed by looking at file spec in the folder <db2softwaredir>/server_t/db2 directory.

You can disable smap feature in Linux kernel by adding kernel parameter nosamp as shown below (for RHEL 7.2).

  1. Locate grub.cfg  –> It can be in different places depending upon legacy or EFI boot. In my case, this was in /boot/grub2/grub.cfg
  2. Edit grub.cfg and find line associated with the system image such as line containing vmlinuz and add “nosamp” parameter at the end.

Alternatively, you can use edit /etc/default/grub and add nosamp parameter at the end of line containing GRUB_CMDLINE_LINUX and then run grub2-mkconfig -o /boot/grub2/grub.cfg and reboot the system.

You can also run the grubby command to add this kernel parameter.

# grubby --update-kernel=ALL --args=nosmap

 

 

 

 

DB2 pureScale – No Product License found

$
0
0

In db2diag, you may see this message – No product license found.

Sample output:

 2017-05-19-13.00.58.511626-240 E2575E1007 LEVEL: Severe
PID : 35363 TID : 70366589481392 PROC : db2start
INSTANCE: purf01 NODE : 000
HOSTNAME: va33dlvudb001
FUNCTION: DB2 UDB, high avail services, sqlhaExecuteCommandSQO, probe:1062
DATA #1 : String, 13 bytes
va33dlvudb003
DATA #2 : String, 26 bytes
/usr/sbin/rsct/bin/samlicm
DATA #3 : String, 2 bytes
-s
DATA #4 : String, 4 bytes
root
DATA #5 : String, 6 bytes
purf01
DATA #6 : String, 6 bytes
purf01
DATA #7 : unsigned integer, 4 bytes
1055
DATA #8 : Boolean, 1 bytes
false
DATA #9 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 3508 bytes
commandResponse->callRC: 0x00000000
commandResponse->output: No product license found.
 
 
DATA #10: Hex integer, 4 bytes
0x00000000
DATA #11: Hex integer, 4 bytes
0x00000001
CALLSTCK: (Static functions may not be resolved correctly, 
as they are resolved to the nearest symbol)
 [0] 0x0000000000000000 ?unknown + 0x0
 [1] 0x0000000000000000 ?unknown + 0x0
 
2017-05-19-13.00.58.513621-240 I3583E741 LEVEL: Error
PID : 35363 TID : 70366735337216 PROC : db2start
INSTANCE: purf01 NODE : 000
HOSTNAME: va33dlvudb001
FUNCTION: DB2 UDB, high avail services, 
sqlhaGetLicenseStatusFromEachHost, probe:24843
MESSAGE : Problem running command on the host.
DATA #1 : SQLHA Remote Command Set, PD_TYPE_SQLHA_COMMAND_SET, 292120 bytes
commandSet->numCommands: 4
commandSet->options: NONE
commandSet->previousDb2RshCmd:
DATA #2 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 8 bytes
commandResponse->callRC: 0x0D9C36A0
commandResponse->output: NOT_POPULATED
DATA #3 : unsigned integer, 8 bytes
3
DATA #4 : String, 27 bytes
node03.zinox.com

From the above output, it looks that node03.zinox.com has the product license issue.

Fix Problem

Go to /var/opt/sam/lic

# cd /var/opt/sam/lic
# ls -l
-rw-r----- 1 root root 86 May  8 14:02 nodelock

If permission shows 640, change it 644

# chown 644 nodelock

Somehow, in the process of locking or unlocking this file, RSCT forgets to change the permission back. Most probably, this issue could be result of time difference between hosts but I am not sure. I will ask RSCT development on this.

db2haicu disable / enable and scripting

$
0
0

It is recommended that you always update RSCT / TSA from db2 software media and not from IBM Fix Central. The reason – DB2 is tested and certified with the version that ships with the DB2 software.

If you go to server_t/db2/linuxamd64/tsamp and run db2cktsa command using -v install (what is installed) or -v media (what is in media), you can choose to update the software.

[root@node03 tsamp]# pwd
/root/download/server_t/db2/linuxamd64/tsamp
[root@node03 tsamp]# ./db2cktsa -v install
4.1.0.3

[root@node03 tsamp]# ./db2cktsa -v media
4.1.0.3

How to find RSCT version installed?

# /usr/sbin/rsct/install/bin/ctversion -b

How to find TSA version installed

# /usr/sbin/rsct/bin/samversion

How to use db2haicu -enable or db2haicu -disable for scripting?

A. For disable, create a file called db2haicu.disable as shown:

$ cat << EOF > db2haicu.disable
> 1
> EOF

The above command will just add one line having 1 in it. You can also create same using vi or any other editor.

Then, you can run db2haicu through script to disable as:

$ db2haicu -disable < db2haicu.disable

B. For enable, create a file called db2haicu.enable for scripting as:

$ cat << EOF > db2haicu.enable
> 1
> 1
> EOF

Then, you can run db2haicu through script to enable as:

$ db2haicu < db2haicu.enable

DB2 11.1 BLU with DPF Monitoring scripts

$
0
0

I just wrapped a very large PoC for BLU with DPF with very high ingestion rate (110K messages per second) and as well as very high volume point queries (1500 queries per second) and I used the following queries to measure the various parameters and optimum tuning.

The STMM is off by default for DPF with BLU but we should take advantage of the some automation.

Rule – 1 : If you are using more than one MLN (Multiple Logical Node) per physical host (or LPAR), divide total memory of the host by the number of MLN and then do the calculation. Say for example: If each host has 256 GB RAM and decide to use 2 MLN per host then the available memory per MLN is 128 GB. (This is also misnomer – the actual memory in Linux is usually less than what is the installed memory and we should go with what free command says as available memory and not the one that is installed –> For example: The installed memory is 256 GB but available that Linux reports is only 251 GB so I would divide 251 GB with MLN say 2.)

Rule – 2: Keep INSTANCE_MEMORY to 90-95% of the available memory to the MLN. Keep INSTANCE_MEMORY to NON-automatic.

Rule – 3: Keep DATABASE_MEMORY to 90% of the INSTANCE_MEMORY. Keep DATABASE_MEMORY to AUTOMATIC. {Please note this.}

Rule – 4: Keep 40% of the DATABASE_MEMORY to SHEAPTHRES_SHR and SORTHEAP to 1/20 of the SHEAPTHRES_SHR. Now, for my workload, I had to the keep SHEAPTHRES_SHR to 60% and the ratio to 1/5 as my system was not having sufficient memory to process a very large data set.

Rule – 5: If continuous LOAD is performed, keep UTIL_HEAP_SZ to 5000 pages and AUTOMATIC and let db2 adjust the memory. Their is no need to bump this up if UTIL_HEAP_SZ and DATABASE_MEMORY are AUTOMATIC (even though STMM is turned off)

Ruke – 6: Keep buffer pool 40% of the DATABASE_MEMORY and keep it NON-AUTOMATIC. Again adjust this size based upon query performance requirement.

Monitoring queries for BLU with DPF.

 $ cat mondpf
#!/bin/bash

SECONDS=25200

db2 connect to <database>

endTime=$(($(date +%s) + SECONDS))
while [ $(date +%s) -lt $endTime ]
do
 echo Skew report;
 db2 -tf skew.sql
 echo Sort info;
 db2 -tf sort.sql
 echo logfull info;
 db2 -tf logfull.sql
 echo BP hit ratio;
 db2 -tf bp.sql
 echo Members info;
 db2 -tf members.sql
 sleep 60
 #db2 "call monreport.dbsummary(60)"
done

This skew script is very important in a multi-member (either pureScale or DPF) to determine if any of the member is slow in writing or reading the information from this disk. No matter how much monitoring you do for a FC card performance, you can’t make out anything just by seeing the values in isolation if a particular is slow in writing. This is one of most difficult problem to solve. It took me 2 weeks to figure this out and this SQL was the saver through which we found out the members that were slow in writing and reading –> This led us to change the FC cards and when this was done, the performance gain was 4 times. Through this script – by comparing the  direct_write_time of every member, we determined which member FC cards were slow in writing. It was a big help.

 $ cat skew.sql
select member,
 bp_cur_buffsz,
 direct_writes,
 pool_data_writes,
 direct_write_time,
 pool_write_time,
 pool_col_writes,
 pool_async_col_writes
from TABLE(MON_GET_BUFFERPOOL('',-2))
where bp_name in ('MINBP','MAXBP')
order by member
;

In BLU with DPF, the maximum performance gain occurs by having a properly sized SHEAPTHRES_SHR and the ratio between SHEAPTHRES_SHR and SORTHEAP. A proper tuning and adjusting the values will be helpful but know through this script of sort overflows are occurring and minimizing that by increasing SHEAPTHRES_SHR.

 $ cat sort.sql
echo Monitor Sort Memory Usage - Obtain current and maximum sort usage for the database;

SELECT MEMBER, SORT_SHRHEAP_ALLOCATED,
 SORT_SHRHEAP_TOP
FROM TABLE(MON_GET_DATABASE(-2))
ORDER BY MEMBER
;

echo Extract percentage of sort operations that have spilled and high watermark sort usage;

WITH ops AS (
SELECT
 member, (total_sorts + total_hash_joins + total_hash_grpbys)as sort_ops,
 (sort_overflows + hash_join_overflows + hash_grpby_overflows) as overflows,
 sort_shrheap_top as sort_heap_top
FROM TABLE (mon_get_database(-2))
) SELECT member, sort_ops,
 overflows,
 (overflows * 100) / nullif(sort_ops,0) as pctoverflow,
 sort_heap_top
from ops
order by member;

The log full script is the one each DBA must keep in their pocket all the time. I have seen 100s of DBAs who will just increase the LOGPRIMARY and LOGSECOND whenever they run into LOG FULL condition without realizing as what is causing this. Find out the offending application that is holding a LOG file hostage and thus leading to LOG FULL condition. I have covered this in detail somewhere also.

 $ cat logfull.sql
SELECT MEMBER,
 TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
 TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
 CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
 OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
 OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
 ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
 (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100
 END)) AS DECIMAL (5,2)) AS USED_PCT,
 APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY member, USED_PCT DESC;

The buffer pool hit ratio is meaningless to me. It would be good 99.9% of the time and then why do we measure it? The key is to see if we have enough pool size or not. I am still searching for the silver bullet. This is least to monitor in my perspective but still including the script.

 $ cat bphit.sql
WITH BPMETRICS AS (
 SELECT bp_name,
 pool_data_l_reads + pool_temp_data_l_reads +
 pool_index_l_reads + pool_temp_index_l_reads +
 pool_xda_l_reads + pool_temp_xda_l_reads as logical_reads,
 pool_data_p_reads + pool_temp_data_p_reads +
 pool_index_p_reads + pool_temp_index_p_reads +
 pool_xda_p_reads + pool_temp_xda_p_reads as physical_reads,
 member
 FROM TABLE(MON_GET_BUFFERPOOL('',-2)) AS METRICS)
 SELECT
 VARCHAR(bp_name,20) AS bp_name,
 logical_reads,
 physical_reads,
 CASE WHEN logical_reads > 0
 THEN DEC((1 - (FLOAT(physical_reads) / FLOAT(logical_reads))) * 100,5,2)
 ELSE NULL
 END AS HIT_RATIO,
 member
 FROM BPMETRICS;

This script requires that you create a workload for the purpose of the monitoring. This is the best way to find out performance of a client application at a member level. I have covered elsewhere how to do this in detail.

 $ cat members.sql
select substr(host_name,1,20) host, 
       t.member, act_completed_total + act_aborted_total AS TOTAL_TRANS,
       cast(total_act_time/1000.0 as decimal(16,2)) ACT_TIME_SECS 
from table(mon_get_workload('Your Work Load Name',-2)) as t,
     table (env_get_system_resources()) m 
where m.member = t.member 
order by host, t.member;

I like to run as few scripts as possible for the purpose of monitoring and let the database do the job rather than to monitor every aspect of it – which is not a good idea once you have tuned the system. The monitoring has a drag in the same fashion as 20 managers keep asking a high performing employee every hour the status of the work. I have seen this in all major corporations that I have visited in last 14 years. We have too many status seeking managers than high performing employees now-a-days. Does it sound right?


 

Data Server Manager 2.1.4 announced on June 22, 2017

DB2 pureScale Utility Scripts

$
0
0

Most of the utility scripts here are given for Linux using bash shell. You can use them after making syntax changes from bash to ksh.

Most of the script require that you create a node information file in /root/bin/backup/ip.txt.

A sample /root/bin/backup/ip.txt is given as below.

01 192.168.100.101 192.168.100.1 node01.zinox.com
02 192.168.100.102 192.168.100.1 node02.zinox.com
03 192.168.100.103 192.168.100.1 node03.zinox.com
04 192.168.100.104 192.168.100.1 node04.zinox.com

DB2 Backup Tuning

$
0
0

One of my colleague Mukesh V Desai did backup tuning for a pureScale customer and just by making the following change, reduced the backup time from from 4.5 hours to 30 minutes.

Original config: Number of buffers = 8, Parallelism = 4 and Compress = YES

New Config: Number of buffers = 16, Parallelism = 8 and Compress = NO.

 

 

 

IBM Claims Big Breakthrough in Deep Learning

$
0
0

A Fortune article by Barb Darrow.

Few interesting highlights:

  • IBM used 64 of its own Power 8 servers—each of which links both general-purpose Intel microprocessors with Nvidia graphical processors with a fast NVLink interconnection to facilitate fast data flow between the two types of chips.
  • Expanding deep learning from a single eight-processor server to 64 servers with eight processors each can boost performance some 50 to 60 times
  • system achieved 95% scaling efficiency across 256 processors
  • In terms of image recognition, the IBM system claimed a 33.8% accuracy rate working with 7.5 million images over seven hours. The previous record, set by Microsoft, was 29.8% accuracy, and that effort took 10 days
  • Caffe deep learning framework created at the University of California at Berkeley was used in the technology.
  • Popular Google TensorFlow framework can likewise run atop this new technology

New Era in Distributed Computing with Blockchains and Databases

$
0
0

C Mohan, IBM Fellow

Mohan gave his opening key note address at the 37th IEEE International Conference on Distributed Computing Systems (ICDCS) in Atlanta (USA) on 6 June 2017.
A new era is emerging in the world of distributed computing with the growing popularity of blockchains (shared, replicated and distributed ledgers) and the associated databases as a way of integrating inter-organizational work. Originally, the concept of a distributed ledger was invented as the underlying technology of the cryptocurrency Bitcoin. But the adoption and further adaptation of it for use in the commercial or permissioned environments is what is of utmost interest to me and hence will be the focus of this keynote. Computer companies like IBM and Microsoft, and many key players in different vertical industry segments have recognized the applicability of blockchains in environments other than cryptocurrencies. IBM did some pioneering work by architecting and implementing Fabric, and then open sourcing it. Now Fabric is being enhanced via the Hyperledger Consortium as part of The Linux Foundation. A few of the other efforts include Enterprise Ethereum, R3 Corda and BigchainDB.
While there is no standard in the blockchain space currently, all the ongoing efforts involve some combination of database, transaction, encryption, consensus and other distributed systems technologies. Some of the application areas in which blockchain pilots are being carried out are: smart contracts, supply chain management, know your customer, derivatives processing and provenance management. In this talk, I will survey some of the ongoing blockchain projects with respect to their architectures in general and their approaches to some specific technical areas. I will focus on how the functionality of traditional and modern data stores are being utilized or not utilized in the different blockchain projects. I will also distinguish how traditional distributed database management systems have handled replication and how blockchain systems do it. Since most of the blockchain efforts are still in a nascent state, the time is right for database and other distributed systems researchers and practitioners to get more deeply involved to focus on the numerous open problems.

Videos and Slides

Video (latest) of C Mohan’s presentation at the University of Waterloo on 6 July 2017: http://bit.ly/uOwBlK Presentation Slides (latest): http://bit.ly/uOwBcP
Video of presentation at Royal Bank of Canada (RBC) Toronto on 4 July 2017: http://bit.ly/RbCtBc Presentation Slides: http://bit.ly/rBcPpT
Slides from 27 June 2017 talk (http://bit.ly/neDCsv) at the IEEE Computer Society Silicon Valley Chapter meeting at Texas Instruments, Santa Clara, USA: http://bit.ly/iEeEbc
Slides from 6 June 2017 keynote at the 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017): http://bit.ly/kBCdCs
Updated version of Mohan’s Hong Kong blockchain presentation that he gave to IBMers worldwide on 19 April 2017: http://bit.ly/iBmDBb
Link to video of presentation given at Hong Kong University of Science Technology (HKUST) on 7 April 2017: http://bit.ly/bcUSTv Slides: http://bit.ly/BCustS

 

C Mohan presentation in Hong Kong University of Science and Technology.

Db2 pureScale netname does not exist

$
0
0
ERROR: DBI20122E The instance was not created or updated because the following netname does not exist: "node01.zinox.com".

The above error might come from different reasons but our case was due to misplaced entries in the /etc/hosts file.

The natural convention for /etc/hosts file is the IP Address FQDN ShortName but some customers especially have a different syntax. They use IP address ShortName FQDN and this has the issue when db2icrt command is run.

If net name for member and CF used in the db2icrt command has FQDN and the /etc/hosts file has IP address ShortName FQDN syntax, then you are likely to get this error.

Two options: Either follow the accepted convention of using proper syntax for /etc/hosts file or do not use FQDN while defining the net names for CF and Member.

Hope this helps someone who runs into this issue.

 

Big Data Hype and Reality

$
0
0

I can not put a name to a customer but they spent millions to port their data warehouse to Hadoop and 1 year later rolled everything back to the Db2 again since their results were not matching and the data was getting lost. This is a classic example of misuse of the technology. This stems from the fact that even enterprizes who earns money by using the software tend to get into this murky water of jumping the band wagon of the tendency to not pay for the software and instead use open source. People generally site example of Google, FaceBook or others but little they forget these companies employ best of the brains who are highly paid and they have the ability to get things done using the open source.

Dr. C Mohan gave this talk sometime back on Big Data: Hype and Reality and this echoed very well with my experience.

Viewing all 67 articles
Browse latest View live