Viewing block chains on SQL Server using DMVs and a recursive cte

Locks and subsequently blocks are an integral part of a database in order to ensure that data integrity is met. However, if a process is blocked for any significant period of time, then this has a detrimental effect on your application’s performance.

I find that it’s rare that there’s a single process being blocked; usually if a block is occurring then it will be a bunch of processes, and these in turn may end up causing subsequent blocks: before you know it, we have a whole swathe of blocked processes, and application that’s hanging and users complaining left, right and centre.

Handily, SQL Server has procedures and DMVs that show us which process is blocking any given process. Unfortunately, in a big block chain, this means you have to piece all this together to find the root cause of the block chain and then make a decision on how to deal with the problem.

In order to do this, I use the script at the foot of the page; I keep it as a snippet in my Redgate SQL Prompt settings, but you could easily create a stored procedure for this (if so, please reference this post with a link in the comments!).

The code is comprised of a couple of CTEs (common table expressions), the first collects the data we need and the second is a recursive CTE to work out the block chain.

The query for the first CTE cteCurrentSessions is a standalone query and can be run on its own:-

SELECT des.session_id
, des.login_name
, des.program_name
, s.name AS AgentJobName
, der.cpu_time
, der.total_elapsed_time
, der.wait_resource
, sqlt.text LastSQL
, sqlt2.text CurrentSQL
, der.blocking_session_id
FROM sys.dm_exec_sessions AS des
INNER JOIN sys.dm_exec_connections dec
	ON des.session_id = dec.session_id
LEFT OUTER JOIN sys.dm_exec_requests AS der
	ON des.session_id = der.session_id
OUTER APPLY sys.dm_exec_sql_text (most_recent_sql_handle) AS sqlt
OUTER APPLY sys.dm_exec_sql_text (sql_handle) AS sqlt2
LEFT OUTER JOIN msdb.dbo.sysjobs AS s
	ON CASE WHEN program_name LIKE 'SQLAgent - TSQL JobStep%' 
	THEN CAST(CONVERT(binary(16),SUBSTRING(program_name, CHARINDEX('(Job ',program_name) + 5, 34),1) AS uniqueidentifier) 
	ELSE NULL END = s.job_id

Here, we’re returning the list of all the sessions and connections currently open on our server using the DMVs dm_exec_sessions and dm_exec_connections. The DMV dm_exec_sessions returns the pertinent information we need to identify the details of that session; the session_id (the spid), the login, the program the session was called by. However, we can’t just use dm_exec_sessions because a key part of this query is the column most_recent_sql_handle which only exists in dm_exec_connections. This is important because quite often, a block chain can be caused when someone ran some code and left a transaction open, but their code has finished. If we were to look at the dm_exec_requests view, there would be no entry for that session, and yet we need to be able to establish what code that session was running to identify where the block lies.

Next up is a series of outer joins and outer applies to get our execution data; they are all outer joins/applies due to the fact there may not be any active request running. Dm_exec_requests gives us most of what we want; the cpu_time, total_elapsed_time (in ms), the wait_resource (usually the object we want to acces that is locked by something else) and the blocking_session_id: the session that is causing this one to be blocked. We also use the sql_handle column to get the details of the text of the call from the dm_exec_sql_text function; as this is a function we have to use APPLY rather than joins. We use the same function to get the most_recent_sql mentioned previously.

Lastly, if the session is the result of a SQL Server Agent job, we have a few hoops to jump through to get the job name. The program_name column for a job step displays along the following lines:-

SQLAgent – TSQL JobStep (Job 0x79E16FD54FFD954EA77BCB2FAD037E0E : Step 1)

If you’re like me, at first glance this is pretty indecipherable in terms of establishing which job this actually refers to! However, the binary displayed is actually the binary(16) of the uniqueidentifier that relates to the job’s job_id in msdb.dbo.sysjobs. So, first we need to pull this bit of the string out. We know that it will always be a 34 character string, and we know that it always comes after the portion of the string “(Job “, so we can use CHARINDEX and SUBSTRING to get purely that bit of the string:-

SUBSTRING(program_name, CHARINDEX('(Job ',program_name) + 5, 34)

Once we have it, we can then convert the string to be a binary(16) datatype, and then convert that value to get the uniqueidentifier and the ability to join to the sysjobs table. This last conversion isn’t strictly necessary as SQL Server will implicitly convert the binary, but I find that it helps having it in when explaining the code and what is actually going on:-

CAST(CONVERT(binary(16),SUBSTRING(program_name, CHARINDEX('(Job ',program_name) + 5, 34),1) AS uniqueidentifier)

Lastly, because we can’t guarantee that we’ll get something that correctly converts to a uniqueidentifier, we wrap the conversion statement in a CASE statement in order to pass out NULL if we’re not expecting to get an agent job out, and use this in our LEFT OUTER JOIN to the sysjobs table:-

LEFT OUTER JOIN msdb.dbo.sysjobs AS s
	ON CASE WHEN program_name LIKE 'SQLAgent - TSQL JobStep%' 
		THEN CAST(CONVERT(binary(16),SUBSTRING(program_name, CHARINDEX('(Job ',program_name) + 5, 34),1) AS uniqueidentifier) 
		ELSE NULL END = s.job_id

Ok, so that’s the meat of the data we’re pulling covered. We’re now going to use a recursive CTE to navigate from the start of each process which isn’t being blocked, to work out whether it’s blocking anything else and if they in turn are blocking other processes.

If you’re not familiar with recursive CTEs, the BOL page has everything you need to get to grip with the basics.

The joy of the recursive CTE is that we can return the “root” of each block chain on every row, and also calculate a “level”; i.e. the degree of seperation of the session we’re looking at and the original block.

Essentially, all we’re doing is getting the rows which might be the start of the block chain, then joining that set to any sessions that are being blocked by them. Once this is done, we check that set for any sessions that are being blocked by the second level; and continue this until we get no more blocks.

The final output of this is simply to exclude any sessions that aren’t causing any blocks, and then order the output in each block chain for ease of analysis.

The full code is below:-

WITH cteCurrentSessions AS (SELECT des.session_id
						, des.login_name
						, des.program_name
						, s.name AS AgentJobName
						, der.cpu_time
						, der.total_elapsed_time
						, der.wait_resource
						, sqlt.text LastSQL
						, sqlt2.text CurrentSQL
						, der.blocking_session_id
						FROM sys.dm_exec_sessions AS des
						INNER JOIN sys.dm_exec_connections dec
							ON des.session_id = dec.session_id
						LEFT OUTER JOIN sys.dm_exec_requests AS der
							ON des.session_id = der.session_id
						OUTER APPLY sys.dm_exec_sql_text (most_recent_sql_handle) AS sqlt
						OUTER APPLY sys.dm_exec_sql_text (sql_handle) AS sqlt2
						LEFT OUTER JOIN msdb.dbo.sysjobs AS s
							ON CASE WHEN program_name LIKE 'SQLAgent - TSQL JobStep%' 
									THEN CAST(CONVERT(binary(16),SUBSTRING(program_name, CHARINDEX('(Job ',program_name) + 5, 34),1) AS uniqueidentifier) 
									ELSE NULL END = s.job_id)
, BlockChain AS (SELECT session_id
				, login_name
				, program_name
				, AgentJobName
				, cpu_time
				, total_elapsed_time
				, wait_resource
				, LastSQL
				, CurrentSQL
				, blocking_session_id
				, session_id AS RootBlockingSession
				, 1 AS BlockLevel
				FROM cteCurrentSessions
				WHERE blocking_session_id = 0
				OR blocking_session_id IS NULL

				UNION ALL

				SELECT cs.session_id
				, cs.login_name
				, cs.program_name
				, cs.AgentJobName
				, cs.cpu_time
				, cs.total_elapsed_time
				, cs.wait_resource
				, cs.LastSQL
				, cs.CurrentSQL
				, cs.blocking_session_id
				, bc.RootBlockingSession
				, BlockLevel + 1
				FROM cteCurrentSessions cs
				INNER JOIN BlockChain bc
				ON cs.blocking_session_id = bc.session_id)

SELECT * 
FROM BlockChain
WHERE RootBlockingSession IN (SELECT RootBlockingSession
							 FROM BlockChain
							 WHERE BlockLevel > 1)
ORDER BY RootBlockingSession
, BlockLevel
, blocking_session_id
, session_id
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s