SQL Server – Understanding the TOP operator and implications for batch processing

I was debating using a picture of Kylie Minogue in this blog post, from her video for the song “Spinning Around”.  Because it is relevant, and not just because it’s a picture of Kylie….

kylie-289808
Kylie likes loops

Anyway.  Today, we had a problem where “all of a sudden” CPU was maxing out on a server, and the disk I/O was going through the roof.  Uh-oh, I thought, today is going to be rubbish.  Rolling up my sleeves, I opened up SQL Sentry Performance Advisor to have a look what was going on.

The output looked like this:-

SQLSentryOutput

Right away I thought:  there’s some nasty table scans going on here, as the reads for each call of this procedure are exactly the same each time.  Then, I started to wonder why it was getting called so frequently.  First, I got up and shouted at some developers, as is my wont, asking them if they’d set a service or something to run every minute.

Whilst they were looking at that, I had a look at the query plan which is below, after being anonymised by SQL Sentry’s nice “anonymise my sql plan” button (I promise, I don’t work for SQL Sentry, nor am endorsed by them; they just make a great monitoring tool):-

Query Plan

I’ve highlighted the bits that got me concerned – a section in the bottom right where a merge join is being performed on two “not too huge” tables, and, what bothered me far more, a TOP operator after this reducing the rows to 50.

Looking at the procedure call itself, this is what was being repeatedly called:

ProcedureCall.jpg

@intMaximumBatchSize=50.  Now where have we seen the number 50 before?

Sure enough, having a look at the procedure, this statement was an insert from a SELECT statement using TOP (@intMaximumBatchSixe).  However, because the statement in question involved several joins between a few tables (including some horrible, horrible CASE based join statements, but that’s for another time), there was no way that the TOP operator could possibly be applied until the full result set of the SELECT had been produced.  So, despite only returning 50 rows, we’ve effectively done the workload for the entire result set.

After these 50 rows had then been gathered, some further processing went off and produced some data for some pdfs.  Then, because there was still work to do, it started up again and did the entire workload again, for another 50 pdfs.  And again…. spinning round and round and round.  Hence the Kylie reference.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s