I will be honest here, at one point I had become a little blind to seeing CXPACKET waits on my SQL instances. As a DBA having to work across lots of different instances, you will see this quite a bit (or at least I have).
Countless times had I heard phrases such as “Yeah nothing to worry about there” and “that’s just indicating parallelism is happening”.
But, dear reader, that’s not exactly true. It took me far too long to realise the following concept and so I thought I would write a post about it.
Parallel queries are good, right?
As with most questions around SQL Server, the answer here is ‘it depends’. I’m going to try and demonstrate how it’s not always the best thing you want to see within your execution plans, in certain scenarios.
For this demo, we will be using an instance with 4 Cores. The database we’re going to use will be the StackOverFlow database (the 140Gb one).
First, for the purposes of this demo, we will set the defaults for SQL Server regarding the MAXDOP and Cost Threshold For Parallelism. Out of the box, SQL Server will set these as:
- MAXDOP = 0
- Cost Threshold For Parallelism = 5
EXEC sys.sp_configure N'show advanced options', N'1' RECONFIGURE WITH OVERRIDE EXEC sys.sp_configure N'cost threshold for parallelism', N'5' EXEC sys.sp_configure N'max degree of parallelism', N'0' GO
Next, we will set up a supporting index for our demo on the Users table
CREATE INDEX Location ON dbo.Users (Location); GO
So that’s the setup sorted. Now we’re going to run a simple query to find which users reside in a specific location, from the Users table
SELECT u.Id, u.DisplayName FROM dbo.Users u WHERE u.Location = N'Netherlands'
The plan for this query looks as you would expect. We are seeking into our Location index to find the matched locations first. After that, we are performing a key lookup to go and find the DisplayNames associated with each user found. Nothing surprising here, yet.
As you will note, we have our parallel operator indicating that everything to the right of said operator went parallel. This is surely a good thing right?
Let’s take a look just how much this query went parallel. Digging into the properties for our index seek on Location, we will click into the Actual I/O Statistics > Actual Logical Reads:
Ok, so we went parallel great, but hold on a second. Only one thread actually read any rows (ignore Thread 0, that’s our coordinating thread to manage all the others). Thread 2 was the only one doing any work here, the rest of our threads went and took a nap.
If we check the key lookup operator, what do you know, the same thread did all the work here too:
You may be thinking that it’s not that much of a big deal and within the context of this one tiny query, I would agree. This query still finishes in less than a second.
However, we have to mindful when seeing that parallel operator that all may not appear as it first seems. Sure this query tells us that it went parallel across 4 threads, but only one of those threads read any rows, the rest simply had nothing to do and took a nap. Remember that for the next part.
Using the same query were going to add an operator which will require a memory grant, a Sort.
SELECT u.Id, u.DisplayName FROM dbo.Users u WHERE u.Location = N'Netherlands' ORDER BY u.DisplayName;
As you would expect, our plan looks pretty similar, apart from the fact we now have a Sort operator in the mix
However, you may notice something else too. We also have that horrid yellow bang on the sort operator. Looking at the tooltip for the sort operator, we see this
This query had to spill to disk so that it could complete the operation. What? Why? This query returns just 8952 rows, why did we have to spill to disk?
The More You Know
The total memory grant for this entire query was 4.1Mb.
We know that everything to the right of our parallelism operator went ‘parallel’ and that parallelism was grossly uneven (waves at Mr.CXPACKET)
Lets look at the memory allocation for the sort operator
So the memory grants were all perfectly distributed across all four threads. Yes, the memory was perfectly distributed across ‘all’ threads. That works out at just 936Kb per thread allocated. Remember our diligent thread number 2 from earlier? That poor guy got just 936 to handle ALL of the work and we wasted 2.8Mb of memory on the workers that were taking a nap (Threads 1, 3 and 4). This is why we had to spill to disk.
Rows read per thread is not linked to the amount of memory each thread gets. Memory was shared out across all threads evenly despite the fact that only thread 2 did all the work.
A Bad Work-Around
For the purposes of this demo, we can ‘correct’ this by doing the following, but who would do this, seriously. Disclaimer: Seriously, don’t do this as a solution
SELECT u.Id, u.DisplayName FROM dbo.Users u WHERE u.Location = N'Netherlands' ORDER BY u.DisplayName OPTION (MAXDOP 1);
Using the hint MAXDOP 1 within our query forces the plan to go serial. Let’s take another look at the thread distribution for rows read on the index seek
As expected, just one thread does all the work, but the difference is this time that SQL Server knows this plan did not run parallel because we told it not to. Let’s take a look at that memory distribution now on the Sort operator
Perfect, so now all the memory was allocated to the one thread, the only thread. Because of this, our Sort operator no longer spills to disk
Best Practice Is Normally For The Best
One of the first things I remember being taught as a DBA was to ensure you change SQL Server’s defaults regarding MAXDOP and Cost Threshold For Parallelism. At the time, I wasn’t quite sure why, however.
I hope this post illustrates one of the many reasons why you should change these settings from their defaults to suit your workloads accordingly.
Parallel queries are not always for the best and we can see how having queries go parallel can sometimes bring about its own issues.
So, the next time you see that CXPACKET wait type, don’t just disregard it like I did so many times, it just might be hiding a nasty little surprise.
Thanks for reading.