MSBI (SSIS/SSRS/SSAS) Online Training

Monday, July 22, 2013

SQL Server stored procedure Performance Tuning (best practices)

When developing stored procedures, there seems to be a lot of emphasis on "get it done fast." Which means type all lower case, pay little attention to formatting, and sometimes throw best practices out the window. Personally, I would rather front-load my development time; I think that the costs I pay in initial development far outweigh what I might have paid in maintenance down the road. Making readable and maintainable code that also performs well and is delivered in a timely manner is something that a lot of us strive for, but we don't always have the luxury. But I have found that it is very easy to fall into the good kind of development habits.
A popular adage is, "you can have it fast, cheap, or good. Pick two." I contend that if you develop habits like these and use them in all of your database programming, the time difference between following those methods and doing it the "lazy" way will be negligible at most; and so, fast and good go hand in hand, rather than trade off for one another.
Once in a while this "disorder" slows me down. I come across code that someone else wrote (almost exclusively it is someone I no longer work with), and I can't even bear to look at it without first re-writing it. Here is a fake but realistic example of the kinds of procedures I see:
create proc foo(@i int,@bar int=null,@hr int output,@xd datetime) as
declare
@c varchar
declare
@s nchar(2)
declare @x int
set
@grok='Beverly'
set @korg='MA'
set @x=5
select customers.customerid,firstname,lastname,orderdate from customers join orders on
customers.customerid=orders.customerid where status=@i or status<=@bar and orderdate<=@xd
set @hr = @@rowcount
select customers.customerid,count(*) from customers left join orders on
customers.customerid
=orders.customerid where customers.city=@c and customers.state=@s
group by
customers.customerid having count(*)>=@x
return (@@rowcount)
This kind of feels like the 5th grade all over again, but when I get handed code like this, I start immediately visualizing one of those "find all of the things wrong with this picture" exercises, and feel compelled to fix them all. So, what is wrong with the above sample, you may ask? Well, let me go through my own personal (and quite subjective) subconscious checklist of best practices when I write my own stored procedures. I have never tried to list these all at once, so I may be all over the place, but hopefully I will justify why I choose to have these items on my checklist in the first place.
======================
Upper casing T-SQL keywords and built-in functions
I always use CREATE PROCEDURE and not create procedure or Create Procedure. Same goes for all of the code throughout my objects... you will always see SELECT, FROM, WHERE and not select, from, where. I just find if much more readable when all of the keywords are capitalized. It's not that hard for me to hold down the shift key while typing these words, and there are even IDEs that will do this kind of replacement for you (for example, Apex SQLEdit has a handy "mis-spelled keyword replacement" feature that I think could be used for this purpose also). This is probably one of the few areas where Celko and I actually agree. :-)
======================
Using a proper and consistent naming scheme
Obviously "foo" is a horribly ridiculous name for a procedure, but I have come across many that were equally nondescript. I like to name my objects using {target}_{verb}. So for example, if I have a Customers table, I would have procedures such as:
dbo.Customer_Create
dbo.Customer_Update
dbo.Customer_Delete
dbo.Customer_GetList
dbo.Customer_GetDetails
This allows them to sort nicely in Object Explorer / Object Explorer Details, and also narrows down my search quickly in an IntelliSense (or SQLPrompt) auto- complete list. If I have a stored procedures named in the style dbo.GetCustomerList, they get mixed up in the list with dbo.GetClientList and dbo.GetCreditList. You could argue that maybe these should be organized by schema, but in spite of all the buzz, I have not developed a need or desire to use schemas in this way. For most of the applications I develop, ownership/schema is pretty simple and doesn't need to be made more complex.
Of course I NEVER name stored procedures using the sp_ prefix. See Brian Moran's article in SQL Server Magazine back in 2001. Or just ask anybody. :-) I also avoid other identifying object prefixes (like usp_). I don't know that I've ever been in a situation where I couldn't tell that some object was a procedure, or a function, or a table, and where the name really would have helped me all that much. This is especially true for the silly (but common) "tbl" prefix on tables. I don't want to get into that here, but I've always scratched my head at that one. Views may be the only place where I think this is justified, but then it should be a v or View_ prefix on the views only; no need to also identify tables... if it doesn't have a v or View_ prefix, it's a table!
More important than coming up with a proper naming scheme (because that is mostly subjective), it is much more important that you apply your naming scheme consistently. Nobody wants to see procedures named dbo.Customer_Create, dbo.Update_Customer and dbo.GetCustomerDetails.
======================
Using the schema prefix
I always specify the schema prefix when creating stored procedures. This way I know that it will be dbo.procedure_name no matter who I am logged in as when I create it. Similarly, my code always has the schema prefix on all object references. This prevents the database engine from checking for an object under my schema first, and also avoids the issue where multiple plans are cached for the exact same statement/batch just because they were executed by users with different default schemas.
======================
Using parentheses around parameter list
I am not a big fan of using parentheses around the parameter list. I can't really explain it, as I am a proponent of consistency, and this is the syntax required when creating user-defined functions. But I wanted to mention it because you will not see any of my stored procedures using this syntax. I'm open to change if you can suggest a good enough reason for me to do so.
======================
Lining up parameter names, data types, and default values
I find this much easier to read:
CREATE PROCEDURE dbo.User_Update
  
@CustomerID     INT,
  
@FirstName      VARCHAR(32)     = NULL,
  
@LastName       VARCHAR(32)     = NULL,
  
@Password       VARCHAR(16)     = NULL,
  
@EmailAddress   VARCHAR(320)    = NULL,
  
@Active         BIT             = 1,
  
@LastLogin      SMALLDATETIME   = NULL
AS
BEGIN

...
...than this:
CREATE PROCEDURE dbo.User_Update
@CustomerID INT,
@FirstName VARCHAR(32) = NULL,
@LastName VARCHAR(32) = NULL,
@Password VARCHAR(16) = NULL,
@EmailAddress VARCHAR(320) = NULL,
@Active BIT = 1,
@LastLogin SMALLDATETIME = NULL
AS
BEGIN
...
======================
Using spaces and line breaks liberally
This is a simple one, but in all comparison operators I like to see spaces between column/variable and operator. So instead of @foo int=null or where @foo>1 I would rather see @foo INT = NULL or WHERE @foo > 1.
I also tend to place at least a carriage return between individual statements, especially in stored procedures where many statements spill over multiple lines.
Both of these are just about readability, nothing more. While in some interpreted languages like JavaScript, size is king, and compressing / obfuscating code to make it as small as possible does provide some benefit, in T- SQL you would be hard-pressed to find a case where this comes into play. So, I lean to the side of readability.
======================
Avoiding data type / function prefixes on column / parameter names
I often see prefixes like @iCustomerID, @prmInputParameter, @varLocalVariable, @strStringVariable. I realize why people do it, I just think it muddies things up. It also makes it much harder to change the data type of a column when not only do you have to change all the variable/parameter declarations but you also have to change @iVarName to @bigintVarName, etc. Otherwise the purpose of the prefixed variable name loses most of its benefit. So, just name the variable for what it is. If you have a column EmailAddress VARCHAR(320), then make your variable/parameter declaration @EmailAddress VARCHAR(320). No need to use @strEmailAddress ... if you need to find out the data type, just go to the declaration line!
======================
Using lengths on parameters, even when optional
I occasionally see people define parameters and local variables as char or varchar, without specifying a length. This is very dangerous, as in many situations you will get silent truncation at 30 characters, and in a few obscure ones, you will get silent truncation at 1 character. This can mean data loss, which is not very good at all. I have asked that this silent truncation at least become consistent throughout the product (see Connect #267605), but nothing has happened yet. Fellow MVP Erland Sommarskog has gone so far as to ask for the length declaration to become mandatory (see Connect #244395) and, failing that, feels that this should be something that raises a warning when using his proposed SET STRICT_CHECKS ON setting (see http://www.sommarskog.se/strict_checks.html#nodefaultlength).
======================
Listing output parameters last
My habit is to list OUTPUT parameters last. I am not sure why that is exactly, except that it is the order that I conceptually think about the parameters... in then out, not the other way around.
======================
Using BEGIN / END liberally
I have seen many people write stuff like this:
CREATE PROCEDURE dbo.ProcedureA
AS
   SELECT
* FROM foo;
  
GO
   SELECT * FROM bar;
GO
They create the procedure, maybe don't notice the extra resultset from bar (or shrug it off), and then wonder why they only get results from foo when they run the procedure. If they had done this:
CREATE PROCEDURE dbo.ProcedureA
AS
BEGIN
   SELECT
* FROM foo;
  
GO
   SELECT * FROM bar;
END
GO
Because GO is not a T-SQL keyword but rather a batch separator for tools like Query Analyzer and SSMS, they would have received these error messages, one from each batch:
Msg 102, Level 15, State 1, Procedure ProcedureA, Line 4
Incorrect syntax near ';'.
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'END'.
Yes, errors are bad, and all that, but I would rather have this brought to my face when I try to compile the procedure, then later on when the first user tries to call it.
======================
Using statement terminators
I have quickly adapted to the habit of ending all statements with proper statement terminators (;). This was always a habit in languages like JavaScript (where it is optional) and C# (where it is not). But as T-SQL gets more and more extensions (e.g. CTEs) that require it, I see it becoming a requirement eventually. Maybe I won't even be working with SQL Server by the time that happens, but if I am, I'll be ready. It's one extra keystroke and guarantees that my code will be forward-compatible.
======================
Using SET NOCOUNT ON
I always add SET NOCOUNT ON; as the very first line of the procedure (after BEGIN of course). This prevents DONE_IN_PROC messages from needlessly being sent back to the client after every row-affecting statement, which increases network traffic and in many cases can fool applications into believing there is an additional recordset available for consumption.
NOTE
I do not advocate blindly throwing SET NOCOUNT ON into all of your existing stored procedures. If you have existing applications they might actually already be working around the "extra recordset" problem, or there may be .NET applications that are using its result. If you code with SET NOCOUNT ON from the start, and keep track of rows affected in output parameters when necessary, this should never be an issue. Roy Ashbrook got beat up about this topic at a Tampa code camp last summer, and
wrote about it here.
======================
Using local variables
When possible, I always use a single DECLARE statement to initialize all of my local variables. Similarly, I try to use a single SELECT to apply values to those variables that are being used like local constants. I see code like this:
declare @foo int
declare
@bar int
declare
@x int
set
@foo = 5
set @bar = 6
set @x = -1
And then some more declare and set statements later on in the code. I find it much harder to track down variables in longer and more complex procedures when the declaration and/or assignments can happen anywhere... I would much rather have as much of this as possible occurring in the beginning of the code. So for the above I would rather see:
DECLARE
  
@foo    INT,
  
@bar    INT,
  
@x      INT;

SELECT
  
@foo    = 5,
  
@bar    = 6,
  
@x      = -1;
As a bonus, in SQL Server 2008, the syntax now supports changing the above into a single statement:
DECLARE
  
@foo    INT = 5,
  
@bar    INT = 6,
  
@x      INT = -1;
So much nicer. However, it still leaves a lot to be desired: I also always use meaningful variables names, rather than @i, @x, etc.
Also, some people like listing the commas at the beginning of each new line, e.g.:
DECLARE
  
@foo    INT = 5
  
,@bar   INT = 6
  
,@x     INT = -1;
Not just in variable declarations, but also in parameter lists, columns lists, etc. While I will agree that this makes it easier to comment out individual lines in single steps, I find the readability suffers greatly.
======================
Using table aliases
I use aliases a lot. Nobody wants to read (never mind type) this, even though I have seen *many* examples of it posted to the public SQL Server newsgroups:
SELECT
  
dbo.table_X_with_long_name.column1,
  
dbo.table_X_with_long_name.column2,
  
dbo.table_X_with_long_name.column3,
  
dbo.table_X_with_long_name.column4,
  
dbo.table_X_with_long_name.column5,
  
dbo.table_H_with_long_name.column1,
  
dbo.table_H_with_long_name.column2,
  
dbo.table_H_with_long_name.column3,
  
dbo.table_H_with_long_name.column4
FROM
  
dbo.table_X_with_long_name
INNER JOIN
  
dbo.table_H_with_long_name
ON
  
dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
  
OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
  
OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
WHERE
  
dbo.table_X_with_long_name.column1 >= 5
  
AND dbo.table_X_with_long_name.column1 < 10;
But as long as you alias sensibly, you can make this a much more readable query:
SELECT
  
X.column1,
  
X.column2,
  
X.column3,
  
X.column4,
  
X.column5,
  
H.column1,
  
H.column2,
  
H.column3,
  
H.column4
FROM
  
dbo.table_X_with_long_name AS X
INNER JOIN
  
dbo.table_H_with_long_name AS H
ON
  
X.column1 = H.column1
  
OR X.column2 = H.column2
  
OR X.column3 = H.column3
WHERE
  
X.column1 >= 5
  
AND X.column1 < 10;
The "AS" when aliasing tables is optional; I have been trying very hard to make myself use it (only because the standard defines it that way). When writing multi-table queries, I don't give tables meaningless shorthand like a, b, c or t1, t2, t3. This might fly for simple queries, but if the query becomes more complex, you will regret it when you have to go back and edit it.
======================
Using column aliases
I buck against the trend here. A lot of people prefer to alias expressions / columns using this syntax:
SELECT [column expression] AS alias
I much prefer:
SELECT alias = [column expression]
The reason is that all of my column names are listed down the left hand side of the column list, instead of being at the end. It is much easier to scan column names when they are vertically aligned.
In addition, I always use column aliases for expressions, even if right now I don't need to reference the column by an alias. This prevents me from having to deal with multiple errors should I ever need to move the query into a subquery, or cte, or derived table, etc.
======================
Using consistent formatting
I am very fussy (some co-workers use a different word) about formatting. I like my queries to be consistently readable and laid out in a predictable way. So for a join that includes a CTE and a subquery, this is how it would look:
WITH cte AS
(
   
SELECT
        
t.col1,
        
t.col2,
        
t.col3
   
FROM
        
dbo.sometable AS t
)
SELECT
   
cte.col1,
   
cte.col2,
   
cte.col3,
   
c.col4
FROM
  
 cte
INNER JOIN
   
dbo.Customers AS c
   
ON c.CustomerID = cte.col1
WHERE EXISTS
(
    
SELECT 1
       
FROM dbo.Orders o
       
WHERE o.CustomerID = c.CustomerID
)
AND
c.Status = 'LIVE';
Keeping all of the columns in a nice vertical line, and visually separating each table in the join and each where clause. Inside a subquery or derived table, I am less strict about the visual separation, though I still put each fundamental portion on its own line. And I always use SELECT 1 in this type of EXISTS() clause, instead of SELECT * or SELECT COUNT(*), to make it immediately clear to others that the query inside does NOT retrieve data.
======================
Matching case of underlying objects / columns
I always try to match the case of the underlying object, as I can never be too certain that my application will always be on a case-sensitive collation. Going back and correcting the case throughout all of my modules will be a royal pain, at best. This is much easier if you are using SQL Server 2008 Management Studio against a SQL Server 2008 instance, or have invested in Red-Gate's SQL Prompt, as you will automatically get the correct case when selecting from the auto-complete list.
======================
Qualifying column names with table/alias prefix
I always qualify column names when there is more than one table in the query. Heck, sometimes I even use aliases when there is only one table in the query, to ease my maintenance later should the query become more complex. I won't harp on this too much, as fellow MVP Alex Kuznetsov treated this subject a few days ago.
======================
Using RETURN and OUTPUT appropriately
I never use RETURN to provide any data back to the client (e.g. the SCOPE_IDENTITY() value or @@ROWCOUNT). This should be used exclusively for returning stored procedure status, such as ERROR_NUMBER() / @@ERROR. If you need to return data to the caller, use a resultset or an OUTPUT parameter.
======================
Avoiding keyword shorthands
I always use full keywords as opposed to their shorthand equivalents. "BEGIN TRAN" and "CREATE PROC" might save me a few keystrokes, and I'm sure the shorthand equivalents are here to stay, but something just doesn't feel right about it. Same with the parameters for built-in functions like DATEDIFF(), DATEADD() and DATEPART(). Why use WK or DW when you can use WEEK or WEEKDAY? (I also never understood why WEEKDAY become DW in shorthand, instead of WD, which is not supported. DW likely means DAYOFWEEK but that is an ODBC function and not supported directly in T-SQL at all. That in and of itself convinced me that it is better to take the expensive hit of typing five extra characters to be explicit and clear.) Finally, I always explicitly say "INNER JOIN or "LEFT OUTER JOIN"... never just "join" or "left join." Again, no real good reason behind that, just habit.
======================
Using parentheses liberally around AND / OR blocks
I always group my clauses when mixing AND and OR. Leaving it up to the optimizer to determine what "x=5 AND y = 4 OR b = 3" really means is not my cup of tea. I wrote a very short article about this a few years ago.
======================
So, after all of that, given the procedure I listed at the start of the article, what would I end up with? Assuming I am using SQL Server 2008, and that I can update the calling application to use the right procedure name, to use sensible input parameter names, and to stop using return values instead of output parameters:
CREATE PROCEDURE dbo.Customer_GetOlderOrders
  
@OrderStatus        INT,
  
@MaxOrderStatus     INT = NULL,
  
@OrderDate          SMALLDATETIME,
  
@RC1                INT OUTPUT,
  
@RC2                INT OUTPUT
AS
BEGIN
   SET NOCOUNT ON
;

  
DECLARE
      
@City           VARCHAR(32) = 'Beverly',
      
@State          CHAR (2)    = 'MA',
      
@MinOrderCount  INT         = 5;

  
SELECT
      
c.CustomerID,
      
c.FirstName,
      
c.LastName,
      
c.OrderDate
  
FROM
      
dbo.Customers c
  
INNER JOIN
      
dbo.Orders o
      
ON c.CustomerID = o.CustomerID
  
WHERE  
      
(
          
o.OrderStatus       = @OrderStatus
          
OR o.OrderStatus    <= @MaxOrderStatus
      
)
       AND
o.OrderDate         <= @MaxOrderDate;

  
SET @RC1 = @@ROWCOUNT;

  
SELECT
      
c.CustomerID,
      
OrderCount = COUNT(*)
  
FROM
      
dbo.Customers c
  
LEFT OUTER JOIN
      
dbo.Orders o
      
ON c.CustomerID = o.CustomerID
  
WHERE
      
c.City = @City
      
AND c.State = @State
  
GROUP BY
      
c.CustomerID
  
HAVING
      
COUNT(*) >= @MinOrderCount;

  
SET @RC2 = @@ROWCOUNT;

  
RETURN;
END
GO
Okay, so it LOOKS like a lot more code, because the layout is more vertical. But you tell me. Copy both procedures to SSMS or Query Analyzer, and which one is easier to read / understand? And is it worth the three minutes it took me to convert the original query? It took me a few hours to convert this list from my subconscious to you, so hopefully I have helped you pick up at least one good habit. And if you think any of these are BAD habits, please drop a line and let me know why!

Source: