The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Archive for the ‘Query Optimization’ Category

Seek & Destroy Duplicate Indexes in your SQL Database

Posted by Sudarshan Narasimhan on May 20, 2016


This post is Part 1 of my series of upcomingposts on Index Maintenance for a DBA. So here goes.

There are all sorts of indexing tuning & optimizations that can be done on a SQL database. One of the no-brainer things a DBA must do is “seek & destroy” (pun intended) duplicate indexes. Duplicate indexes are indexes exactly similar in structure, including key columns and their base tables. These indexes serve no useful purpose and in fact contribute to slowness for WRITE database operations as the WRITER (INSERT/UPDATE/DELETE) must do more work to update the index as well as this directly contributes to the slowness of a write operation. If AlwaysON or other HA, then it has to do this on all the replicas as well, adding additional unnecessary network overhead.

You’d be surprised at the results. Given this script a run on any of your databases especially Vendor application databases. If you do see any duplicates, time to talk to your vendor or DB developer.

Here is query that I use to identify duplicate indexes. This query considers 2 or more indexes to be Exact Duplicates when all the key columns & included columns (if any) match another index on the same table.

USE [tempdb]  /* Change to your DB Name */
GO
DECLARE @i int=1, @count int=0, @tblname nvarchar(250), @indxname nvarchar(250), @cmd nvarchar(4000)
DECLARE @DuplicateIndexes TABLE(id int IDENTITY(1,1), table_name nvarchar(250), index_name nvarchar(250), dup_index_name nvarchar(250))
DECLARE @IndexInfo TABLE(id int IDENTITY(1,1), table_name nvarchar(250), index_name nvarchar(250), index_description nvarchar(250),index_keys nvarchar(250), included_columns nvarchar(250) null)

-- exact duplicates
;WITH indexcols AS
(
    select object_id as id, index_id as indid, name,
    (select case keyno when 0 then NULL else colid end as [data()]
    from sys.sysindexkeys as k
    where k.id = i.object_id
    and k.indid = i.index_id
    order by keyno, colid
    for xml path('')) as cols,
    (select case keyno when 0 then colid else NULL end as [data()]
    from sys.sysindexkeys as k
    where k.id = i.object_id
    and k.indid = i.index_id
    order by colid
    for xml path('')) as inc
    from sys.indexes as i
),
DuplicateIndexes AS
(
    SELECT
    object_schema_name(c1.id) + '.' + object_name(c1.id) as [TableName],
    c1.name as [IndexName],
    c2.name as [DuplicateIndexName]
    FROM indexcols as c1
    join indexcols as c2
    ON c1.id = c2.id
    and c1.indid < c2.indid
    and c1.cols = c2.cols
    and c1.inc = c2.inc
)
INSERT INTO @DuplicateIndexes
SELECT * FROM DuplicateIndexes

SELECT @count=count(*) FROM @DuplicateIndexes
WHILE (@i <= @count)
BEGIN
    SELECT @tblname = [table_name], @indxname = [index_name] FROM @DuplicateIndexes WHERE [id] = @i
    DECLARE @tmptbl TABLE (index_name nvarchar(250), index_description nvarchar(250),index_keys nvarchar(250))
    DELETE FROM @tmptbl
    SET @cmd = 'EXEC sp_helpindex ''' +@tblname+ ''''

    INSERT INTO @tmptbl (index_name, index_description, index_keys)
        EXEC (@cmd)

    INSERT INTO @IndexInfo (table_name, index_name, index_description, index_keys)
    SELECT @tblname, index_name, index_description, index_keys FROM @tmptbl

    DECLARE @incld_cols sysname
    SET @cmd = '
    SELECT @tmpstr=b.IncludedColumns
    FROM (
    SELECT LEFT(a.indx_cols_included, LEN(a.indx_cols_included)-1) as [IncludedColumns]
    FROM (
    SELECT STUFF (( SELECT C.Name + '','' AS [text()] 
    FROM sys.indexes I 
     INNER JOIN sys.index_columns IC 
      ON  I.object_id = IC.object_id AND I.index_id = IC.index_id 
     INNER JOIN sys.columns C 
      ON IC.object_id = C.object_id and IC.column_id = C.column_id 
    WHERE IC.object_id=object_id('''+@tblname+''') and I.name = '''+@indxname+''' and IC.is_included_column=1
    FOR XML PATH('''')
    ), 1, 0, '''') AS indx_cols_included) as a) as b'
    
    EXEC sp_executesql @cmd, N'@tmpstr sysname OUTPUT', @incld_cols OUTPUT
    UPDATE @IndexInfo SET included_columns = @incld_cols WHERE [id]=@i

SET @i = @i + 1
END

--DUPLICATE INDEX INFORMATION SUMMARY
SELECT DB_NAME() AS [DB Name], d.table_name as [TableName], d.index_name as [IndexName], d.dup_index_name as [DuplicateIndexName], i.index_keys as [IndexKeyColumns], i.included_columns as [IncludedColumns]
FROM @DuplicateIndexes d
JOIN @IndexInfo i
    ON d.table_name = i.table_name
    AND d.index_name = i.index_name
GO

 

Script Result
Here is a sample output from my test system where I’ve created 2 indexes having the same structure
DuplicateIndexes

 

 

 

Note: The original source script was from HERE. I’ve modified & added to this to include index key column information as well as included columns to present a more clear output on the duplicate indexes.

 

Stay tuned for more index related scripts in this series.

-Sudarshan (TheSQLDude)

Advertisement

Posted in DB Maintenance, Indexing, Query Optimization, T-SQL | 4 Comments »

A scenario on Query Tuning–Included Columns are a good thing!

Posted by Sudarshan Narasimhan on December 20, 2012


Well, this post has been due for quite some time. I had opened up a survey/opinion poll on https://thesqldude.com/2012/07/10/user-opinion-poll-what-topics-would-you-like-to-see-posted-on-thesqldude-weblog/ sometime back and as expected most of you out there want to see more on performance tuning and query optimization.

image

  Figure 1: Opinion Poll Results

Without much ado and by popular demand here goes a simple scenario on query tuning.

Tuning


One of my engineers came to me with the following scenario…

Problem Statement

We have this Update query that is taking much longer to run compared to before. No schema changes have been made though its quite possible data changes have been happening. The update query goes like this,

update msgboard set mb_status =10, mb_changed_by ='SATCOM'  
from msgboard 
where mb_type = 210 

Table Schema

I was easily able to reproduce this issue on my instance, here is how my table setup looks.

/****** Object:  Table [dbo].[msgboard]    Script Date: 12/20/2012 01:52:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[msgboard](
    [mb_id] [int] NOT NULL,
    [mb_type] [int] NOT NULL,
    [mb_status] [int] NOT NULL,
    [mb_changed_by] [varchar](50) NULL,
    [mb_changed_dt]  AS (getdate())
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
CREATE UNIQUE CLUSTERED INDEX [pk_msgboard] ON [dbo].[msgboard] 
(
    [mb_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 100) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_msgboard_type] ON [dbo].[msgboard] 
(
    [mb_type] ASC,
    [mb_status] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

I populated this table with some sample data of 1000 rows using the following script.

declare @counter int
set @counter=0
declare @mbid int
set @mbid=1
declare @mbtype int
set @mbtype=200
declare @mbstatus int
set @mbstatus=10
while(@counter<1000)
begin
INSERT INTO [dbo].[msgboard]
           ([mb_id]
           ,[mb_type]
           ,[mb_status]
           ,[mb_changed_by])
     VALUES
           (@mbid,@mbtype,@mbstatus,'THESQLDUDE')
set @mbid+=1
set @counter+=1
end
GO
INSERT INTO [test].[dbo].[msgboard]
           ([mb_id]
           ,[mb_type]
           ,[mb_status]
           ,[mb_changed_by])
     VALUES
           (1001,210,10,'MANWITHNONAME')

So, I have 2 indexes on this table:-

1) A unique Clustered Index pk_msgboard on the mb_id column

2) A composite non-clustered index IX_msgboard_type on the columns mb_type and mb_status (in this order).

Query Execution Details

Here is what I observed when I ran the following update statement.

update msgboard set mb_status =10, mb_changed_by ='SATCOM'  
from msgboard 
where mb_type = 210 

To actual execution plan looks like this,

image

Total Cost: 0.03124485

As you can see, this query is using the Clustered Index pk_msgboard and its performing a Scan. At this point, two questions should pop into your head

1. Why is the query using the clustered index?

2. Why is it performing a Scan operation?

Answer to question #1

Let’s look at the update query carefully. The query is updating the columns mb_status and mb_changed_by, and there is a filter on column mb_type. You might think, well, I have an index IX_msgboard_type on the mb_type column, why is SQL Server not using this non-clustered index?

To prove I was smarter than SQL Server optimizer, I forced it to use the non-clustered index by adding an index hint. The query looks like this,

update msgboard set mb_status =10, mb_changed_by ='SATCOM'  
from msgboard WITH(INDEX(IX_msgboard_type))
where mb_type = 210 

image

Total Cost: 0.03651106

  • Notice that the Cost for the query with index hint is higher than the one without! If the table had more rows, we would have seen a bigger difference in cost Smile.
  • Additionally, a Table Spool operator is now part of the execution plan. The table spool operator caches the row to be updated (In tempdb) and then feeds it to the Clustered Index update operator.
  • This proves that SQL Server optimizer was actually choosing the cheapest and best plan it could come up with.

The reason SQL Server picked the clustered index is because the query is updating the columns mb_status =10 and mb_changed_by. This column mb_status is not the leading column in the NC index IX_msgboard_type, but the 2nd column. Why does this matter? It does because, the statistics for this NC index would be built on the leading column which is mb_type. Here is how the histogram looks.

dbcc show_statistics ('dbo.msgboard','IX_msgboard_type')

image

To avoid a bookmark lookup (Key Lookup operator from SQL 2005 onwards), SQL Server decided to use the clustered index since it covers all the columns in the table. If SQL Server were to use the NC index, it would be do a lookup or cache the results (ahem, Table Spool operator!) and then find the matching row in the clustered index to actually update the 2 columns. This is an additional cost, and this is why SQL decided not to go with this plan.

Coming to the 2nd question of why a Scan? It might surprise some of you, but a Seek does not imply good performance always. My favourite answer – “It depends”. In this case, the table only had 1001 rows and of which only 1 row (mb_id=1001) qualified for the update. When using the CI Scan, SQL Server was applying the filter on mb_type=210 and that returned 1 row back. A Clustered Index Seek in this case would not make any positive difference in query execution time.

OBJECT:([test].[dbo].[msgboard].[pk_msgboard]), WHERE:([test].[dbo].[msgboard].[mb_type]=(210)) ORDERED

To test this, I added the FORCESEEK table hint to this update query and as expected the query did not run faster.

update msgboard set mb_status =10, mb_changed_by ='SATCOM'  
from msgboard WITH (FORCESEEK) where mb_type = 210 

In fact it has the same cost as the query with index hint –> 0.03651106
 

Solution

Instead of trying to second guess the query optimizer and trying to tweak the execution plan it is producing, this issue needs a different approach –> back to basics. Based on the query design, if the column being updated mb_status was “covered” then it would help. In this case that was also also take care of, since IX_msgboard_type also includes the column mb_status, though as a non-key column. This is an important thing and this was why the “included columns” feature was introduced.

So, I modified the index IX_msgboard_type to remove the column mb_status. Instead, I added the column back as an included column

DROP INDEX [IX_msgboard_type] ON [dbo].[msgboard] WITH ( ONLINE = OFF )
GO
CREATE NONCLUSTERED INDEX [IX_msgboard_type] ON [dbo].[msgboard] 
(
    [mb_type] ASC
)
INCLUDE ( [mb_status]) WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

The biggest advantage of having an included column in a non-clustered index is that, the query optimizer can locate all the column values within the index. The base table or clustered index data is not accessed resulting in fewer disk I/O operations and hence faster query execution. Also, the index size is lesser since non-key columns are only added to the leaf level. Many advantages indeed.

Here is how the update query plan looked after added mb_status as an included column.

image

My oh my, this plan looks so simple, clean and yeah faster. The actual cost is now 0.0132843, which is a big improvement over the original plan of 0.03651106

So we have achieved our original goals of

a) Getting SQL Server to use the Non-clustered index IX_msgboard_type, instead of the clustered index.

b) Getting an Seek operation instead of a Scan.

c) Query cost is cheaper.



So, hope everyone understood how beneficial included columns are on non-clustered indexes. For more information on guidelines and restriction with included columns refer this MSDN article.

That’s all for now folks. Keep tuned to this site for more.

-TheSQLDude (Sudarshan)

Posted in Performance, Query Optimization | Tagged: , , , , , , , , , , | 6 Comments »