The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Posts Tagged ‘indexes’

Seek & Evaluate Unused Indexes in your SQL database

Posted by Sudarshan Narasimhan on June 16, 2016


This post is Part 3 of my series of posts on Index Maintenance for a DBA. So here goes.

As a regular part of your DBA work, it is very important to keep evaluating the need for indexes in your database environment. Any DBA worth his salt, will always look to “maintain” the indexes by ensuring the fragmentation is kept low. Most of you probably have maintenance tasks/scripts to take care of this. An often under-looked aspect in index maintenance is to also consider whether some indexes are an over-kill a.k.a unnecessary and potentially drop them if it’s not really utilized. This is especially important when dealing with 3rd party vendor databases, where the DBA does not have full control on what comes in and what goes out in terms of index changes.

I like to think of the database as my “house” and take care of it the same way one would do to your own home. You always consider who/what comes in to your house. The same care needs to be applied to the databases you maintain. As gatekeepers to the database, even if you don’t have complete control on what comes in, it is good to be aware & have the stats handy so that you can work with your application teams or vendors, to keep the house tidy.

If you link this data back to my previous posts Part 1 – Seek & Destroy Duplicate Indexes in your SQL Database and Part 2 – Seek & Understand Top Indexes by Size in your SQL database , you will be on your way to hopefully free up some space and realize space cost and performance gains.

Here is a handy script that I use to:-

1) Lists the indexes on each table in the database, that are completely unused.

This scripts lists those indexes where not even a single Read operation has used these indexes. Keep in mind, write operations still need to update this index as data is being updated/inserted contributing to overall time of the Write operations. When no-one is using the index for Read it becomes a system overhead.

2) Lists the indexes on each table in the database, that are having more Writes compared to Reads

The maintenance cost of having these indexes from insert, update, or delete operations on the underlying table are more compared to the benefits for Read operations. These indexes needs to be investigated to see if they are still required. A cost benefit analysis on these indexes needs to be performed.
Note: I’m not advocating that you remove these indexes, but just that you carefully consider the benefits vs the cost, before making any decision.



Script #1 – TOTALLY UNUSED INDEXES

USE [DBNAME] /* Replace with your Database Name */
GO
--TOTALLY UN-USED INDEXES
SELECT DB_NAME(s.database_id) as [DB Name], OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id,
    i.is_disabled, i.is_hypothetical, i.has_filter, i.fill_factor, i.is_unique,
    s.user_updates AS [Total Writes],
    (s.user_seeks + s.user_scans + s.user_lookups) AS [Total Reads],
    s.user_updates - (s.user_seeks + s.user_scans + s.user_lookups) AS [Difference],
    (partstats.used_page_count / 128.0) AS [IndexSizeinMB]
FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
    ON s.[object_id] = i.[object_id]
    AND s.index_id = i.index_id
    AND s.database_id = DB_ID()
INNER JOIN sys.dm_db_partition_stats AS partstats
    ON i.object_id = partstats.object_id AND i.index_id = partstats.index_id
WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
    AND user_updates > (user_seeks + user_scans + user_lookups)
    AND (s.user_lookups=0 AND s.user_scans=0 AND s.user_seeks=0)
    AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC OPTION (RECOMPILE);
GO

Script #1 Result
Here is a sample output from my test system showing all unused indexes on the MSDB database.

UnusedIndexes

 

 

 

 

 

 



Script #2 – INDEXES WITH MORE WRITES THAN READS

USE [DBNAME] /* Replace with your Database Name */
GO
--INDEXES WITH WRITES > READS
SELECT DB_NAME(s.database_id) as [DB Name], OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id,
    i.is_disabled, i.is_hypothetical, i.has_filter, i.fill_factor, i.is_unique,
    s.user_updates AS [Total Writes],
    (s.user_seeks + s.user_scans + s.user_lookups) AS [Total Reads],
    s.user_updates - (s.user_seeks + s.user_scans + s.user_lookups) AS [Difference],
    (partstats.used_page_count / 128.0) AS [IndexSizeinMB]
FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
    ON s.[object_id] = i.[object_id]
    AND s.index_id = i.index_id
    AND s.database_id = DB_ID()
INNER JOIN sys.dm_db_partition_stats AS partstats
    ON i.object_id = partstats.object_id AND i.index_id = partstats.index_id
WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
    AND (s.user_lookups<>0 OR s.user_scans<>0 OR s.user_seeks<>0)
    AND s.user_updates > (s.user_seeks + s.user_scans + s.user_lookups)
    AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC OPTION (RECOMPILE);
GO

Script #2 Result
Here is a sample output from my test system showing all indexes having more Writes compared to Reads on the MSDB database.

WritesMoreThanReadsIndexes

 

 

 

 

 

Well, that’s all folks.

-Sudarshan (TheSQLDude)

Advertisements

Posted in DB Maintenance, Indexing, T-SQL | Tagged: , , , , , , | 1 Comment »

Seek & Understand Top Indexes by Size in your SQL database

Posted by Sudarshan Narasimhan on June 3, 2016


This post is Part 2 of my series of posts on Index Maintenance for a DBA. So here goes.

Often, when you are performing capacity planning or upgrades or working with 3rd party vendor on database schema upgrades, you need to look into the existing indexes on the database. One of the key things you need to consider is disk space. Aside from the actual “data” in the tables, indexes also occupy space depending on the type of index. Here is a handy script that I use to quickly understand how many indexes are present for each table and the size of each index, in any given SQL database.

If you link this data back to my post “Seek & Destroy Duplicate Indexes in your SQL Database“, you will be on your way to hopefully free up some space and realize space cost savings.

This query only reports on user objects (object_id>99) and filters out HEAP tables. There are some exceptions to the filter on user objects (like service broker tables, CDC tables etc) which will still be considered as “user objects” though it supports a system functionality.

USE [msdb]  /*Replace with your Database Name */
GO
SELECT 
    DB_NAME() DatabaseName,
    OBJECT_SCHEMA_NAME(i.OBJECT_ID) SchemaName,
    OBJECT_NAME(i.OBJECT_ID ) TableName,
    i.name IndexName,
    i.type_desc as [IndexType],
    i.is_primary_key as [PrimaryKeyIndex],
    i.is_unique as [UniqueIndex],
    i.is_disabled as [Disabled],
    SUM(s.used_page_count) / 128.0 IndexSizeinMB,
    CAST((SUM(s.used_page_count)/128.0)/1024.0 as decimal(10,6))  IndexSizeinGB
FROM sys.indexes AS i
INNER JOIN sys.dm_db_partition_stats AS S
    ON i.OBJECT_ID = S.OBJECT_ID AND I.index_id = S.index_id
WHERE i.object_id>99 and i.type_desc  'HEAP'
GROUP BY i.OBJECT_ID, i.name, s.used_page_count, i.type_desc, i.is_primary_key, i.is_unique, i.is_disabled
ORDER BY s.used_page_count DESC
GO

 

Script Result
Here is a sample output from my test system showing all the indexes on MSDB database.

Index Size

 

Stay tuned for more index related scripts in this series.

-Sudarshan (TheSQLDude)

Posted in DB Maintenance, Indexing, T-SQL | Tagged: , , , | 1 Comment »

T-SQL Script to find the size of every index in the database

Posted by Sudarshan Narasimhan on December 15, 2012


I wrote this query to find the size of all indexes in the database for one of my customers. Thought I’d share this script with everyone. Cheers.


CREATE PROCEDURE uspIndexSizeinDB
AS 
-- Author: Sudarshan (TheSQLDude.com)
-- Data Created: Dec 15, 2012
-- Description: Calculate the SQL Server index size for all indexes in the current database
  BEGIN 
    DECLARE  @IndexSize BIGINT, 
             @IndexID   INT,
             @IndexName nvarchar(200),
             @IndexType nvarchar(50),
             @ObjectID  INT
     
    SET @IndexSize = 0
    
    create table #tmpresults (ObjName nvarchar(100), IndexID int, IndexName nvarchar(200),[IndexSize(MB)] int, IndexType nvarchar(50))
    
    DECLARE curIndex CURSOR  FOR 
    SELECT sysind.object_id, sysind.index_id, sysind.name, sysind.type_desc 
    FROM sys.indexes sysind 
    join sys.sysobjects sysobj
    on sysind.object_id = sysobj.id
    where sysind.type>0 and sysobj.type not in ('S','IT')
    order by sysobj.id asc
   
    OPEN curIndex 
     
    FETCH NEXT FROM curIndex 
    INTO @ObjectID, @IndexID, @IndexName, @IndexType
     
    WHILE (@@FETCH_STATUS = 0) 
      BEGIN 
        SELECT @IndexSize = sum(avg_record_size_in_bytes * record_count) 
        FROM   sys.dm_db_index_physical_stats(DB_ID(),@ObjectID, @IndexID, NULL, 'detailED') 
         
        insert into #tmpresults (ObjName, IndexID, IndexName, [IndexSize(MB)], IndexType)
        SELECT TableName = OBJECT_NAME(@ObjectID), 
               IndexID = @IndexID, 
               IndexName = @IndexName,
               [IndexSize(MB)] = CONVERT(DECIMAL(16,1),(@IndexSize / (1024.0 * 1024))), IndexType = @IndexType
         
        FETCH NEXT FROM curIndex 
        INTO @ObjectID, @IndexID, @IndexName, @IndexType
      END 
    CLOSE curIndex 
    DEALLOCATE curIndex 
    
    select UPPER(ObjName) as ObjectName, IndexID, IndexName, IndexType, [IndexSize(MB)]  from #tmpresults order by [IndexSize(MB)] desc
  END 
GO

Posted in DB Maintenance, T-SQL | Tagged: , , , , , , | 2 Comments »

IO Cost in an execution plan–What it actually means?

Posted by Sudarshan Narasimhan on May 11, 2012


I/O is probably one the slowest of the resources. The information below gives a quick meaning of what these operators mean when you look at them in an execution plan. This specific concentrates on the “Estimated I/O Cost” value in an execution plan. I’ll post some other day in detail on how to decipher a SQL Server query execution plan.

Thanks to Niraj Mehta for putting together this content.

Table Scan

  • The total number of data pages in the table

Clustered Index Scan

  • The number of levels in the index plus the number of data pages to scan (data pages = #rows / #rows per page)

Non-Clustered Index Seek on a Heap (Bookmark Lookup)

  • The number of levels in the index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows (1 I/O for each row on the heap)

Non-Clustered Index (I1) Seek on a Clustered index (I2) (Bookmark Lookup)

  • The number of levels in the I1 index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows times the cost of searching for a clustered index (I2) key

Covering Non-Clustered index

  • The number of levels in the index plus the number of leaf index pages to read for qualifying rows (#qualifying rows / # rows per leaf page).

Posted in Performance | Tagged: , , , , , , , , , , | Leave a Comment »

Shedding light on SQL Server Statistics and Auto Update Statistics

Posted by Sudarshan Narasimhan on May 4, 2012


SQL Server collects statistical information about indexes and column data in the database. These statistics play a very important role in coming up with an efficient query plan. Statistical information is used by the Query Optimizer when it decides to choose Seek vs. Scan or using Index A vs. Index B, etc. So it’s important as a DBA to pay careful attention to statistics and making sure they are updated and maintained well.

Just like Chandler Bing’s job is “Statistical analysis and data reconfiguration”, its a good DBA’s job to take care of maintaining the statistics Smile.


SQL Server 2005 statistics features allow you to:

  • Implicitly create and update statistics with the default sampling rate (in the SELECT, INSERT, DELETE, and UPDATE commands, use of a column in a query condition such as a WHERE or JOIN clause causes statistics to be created or updated if necessary when automatic statistics creation and update is enabled)
  • Manually create and update statistics with any desired sampling rate, and drop statistics (CREATE STATISTICS, UPDATE STATISTICS, DROP STATISTICS, CREATE INDEX, DROP INDEX)
  • Manually create statistics in bulk for all columns of all tables in a database (sp_createstats)
  • Manually update all existing statistics in the database (sp_updatestats)
  • List statistics objects that exist for a table or database (sp_helpstats, catalog views sys.stats, sys.stats_columns)
  • Display descriptive information about statisticsobjects (DBCC SHOW_STATISTICS)
  • Enable and disable automatic creation and update of statistics database-wide or for a specific table or statistics object (ALTER DATABASE options AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, sp_autostats, and the NORECOMPUTE option on CREATE STATISTICS and UPDATE STATISTICS)
  • Enable and disable asynchronous automatic update ofstatistics (ALTER DATABASE option AUTO_UPDATE_STATISTICS_ASYNC

Behaviour change from SQL 2005 onwards

In SQL Server 2000, statistics update was determined by the number of row changes. Now, changes are tracked at the column level, and auto update of statistics can now be avoided on columns which have not changed enough to warrant statistics update.  This is a behaviour change between SQL 2000 and SQL 2005 onwards when it comes to auto update statistics. SQL Server 2005 and onwards determines whether to update statistics based on the colmodctrs (column modification counters) value.

A statistics object is considered out of date in the following cases:

1. The table size has gone from 0 to > 0 rows.
2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
4. If the statistics object is defined on a temporary table, it is out of date as above, except that there is an additional threshold Recomputation after insertion of 6 rows.


Q: How is the COLMODCTR maintained by SQL Server?
Ans: The colmodctr values that SQL Server keeps track of are continually modified as the data in the table changes. Depending on the amount of INSERT/UPDATE/DELETE/TRUNCATE/BULK OPS that are occurring on the table, this counter value keeps changing. The below table, describes what the effect on colmodctr value are for each operation.

Cc293623.table_C05621961_2(en-us,TechNet.10).jpg
Reference: http://technet.microsoft.com/en-us/library/cc293623.aspx

 

Controlling STATISTICS Behaviour

You have options like AUTO_UPDATE_STATISTICS, AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC etc, which are all database level options. So you need to configure this per each database. You can find if your user databases have these options by using sp_helpdb and looking at the status column or by using a query like below.

select name as [DB_NAME], is_auto_create_stats_on, is_auto_update_stats_on, is_auto_update_stats_async_on 
from sys.databases

How to find out which indexes or statistics needs to be updates?

You can use the following query on any SQL 2005+ instance to find out the % of rows modified and based on this decide if any indexes need to be rebuilt or statistics on the indexes need to be updated.

select
schemas.name as table_schema,
tbls.name as Object_name,
i.id as Object_id,
i.name as index_name,
i.indid as index_id,
i.rowmodctr as modifiedRows,
(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,
convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,
stats_date( i.id, i.indid ) as lastStatsUpdateTime
from sysindexes i
inner join sysobjects tbls on i.id = tbls.id
inner join sysusers schemas on tbls.uid = schemas.uid
inner join information_schema.tables tl
on tbls.name = tl.table_name
and schemas.name = tl.table_schema
and tl.table_type='BASE TABLE'
where 0 < i.indid and i.indid < 255
and table_schema <> 'sys'
and i.rowmodctr <> 0
and i.status not in (8388704,8388672)
and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0
order by modifiedRows desc

Updating Statistics on all the table in any given database

I often get this often as to how you update all the tables in a database. You can use a script like below to achieve this.

Disclaimer: Do not run this unless you know its implications on a production server. Statistics update on all the tables will use CPU resources and depending on the size of the table take its own time.

use <YOUR_DB_NAME>
go
select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type='base table'
declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)
set @count=1
select @maxcount=count(*) from table_count
while @count < @maxcount+1
begin 
      select @tblname=table_name from table_count where rownum=@count
      set @stmt = 'UPDATE STATISTICS '+ '[' +@tblname+ ']' + ' WITH FULLSCAN'
      PRINT ('Updating statistics for table :'+@tblname)
      EXEC(@stmt)
      PRINT ('Finished Updating statistics for table :'+@tblname)
      print ''
      set @count=@count+1
      set @stmt=''
End
drop table table_count

Some tips on Statistics

1. Table variables do not have statistics at all.

Table variables are meant for operations on a small number of rows, a few thousand rows at max. This is a good scenario where you need to think about temporary tables (#tbl), because unlike table variables, temp tables can have indexes created on them, which means they can have statistics.

2. Multi-Statement Table Value Functions (TVF’s) also do not have any statistics

So if you have a complex query logic implemented in a function in SQL Server, think again! This function does not have any statistical information present, so the SQL optimizer must guess the size of the results returned. The reason for this is a multi-statement TVF returns you a TABLE as an output and table does not have any statistics on it.

3. You can find out from the Execution Plan aka SET STATISTICS PROFILE statement if any statistics would help a particular query

When you enable STATISTICS PROFILE ON and execute any query/batch it displays the execution plan. In this output look for the column called “Warnings”. During the course of compiling the plan, if the SQL Server optimizer felt that some statistics on column A would have helped the query, it displays this warning in the execution plan as “NO STATS”. If you see any such warning, consider creating some column statistics or indexes on the particular object in the row.

4. Avoid creating indexes on very frequently updated columns as the statistics also will have to keep up with the amount of data modifications.

5. Viewing Statistics

You can use DBCC SHOW_STATISICS (‘tablename’ , ‘index name’) to view the statistics on any given index’/column stats along with the histogram. The system DMV sys.stats stores information on each statistics available in a particular database.

Any statistics having the name prefixed as _WA_Sys_ is a auto-created statistics, which means SQL Server itself created them. User created statistics will have a given name or have the index name, e.g. PK_TBL1

Posted in Performance | Tagged: , , , , , , , , | Leave a Comment »

T-SQL Script for SQL Server Statistics Last Updated Time & Percent Rows modified

Posted by Sudarshan Narasimhan on August 4, 2011


I wrote a post on the same topic @ Amit’s Blog. I made some changes to the script to make it easier for DBA’s to read the output. I am re-posting this here for your reference. This can be quite handy if you are managing multiple SQL instances and you have lot of tables or have large databases where you only want to update statistics on some tables based on row modifications (data changes e.g. OLTP). You can always use DBCC SHOW_STATISTICS or SP_AUTOSTATS to find out the same information for a specific table, index or column stats.

select

schemas.name as table_schema,

tbls.name as Object_name,

i.id as Object_id,

i.name as index_name,

i.indid as index_id,

i.rowmodctr as modifiedRows,

(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,

convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,

stats_date( i.id, i.indid ) as lastStatsUpdateTime

from sysindexes i

inner join sysobjects tbls on i.id = tbls.id

inner join sysusers schemas on tbls.uid = schemas.uid

inner join information_schema.tables tl

on tbls.name = tl.table_name

and schemas.name = tl.table_schema

and tl.table_type=‘BASE TABLE’

where 0 < i.indid and i.indid < 255

and table_schema <> ‘sys’

and i.rowmodctr <> 0

and i.status not in (8388704,8388672)

and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0

order by modifiedRows desc


Keep in mind that Rowmodctr values displayed in above output are not entirely reliable starting with SQL 2005 onwards. But, you can rely on last updated time to make a decision if the statistics need to be updated. Read here for more info on this.

 

 

You can use this script to generate the UPDATE STATISTICS command for each table in your table. I am using FULL SCAN by default, which you can of course modify to use specific SAMPLE. You have to un-comment the comments lines, to actually perform the update.

 

select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type=‘base table’

declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)

set @count=1

select @maxcount=count(*) from table_count

while @count < @maxcount+1

begin

      select @tblname=table_name from table_count where rownum=@count

      set @stmt = ‘UPDATE STATISTICS ‘+ ‘[‘ +@tblname+ ‘]’ + ‘ WITH FULLSCAN’

      print @stmt

      –PRINT (‘Updating statistics for table :’+@tblname)

      –EXEC(@stmt)

      –PRINT (‘Finished Updating statistics for table :’+@tblname)

      print

      set @count=@count+1

      set @stmt=

End

drop table table_count


Have fun with your database maintenanceSmile. As always stay tuned for more cool SQL stuff.

Posted in DB Maintenance | Tagged: , , , , , | 4 Comments »