The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Archive for May, 2012

Monitoring tempdb space usage and scripts for finding queries which are using excessive tempdb space

Posted by Sudarshan Narasimhan on May 15, 2012


Many times during the life of a DBA, you might notice the tempdb database growing excessively, though no changes have recently been done. It’s often the case that due to data increase, the application T-SQL queries are not written to scale up, hence end up doing excessive sorting/hashing operations which consume space from your tempdb database. Here are some T-SQL scripts that you can use to monitor who/what is consuming space from tempdb and plan accordingly.

Before we get into identifying queries that use tempdb, it is very important to understand what all activities in SQL Server (both internal & user activities), which use the tempdb database. Broadly you can classify these into 3 categories:-

  1. Internal Objects
  2. Version Stores
  3. User Objects

From a feature perspective, here are the features in SQL Server that use space from tempdb.

  1. Query
  2. Triggers
  3. Snapshot isolation and read committed snapshot (RCSI)
  4. MARS
  5. Online index creation
  6. Temporary tables, table variables, and table-valued functions
  7. DBCC CHECK
  8. LOB parameters
  9. Cursors
  10. Service Broker and event notification
  11. XML and LOB variables
  12. Query notifications
  13. Database mail
  14. Index creation
  15. User-defined functions

From a query performance standpoint, here are some operators that use tempdb space.

1. Sort Operator : The sort operator needs tempdb space to sort the full rowset of incoming rows. This is usually send when user ODER BY and also for DISTINCT ORDER BY

        image

2. Hash Match Operator: Depending on the size of row, a hash table could use tempdb

      image

3. Spool Operator: This operator is used to save the intermediate set of rows for re-use and uses the tempdb database to save the query result set.

         image

Tempdb out of space error

Error: 1105, Severity 17, State 2
Could not allocate space for object dbo.TBL1  in database ‘tempdb’ because the ‘PRIMARY’ filegroup is full.

 

Identify which type of tempdb objects are consuming  space

The following query helps you understand if user objects or version store or internal objects are the ones using the space in tempdb. According to this output, you can focus on the below sections.

SELECT
SUM (user_object_reserved_page_count)*8 as user_obj_kb,
SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
SUM (version_store_reserved_page_count)*8  as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb,
SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage
If user_obj_kb is the highest consumer, then you that objects are being created by user queries like local or global temp tables or table variables. Also don’t forget to check if there are any permanent 
tables created in TempDB. Very rare, but I’ve seen this happening.
If version_store_kb is the highest consumer, then it means that the version store is growing faster than the clean up. Most likely there are long running transactions or open transaction (Sleeping state), 
which are
preventing the cleanup and hence not release tempdb space back.



Query that identifies the currently active T-SQL query, it’s text and the Application that is consuming a lot of tempdb space

SELECT es.host_name , es.login_name , es.program_name,
st.dbid as QueryExecContextDBID, DB_NAME(st.dbid) as QueryExecContextDBNAME, st.objectid as ModuleObjectId,
SUBSTRING(st.text, er.statement_start_offset/2 + 1,(CASE WHEN er.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max),st.text)) * 2 ELSE er.statement_end_offset 
END - er.statement_start_offset)/2) as Query_Text
, tsu.session_id ,tsu.request_id, tsu.exec_context_id, (tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count) as OutStanding_user_objects_page_counts, (tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count) as OutStanding_internal_objects_page_counts, er.start_time, er.command, er.open_transaction_count, er.percent_complete, er.estimated_completion_time, er.cpu_time, er.total_elapsed_time, er.reads,er.writes,
er.logical_reads, er.granted_query_memory FROM sys.dm_db_task_space_usage tsu inner join sys.dm_exec_requests er ON ( tsu.session_id = er.session_id and tsu.request_id = er.request_id
) inner join sys.dm_exec_sessions es ON ( tsu.session_id = es.session_id ) CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) st WHERE (tsu.internal_objects_alloc_page_count+tsu.user_objects_alloc_page_count) > 0 ORDER BY (tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count)+(tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count)
DESC



Tempdb and the Version Store

The version stored (SQL 2005 onwards) is a collection of objects that are used when Snapshot Isolation or Read-Committed Snapshot Isolation (RCSI) or online index rebuild etc. are used in a database.

Version store contains the committed rows which is how a SELECT operation does not get blocked when another UPDATE/DELETE is operating on the same row, because the SELECT reads the row from the version store, instead of the actual base table. When you enable this, the row has to be stored somewhere and tempdb happens to be the place. A row is maintained in the version store when there are transactions operating on that row in questions. When the transaction is committed, the row is cleaned up from the version store tables.

You can check the version store using the DMV sys.dm_tran_version_store

At times, when there are long running transactions or orphaned transactions, you might notice tempdb growth due to the version store.

You can use the following query to find the oldest transactions that are active and using row versioning.

SELECT top 5 a.session_id, a.transaction_id, a.transaction_sequence_num, a.elapsed_time_seconds,
b.program_name, b.open_tran, b.status
FROM sys.dm_tran_active_snapshot_database_transactions a
join sys.sysprocesses b
on a.session_id = b.spid
ORDER BY elapsed_time_seconds DESC

Trace Flag 1118

This trace flag is available starting with SQL 2000 SP3 to reduce tempdb contention by forcing uniform extent allocations as opposed to mixed extent allocations. This trace flag is only to be used if you seeing contention (wait_Stats) on the PFS/GAM pages like 2:1:1 etc.. More internal details on this trace flag is available in Paul Randal’s blog post here.

Not only does enabling the trace flag help but you need to create multiple tempdb files equal to the number of logical processors. So if you have 4 CPU’s you will create 4 tempdb data files. Now, what if you have 16 or 32 processors, do you still need to create that many tempdb files?

The answer is NO, you don’t have to. The above recommendation has been stated in many KB articles like http://support.microsoft.com/default.aspx?scid=kb;EN-US;328551

If the number of logical processors on your server is greater than or equal to 8, then use 8 data files for tempdb. If the number of logical processors is less than 8, then use as many data files as your processor count.

You can use the following against any of the SQL Servers you manage to find out if any change is required in the tempdb data files to reduce contention and improve general performance.

Declare @tempdbfilecount as int;
select @tempdbfilecount = (select count(*) from sys.master_files where database_id=2 and type=0);
WITH Processor_CTE ([cpu_count], [hyperthread_ratio])
AS
(
      SELECT  cpu_count, hyperthread_ratio
      FROM sys.dm_os_sys_info sysinfo
)
select Processor_CTE.cpu_count as [# of Logical Processors], @tempdbfilecount as [Current_Tempdb_DataFileCount], 
(case 
      when (cpu_count<8 and @tempdbfilecount=cpu_count)  then 'No' 
      when (cpu_count<8 and @tempdbfilecount<>cpu_count and @tempdbfilecount<cpu_count) then 'Yes' 
      when (cpu_count<8 and @tempdbfilecount<>cpu_count and @tempdbfilecount>cpu_count) then 'No'
      when (cpu_count>=8 and @tempdbfilecount=cpu_count)  then 'No (Depends on continued Contention)' 
      when (cpu_count>=8 and @tempdbfilecount<>cpu_count and @tempdbfilecount<cpu_count) then 'Yes'
      when (cpu_count>=8 and @tempdbfilecount<>cpu_count and @tempdbfilecount>cpu_count) then 'No (Depends on continued Contention)'
end) AS [TempDB_DataFileCount_ChangeRequired]
from Processor_CTE;

 

Here is a sample output. As you can see I have 8 processors and only 1 tempdb file. So I need to add 7 more files.

clip_image002


Last point before I wrap up this post. Once your tempdb database or log file is full, you have these options:-

1. Either you have to rollback any transactions consuming tempdb space or kill the transactions (not a good idea).

2. Create additional tempdb files in other drives which have free space, while you dig around to find the culprit who is growing tempdb.

3. Restart your SQL Server service.

Have fun working with tempdb. Here are some good references

Working with tempdb – http://technet.microsoft.com/hi-in/library/cc966545(en-us).aspx

Storage Engine Blog – http://blogs.msdn.com/b/sqlserverstorageengine/archive/2009/01/12/tempdb-monitoring-and-troubleshooting-out-of-space.aspx

Posted in Performance, T-SQL | Tagged: , , , , , , , , , | 18 Comments »

www.TheSQLDude.com is now officially released. Rush to your nearest web browser, to catch the action!

Posted by Sudarshan Narasimhan on May 12, 2012


Folks,

TheSQLDude.com is now live and all the older blog posts have been migrated to the new domain name. Existing/Older links will be re-directed to the new site name. As always, I leave you with another of Dilbert’s very best…
 

Image

Posted in SQL News | Tagged: , , , , | Leave a Comment »

IO Cost in an execution plan–What it actually means?

Posted by Sudarshan Narasimhan on May 11, 2012


I/O is probably one the slowest of the resources. The information below gives a quick meaning of what these operators mean when you look at them in an execution plan. This specific concentrates on the “Estimated I/O Cost” value in an execution plan. I’ll post some other day in detail on how to decipher a SQL Server query execution plan.

Thanks to Niraj Mehta for putting together this content.

Table Scan

  • The total number of data pages in the table

Clustered Index Scan

  • The number of levels in the index plus the number of data pages to scan (data pages = #rows / #rows per page)

Non-Clustered Index Seek on a Heap (Bookmark Lookup)

  • The number of levels in the index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows (1 I/O for each row on the heap)

Non-Clustered Index (I1) Seek on a Clustered index (I2) (Bookmark Lookup)

  • The number of levels in the I1 index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows times the cost of searching for a clustered index (I2) key

Covering Non-Clustered index

  • The number of levels in the index plus the number of leaf index pages to read for qualifying rows (#qualifying rows / # rows per leaf page).

Posted in Performance | Tagged: , , , , , , , , , , | Leave a Comment »

Shedding light on SQL Server Statistics and Auto Update Statistics

Posted by Sudarshan Narasimhan on May 4, 2012


SQL Server collects statistical information about indexes and column data in the database. These statistics play a very important role in coming up with an efficient query plan. Statistical information is used by the Query Optimizer when it decides to choose Seek vs. Scan or using Index A vs. Index B, etc. So it’s important as a DBA to pay careful attention to statistics and making sure they are updated and maintained well.

Just like Chandler Bing’s job is “Statistical analysis and data reconfiguration”, its a good DBA’s job to take care of maintaining the statistics Smile.


SQL Server 2005 statistics features allow you to:

  • Implicitly create and update statistics with the default sampling rate (in the SELECT, INSERT, DELETE, and UPDATE commands, use of a column in a query condition such as a WHERE or JOIN clause causes statistics to be created or updated if necessary when automatic statistics creation and update is enabled)
  • Manually create and update statistics with any desired sampling rate, and drop statistics (CREATE STATISTICS, UPDATE STATISTICS, DROP STATISTICS, CREATE INDEX, DROP INDEX)
  • Manually create statistics in bulk for all columns of all tables in a database (sp_createstats)
  • Manually update all existing statistics in the database (sp_updatestats)
  • List statistics objects that exist for a table or database (sp_helpstats, catalog views sys.stats, sys.stats_columns)
  • Display descriptive information about statisticsobjects (DBCC SHOW_STATISTICS)
  • Enable and disable automatic creation and update of statistics database-wide or for a specific table or statistics object (ALTER DATABASE options AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, sp_autostats, and the NORECOMPUTE option on CREATE STATISTICS and UPDATE STATISTICS)
  • Enable and disable asynchronous automatic update ofstatistics (ALTER DATABASE option AUTO_UPDATE_STATISTICS_ASYNC

Behaviour change from SQL 2005 onwards

In SQL Server 2000, statistics update was determined by the number of row changes. Now, changes are tracked at the column level, and auto update of statistics can now be avoided on columns which have not changed enough to warrant statistics update.  This is a behaviour change between SQL 2000 and SQL 2005 onwards when it comes to auto update statistics. SQL Server 2005 and onwards determines whether to update statistics based on the colmodctrs (column modification counters) value.

A statistics object is considered out of date in the following cases:

1. The table size has gone from 0 to > 0 rows.
2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
4. If the statistics object is defined on a temporary table, it is out of date as above, except that there is an additional threshold Recomputation after insertion of 6 rows.


Q: How is the COLMODCTR maintained by SQL Server?
Ans: The colmodctr values that SQL Server keeps track of are continually modified as the data in the table changes. Depending on the amount of INSERT/UPDATE/DELETE/TRUNCATE/BULK OPS that are occurring on the table, this counter value keeps changing. The below table, describes what the effect on colmodctr value are for each operation.

Cc293623.table_C05621961_2(en-us,TechNet.10).jpg
Reference: http://technet.microsoft.com/en-us/library/cc293623.aspx

 

Controlling STATISTICS Behaviour

You have options like AUTO_UPDATE_STATISTICS, AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC etc, which are all database level options. So you need to configure this per each database. You can find if your user databases have these options by using sp_helpdb and looking at the status column or by using a query like below.

select name as [DB_NAME], is_auto_create_stats_on, is_auto_update_stats_on, is_auto_update_stats_async_on 
from sys.databases

How to find out which indexes or statistics needs to be updates?

You can use the following query on any SQL 2005+ instance to find out the % of rows modified and based on this decide if any indexes need to be rebuilt or statistics on the indexes need to be updated.

select
schemas.name as table_schema,
tbls.name as Object_name,
i.id as Object_id,
i.name as index_name,
i.indid as index_id,
i.rowmodctr as modifiedRows,
(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,
convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,
stats_date( i.id, i.indid ) as lastStatsUpdateTime
from sysindexes i
inner join sysobjects tbls on i.id = tbls.id
inner join sysusers schemas on tbls.uid = schemas.uid
inner join information_schema.tables tl
on tbls.name = tl.table_name
and schemas.name = tl.table_schema
and tl.table_type='BASE TABLE'
where 0 < i.indid and i.indid < 255
and table_schema <> 'sys'
and i.rowmodctr <> 0
and i.status not in (8388704,8388672)
and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0
order by modifiedRows desc

Updating Statistics on all the table in any given database

I often get this often as to how you update all the tables in a database. You can use a script like below to achieve this.

Disclaimer: Do not run this unless you know its implications on a production server. Statistics update on all the tables will use CPU resources and depending on the size of the table take its own time.

use <YOUR_DB_NAME>
go
select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type='base table'
declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)
set @count=1
select @maxcount=count(*) from table_count
while @count < @maxcount+1
begin 
      select @tblname=table_name from table_count where rownum=@count
      set @stmt = 'UPDATE STATISTICS '+ '[' +@tblname+ ']' + ' WITH FULLSCAN'
      PRINT ('Updating statistics for table :'+@tblname)
      EXEC(@stmt)
      PRINT ('Finished Updating statistics for table :'+@tblname)
      print ''
      set @count=@count+1
      set @stmt=''
End
drop table table_count

Some tips on Statistics

1. Table variables do not have statistics at all.

Table variables are meant for operations on a small number of rows, a few thousand rows at max. This is a good scenario where you need to think about temporary tables (#tbl), because unlike table variables, temp tables can have indexes created on them, which means they can have statistics.

2. Multi-Statement Table Value Functions (TVF’s) also do not have any statistics

So if you have a complex query logic implemented in a function in SQL Server, think again! This function does not have any statistical information present, so the SQL optimizer must guess the size of the results returned. The reason for this is a multi-statement TVF returns you a TABLE as an output and table does not have any statistics on it.

3. You can find out from the Execution Plan aka SET STATISTICS PROFILE statement if any statistics would help a particular query

When you enable STATISTICS PROFILE ON and execute any query/batch it displays the execution plan. In this output look for the column called “Warnings”. During the course of compiling the plan, if the SQL Server optimizer felt that some statistics on column A would have helped the query, it displays this warning in the execution plan as “NO STATS”. If you see any such warning, consider creating some column statistics or indexes on the particular object in the row.

4. Avoid creating indexes on very frequently updated columns as the statistics also will have to keep up with the amount of data modifications.

5. Viewing Statistics

You can use DBCC SHOW_STATISICS (‘tablename’ , ‘index name’) to view the statistics on any given index’/column stats along with the histogram. The system DMV sys.stats stores information on each statistics available in a particular database.

Any statistics having the name prefixed as _WA_Sys_ is a auto-created statistics, which means SQL Server itself created them. User created statistics will have a given name or have the index name, e.g. PK_TBL1

Posted in Performance | Tagged: , , , , , , , , | Leave a Comment »