The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Posts Tagged ‘slow’

Database takes long time to Recover during startup/Restore – List of Known issues & Fixes

Posted by Sudarshan Narasimhan on October 25, 2012


Have you ever wondered why a database takes a long time to recover during start-up or when you performed a restore operation? Recovery is a regular part of Database start-up as SQL server has to go through various phases like Analysis, Redo and Undo to bring the database online in a consistent state. But, sometimes you might notice that recovery is taking an awfully long time, which is preventing the DB from coming online and is inaccessible to your users. From SSMS you will notice the DB is showing up as “In Recovery” or “Recovering”. If anyone tries to use the database, you will get this error.

Msg 922, Level 14, State 1, Line 1
Database ‘MyDB1’ is being recovered. Waiting until recovery is finished.

Once a DB is in recovery, there is pretty much nothing you can do to make it go faster. Read the troubleshooting section below if your database is already in the recovery phase.

The SQL Errorlogs will tell you the following information:-

  1. The current phase of the DB Recovery process.
  2. The % completed in the current phase.
  3. Approximate time remaining before it completes this phase.
  4. Starting with SQL Server 2008 R2 SP2 and SQL Server 2012 RTM, you will also see the following message if the DB recovery is slow due to large number of Virtual Log Files (VLF’s) present in the transaction log of the database.

Database MyDB1 has more than 1000 virtual log files which is excessive. Too many virtual log files can cause long startup and backup times. Consider shrinking the log and using a different growth increment to reduce the number of virtual log files.

When does a database go into a recovery phase?

  1. During SQL Server start/restart and when the database initializes
  2. When you restore a database from a backup.
  3. When you attach a database.
  4. When you have the AUTO_CLOSE property turned ON and the database was shutdown because no users were connected. The next time a connection to this database comes in it will enter the recovery phase (usually this should be very fast, since it would have been cleanly shutdown and recovery shouldn’t have any work to do).

There are many known issues with slow database recovery. If you are facing a slow database start-up or recovery issue, first check the following table to ensure your SQL Server build is equal to or greater than the builds mentioned below (depending on the version of your SQL Server instance).


Known Issues Section

SQL Server 2005

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative update package 13 for SQL Server 2005 Service Pack 3 (9.00.4315)

Cumulative update package 1 for SQL Server 2005 Service Pack 4 (9.00.5254)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2005 Service Pack 3 (9.00.4285)

974777

FIX: Database restore operation may fail during the recovery phase when the database uses query notification in SQL Server 2005 or in SQL Server 2008

Cumulative update package 6 for SQL Server 2005 Service Pack 3  (9.00.4266)

 

975089

FIX: The restore operation takes a long time when you restore a database that has query notification enabled in SQL Server 2005 or in SQL Server 2008

Cumulative update package 6 for SQL Server 2005 Service Pack 3  (9.00.4266)

 

SQL Server 2008

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative update package 12 for SQL Server 2008 Service Pack 1    (10.00.2808)

Cumulative update package 2 for SQL Server 2008 Service Pack 2

             (10.00.4272)

2524743

FIX: Recovery takes longer than expected for a database in a SQL Server 2008 or in a SQL Server 2008 R2 environment

Cumulative update package 15 for SQL Server 2008 Service Pack 1    (10.00.2847)

Cumulative update package 5 for SQL Server 2008 Service Pack 2    (10.00.4316)

Cumulative update package 1 for SQL Server 2008 Service Pack 3    (10.00.5766)

2653893

FIX: It takes a long time to restore a database in SQL Server 2008 R2 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008 Service Pack 2    (10.00.4326)

Cumulative update package 3 for SQL Server 2008 Service Pack 3    (10.00.5770)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative update package 10 for SQL Server 2008  

             (10.00.1835)

Cumulative update package 7 for SQL Server 2008 Service Pack 1    (10.00.2766)

974777

FIX: Database restore operation may fail during the recovery phase when the database uses query notification in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008

            (10.00.1823)

 

Cumulative update package 5 for SQL Server 2008 Service Pack 1   (10.00.2746)

975089

FIX: The restore operation takes a long time when you restore a database that has query notification enabled in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008
            (10.00.1823)

 

Cumulative update package 5 for SQL Server 2008 Service Pack 1   (10.00.2746)

 

SQL Server 2008 R2

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative Update package 6 for SQL Server 2008 R2
            (10.50.1765)

2524743

FIX: Recovery takes longer than expected for a database in a SQL Server 2008 or in a SQL Server 2008 R2 environment

Cumulative Update package 9 for SQL Server 2008 R2
            (10.50.1804)

Cumulative Update package 2 for SQL Server 2008 R2 Service Pack 1
           
(10.50.2772)

2653893

FIX: It takes a long time to restore a database in SQL Server 2008 R2 or in SQL Server 2008

Cumulative update package 11 for SQL Server 2008 R2
            (10.50.1809)

 

Cumulative update package 4 for SQL Server 2008 R2 SP1

           (10.50.2796)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative Update package 1 for SQL Server 2008 R2
           (10.50.1702)

 

Sample information in the error log about DB recovery

Pre-Recovery
The last message you will see for that database (while in recovery) in the log:
2012-06-26 10:29:20.48 spid58 Starting up database ‘MyDB1’.

Once the pre-recovery has completed, you will see the following message. In this example, it took almost 9 minutes before the following message appeared.
2012-06-26 10:38:23.25 spid58s Analysis of database ‘MyDB1’ (7) is 37% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.


Phase 1: Analysis
2012-06-26 10:58:15.84 spid58s Analysis of database ‘MyDB1’ (7) is 0% complete (approximately 26933 seconds remain). This is an informational message only. No user action is required.
2010-06-26 17:58:10.70 spid58s Analysis of database ‘MyDB1’ (7) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.

Phase 2: Redo
2012-06-26 17:59:40.16 spid58s Recovery of database ‘MyDB1’ (7) is 1% complete (approximately 1508718 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2012-06-26 18:00:40.17 spid58s Recovery of database ‘MyDB1’ (7) is 1% complete (approximately 1508698 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required. 

Phase 3: Undo
2012-06-26 21:33:41.12 spid58s Recovery of database ‘MyDB1’ (7) is 7% complete

Starting with SQL Server 2008, the error log also prints summary info about time spent in each phase of the recovery.
2012-06-27 12:21:48.29 spid7s Recovery completed for database MYDB1 (database ID 7) in 1 second(s) (analysis 460 ms, redo 0 ms, undo 591 ms.) This is an informational message only. No user action is required.

Troubleshooting Information

I’m not going to re-invent the wheel since this topic has been covered already by folks in the SQL community. You can refer to the following blog posts that explain how to troubleshoot  this issue using DMV’s as well as steps to reduce the VLF’s by shrinking the TLOG file.

  1. http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx
  2. http://blogs.msdn.com/b/psssql/archive/2010/12/29/tracking-database-recovery-progress-using-information-from-dmv.aspx
  3. http://blogs.msdn.com/grahamk/archive/2008/05/09/1413-error-when-starting-database-mirroring-how-many-virtual-log-files-is-too-many.aspx
  4. http://blogs.msdn.com/psssql/archive/2009/05/21/how-a-log-file-structure-can-affect-database-recovery-time.aspx
  5. http://blogs.msdn.com/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx

Well, that’s all folks! Staying tuned as always for more SQL’lly stuff.

– TheSQLDude

Advertisement

Posted in Startup & Recovery, Storage Engine | Tagged: , , , , , , , , , | Leave a Comment »

Some known issues with SQL Server Management Studio (SSMS)

Posted by Sudarshan Narasimhan on September 13, 2012


We all use SSMS everyday and it is the main work tool for a DBA/Developer. This post contains info on known issues (bugs) with SQL Server Management Studio. Sharing this in case you are running into any of these known issues, the fix information is given below.


Issue 1

IntelliSense feature stops working in SQL Server 2008/R2 Management Studio

FIX: The IntelliSense feature in SSMS 2008 R2 may stop working after you install Visual Studio 2010 SP1
http://support.microsoft.com/kb/2531482

  • Install cumulative update 7 for SQL Server 2008 R2 to fix this issue.
  • Issue happens if you have installed Visual Studio 2010 SP1 alongside SSMS 2008 R2.

Issue 2

SSMS Hangs on a computer with more than 32 processors. When you click on Connect to Database Engine, the application just hangs and does not connect.

FIX: An application that is based on the .NET Framework 2.0 stops responding on a computer that has more than 32 processors, and the .NET Framework 2.0 Service Pack 2 or the .NET Framework 3.5 Service Pack 1 is installed
http://support.microsoft.com/kb/2276255

  • This is a known issue with .NET 2.0 or .NET 3.5 SP1
  • The above KB has fixes for Windows Vista/Server 2008 and above. If your computer is running Windows Server 2003, then you can apply the below hotfix to resolve this issue. This will patch your .NET 2.0 to a higher build than above KB and fix the issue.

KB –> http://support.microsoft.com/kb/2461007
Download –> http://hotfixv4.microsoft.com/.NET%20Framework%202.0%20-%20Windows%20%202000,%20Windows%202003,%20Windows%20XP%20(MSI)/sp2/DevDiv937463/50727.5066/free/425615_intl_x64_zip.exe

 

Issue 3

SQL Server Management Studio take a long time to open

When running "SQL Server Management Studio" the application may load slowly.
http://support.microsoft.com/kb/555686

  • This is usually observed if the server/computer is not directly connected to the internet or it could be due to 32-bit SSMS running on 64-bit OS (Refer http://support.microsoft.com/kb/906892 )
  • The reason for the slow start up of SSMS is during start up, the .NET Runtime tries to contact crl.microsoft.com to ensure that the certificate is valid.
  • If the machine isn’t connected to the internet, the connection will eventually time-out (hence the delay)
  • The workaround is to disable “Check publisher’s certificate revocation” under Internet Options.

There are a few other useful tips given in this blog to speed up SSMS –> http://eniackb.blogspot.com/2009/06/sql-server-management-studio-slow.html

If you know of any other issues, leave a comment behind with the info and I’ll add it to the blog post. Cheers!

Posted in Tools | Tagged: , , , , , , , , , | Leave a Comment »

IO Cost in an execution plan–What it actually means?

Posted by Sudarshan Narasimhan on May 11, 2012


I/O is probably one the slowest of the resources. The information below gives a quick meaning of what these operators mean when you look at them in an execution plan. This specific concentrates on the “Estimated I/O Cost” value in an execution plan. I’ll post some other day in detail on how to decipher a SQL Server query execution plan.

Thanks to Niraj Mehta for putting together this content.

Table Scan

  • The total number of data pages in the table

Clustered Index Scan

  • The number of levels in the index plus the number of data pages to scan (data pages = #rows / #rows per page)

Non-Clustered Index Seek on a Heap (Bookmark Lookup)

  • The number of levels in the index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows (1 I/O for each row on the heap)

Non-Clustered Index (I1) Seek on a Clustered index (I2) (Bookmark Lookup)

  • The number of levels in the I1 index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows times the cost of searching for a clustered index (I2) key

Covering Non-Clustered index

  • The number of levels in the index plus the number of leaf index pages to read for qualifying rows (#qualifying rows / # rows per leaf page).

Posted in Performance | Tagged: , , , , , , , , , , | Leave a Comment »

Shedding light on SQL Server Statistics and Auto Update Statistics

Posted by Sudarshan Narasimhan on May 4, 2012


SQL Server collects statistical information about indexes and column data in the database. These statistics play a very important role in coming up with an efficient query plan. Statistical information is used by the Query Optimizer when it decides to choose Seek vs. Scan or using Index A vs. Index B, etc. So it’s important as a DBA to pay careful attention to statistics and making sure they are updated and maintained well.

Just like Chandler Bing’s job is “Statistical analysis and data reconfiguration”, its a good DBA’s job to take care of maintaining the statistics Smile.


SQL Server 2005 statistics features allow you to:

  • Implicitly create and update statistics with the default sampling rate (in the SELECT, INSERT, DELETE, and UPDATE commands, use of a column in a query condition such as a WHERE or JOIN clause causes statistics to be created or updated if necessary when automatic statistics creation and update is enabled)
  • Manually create and update statistics with any desired sampling rate, and drop statistics (CREATE STATISTICS, UPDATE STATISTICS, DROP STATISTICS, CREATE INDEX, DROP INDEX)
  • Manually create statistics in bulk for all columns of all tables in a database (sp_createstats)
  • Manually update all existing statistics in the database (sp_updatestats)
  • List statistics objects that exist for a table or database (sp_helpstats, catalog views sys.stats, sys.stats_columns)
  • Display descriptive information about statisticsobjects (DBCC SHOW_STATISTICS)
  • Enable and disable automatic creation and update of statistics database-wide or for a specific table or statistics object (ALTER DATABASE options AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, sp_autostats, and the NORECOMPUTE option on CREATE STATISTICS and UPDATE STATISTICS)
  • Enable and disable asynchronous automatic update ofstatistics (ALTER DATABASE option AUTO_UPDATE_STATISTICS_ASYNC

Behaviour change from SQL 2005 onwards

In SQL Server 2000, statistics update was determined by the number of row changes. Now, changes are tracked at the column level, and auto update of statistics can now be avoided on columns which have not changed enough to warrant statistics update.  This is a behaviour change between SQL 2000 and SQL 2005 onwards when it comes to auto update statistics. SQL Server 2005 and onwards determines whether to update statistics based on the colmodctrs (column modification counters) value.

A statistics object is considered out of date in the following cases:

1. The table size has gone from 0 to > 0 rows.
2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
4. If the statistics object is defined on a temporary table, it is out of date as above, except that there is an additional threshold Recomputation after insertion of 6 rows.


Q: How is the COLMODCTR maintained by SQL Server?
Ans: The colmodctr values that SQL Server keeps track of are continually modified as the data in the table changes. Depending on the amount of INSERT/UPDATE/DELETE/TRUNCATE/BULK OPS that are occurring on the table, this counter value keeps changing. The below table, describes what the effect on colmodctr value are for each operation.

Cc293623.table_C05621961_2(en-us,TechNet.10).jpg
Reference: http://technet.microsoft.com/en-us/library/cc293623.aspx

 

Controlling STATISTICS Behaviour

You have options like AUTO_UPDATE_STATISTICS, AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC etc, which are all database level options. So you need to configure this per each database. You can find if your user databases have these options by using sp_helpdb and looking at the status column or by using a query like below.

select name as [DB_NAME], is_auto_create_stats_on, is_auto_update_stats_on, is_auto_update_stats_async_on 
from sys.databases

How to find out which indexes or statistics needs to be updates?

You can use the following query on any SQL 2005+ instance to find out the % of rows modified and based on this decide if any indexes need to be rebuilt or statistics on the indexes need to be updated.

select
schemas.name as table_schema,
tbls.name as Object_name,
i.id as Object_id,
i.name as index_name,
i.indid as index_id,
i.rowmodctr as modifiedRows,
(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,
convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,
stats_date( i.id, i.indid ) as lastStatsUpdateTime
from sysindexes i
inner join sysobjects tbls on i.id = tbls.id
inner join sysusers schemas on tbls.uid = schemas.uid
inner join information_schema.tables tl
on tbls.name = tl.table_name
and schemas.name = tl.table_schema
and tl.table_type='BASE TABLE'
where 0 < i.indid and i.indid < 255
and table_schema <> 'sys'
and i.rowmodctr <> 0
and i.status not in (8388704,8388672)
and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0
order by modifiedRows desc

Updating Statistics on all the table in any given database

I often get this often as to how you update all the tables in a database. You can use a script like below to achieve this.

Disclaimer: Do not run this unless you know its implications on a production server. Statistics update on all the tables will use CPU resources and depending on the size of the table take its own time.

use <YOUR_DB_NAME>
go
select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type='base table'
declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)
set @count=1
select @maxcount=count(*) from table_count
while @count < @maxcount+1
begin 
      select @tblname=table_name from table_count where rownum=@count
      set @stmt = 'UPDATE STATISTICS '+ '[' +@tblname+ ']' + ' WITH FULLSCAN'
      PRINT ('Updating statistics for table :'+@tblname)
      EXEC(@stmt)
      PRINT ('Finished Updating statistics for table :'+@tblname)
      print ''
      set @count=@count+1
      set @stmt=''
End
drop table table_count

Some tips on Statistics

1. Table variables do not have statistics at all.

Table variables are meant for operations on a small number of rows, a few thousand rows at max. This is a good scenario where you need to think about temporary tables (#tbl), because unlike table variables, temp tables can have indexes created on them, which means they can have statistics.

2. Multi-Statement Table Value Functions (TVF’s) also do not have any statistics

So if you have a complex query logic implemented in a function in SQL Server, think again! This function does not have any statistical information present, so the SQL optimizer must guess the size of the results returned. The reason for this is a multi-statement TVF returns you a TABLE as an output and table does not have any statistics on it.

3. You can find out from the Execution Plan aka SET STATISTICS PROFILE statement if any statistics would help a particular query

When you enable STATISTICS PROFILE ON and execute any query/batch it displays the execution plan. In this output look for the column called “Warnings”. During the course of compiling the plan, if the SQL Server optimizer felt that some statistics on column A would have helped the query, it displays this warning in the execution plan as “NO STATS”. If you see any such warning, consider creating some column statistics or indexes on the particular object in the row.

4. Avoid creating indexes on very frequently updated columns as the statistics also will have to keep up with the amount of data modifications.

5. Viewing Statistics

You can use DBCC SHOW_STATISICS (‘tablename’ , ‘index name’) to view the statistics on any given index’/column stats along with the histogram. The system DMV sys.stats stores information on each statistics available in a particular database.

Any statistics having the name prefixed as _WA_Sys_ is a auto-created statistics, which means SQL Server itself created them. User created statistics will have a given name or have the index name, e.g. PK_TBL1

Posted in Performance | Tagged: , , , , , , , , | Leave a Comment »

Update to the MDW (Data Collector) Post

Posted by Sudarshan Narasimhan on January 13, 2012


Back in August 2011, I had posted quite a lengthy blog post on Management Data Warehouse (Data Collector). I made an update to that post today to include a new issue and the fix for that issue. This has also been updated in the official KB article given below. For those of you out there using Data Collector to monitor your SQL Server’s, please read this to avoid any future issues.

FIX: Data Collector job takes a long time to clear data from a MDW database in SQL Server 2008 R2
http://support.microsoft.com/kb/2584903

To break it down (or) putting it simply, here is what you need to do as a DBA IF you are using MDW or Data Collector

1. For SQL Server 2008 R2, please apply Service Pack 1 immediately.
2. Follow KB 2584903 (or) my earlier post to modify the stored procedure sp_purge_orphaned_notable_query_text.
3. Run the purge procedure ad-hoc or let it run on schedule and this will actually do the clean-up and reduce the size of the MDW database.

The reason for step#2 even after applying SP1, is because a new bug in the T-SQL code was found. i.e. the delete statement incorrectly references the snapshots.notable_query_plan table after you apply Service Pack 1, and we need to change this to correctly reference the snapshots.notable_query_text table.

Have fun!

Posted in Data Collector | Tagged: , , , , , , , , | Leave a Comment »

Data Collector (Management Data Warehouse) – Issues, Tips and Solutions

Posted by Sudarshan Narasimhan on August 3, 2011


Somehow I ended up working on data collector issues quite a bit in the last 6 months or so. It’s certainly a useful feature but not an easy one to work with and definitely not an easy one to troubleshoot for you DBA’s. I’m listing down some issues that I faced and solutions for these issues. If you come up with or face any new issues in data collector, please drop a mail to sudarn.

1. Data Collector Upload Job Timeouts

The Data Collector jobs were getting timeout errors on for data upload job ONLY AT THE TIME WHEN PURGE JOB IS RUNNING. You notice that once the purge job completes, the data upload job also starts succeeding, but till that time it simply keeps failing. Here is what you would see in job history for these upload jobs.

02/08/2011 05:25:00,collection_set_3_upload,Error,0,SERVERXYZ\INSTANCEXYZ,collection_set_3_upload,(Job outcome),,The job failed. 
The Job was invoked by Schedule 2 (CollectorSchedule_Every_5min).  The last step to run was step 2 (collection_set_3_upload_upload).,01:00:01,0,0,,,,0

02/08/2011 05:25:00,collection_set_3_upload,Error,2,SERVERXYZ\INSTANCEXYZ,collection_set_3_upload,collection_set_3_upload_upload,,Executed as user: STARWARS\Yoda
The thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop.  Process Exit Code 259.  The step failed.,01:00:01,0,0,,,,0

Since we know that the purge job running was the only time when these uploads were failing, we have a simple solution

Solution:
Schedule the purge and upload to run at different schedules. You can use the SSMS UI to define a new schedule for the collection set, just make sure it doesn’t fall under the schedule of the purge job.

 

2. Data Collector Upload Job Deadlocks intermittently

The collection set upload job is running into deadlocks now and then (aka intermittent). This is again related to the purge jobs. Why?

There have been multiple reports of this issue on Connect & MSDN Forums and I’ve had the “pleasure” of talking to customers about this issue. Here are some,

Deadlock in MDW Upload Purge Logs Job Step
http://connect.microsoft.com/SQLServer/feedback/details/539547/deadlock-in-mdw-upload-purge-logs-job-step

Management Data Warehouse Data Collector upload job deadlocks
http://connect.microsoft.com/SQLServer/feedback/details/504557/management-data-warehouse-data-collector-upload-job-deadlocks

Here is a sample output of a failed Upload job that reported the deadlock.

Log Job History (collection_set_3_upload)
Step ID 1
Job Name collection_set_3_upload
Step Name collection_set_3_upload_purge_logs

Message
Executed as user: STARWARS\Yoda. Transaction (Process ID 457) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.

Solution:

Stagger the two collector jobs. E.g. collection_set_2_upload Server Activity and collection_set_3_upload Query Statistics to run a couple of minutes apart. Now, here is a catch! Don’t change the schedule for the SQL Server Agent job, you need to change the schedule using the collector “pick schedule” option in SSMS.

To do this, right click on the collector agent (not the SQL Agent job) select properties and then uploads in the left hand pane. Click on "New" to create a new schedule. If you do it this way you can create two new schedules that are independent of each other.

3. Unable to change/define Schedules for MDW Collection Sets

I did run into another quirky issue when attempting to define a new schedule for the collection sets. i.e I was not able to define a new schedule for this and kept getting this error.

clip_image002[4]

These are the collection sets that are present by default,

  1. Disk Usage
  2. Server Activity
  3. Query Statistics
  4. Utility Information

Of these, the Utility Information is disabled and does not have a schedule defined by default. The reason for the above error was the Collection set “Query Statistics” mentioned above, did not have a valid schedule UID stored. You can confirm this by doing these queries.

1. In the context of MSDB database, run the following query and note down the schedule_uid value for Query Statistics collection set

select * from dbo.syscollector_collection_sets where collection_set_id=3

e.g output. A575FFD0-98A0-4D0E-B43C-B63482FB5B00

2. Again in the MSDB context, run the following,

SELECT schedule_id from sysschedules_localserver_view WHERE ‘XYZ’ = schedule_uid

                — XYZ is the value obtained from step#1. You will see that there is NO value returned for step#2

 

3. As I mentioned, Utility Info has no schedule by default, so we need to use this to get out of this situation. So I used the Utility Information collection set and defined a new schedule for it for every 10 minutes.

4. Next, I queried the schedule_id for this in dbo.syscollector_collection_sets and used this schedule_id to map to the collection set that was failing.

declare @schedule_uid uniqueidentifier

select @schedule_uid = schedule_uid from dbo.syscollector_collection_sets where collection_set_id = 5 – whichever is the Utility Info collection set ID

exec dbo.sp_syscollector_update_collection_set @collection_set_id = 3, @schedule_uid = @schedule_uid — whichever is the Query Statistics collection set ID (change according to the one failing at your end)

 

5. This fixed the issue with Query Stats and I was able to change/define a schedule for that. But, since we created a schedule for Utility Info and you don’t want that to run, I tried to disable it by setting it to “On Demand”. But this failed. Oops!

6. So I enabled the Utility Info collection set and only then did it create a valid job_id for it, but I got another error when trying to remove the schedule.

7. I stopped the collection and then deleted the job manually. To get things back to old state, I updated the metadata using like this,

UPDATE [msdb].[dbo].[syscollector_collection_sets_internal]

SET [collection_job_id] = NULL, [upload_job_id] = NULL

WHERE collection_set_id = 7 – whichever is the Utility Info collection set ID

8. Now, you can stop the Utility Info collection set and also get the other collection sets schedule changed to fix issue #2 mentioned aboveSmile.

 

4. Data Collector Purge Job (Clean-up job) takes a long time to complete

This is actually the root cause of issue #1 and #2 listed above. The purge procedure is complicated and is responsible for cleaning up the metadata tables of old entries. This work is done by the core.sp_purge_data stored procedure. As a troubleshooting step, I captured the execution plan of the procedure and noticed a missing index recommendation in the XML Showplan.

<MissingIndexes>
  <MissingIndexGroup Impact="92.2913">
    <MissingIndex Database="[MDW]" Schema="[snapshots]" Table="[query_stats]">
      <ColumnGroup Usage="EQUALITY">
        <Column Name="[sql_handle]" ColumnId="1" />
      </ColumnGroup>
    </MissingIndex>

If you were to translate this into a CREATE INDEX statement this is how it would look,

CREATE NONCLUSTERED INDEX [Ix_query_stats_sql_handle]

ON [snapshots].[query_stats] ([sql_handle] ASC)

WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,

DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)

ON [PRIMARY]

 

Now, creating this requires modifying the MDW system table and of course this isn’t supported! This same thing is mentioned in this blog as well and is this blog post by the SQL Server Development Team. Don’t do it!  Patience, I will explain.

 

Here are some facts:-

1. Purge job can get slow on large MDW databases (40+ GB).

2. The DELETE TOP statement on snapshots.notable_query_plan is the one where most execution time is spent.

 

 

Solution:

Like I mentioned earlier, don’t modify system Stored Procedure code unless guided by Microsoft Support. Luckily, for the slow purge procedure a fix has been released in SQL Server 2008 R2 Service Pack 1, which can be downloaded here.

 

This fix updates the Purge procedure TSQL code and the purge has been broken down and re-written in an optimized way. The runtime will come down drastically once you update to SP1. The new procedures doing the purge is called “[core].[sp_purge_orphaned_notable_query_plan]” and “[core].[sp_purge_orphaned_notable_query_text]”

 

Hang on, it’s not over yet!

 

 

AFTER you apply SP1, you will need to modify the Stored Procedure sp_purge_orphaned_notable_query_text as shown below. These changes are required to correct the text of the sp_purge_orphaned_notable_query_text stored procedure because the delete statement incorrectly references the snapshots.notable_query_plan table after you apply Service Pack 1.

 

 

SP1 Code

        — Deleting TOP N orphaned rows in query plan table by joining info from temp table variable

        — This is done to speed up delete query.

        DELETE TOP (@delete_batch_size) snapshots.notable_query_plan

        FROM snapshots.notable_query_plan AS qp , #tmp_notable_query_plan AS tmp

        WHERE tmp.[sql_handle] = qp.[sql_handle]

 

Change this to following once you apply SP1

— Deleting TOP N orphaned rows in query text table by joining info from temp table
       
— This is done to speed up delete query.

       
DELETE TOP (@delete_batch_size) snapshots.notable_query_text 
        
FROM snapshots.notable_query_text AS qt, #tmp_notable_query_text AS
tmp
       
WHERE tmp.[sql_handle] = qt.[sql_handle]

 

Hopefully, this code change will be included in a future cumulative update post-SP1, so that you don’t have to manually change the code. The same applies for SQL Server 2008 as well, where I am hopeful these changes will be included in a future Service Pack. This fix mentioned above is at present once valid for SQL Server 2008 R2 (as of Aug 2, 2011 when I wrote this). With these, the slow purge issues should be put to bed, once and for all!


UPDATE (August 3rd, 2011)

After working with our KB team, we have published an official KB article that talks about this issue. For all those running into slow purge issues, please follow the resolution given in this KB article,

FIX: Data Collector job takes a long time to clear data from a MDW database in SQL Server 2008 R2
http://support.microsoft.com/kb/2584903

 

Other Useful Links

FIX: The Management Data Warehouse database grows very large after you enable the Data Collector feature in SQL Server 2008
http://support.microsoft.com/kb/970014

Data Collector’s sp_purge_data may run for hours if there are many orphaned plans to purge
http://blogs.msdn.com/b/sqlagent/archive/2011/04/12/data-collector-s-sp-purge-data-may-run-for-hours-if-there-are-many-orphaned-plans-to-purge.aspx

Posted in Data Collector | Tagged: , , , , , , | 8 Comments »