The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Archive for August, 2011

Trust No One and always ask “WHY”!

Posted by Sudarshan Narasimhan on August 13, 2011


Another good post by Paul Randal. It’s worth a read and I agree with his thoughts on Forums filled with “so-called” experts. The WWW can be a dangerous place for those looking for quick solutions. Don’t be misled into implementing something you will regret later or end up being someone else’s worst nightmare. Always ask yourself “WHY” instead of just focusing on the “WHAT TO DO” when looking for suggestions/solutions. This is a crucial part of logical troubleshooting which most people ignore. So do yourself a favour and ask “WHY” when someone asks you do to make this/that change. If they can’t explain “why”, well rest assured it is not 100% guaranteed to solve your issue. Even if it did, you still don’t know “why” it did.

Like the warrior in 300 says “Remember us, Remember WHY we died


Do yourself a favor… Trust No One
http://www.sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx

Posted in General Thoughts | Tagged: , , , , | Leave a Comment »

Demystifying SQL Server Servicing (Patching, Security Updates and Service Packs)

Posted by Sudarshan Narasimhan on August 11, 2011


Today I am going to talk about the different patching options that are available with SQL Server and how they work. Often people get confused and assume that there is only one single sequence of updates for SQL Server, starting from a lower build number to higher build. This is not true, because there are different branches for patching and different types of patches released. Let me try and clarify a few of those common assumptions/misconceptions today.

SQL Server is a large product (well, it’s a Database server after all) and so requires a different patching mechanism to other products like Windows.


Types of SQL Server Patches (Product Updates)

1. Service Packs (SP):

Consider this as a consolidated list of different updates bundled into one large package. Typically, there are 2-3 Service Packs for the lifetime of a product. A typical product is in mainstream support phase for roughly about 5-6 years after which it goes into extended support phase. So if a service pack is released once every 2 years, that makes it 2-3 SP’s for the lifetime of a product.

There are exceptions to this. E.g. SQL Server 2005. This has 4 Service Packs released and probably because it was so popular that people were comfortable using it even in extended support phase instead of upgrading to the next version. Hence, this required SP4 to be released to cater to the community base that was still using SQL 2005. Makes business sense, doesn’t it? 🙂

I don’t want to get into the details of the different support phases. But you read the simplified explanation of this in Bob Ward’s blog post here.

2. Cumulative Updates (CU):

To put it simply, Cumulative updates are released once every 2 months for each branch of the product that is in mainstream support [I’ll explain about “branches” shortly]. The SQL Server release team shifted to this “Incremental Servicing Model (ISM)” some years back to make it easier for administrators to predict and plan for updates. This fits in nicely with change control and downtime planning, to apply these cumulative updates.

Branching

This is where most people get confused. I said earlier that CU’s are released, but I wasn’t totally honest. Cumulative Update packages are release for a Service Pack. Still don’t see the big picture? Let me explain. After each Service Pack, a CU is released every 2 months and this CU is only applicable for that particular service pack branch. E.g., CU2 for SP1 != CU2 for SP2. You may ask, why is this stuff so complicated?

My answer: Think Linear, cause Time flows in only 1 direction. Smile

Let me explain this with a practical example. After Service pack 1 is released, lets says 10 cumulative updates were released which makes it roughly 2×10=20 months. Now, its time for a new Service Pack. Once Service pack 2 comes out, we have SQL Servers that are in RTM build, SP1 build and others somewhere between SP1 CU1 to SP1 CU10. Can all of these be upgraded to SP2 immediately? Obviously not! People do take time to test in QA/Dev/Test etc.. before a service pack is applied (all of us go through this as part of our jobs, don’t we). Now as time goes on and 2 more months are past, the next CU for SP2 comes out. And this is exactly what Branching is all about. We now have SQL Servers that are on the base RTM, some on SP1 branch and some on SP2 branch.

You might have noticed that once a service pack is released, cumulative updates continue to be released for older service packs. Why? Think back to the mainstream vs. extended support phase. Till a service pack goes into extended support phase, Microsoft has a legal obligation to continue to service that branch of the product. This tends to be the confusion for most people, but if you understood Branching, then you will understand why CU’s are released even when SP’s are also being released.

Want me to complicate things further? The strong minded may read on …

Quirky Scenario:

You have a SQL Server instance on SP1 CU5 build. You hit a known issue (Bug) which you find out is fixed in CU6. Great! You test CU6 on your test system and the fix does solve your issue, so you implement CU6 on your production server. Life is good!

A month later, a new service pack (SP2) is released. Your boss wants you to immediately apply this SP2 on Production (due to company policy, compliance blah blah blah…). The dutiful employee that you are on, you successfully apply SP2 on Production and find out that the old issue is back again. WTF!

Think about this carefully and you will see the answer.

Service Pack 2 released just a month after CU6 was released. I mentioned that a CU is released every 2 months, which means your CU6 must have been developed/tested 2 months + 30 days = 3 months back by Microsoft. A Service Pack is considered a major upgrade, so obviously the development/testing time would be more for a SP vs. a CU. Let me point you to a piece of documentation most of you would have missed.

image(Sample taken from KB 2285068)

So, the fix for your issue which was part of SP1 CU6 is not part of SP2 because of the testing time required for a service pack. What happens in such a scenario is, the next cumulative update for SP2 (which is CU1) will contain the fixes there were present in CU6. This is called a post service pack roll-up fix. As a DBA, you need to keep in mind that if you are following the regular CU schedule of patching your SQL Servers every 2 months and then applying a Service Pack, you must immediately apply CU1 for the newly released Service Pack. CU1 for Service Packs are released pretty soon after the SP is released (usually in a week’s time). Even this minute detail is documented.

image(Sample taken from KB 2289254)

 

3. General Distribution Release (GDR):

These are special patches that are released mainly for security issues or issues having a broad customer impact. I simply like to call them “security fixes”. These are not a scheduled fix and are released by Microsoft as appropriate. The important thing about these updates are that it impact every SQL Server and is applicable for most SQL Server versions. That is why, whenever a GDR is released it is available for public download in download centre, available through Windows Updates as a High/Critical fix and in all likeliness you IT team will push them to your machines for automatic update Smile.

The other thing to keep in mind with GDR releases it that even if your product is in extended support phase, you will still get these critical security fixes. For each GDR update, a security bulletin is posted which lists the severity rating, the affected software as well as the affected versions of the software.

If you have ever read the KB article for any SQL Server security bulletin, you may have noticed two different download options.

a. GDR Software Updates
b. QFE Software Updates

You need to choose the correct version based on the build number of your SQL Server instance. GDR updates are typically if you are on the base service pack build or earlier CU’s, whereas QFE updates are for those who have service pack + some cumulative update applied. In the latter, case the same GDR security fix is available for you with the existing CU fixes. In the former, you have the service pack + only the security fix applied.

True or False?

1. You need to apply each CU in sequence to be able to apply the latest CU: False

As the name says these are cumulative fixes, meaning the fixes from the previous CU’s are carried over to the next CU. You only need to apply the latest CU to have the fixes from the previous updates.


2. I have to apply each Service Pack to be able to apply the latest Service Pack: False

Same logic as CU. Each Service Pack is also cumulative, so only need the latest Service Pack to have all the fixes (The only exception is the Quirky Situation mentioned above).

3. I need a downtime to apply a CU or Service Pack: True

As the setup process goes on, it will automatically stop your SQL Server instance to replace the binaries (exe/DLL/other files), to apply upgrade scripts and start it back again. So you certainly need a business downtime pre-planned before running a SQL Server update. (This does not apply if you only patching the Tools or Setup Support Files).

4. I heard that SQL Express Edition comes with the Service Pack bundled?: True

SQL Express is unique in a way that for each Service Pack release, the SQL Express media itself can be used for a new install. This is called “Slipstreaming”. Slipstreaming is integrating a SQL Server update and the original installation media so that the original media and the update are installed at the same time. Starting with SQL Server 2008 this can be done manually even for other editions, but SQL Express has been this way for a long time and MS does this for you and puts the slipstreamed media in the download site. So if you install SQL Express SP4, you get a SQL Server instance that is at SP4 build right from the word Go.

5. It is not possible to uninstall a Service Pack!: False

Starting with SQL Server 2008, it is possible to uninstall Service Packs and cumulative updates. You can see each SQL update present in Add/Remove programs (or “View Installed Updates” in Vista+) and these can be uninstalled. Refer screenshot below.

image


For more information on this topic, please read the following MSDN article.

Overview of SQL Server Servicing Installation
http://msdn.microsoft.com/en-us/library/dd638062(v=sql.100).aspx



Folks, I know this was a long post, but this was something that I wanted to talk about for a long time. If you have any questions leave a comment behind or drop me an email and I will definitely try and get back to you. Cheers!

.

Posted in Setup | Tagged: , , , , , , , , , , | 2 Comments »

Next version of SQL Server Code-Named “Denali” CTP3: Product Guide available for download

Posted by Sudarshan Narasimhan on August 11, 2011


Good news for all SQL enthusiasts today! The next version of SQL Server [codenamed “Denali”] reached the next milestone, i.e. CTP3 (Community Technology Preview 3) about a month back, but today the Product Team has released a complete product guide (with samples & videos) for public use. You can find the official confirmation from the SQL Server Product Team here,

http://blogs.technet.com/b/dataplatforminsider/archive/2011/08/10/sql-server-code-name-denali-ctp3-product-guide-is-here.aspx

If you haven’t downloaded the installation media for SQL Server Code Name "Denali" CTP3, you can find the link below,

https://www.microsoft.com/betaexperience/pd/SQLDCTP3CTA/enus/

I highly recommend you to download the comprehensive product guide which contains whitepapers, presentation, demos and sample videos to help you learn the new release. This will help you get started on the next version and stay one step ahead of the rest of the SQL community. The Product Guide is a ~450MB download.

http://www.microsoft.com/download/en/details.aspx?id=27069

Have fun playing with the new version. Remember, Information is Wealth Open-mouthed smile.

Posted in SQL News | Tagged: , , , , , , , , , | Leave a Comment »

PRB: Unable to remove secondary filegroup due to system service broker queues

Posted by Sudarshan Narasimhan on August 10, 2011


Delving a little into SQL Storage engine and Service Broker today. I had faced this problem sometime back and it presented me with an opportunity to see how the SSB queues/tables are represented internally by SQL Server.


Scenario
My customer wanted to remove the secondary File Group “FG2_fg" from a database. But was unable to remove the filegroup since these service broker tables were present in filegroup FG2_fg.

queue_messages_223180947
queue_messages_225181061
queue_messages_327181175

All of the above objects are internal system tables which are placeholders for the default queues that are present for every database starting with SQL Server 2005.

QueryNotificationErrorsQueue
EventNotificationErrorsQueue
ServiceBrokerQueue

Whenever we tried to remove the file/filegroup we got this error,

Alter database test3 remove file fil2

Msg 5031, Level 16, State 1, Line 1

Cannot remove the file ‘fil2’ because it is the only file in the DEFAULT filegroup.

We used the following queries to identify the objects present in the secondary filegroup:-

1) Find out the objects present in the secondary filegroup

select name, type, type_desc, is_ms_shipped from sys.objects where object_id in (select object_id from sys.indexes where data_space_id in (2))

— Here data_spaceid=2 represents the filegroup ID 2

2)

SELECT o.[name], o.[type], i.[name], i.[index_id], f.[name]

FROM sys.indexes i

INNER JOIN sys.filegroups f

ON i.data_space_id = f.data_space_id

INNER JOIN sys.all_objects o

ON i.[object_id] = o.[object_id]

WHERE i.data_space_id = 2 — Filegroup ID

Background Info
1. SQL Server uses a proportional fill strategy as it writes to each file within the filegroup and NOT across filegroups. Read more about this in here.
2. The only exception to this could have been IF someone had changed the default filgroup to the secondary filegroup. If so, then any new objects created afterwards would have gone to the secondary instead of the primary.

Repro on SQL 2005
As any good engineer does, so did I – TESTING. To test my little theory I did the following repro on SQL 2005 :-

1) Created a new database with secondary filegroup and 1 file in that.
2) Made secondary FG as default FG for database "test".

*Note: This needs to be done as part of create database and not added later on.

3) What I found was that you DON’T NEED service broker to be enabled for the queue_messages tables to get created.
4) I found that on my test database I had 3 Internal queue_message tables which where for the following parent object

QueryNotificationErrorsQueue
EventNotificationErrorsQueue
ServiceBrokerQueue

4) All of the 3 were were of type SERVICE_QUEUE. You can find this out from sys.objects view
5) Even a new database has the same object numbers and looking at my customer’s object numbers, they are higher, which means they must have had Service Broker implemented at some point.

queue_messages_254180947
queue_messages_286181061
queue_messages_318181175

 

6) I tested this by creating a service queue when my secondary filegroup was the default filegroup.

use test

GO

CREATE QUEUE TestQueue

WITH

STATUS = OFF,

RETENTION = ON

 

7) This created the table queue_messages_2105058535 on FG 2 of type "QUEUE_MESSAGES" indicating its a SSB queue.

 

8)  To remove the internal table I did a DROP QUEUE.

drop queue TestQueue

 

9) This removed the associated internal table and the indexes. You can use the following query to identify internal tables which are for service broker queues.

select a.name as [ChildName], a.object_id as [ChildObjectID], a.type_desc as [ChildType], a.internal_type_desc as [ChildTypeDesc],

a.parent_id as [Parent_ObjectID], b.name as [Parent Name], b.type_desc as [Parent_Type]

from sys.internal_tables a

inner join sys.objects b

on a.parent_id = b.object_id

 

So the issue still does NOT reproduce on SQL 2005 (same for SQL 2008). I then did the following repro on a SQL Server 2000 instance.

Repro on SQL 2000
1. Created a database on SQL 2000
2. Added File-group (FG2) and made it as default.
3. Took a backup of the database in SQL 2000.
4. Restored this to SQL 2005/2008.
5. System Services/Queues got created on FG2.
*Note: We cannot modify them because they are system objects.

6) The remove File command gives us this error,

Alter database test remove file fil2

Msg 5031, Level 16, State 1, Line 1

Cannot remove the file ‘fil2’ because it is the only file in the DEFAULT filegroup.

 

7)  Next up, I ran these commands.

 

dbcc shrinkfile(3, EMPTYFILE)

go

alter database test remove file fil2

 

This worked so far and in sys.database_files I see the status of file2 as OFFLINE. But unfortunately I still cannot remove the filegroup. It still says that it is not empty. Even though the emptyfile commands worked, the objects (SSB queues) still exist on FG2.

I followed same steps for a database created on SQL 2008 and took a backup and restored it again on SQL 2008 and issue did not reproduce. i.e. objects were created on Primary and none of them went to the Secondary FG. I did the same test on SQL 2005 Database backed up and restored to SQL 2008. This also did NOT reproduce the issue.

So, what did I learn out of all my testing (and precious time). Read on…

My Theory
This database had to have been restored from SQL 2000 to a higher version or been upgraded in-place from SQL 2000 to higher version? Tracking down the source of the original database would help us figure this out.

If yes, we now know the answer and that this behaviour is ByDesign and understood, i.e. how the system tables went to the secondary filegroup.  This is because the non-primary filegroup FG2_fg was set as DEFAULT prior to backup from SQL 2000 or in-place upgrade. Once the upgrade was done, since SSB was newly introduced starting with SQL 2005, during the upgrade the upgrade scripts created these objects on the secondary filegroup since it was set as the default filegroup.


Verifying My Theory

To confirm if an in-place upgrade was done, we can look in sys.database_files DMV and look at the column file_GUID. The file_guid column will be NULL if the database was upgraded from an earlier version of Microsoft SQL Server. (Upgraded only, not for a Restore).

I have to thank the SQL Storage Engine folks for thinking about this kind of a scenario and capturing such detailed info in the catalog views. Thanks Folks!

I went back to my customer’s database and looked at the sys.databases output and I noticed this,

file_id

file_guid

type

type_desc

data_space_id

name

physical_name

1

08C43589-1462-4492-8778-D4BCA128ED66

0

ROWS

1

test_Data

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data.mdf

2

A7ADDAF4-0425-4BB8-988C-FE260A41331C

1

LOG

0

test_Log

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Log.ldf

5

3542AAD9-4AC4-499A-90B8-D342D0CBFFE6

0

ROWS

1

test_Data2

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data2.ndf

6

NULL

0

ROWS

2

fil2

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\fil2.ndf

7

12D108E4-DEEE-4302-A61A-AD6FE21B5EF3

0

ROWS

1

test_Data3

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data3.ndf

 

Notice that for data_space_id=2 which is for the secondary filegroup FG2_fg, the file_guid column has a value of NULL !!!
EUREKA!!!

This confirms that this instance/database was indeed upgraded from an earlier version of SQL 2000/7.0 to SQL (2005/2008). This behaviour is by design and we now understand how the system SSB objects were created on secondary filegroup.


Conclusion

Q: Why cannot we remove the file and filegroup?
Ans: Because of the above mentioned queue_messages objects which are present in the secondary filegroup. If there were user-created service broker queues/services then they can be Dropped or Altered to move them to the primary filegroup. Since these came as part of the database during DB Creation, they cannot be modified/dropped/moved. Hence, we cannot remove the filegroup.

Q: How did these "system" objects get to the secondary file group?
Ans: These system objects became part of the database starting with SQL Server 2005 onwards. So, If you had a database in SQL 2000 and you upgraded that to SQL 2005/2008 these system objects will get created. But, in SQL 2000 if the secondary file-group FG2_fg was set as the DEFAULT file-group, then any new objects created without an explicit MOVE TO will go the default filegroup. So when you upgrade the database to SQL 2008 these automatically got created on the secondary file-group.

Q: What data do I have to prove that the theory above is true in your case?
Ans: We store the information about each database file in sys.database_files DMV. There is a column called file_guid which when NULL indicates that the database/file was upgraded from an earlier version on SQL Server. So when I looked at sys.database_files in the database test, I saw the value for the secondary file as NULL.

Reference – http://technet.microsoft.com/en-us/library/ms174397(SQL.100).aspx

 

I spent some serious time to arrive at this conclusion and I am sharing this with the SQL community, so that it saves you some time. As always, stay tuned to theSQLDude for more…

Posted in Storage Engine | Tagged: , , , , , , | Leave a Comment »

Guide to Troubleshooting SQL Server Service startup issues

Posted by Sudarshan Narasimhan on August 9, 2011


I did post on this same topic on MSDN SQLServerFAQ and I am re-posting this here for easy reference. This is a handy guide for every SQL DBA, as they would have come across this issue at one point or another in their DBA life. Read on…


This is one of those challenging situations when customers call us at times. I can understand the gravity of the situation – your boss is behind your back, your business is down, your application is down, your website cannot accept orders or your phone is ringing off the hook with calls from the helpdesk. Don’t worry; I will guide you through some basic steps & actions that you can take to make some sense amidst all this madness.

clip_image002[4] Warning:

This blog is only a basic guide and by no means covers all possible troubleshooting techniques and meant only as a starting point.

 

Nothing in life is to be feared. It is only to be understood. – Marie Curie

clip_image004[4]

Startup Options
Have you ever called SQL CSS for a startup issue and seen the engineer use some cool trace flags or parameters to get SQL started. Well, it’s all documented (well, most of it that’s worth knowing)

Using the SQL Server Service Startup Options
http://msdn.microsoft.com/en-us/library/ms190737.aspx

The most common ones by far that I use on SQL startup issues are:-

S.no

Parameter

Description

1.

-c

Start as a console application, not as a service.

2.

-m

Tries to start the SQL service in single user mode, i.e. only a single user can connect. This single user connection can be either  a sysadmin or a regular user connection

3.

-f

Tries to start the SQL service in Minimal configuration mode. This implicitly puts SQL Server in single-user mode and this also means only the system databases master, model, tempdb & mssqlsystemresource are recovered and started.

4.

-T XXXX

Tries to start the SQL Server should be started with the specified trace flag which follows after –T. Again this is case sensitive.

5.

-g

Specifies the number of megabytes (MB) of memory that SQL Server leaves available for memory allocations within the SQL Server process, but outside the SQL Server buffer pool. The default value for this is 256MB.

6.

-m”ClientApp Name”

You can limit the connections to the specified client application. For example, -m"SQLCMD" limits connections to a single connection and that connection must identify itself as the SQLCMD client program. You can use this option when you are starting SQL Server in single-user mode and an unknown client application is taking the only available connection. Cool option J

7.

-k  123

Limits the number of checkpoint I/O requests per second to the value specified e.g. 123 MB/sec. Refer http://support.microsoft.com/kb/929240 for more info.

 

Trace Flags
For more information some common trace flags see here
http://msdn.microsoft.com/en-us/library/ms188396.aspx. Few of the common ones that come handy during startup issues are:

1.       3608 Starts SQL and recovers only the master database.

2.       4010 Allows only shared memory connections to the SQL Server. Meaning, you will only be able to connect from the server machine itself. Client connections over TCP/IP or named pipes will not happen.

3.       4606 Disables password policy check during server startup.

4.       4022 Skips launching automatic stored procedures when SQL Server starts. This is equivalent to setting "scan for startup procs" configuration option to 0.

 

Here are the steps to start SQL Server from command prompt:-

1.      Right-click on the SQL Server service name in services.msc

 

clip_image006[4]

2.  Copy the path that you see here.

3.  Open a Command Prompt window and navigate to the path from step2.

E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>sqlservr.exe -c -m -sSQL2005

 

 

clip_image008[8] Note:

If you are using Windows Server 2008 or above, you need to open an elevated command prompt by launching cmd.exe with “Run as Administrator” option.

A common mistake that people sometime make is to try and start SQL using sqlservr.exe and you get this error,

clip_image010[4]

Text: “Your SQL Server installation is either corrupt or has been tampered with. Please uninstall then re-run setup to correct this problem”

Okay, you see this message and you are thinking oh no, there goes my dinner plans!
Timeout!
The above message can be misleading at times. You will get the above error if you try to start SQL server from command prompt for a named instance and do not specify an instance name. Without the -s switch the default instance will try to start, but using the sqlservr.exe file from another instance binn folder. You will also get this error if you specify an invalid parameter to sqlservr.exe. So, the installation is not corrupt after all (Dinner plans back on?)

Here’s how to start a named instance,
sqlservr.exe -c -s INSTNAME

Here’s how to start a default instance,
sqlservr.exe -c

Here’s how to start a named instance recovering only master database
sqlservr.exe -c -sMATRIX –T3608

clip_image008[9] Note:

Only specify the instance name and not the servername\InstanceName. E.g. If your machine name is YODA and your instance is called MATRIX, you only specify MATRIX. Also the “-s” is case sensitive. It does not matter if there is/isn’t a blank space after the “-s” parameter.


Some Common Scenarios for Startup Failures

Scenario 1

Windows could not start the SQL Server (MSSQLSERVER) on Local Computer. For more information, review the System Event Log.

OR

“The service failed to respond in a timely fashion”.

OR

“The service failed to start”.

1.      Locate the SQL Errorlog as that is the 1st place to look if a SQL service is not starting.

2.   This can be done by looking up the -e startup parameter value for the SQL Server instance in question.


clip_image012[4]

 

3.  Verify that the files pointed to by -d (master data file location),  -e (Errorlog location) and  -l (master log file location) actually exist and have the correct file paths and that the SQL service startup account has proper permissions. Alternately you can also get this from the registry by looking at the values SQLArg0, SQLArg1 and SQLArg2 in the following locations.

SQL 2000
Default: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\Parameters
Named: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\INSTANCENAME\MSSQLServer\Parameters

SQL 2005

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\Parameters

SQL 2008/R2
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQLServer\Parameters

 

4.  Once you have located the errorlog open it in notepad and look for any errors. An example would be like this,

2010-11-20 07:50:58.250 Server Error: 17113, Severity: 16, State: 1.
2010-11-20 07:50:58.250 Server Error 2(The system cannot find the file specified.) occurred while opening file ‘C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master.mdf’ to obtain configuration information at startup. An invalid startup option might have caused the error. Verify your startup options, and correct or remove them if necessary.

The above message clearly indicates that the master.mdf was not found in the location it was expecting it to be found.

 

Scenario 2

You can also look into the Windows Application log for any events related to SQL Server.

initerrlog: Could not open error log file ‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG1\ERRORLOG’. Operating system error = 3(error not found).

The above error indicates that SQL could not find the errorlog file in the above location. What does this mean? This means that since SQL could not find the errorlog it has no place to log messages, hence it did not start.

When you are faced with this error, then if you look into the errorlog location, you will notice log files whose date/time stamp is not current. Do not make the mistake of opening these files and troubleshooting based on messages that are not for the current startup failure. This is where looking into the application log gives you the current entries.

I hope cleared up any confusion you might have had regarding this rule, the warning status and the simple logic behind the rule. As always stay tuned for more SQL tips…

Scenario 3

2011-04-16 07:52:49.32 spid5s      Error: 17204, Severity: 16, State: 1.
2011-04-16 07:52:49.32 spid5s      FCB::Open failed: Could not open file C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\mastlog.ldf for file number 2.  OS error: 32(error not found).

OS Error 32 translates to “The process cannot access the file because it is being used by another process.” How did I know this? Simple. You can find the text for any windows error by running this from a command prompt à net helpmsg XXX, where XXX is the error number

In the above scenario, some other process/application has a lock on master.mdf and this isn’t allowing SQL to take access, hence the service fails to start. To find out which process it is you can use either Process Explorer or Handle.exe from sysinternals. The usual culprits are Anti-Virus or Anti-Malware services. Once you identify the process, after stopping that, attempt to start SQL again. If it fails, verify from the errorlog that the error code is still 32 and has not changed. Most people miss this trick and assume the error is the same after making some changes. Don’t take it for granted, since the error might have changed but the service still doesn’t start. This indicates that the changes done have been successful.

 

Some Common Causes for SQL Service startup failures

1.                   Access denied on the folders for the SQL Server Service Account, specifically on the DATA folder which contains the SQL Server system databases.

2.                   Due to some other processes like anti-virus holding a lock on the SQL database files.

3.                   Insufficient permissions granted to the SQL Server Service account.

clip_image008[10] Important:

Always make SQL service account changes via the SQL Configuration manager since only this tool sets the proper permissions or ACL’s to the new service account. Services.msc does not set ACL’s.

 

4.                   Server side protocols for SQL not set correctly or disabled.

5.                   Critical system database files not found due to either accidental deletion of files or disk failures.

6.                   System database files having inconsistencies preventing SQL Server startup.

7.                   Password of the service account was changed but not updated on the server that has the SQL instance installed.

8.                   Startup parameters have incorrect file path locations.

9.                   The SQL Server service is disabled in Service Control Manager.



clip_image014[4] Critical:

If you are not a SQL expert and have followed the steps above but still stuck, my personal advice would be to call SQL Support rather than doing something you will regret later. Better safe, than sorry, eh?

Additional Resources

Error Messages of SQL Server 2005 Start Up Failure
http://blogs.msdn.com/b/sql_protocols/archive/2006/04/28/585835.aspx

An error occurred during encryption
http://blogs.msdn.com/b/blakhani/archive/2009/11/24/sql-server-2005-express-setup-failure-with-error-an-error-occurred-during-encryption.aspx

 

Posted in Startup | Tagged: , , , , , , | Leave a Comment »

How to find the Installation Date for an Evaluation Edition of SQL Server

Posted by Sudarshan Narasimhan on August 9, 2011


Well, this topic seems pretty simple, but many people forget that we can have the SQL Engine, Tools and other components like Reporting Services, Analysis Services in different versions or installed on different dates, so the expiration date may change accordingly. For e.g., you may have installed the Engine first and then decided to install Management Tools later on.

If you are using an Evaluation edition of SQL Server 2005/2008 then you must already know that Evaluation Edition works for 180 days from the date of installation. So, it’s very important for you to know:-

1. When was your SQL Server Instance Installed
2. When it is going to stop working. Once the 180 days are up, the service will not start.

Read on if you don’t want to see this message once the trial period is up Smile.

Expired

So I thought I’d post the various different way a DBA can go about determining the install date for SQL Server Evaluation Edition, thus calculating when the 180 day trial period expires. Here are some ways to go about finding this information:-

1. Run the following query on the evaluation instance (assuming the 180 days are still NOT over and SQL service is still running),

SELECT create_date as ‘SQL Server Install Date’,

DATEADD(dd,180,create_date) as ‘Instance will Stop Working ON’

FROM sys.server_principals WHERE name=‘NT AUTHORITY\SYSTEM’

 

-> ‘NT AUTHORITY\SYSTEM’ is SQL Server principal which gets created during the installation. So we are just checking the creation date of that security principal.

 

2. Look at the setup logs located in "%ProgramFiles%\Microsoft SQL Server\100\Setup Bootstrap\Log\". Sort by Date Modified ascending and you should get the install date. This method doesn’t work if the log directory was cleaned up or moved or deleted.

 

3. Open the Management Studio on the SQL Server, click on the “Help” of Menu Bar and then on “About”. In the new window you will see some thing like this:
Microsoft SQL Server Management Studio (expires in 104 days)
Note: This is assuming that you installed SSMS along with Engine as part of the same original install at the same time.

image

4. You can check the windows installer for the SQL Product code to pull out the install date. In registry look under,
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Products\<GUID>\InstallProperties

The "InstallDate" key will have the date stored in YYYYMMDD format.

For SQL 2008 you can search the registry for "Microsoft SQL Server 2008 Database Engine Services" to get the correct GUID.
For SQL 2005 you can search the registry for "Microsoft SQL Server 2005" only to get the correct GUID.

Note: Keep in mind the instance name if you have more than 1 instance installed on the same machine

 

While on this topic, keep in mind there is an issue when you perform an in-place edition upgrade of an evaluation edition to a licensed edition. You will still get the error for SSMS or any tools. This is a known issue. Please follow the following KB to fix this issue.

‘Microsoft SQL Server management studio’ and ‘SQL Server Profiler’ report ‘Evaluation period has expired’ message even after upgrading to a licensed edition

http://support.microsoft.com/kb/971268

Posted in Setup | Tagged: , , , , , | 2 Comments »

T-SQL Script for SQL Server Statistics Last Updated Time & Percent Rows modified

Posted by Sudarshan Narasimhan on August 4, 2011


I wrote a post on the same topic @ Amit’s Blog. I made some changes to the script to make it easier for DBA’s to read the output. I am re-posting this here for your reference. This can be quite handy if you are managing multiple SQL instances and you have lot of tables or have large databases where you only want to update statistics on some tables based on row modifications (data changes e.g. OLTP). You can always use DBCC SHOW_STATISTICS or SP_AUTOSTATS to find out the same information for a specific table, index or column stats.

select

schemas.name as table_schema,

tbls.name as Object_name,

i.id as Object_id,

i.name as index_name,

i.indid as index_id,

i.rowmodctr as modifiedRows,

(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,

convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,

stats_date( i.id, i.indid ) as lastStatsUpdateTime

from sysindexes i

inner join sysobjects tbls on i.id = tbls.id

inner join sysusers schemas on tbls.uid = schemas.uid

inner join information_schema.tables tl

on tbls.name = tl.table_name

and schemas.name = tl.table_schema

and tl.table_type=‘BASE TABLE’

where 0 < i.indid and i.indid < 255

and table_schema <> ‘sys’

and i.rowmodctr <> 0

and i.status not in (8388704,8388672)

and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0

order by modifiedRows desc


Keep in mind that Rowmodctr values displayed in above output are not entirely reliable starting with SQL 2005 onwards. But, you can rely on last updated time to make a decision if the statistics need to be updated. Read here for more info on this.

 

 

You can use this script to generate the UPDATE STATISTICS command for each table in your table. I am using FULL SCAN by default, which you can of course modify to use specific SAMPLE. You have to un-comment the comments lines, to actually perform the update.

 

select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type=‘base table’

declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)

set @count=1

select @maxcount=count(*) from table_count

while @count < @maxcount+1

begin

      select @tblname=table_name from table_count where rownum=@count

      set @stmt = ‘UPDATE STATISTICS ‘+ ‘[‘ +@tblname+ ‘]’ + ‘ WITH FULLSCAN’

      print @stmt

      –PRINT (‘Updating statistics for table :’+@tblname)

      –EXEC(@stmt)

      –PRINT (‘Finished Updating statistics for table :’+@tblname)

      print

      set @count=@count+1

      set @stmt=

End

drop table table_count


Have fun with your database maintenanceSmile. As always stay tuned for more cool SQL stuff.

Posted in DB Maintenance | Tagged: , , , , , | 4 Comments »

SQL Server service does not start after enabling SSL encryption

Posted by Sudarshan Narasimhan on August 3, 2011


I recently had a customer who came up to me with a SQL Service start-up issue. This issue started after they had enabled SSL encryption using a personal certificate. Now, before I begin let me confirm a couple of things :-

  1. The correct SSL certificate has been chosen in SQL Configuration Manager.
  2. The ForceEncryption flag is set to Yes.

There are some forum posts on this issue and I don’t entirely agree with the solutions proposed there, because they did not actually solve the real problem but worked around it without understanding how/why the problem occurred, and how to get a complete solution.

http://social.msdn.microsoft.com/Forums/en-US/sqlexpress/thread/b308d9c8-6aba-4b3b-9b32-f919816d1be2/

http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/63c15cb5-0d12-4ea8-bf84-e3ea8a42a866/

I am going to be bold and say this; please do not follow the above posts since they are workarounds. Most people in enterprise environments cannot run SQL Service under the Local System account, which is what the above posts are asking you do to, yikes!!!

As always, the first place to check when a SQL Service does not start is the SQL ERRORLOG (yes, ERRORLOG and NOT the EVENTLOGS – I said 1st place).

For a complete step-by-step guide to approach SQL Service start-up issues, you can read the blog post I wrote on MSDN SQL Server FAQ –>
HERE

Here is what I found in the ERRORLOG,

2011-06-08 12:35:30.56 Server Unable to load user-specified certificate [Cert Hash(sha1) “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX“]. The server will not accept a connection. You should verify that the certificate is correctly installed. See “Configuring Certificate for Use by SSL” in Books Online.

2011-06-08 12:35:30.59 Server Error: 17182, Severity: 16, State: 1.
2010-06-08 12:35:30.59 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x80. Reason: Unable to initialize SSL support. Cannot find object or property.

2011-06-08 12:35:30.62 Server Error: 17182, Severity: 16, State: 1.
2011-06-08 12:35:30.62 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Cannot find object or property.

So, it appears the certificate chosen for SSL encryption has some issues. Here are the things I checked next:-
1. The certificate itself is valid and has not expired (using certmgr.msc).
2. The certificate hash matched that of the certificate associated with the instance but it wasn’t loading.

The certificate appears to be fine. Well, what next?

The way SQL Server loads the certificate happens through the credentials of the SQL Server service account defined for this instance. So, I started to check what the SQLSvc account was and whether it had permissions to access the certificate. At this point I cam across this forum post for a similar issue

http://social.msdn.microsoft.com/Forums/en/sqlexpress/thread/db68b3bf-33a8-42d6-8dac-f7229a2f9cde

The solution listed there was to ensure that the SQL service account had a valid user profile on the server machine. This profile can sometimes be missing if the service account was changed to a new account, which doesn’t yet have a user profile created. You can verify the user profile on any machine by going to
My Computer –> Properties –> Advanced –> Settings (under User Profiles section).

image

Solution #1 (as applicable)

Identify the SQL service account and log on to the server machine that is running SQL Server, with the service account. During logon, Windows will create a new user profile for the service account.

Unfortunately, this wasn’t the scenario in my case. Another thing I have to caution people is regarding service account change. When changing the SQL Svc account ALWAYS ALWAYS use the SQL Server Configuration Manager, and NEVER do it through services.msc. Here are some reasons why:-

1. Permissions & ACL’s of SQL Server registry keys are done only by SSCM
2. SQL Server User Groups are updated with the new service account only by SSCM.
3. WMI namespace permissions are only set through SSCM.

So, if you changed the service account through services.msc, it is possible that the new service account does not have permissions to access the certificate. The certificate hash is stored in the following registry key. If the new service account cannot access it, then you “can” get the error above, since SQL cannot load the certificate due to insufficient privileges.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQL2008\MSSQLServer\SuperSocketNetLib\Certificate

I started to check if this was happening in my case and found that the SQL Service account was not even present in the local group SQLServerMSSQLUser$[Computer_Name]$[Instance_Name]. This confirms that the service account change wasn’t done through SQL Configuration Manager.

This means the SQL Service account does not have permissions to open the private key for the certificate. i.e. SQL Server failed to load this specific certificate due to insufficient permissions. Once we give the service account permission to read the private key, the SQL Server should start up without a problem.

Here is what I did to resolve it,

Solution #2

1. Identify the SQL service account (could be a domain account or any of the built-in accounts like LocalSystem/LocalService/NetworkService).

2. If the service account was set properly using SQL Configuration Manager, then it should have been added to the SQL Security Groups. Look for this on the local machine groups. e.g. SQLServerMSSQLUser$[Computer_Name]$[Instance_Name]

3. Open the Microsoft Management Console (MMC) by click Start -> Run, entering mmc and pressing Enter.

4. Add the Certificates snap-in by clicking File -> Add/Remove Snap-in… and double clicking the Certificates item.

Note:
Select computer account and Local computer in the two pages on the wizard that appears.
Click OK.

5. Expand Certificates (Local Computer) -> Personal -> Certificates and find the SSL certificate you imported or created.

6. Right click on the imported certificate (the one you selected in the SQL Server Configuration Manager) and click All Tasks -> Manage Private Keys.

Note:
You will only see this option for SSL certificates imported from a certifying authority. E.g. you will not see this for EFS, RDP certificates

7. Click the Add button under the Group or user names list box.

8. Enter the SQL service security group that you identified in step #2 and click OK.

You can also directly add the service account itself here, but for any future changes you need to repeat these steps to add that individual account. Instead its better to add the security group, since the service account will always be present here if account change was done using SSCM.

Now, when I started the SQL service you can double-check that the certificate is being loaded and used by looking for this entry in the errorlog. In my case, it started fine and SSL encryption was working.

Open-mouthed smile

2011-06-08 12:48:47.88 Server The certificate [Cert Hash(sha1) “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX“] was successfully loaded for encryption.


Problem Solved!

Now, to check if the connection is actually encrypted you can use Netmon or Wireshark to check the network packet. Refer to the screenshot posted in this blog on how to do that. An easier way for DBA’s to verify this is to use the DMV sys.dm_exec_connections, look at the encrypt_option column. This DMV also tells you if the connection is using Kerberos/NTLM/SQL Authentication -> auth_scheme column has this value.

Hope this saves you some time.

Posted in Startup | Tagged: , , , , , , , | 19 Comments »

Data Collector (Management Data Warehouse) – Issues, Tips and Solutions

Posted by Sudarshan Narasimhan on August 3, 2011


Somehow I ended up working on data collector issues quite a bit in the last 6 months or so. It’s certainly a useful feature but not an easy one to work with and definitely not an easy one to troubleshoot for you DBA’s. I’m listing down some issues that I faced and solutions for these issues. If you come up with or face any new issues in data collector, please drop a mail to sudarn.

1. Data Collector Upload Job Timeouts

The Data Collector jobs were getting timeout errors on for data upload job ONLY AT THE TIME WHEN PURGE JOB IS RUNNING. You notice that once the purge job completes, the data upload job also starts succeeding, but till that time it simply keeps failing. Here is what you would see in job history for these upload jobs.

02/08/2011 05:25:00,collection_set_3_upload,Error,0,SERVERXYZ\INSTANCEXYZ,collection_set_3_upload,(Job outcome),,The job failed. 
The Job was invoked by Schedule 2 (CollectorSchedule_Every_5min).  The last step to run was step 2 (collection_set_3_upload_upload).,01:00:01,0,0,,,,0

02/08/2011 05:25:00,collection_set_3_upload,Error,2,SERVERXYZ\INSTANCEXYZ,collection_set_3_upload,collection_set_3_upload_upload,,Executed as user: STARWARS\Yoda
The thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop.  Process Exit Code 259.  The step failed.,01:00:01,0,0,,,,0

Since we know that the purge job running was the only time when these uploads were failing, we have a simple solution

Solution:
Schedule the purge and upload to run at different schedules. You can use the SSMS UI to define a new schedule for the collection set, just make sure it doesn’t fall under the schedule of the purge job.

 

2. Data Collector Upload Job Deadlocks intermittently

The collection set upload job is running into deadlocks now and then (aka intermittent). This is again related to the purge jobs. Why?

There have been multiple reports of this issue on Connect & MSDN Forums and I’ve had the “pleasure” of talking to customers about this issue. Here are some,

Deadlock in MDW Upload Purge Logs Job Step
http://connect.microsoft.com/SQLServer/feedback/details/539547/deadlock-in-mdw-upload-purge-logs-job-step

Management Data Warehouse Data Collector upload job deadlocks
http://connect.microsoft.com/SQLServer/feedback/details/504557/management-data-warehouse-data-collector-upload-job-deadlocks

Here is a sample output of a failed Upload job that reported the deadlock.

Log Job History (collection_set_3_upload)
Step ID 1
Job Name collection_set_3_upload
Step Name collection_set_3_upload_purge_logs

Message
Executed as user: STARWARS\Yoda. Transaction (Process ID 457) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.

Solution:

Stagger the two collector jobs. E.g. collection_set_2_upload Server Activity and collection_set_3_upload Query Statistics to run a couple of minutes apart. Now, here is a catch! Don’t change the schedule for the SQL Server Agent job, you need to change the schedule using the collector “pick schedule” option in SSMS.

To do this, right click on the collector agent (not the SQL Agent job) select properties and then uploads in the left hand pane. Click on "New" to create a new schedule. If you do it this way you can create two new schedules that are independent of each other.

3. Unable to change/define Schedules for MDW Collection Sets

I did run into another quirky issue when attempting to define a new schedule for the collection sets. i.e I was not able to define a new schedule for this and kept getting this error.

clip_image002[4]

These are the collection sets that are present by default,

  1. Disk Usage
  2. Server Activity
  3. Query Statistics
  4. Utility Information

Of these, the Utility Information is disabled and does not have a schedule defined by default. The reason for the above error was the Collection set “Query Statistics” mentioned above, did not have a valid schedule UID stored. You can confirm this by doing these queries.

1. In the context of MSDB database, run the following query and note down the schedule_uid value for Query Statistics collection set

select * from dbo.syscollector_collection_sets where collection_set_id=3

e.g output. A575FFD0-98A0-4D0E-B43C-B63482FB5B00

2. Again in the MSDB context, run the following,

SELECT schedule_id from sysschedules_localserver_view WHERE ‘XYZ’ = schedule_uid

                — XYZ is the value obtained from step#1. You will see that there is NO value returned for step#2

 

3. As I mentioned, Utility Info has no schedule by default, so we need to use this to get out of this situation. So I used the Utility Information collection set and defined a new schedule for it for every 10 minutes.

4. Next, I queried the schedule_id for this in dbo.syscollector_collection_sets and used this schedule_id to map to the collection set that was failing.

declare @schedule_uid uniqueidentifier

select @schedule_uid = schedule_uid from dbo.syscollector_collection_sets where collection_set_id = 5 – whichever is the Utility Info collection set ID

exec dbo.sp_syscollector_update_collection_set @collection_set_id = 3, @schedule_uid = @schedule_uid — whichever is the Query Statistics collection set ID (change according to the one failing at your end)

 

5. This fixed the issue with Query Stats and I was able to change/define a schedule for that. But, since we created a schedule for Utility Info and you don’t want that to run, I tried to disable it by setting it to “On Demand”. But this failed. Oops!

6. So I enabled the Utility Info collection set and only then did it create a valid job_id for it, but I got another error when trying to remove the schedule.

7. I stopped the collection and then deleted the job manually. To get things back to old state, I updated the metadata using like this,

UPDATE [msdb].[dbo].[syscollector_collection_sets_internal]

SET [collection_job_id] = NULL, [upload_job_id] = NULL

WHERE collection_set_id = 7 – whichever is the Utility Info collection set ID

8. Now, you can stop the Utility Info collection set and also get the other collection sets schedule changed to fix issue #2 mentioned aboveSmile.

 

4. Data Collector Purge Job (Clean-up job) takes a long time to complete

This is actually the root cause of issue #1 and #2 listed above. The purge procedure is complicated and is responsible for cleaning up the metadata tables of old entries. This work is done by the core.sp_purge_data stored procedure. As a troubleshooting step, I captured the execution plan of the procedure and noticed a missing index recommendation in the XML Showplan.

<MissingIndexes>
  <MissingIndexGroup Impact="92.2913">
    <MissingIndex Database="[MDW]" Schema="[snapshots]" Table="[query_stats]">
      <ColumnGroup Usage="EQUALITY">
        <Column Name="[sql_handle]" ColumnId="1" />
      </ColumnGroup>
    </MissingIndex>

If you were to translate this into a CREATE INDEX statement this is how it would look,

CREATE NONCLUSTERED INDEX [Ix_query_stats_sql_handle]

ON [snapshots].[query_stats] ([sql_handle] ASC)

WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,

DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)

ON [PRIMARY]

 

Now, creating this requires modifying the MDW system table and of course this isn’t supported! This same thing is mentioned in this blog as well and is this blog post by the SQL Server Development Team. Don’t do it!  Patience, I will explain.

 

Here are some facts:-

1. Purge job can get slow on large MDW databases (40+ GB).

2. The DELETE TOP statement on snapshots.notable_query_plan is the one where most execution time is spent.

 

 

Solution:

Like I mentioned earlier, don’t modify system Stored Procedure code unless guided by Microsoft Support. Luckily, for the slow purge procedure a fix has been released in SQL Server 2008 R2 Service Pack 1, which can be downloaded here.

 

This fix updates the Purge procedure TSQL code and the purge has been broken down and re-written in an optimized way. The runtime will come down drastically once you update to SP1. The new procedures doing the purge is called “[core].[sp_purge_orphaned_notable_query_plan]” and “[core].[sp_purge_orphaned_notable_query_text]”

 

Hang on, it’s not over yet!

 

 

AFTER you apply SP1, you will need to modify the Stored Procedure sp_purge_orphaned_notable_query_text as shown below. These changes are required to correct the text of the sp_purge_orphaned_notable_query_text stored procedure because the delete statement incorrectly references the snapshots.notable_query_plan table after you apply Service Pack 1.

 

 

SP1 Code

        — Deleting TOP N orphaned rows in query plan table by joining info from temp table variable

        — This is done to speed up delete query.

        DELETE TOP (@delete_batch_size) snapshots.notable_query_plan

        FROM snapshots.notable_query_plan AS qp , #tmp_notable_query_plan AS tmp

        WHERE tmp.[sql_handle] = qp.[sql_handle]

 

Change this to following once you apply SP1

— Deleting TOP N orphaned rows in query text table by joining info from temp table
       
— This is done to speed up delete query.

       
DELETE TOP (@delete_batch_size) snapshots.notable_query_text 
        
FROM snapshots.notable_query_text AS qt, #tmp_notable_query_text AS
tmp
       
WHERE tmp.[sql_handle] = qt.[sql_handle]

 

Hopefully, this code change will be included in a future cumulative update post-SP1, so that you don’t have to manually change the code. The same applies for SQL Server 2008 as well, where I am hopeful these changes will be included in a future Service Pack. This fix mentioned above is at present once valid for SQL Server 2008 R2 (as of Aug 2, 2011 when I wrote this). With these, the slow purge issues should be put to bed, once and for all!


UPDATE (August 3rd, 2011)

After working with our KB team, we have published an official KB article that talks about this issue. For all those running into slow purge issues, please follow the resolution given in this KB article,

FIX: Data Collector job takes a long time to clear data from a MDW database in SQL Server 2008 R2
http://support.microsoft.com/kb/2584903

 

Other Useful Links

FIX: The Management Data Warehouse database grows very large after you enable the Data Collector feature in SQL Server 2008
http://support.microsoft.com/kb/970014

Data Collector’s sp_purge_data may run for hours if there are many orphaned plans to purge
http://blogs.msdn.com/b/sqlagent/archive/2011/04/12/data-collector-s-sp-purge-data-may-run-for-hours-if-there-are-many-orphaned-plans-to-purge.aspx

Posted in Data Collector | Tagged: , , , , , , | 8 Comments »