The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Archive for the ‘Storage Engine’ Category

Database takes long time to Recover during startup/Restore – List of Known issues & Fixes

Posted by Sudarshan Narasimhan on October 25, 2012


Have you ever wondered why a database takes a long time to recover during start-up or when you performed a restore operation? Recovery is a regular part of Database start-up as SQL server has to go through various phases like Analysis, Redo and Undo to bring the database online in a consistent state. But, sometimes you might notice that recovery is taking an awfully long time, which is preventing the DB from coming online and is inaccessible to your users. From SSMS you will notice the DB is showing up as “In Recovery” or “Recovering”. If anyone tries to use the database, you will get this error.

Msg 922, Level 14, State 1, Line 1
Database ‘MyDB1’ is being recovered. Waiting until recovery is finished.

Once a DB is in recovery, there is pretty much nothing you can do to make it go faster. Read the troubleshooting section below if your database is already in the recovery phase.

The SQL Errorlogs will tell you the following information:-

  1. The current phase of the DB Recovery process.
  2. The % completed in the current phase.
  3. Approximate time remaining before it completes this phase.
  4. Starting with SQL Server 2008 R2 SP2 and SQL Server 2012 RTM, you will also see the following message if the DB recovery is slow due to large number of Virtual Log Files (VLF’s) present in the transaction log of the database.

Database MyDB1 has more than 1000 virtual log files which is excessive. Too many virtual log files can cause long startup and backup times. Consider shrinking the log and using a different growth increment to reduce the number of virtual log files.

When does a database go into a recovery phase?

  1. During SQL Server start/restart and when the database initializes
  2. When you restore a database from a backup.
  3. When you attach a database.
  4. When you have the AUTO_CLOSE property turned ON and the database was shutdown because no users were connected. The next time a connection to this database comes in it will enter the recovery phase (usually this should be very fast, since it would have been cleanly shutdown and recovery shouldn’t have any work to do).

There are many known issues with slow database recovery. If you are facing a slow database start-up or recovery issue, first check the following table to ensure your SQL Server build is equal to or greater than the builds mentioned below (depending on the version of your SQL Server instance).


Known Issues Section

SQL Server 2005

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative update package 13 for SQL Server 2005 Service Pack 3 (9.00.4315)

Cumulative update package 1 for SQL Server 2005 Service Pack 4 (9.00.5254)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2005 Service Pack 3 (9.00.4285)

974777

FIX: Database restore operation may fail during the recovery phase when the database uses query notification in SQL Server 2005 or in SQL Server 2008

Cumulative update package 6 for SQL Server 2005 Service Pack 3  (9.00.4266)

 

975089

FIX: The restore operation takes a long time when you restore a database that has query notification enabled in SQL Server 2005 or in SQL Server 2008

Cumulative update package 6 for SQL Server 2005 Service Pack 3  (9.00.4266)

 

SQL Server 2008

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative update package 12 for SQL Server 2008 Service Pack 1    (10.00.2808)

Cumulative update package 2 for SQL Server 2008 Service Pack 2

             (10.00.4272)

2524743

FIX: Recovery takes longer than expected for a database in a SQL Server 2008 or in a SQL Server 2008 R2 environment

Cumulative update package 15 for SQL Server 2008 Service Pack 1    (10.00.2847)

Cumulative update package 5 for SQL Server 2008 Service Pack 2    (10.00.4316)

Cumulative update package 1 for SQL Server 2008 Service Pack 3    (10.00.5766)

2653893

FIX: It takes a long time to restore a database in SQL Server 2008 R2 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008 Service Pack 2    (10.00.4326)

Cumulative update package 3 for SQL Server 2008 Service Pack 3    (10.00.5770)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative update package 10 for SQL Server 2008  

             (10.00.1835)

Cumulative update package 7 for SQL Server 2008 Service Pack 1    (10.00.2766)

974777

FIX: Database restore operation may fail during the recovery phase when the database uses query notification in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008

            (10.00.1823)

 

Cumulative update package 5 for SQL Server 2008 Service Pack 1   (10.00.2746)

975089

FIX: The restore operation takes a long time when you restore a database that has query notification enabled in SQL Server 2005 or in SQL Server 2008

Cumulative update package 8 for SQL Server 2008
            (10.00.1823)

 

Cumulative update package 5 for SQL Server 2008 Service Pack 1   (10.00.2746)

 

SQL Server 2008 R2

KB Article

Description

Fix Build

2455009

FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2

Cumulative Update package 6 for SQL Server 2008 R2
            (10.50.1765)

2524743

FIX: Recovery takes longer than expected for a database in a SQL Server 2008 or in a SQL Server 2008 R2 environment

Cumulative Update package 9 for SQL Server 2008 R2
            (10.50.1804)

Cumulative Update package 2 for SQL Server 2008 R2 Service Pack 1
           
(10.50.2772)

2653893

FIX: It takes a long time to restore a database in SQL Server 2008 R2 or in SQL Server 2008

Cumulative update package 11 for SQL Server 2008 R2
            (10.50.1809)

 

Cumulative update package 4 for SQL Server 2008 R2 SP1

           (10.50.2796)

979042

FIX: The principal database is not recovered if the database has a large number of virtual log files in SQL Server 2005 or in SQL Server 2008

Cumulative Update package 1 for SQL Server 2008 R2
           (10.50.1702)

 

Sample information in the error log about DB recovery

Pre-Recovery
The last message you will see for that database (while in recovery) in the log:
2012-06-26 10:29:20.48 spid58 Starting up database ‘MyDB1’.

Once the pre-recovery has completed, you will see the following message. In this example, it took almost 9 minutes before the following message appeared.
2012-06-26 10:38:23.25 spid58s Analysis of database ‘MyDB1’ (7) is 37% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.


Phase 1: Analysis
2012-06-26 10:58:15.84 spid58s Analysis of database ‘MyDB1’ (7) is 0% complete (approximately 26933 seconds remain). This is an informational message only. No user action is required.
2010-06-26 17:58:10.70 spid58s Analysis of database ‘MyDB1’ (7) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.

Phase 2: Redo
2012-06-26 17:59:40.16 spid58s Recovery of database ‘MyDB1’ (7) is 1% complete (approximately 1508718 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2012-06-26 18:00:40.17 spid58s Recovery of database ‘MyDB1’ (7) is 1% complete (approximately 1508698 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required. 

Phase 3: Undo
2012-06-26 21:33:41.12 spid58s Recovery of database ‘MyDB1’ (7) is 7% complete

Starting with SQL Server 2008, the error log also prints summary info about time spent in each phase of the recovery.
2012-06-27 12:21:48.29 spid7s Recovery completed for database MYDB1 (database ID 7) in 1 second(s) (analysis 460 ms, redo 0 ms, undo 591 ms.) This is an informational message only. No user action is required.

Troubleshooting Information

I’m not going to re-invent the wheel since this topic has been covered already by folks in the SQL community. You can refer to the following blog posts that explain how to troubleshoot  this issue using DMV’s as well as steps to reduce the VLF’s by shrinking the TLOG file.

  1. http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx
  2. http://blogs.msdn.com/b/psssql/archive/2010/12/29/tracking-database-recovery-progress-using-information-from-dmv.aspx
  3. http://blogs.msdn.com/grahamk/archive/2008/05/09/1413-error-when-starting-database-mirroring-how-many-virtual-log-files-is-too-many.aspx
  4. http://blogs.msdn.com/psssql/archive/2009/05/21/how-a-log-file-structure-can-affect-database-recovery-time.aspx
  5. http://blogs.msdn.com/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx

Well, that’s all folks! Staying tuned as always for more SQL’lly stuff.

– TheSQLDude

Advertisements

Posted in Startup & Recovery, Storage Engine | Tagged: , , , , , , , , , | Leave a Comment »

PRB: Unable to remove secondary filegroup due to system service broker queues

Posted by Sudarshan Narasimhan on August 10, 2011


Delving a little into SQL Storage engine and Service Broker today. I had faced this problem sometime back and it presented me with an opportunity to see how the SSB queues/tables are represented internally by SQL Server.


Scenario
My customer wanted to remove the secondary File Group “FG2_fg" from a database. But was unable to remove the filegroup since these service broker tables were present in filegroup FG2_fg.

queue_messages_223180947
queue_messages_225181061
queue_messages_327181175

All of the above objects are internal system tables which are placeholders for the default queues that are present for every database starting with SQL Server 2005.

QueryNotificationErrorsQueue
EventNotificationErrorsQueue
ServiceBrokerQueue

Whenever we tried to remove the file/filegroup we got this error,

Alter database test3 remove file fil2

Msg 5031, Level 16, State 1, Line 1

Cannot remove the file ‘fil2’ because it is the only file in the DEFAULT filegroup.

We used the following queries to identify the objects present in the secondary filegroup:-

1) Find out the objects present in the secondary filegroup

select name, type, type_desc, is_ms_shipped from sys.objects where object_id in (select object_id from sys.indexes where data_space_id in (2))

— Here data_spaceid=2 represents the filegroup ID 2

2)

SELECT o.[name], o.[type], i.[name], i.[index_id], f.[name]

FROM sys.indexes i

INNER JOIN sys.filegroups f

ON i.data_space_id = f.data_space_id

INNER JOIN sys.all_objects o

ON i.[object_id] = o.[object_id]

WHERE i.data_space_id = 2 — Filegroup ID

Background Info
1. SQL Server uses a proportional fill strategy as it writes to each file within the filegroup and NOT across filegroups. Read more about this in here.
2. The only exception to this could have been IF someone had changed the default filgroup to the secondary filegroup. If so, then any new objects created afterwards would have gone to the secondary instead of the primary.

Repro on SQL 2005
As any good engineer does, so did I – TESTING. To test my little theory I did the following repro on SQL 2005 :-

1) Created a new database with secondary filegroup and 1 file in that.
2) Made secondary FG as default FG for database "test".

*Note: This needs to be done as part of create database and not added later on.

3) What I found was that you DON’T NEED service broker to be enabled for the queue_messages tables to get created.
4) I found that on my test database I had 3 Internal queue_message tables which where for the following parent object

QueryNotificationErrorsQueue
EventNotificationErrorsQueue
ServiceBrokerQueue

4) All of the 3 were were of type SERVICE_QUEUE. You can find this out from sys.objects view
5) Even a new database has the same object numbers and looking at my customer’s object numbers, they are higher, which means they must have had Service Broker implemented at some point.

queue_messages_254180947
queue_messages_286181061
queue_messages_318181175

 

6) I tested this by creating a service queue when my secondary filegroup was the default filegroup.

use test

GO

CREATE QUEUE TestQueue

WITH

STATUS = OFF,

RETENTION = ON

 

7) This created the table queue_messages_2105058535 on FG 2 of type "QUEUE_MESSAGES" indicating its a SSB queue.

 

8)  To remove the internal table I did a DROP QUEUE.

drop queue TestQueue

 

9) This removed the associated internal table and the indexes. You can use the following query to identify internal tables which are for service broker queues.

select a.name as [ChildName], a.object_id as [ChildObjectID], a.type_desc as [ChildType], a.internal_type_desc as [ChildTypeDesc],

a.parent_id as [Parent_ObjectID], b.name as [Parent Name], b.type_desc as [Parent_Type]

from sys.internal_tables a

inner join sys.objects b

on a.parent_id = b.object_id

 

So the issue still does NOT reproduce on SQL 2005 (same for SQL 2008). I then did the following repro on a SQL Server 2000 instance.

Repro on SQL 2000
1. Created a database on SQL 2000
2. Added File-group (FG2) and made it as default.
3. Took a backup of the database in SQL 2000.
4. Restored this to SQL 2005/2008.
5. System Services/Queues got created on FG2.
*Note: We cannot modify them because they are system objects.

6) The remove File command gives us this error,

Alter database test remove file fil2

Msg 5031, Level 16, State 1, Line 1

Cannot remove the file ‘fil2’ because it is the only file in the DEFAULT filegroup.

 

7)  Next up, I ran these commands.

 

dbcc shrinkfile(3, EMPTYFILE)

go

alter database test remove file fil2

 

This worked so far and in sys.database_files I see the status of file2 as OFFLINE. But unfortunately I still cannot remove the filegroup. It still says that it is not empty. Even though the emptyfile commands worked, the objects (SSB queues) still exist on FG2.

I followed same steps for a database created on SQL 2008 and took a backup and restored it again on SQL 2008 and issue did not reproduce. i.e. objects were created on Primary and none of them went to the Secondary FG. I did the same test on SQL 2005 Database backed up and restored to SQL 2008. This also did NOT reproduce the issue.

So, what did I learn out of all my testing (and precious time). Read on…

My Theory
This database had to have been restored from SQL 2000 to a higher version or been upgraded in-place from SQL 2000 to higher version? Tracking down the source of the original database would help us figure this out.

If yes, we now know the answer and that this behaviour is ByDesign and understood, i.e. how the system tables went to the secondary filegroup.  This is because the non-primary filegroup FG2_fg was set as DEFAULT prior to backup from SQL 2000 or in-place upgrade. Once the upgrade was done, since SSB was newly introduced starting with SQL 2005, during the upgrade the upgrade scripts created these objects on the secondary filegroup since it was set as the default filegroup.


Verifying My Theory

To confirm if an in-place upgrade was done, we can look in sys.database_files DMV and look at the column file_GUID. The file_guid column will be NULL if the database was upgraded from an earlier version of Microsoft SQL Server. (Upgraded only, not for a Restore).

I have to thank the SQL Storage Engine folks for thinking about this kind of a scenario and capturing such detailed info in the catalog views. Thanks Folks!

I went back to my customer’s database and looked at the sys.databases output and I noticed this,

file_id

file_guid

type

type_desc

data_space_id

name

physical_name

1

08C43589-1462-4492-8778-D4BCA128ED66

0

ROWS

1

test_Data

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data.mdf

2

A7ADDAF4-0425-4BB8-988C-FE260A41331C

1

LOG

0

test_Log

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Log.ldf

5

3542AAD9-4AC4-499A-90B8-D342D0CBFFE6

0

ROWS

1

test_Data2

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data2.ndf

6

NULL

0

ROWS

2

fil2

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\fil2.ndf

7

12D108E4-DEEE-4302-A61A-AD6FE21B5EF3

0

ROWS

1

test_Data3

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\test_Data3.ndf

 

Notice that for data_space_id=2 which is for the secondary filegroup FG2_fg, the file_guid column has a value of NULL !!!
EUREKA!!!

This confirms that this instance/database was indeed upgraded from an earlier version of SQL 2000/7.0 to SQL (2005/2008). This behaviour is by design and we now understand how the system SSB objects were created on secondary filegroup.


Conclusion

Q: Why cannot we remove the file and filegroup?
Ans: Because of the above mentioned queue_messages objects which are present in the secondary filegroup. If there were user-created service broker queues/services then they can be Dropped or Altered to move them to the primary filegroup. Since these came as part of the database during DB Creation, they cannot be modified/dropped/moved. Hence, we cannot remove the filegroup.

Q: How did these "system" objects get to the secondary file group?
Ans: These system objects became part of the database starting with SQL Server 2005 onwards. So, If you had a database in SQL 2000 and you upgraded that to SQL 2005/2008 these system objects will get created. But, in SQL 2000 if the secondary file-group FG2_fg was set as the DEFAULT file-group, then any new objects created without an explicit MOVE TO will go the default filegroup. So when you upgrade the database to SQL 2008 these automatically got created on the secondary file-group.

Q: What data do I have to prove that the theory above is true in your case?
Ans: We store the information about each database file in sys.database_files DMV. There is a column called file_guid which when NULL indicates that the database/file was upgraded from an earlier version on SQL Server. So when I looked at sys.database_files in the database test, I saw the value for the secondary file as NULL.

Reference – http://technet.microsoft.com/en-us/library/ms174397(SQL.100).aspx

 

I spent some serious time to arrive at this conclusion and I am sharing this with the SQL community, so that it saves you some time. As always, stay tuned to theSQLDude for more…

Posted in Storage Engine | Tagged: , , , , , , | Leave a Comment »