The SQL Dude!

On-Disk & In-Memory Pages that teach you the SQL way of life!

Posts Tagged ‘SQL Server’

User Opinion Poll – What topic(s) would you like to see posted on TheSQLDude weblog?

Posted by Sudarshan Narasimhan on July 10, 2012


WordPress Readers, Friends, Fans, SQLDudes, Romans and Countrymen,

Time for an opinion poll @theSQLDude. It’s been a year since I have had my weblog active and running. While the feedback and comments have been great, it’s time to take this to the next level. This fiscal year, I’d like to hear from you (my readers) out there and listen to your comments on what topics/features you would like to read and learn about in this weblog. Please feel free to share this poll with your colleagues & friends. It will take a few seconds to finish the poll, so go ahead and exercise your grey cells.

Regards,
Sudarshan a.k.a TheSQLDude

.

Advertisements

Posted in Opinion Poll | Tagged: , , , , , , | 4 Comments »

How to identify the row or document that failed to index during a fulltext index population

Posted by Sudarshan Narasimhan on July 6, 2012


I wanted to write this post for a long time. If any of you out there are using full-text indexes, especially on those tables what contains documents like Office Documents (doc/docx/xls/xlsx) or Adobe PDF files, then you might have come across scenarios when some of the documents (or) rows do not get indexed. Usually as an end-user or developer or DBA, you typically come across these situations when you do not find the document you are searching for in your website or application. A good example of this would be a Job Database like Naukri or Monster, where you search for candidate resume/profile based on a set of keywords containing skillset or languages known.

For E.g. Lets say, my search string is “TSQL 4 years experience”. When you know there are documents that match this and they don’t show up in the search results, that’s when you start to look into the full-text index and its population.

A thumb rule in FTS; A row in table is called as a document (DocID) in a full-text index. Any properties for the full-text index in SSMS or anywhere else, always refers to rows as documents. Like Documents processed, Documents indexed, Documents failed etc. Just keep this in mind. Even if you have just text data (varchar/char etc), the row is still called a document. The role of this DocumentID/DocID is to map a full-text key value in the full-text index to the corresponding row in the base table. A full-text key value is in varbinary(900) format. So for us to find the corresponding row, we need to convert the hex string to the native format of your full-text key.

Sample Scenario

1. Assume I have a table called dbo.tblCandidateProfiles that has a primary key on the candidate_id column which is of INT data-type.

2. I have a full-text index created on this table to search columns like candidate_name, candidate_resume etc.

3. Now, when you create a full-text index, you will get an option in SSMS GUI to choose the full-text Key Column (TableFulltextKeyColumn). Many people don’t pay close attention to this column, because the GUI usually picks the PK Index or any Unique Index present on the base table.

The reason this is important is because, SQL Server uses the KEY INDEX (of which the KEY COLUMN is a part) to uniquely identify a row in the base table, for full-text index operations. So if any document/row succeeds or fails to get indexed, it can be uniquely identified using the full-text index key value (which is based on the Key Index chosen during the creation of the FTIndex).

Lets look at an example. Note: This test was done on SQL Server 2008 R2 SP1


Table Definition

CREATE TABLE [dbo].[tblCandidateProfiles](
    [Candidate_ID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [varchar](100) NOT NULL,
    [Resume] [varbinary](max) NOT NULL,
    [DocType] [nvarchar](20) NOT NULL,
    [DocName] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_tblCandidateProfiles] PRIMARY KEY CLUSTERED 
(
    [Candidate_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

My table above has a PK on the candidate_ID column which is an IDENTITY column, so it is of type INT. I create the following full-text catalog and full-text index using the PK_CandidateProfiles as my KEY INDEX.

Full Text Index/Catalog Definition

-- Full Text Catalog Creation DDL
CREATE FULLTEXT CATALOG [Catalog_Resume_Search]WITH ACCENT_SENSITIVITY = OFF
AUTHORIZATION [dbo]
go

-- Full Text Index Creation DDL 
CREATE FULLTEXT INDEX ON [dbo].[tblCandidateProfiles]( 
[DocName] LANGUAGE [English], 
[DocType] LANGUAGE [English], 
[Name] LANGUAGE [English], 
[Resume] TYPE COLUMN [DocType] LANGUAGE [English]) 
KEY INDEX [PK_tblCandidateProfiles]ON ([Catalog_Resume_Search], FILEGROUP [PRIMARY]) 
WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM) 
GO 

Okay, now that things are setup we can go ahead and insert a single document into this table.

Inserting 1 Sample Row/Document into the base table

SELECT 'TheSQLDude' AS Name, '.docx' AS doctype, 'TheSQLDude_Resume1.docx' AS docname,       
      * FROM OPENROWSET(BULK N'C:\MyResume.docx', SINGLE_BLOB) AS Resume

I used the OPENROWSET function to directly insert the word document as BLOB into the base table, since the datatype of the Resume column is varbinary(max) meaning it’s for storing BLOB data. This is possible starting with SQL Server 2005 onwards. If things go well, then you should be able to search for keywords based off this document. Like this for example,

select candidate_id, Name, DocName, Resume from
dbo.tblCandidateProfiles where contains (Resume, 'T-SQL')

image

As you can see in the above output, the document was successfully indexed and hence we are getting search results. If you happen to look into the SQLFT logs, which by the way are in the format SQLFT<dbid><catalog_id>.log, so in my case I looked into SQLFT0000800009.LOG, here is what I found for my 1 row that I inserted recently.

2012-07-06 01:21:26.14 spid38s     Informational: Full-text Auto population completed for table or indexed view ‘[test].[dbo].[tblCandidateProfiles]’ (table or indexed view ID ‘475148738’, database ID ‘8’). Number of documents processed: 1. Number of documents failed: 0. Number of documents that will be retried: 0.

So far things are good. But what happens if certain document types (like .PDF) or certain documents don’t return you search results, because they failed to get indexed. How do you find which document failed, if your table contained thousands of documents. Fear not, this is exactly where the full-text key values comes into the picture.

Identifying documents that failed to get indexed

Let me insert a junk document that I know will fail, into the same table mentioned above. I am taking a sample JPG Image file to show how this works, since there is no point in full-text indexing an image, because you can’t search inside an image. Duh!

INSERT INTO dbo.tblCandidateProfiles(Name, DocType, DocName, Resume) 
   SELECT 'JunkImage' AS Name, '.jpg' AS doctype, 'Heaven.jpg' AS docname,       
      * FROM OPENROWSET(BULK N'C:\heaven.jpg', SINGLE_BLOB) AS Resume

Once the row is inserted, lets open the same full-text log file SQLFT0000800009.LOG (DBID=9, Catalog_ID=9)

2012-07-06 01:20:13.12 spid38s     Warning: No appropriate filter was found during full-text index population for table or indexed view ‘[test].[dbo].[tblCandidateProfiles]’ (table or indexed view ID ‘475148738’, database ID ‘8’), full-text key value ‘2’. Some columns of the row were not indexed.

As you can see from the highlighted snippet above, that the 2nd row I inserted (junk JPG image) did NOT get indexed, since there was no Filter present for the JPG type. In the SQL FT Logs it will only give you the full-text key value which failed. In the above case, the key value = 2.

Let’s look back to the original KEY INDEX that I had defined. Since it was an INT data type, the key value is also displayed as an INT value. To find the row/document, you just have to look-up the row with candidate_id=2 and you got the document that did not get indexed.

Things get a little more complicated when you have some char/varchar or composite index defined as the PK or Unique KEY index. Let’s look at that scenario

Identifying documents that failed to get indexed (KEY INDEX is on column with CHARACTER data-type).

I altered the table with 1 change. Dropped the identity column and changed the data-type for the Candidate_ID to varchar(50). This still remains the column on which the Primary Key index exists. Then, I re-created the full-text index on this table.

[Candidate_ID] [varchar](50) NOT NULL,

I inserted 1 document and here is what I see in the full-text log

INSERT INTO dbo.tblCandidateProfiles(Candidate_ID, Name, DocType, DocName, Resume) 
   SELECT 'FBD367D1-642F-47CF-B79B-9BE48FB34007' AS Candidate_ID, 'TheSQLDude' AS Name, '.jpg' AS doctype, 'JunkDocument' AS docname,       
      * FROM OPENROWSET(BULK N'C:\heaven.jpg', SINGLE_BLOB) AS Resume

2012-07-06 01:51:21.36 spid33s     Warning: No appropriate filter was found during full-text index population for table or indexed view ‘[test].[dbo].[tblCandidateProfiles]’ (table or indexed view ID ‘875150163’, database ID ‘8’), full-text key value ‘FBD367D1-642F-47CF-B79B-9BE48FB34007’. Some columns of the row were not indexed.

Notice the random GUID value reported as failed. That is the value of my candidate_ID and since this was my KEY INDEX it is also reporting the same value in the FTLOG as having failed. So I just need to lookup the base table for rows that have the value of candidate_id = FBD367D1-642F-47CF-B79B-9BE48FB34007, and I will get the document that failed to get indexed. Life is still good with character columns, since SQL Server reports the same value in the log.

Another example of how easy this is with SQL 2008 R2

2012-07-06 02:05:03.66 spid33s     Error ‘0x8004170c: The document format is not recognized by the filter.’ occurred during full-text index population for table or indexed view ‘[test].[dbo].[fts_table2]’ (table or indexed view ID ‘1083150904’, database ID ‘8’), full-text key value ‘33377ver3.ppt’. Failed to index the row.


2012-07-06 02:05:04.66 spid33s     Informational: Full-text Auto population completed for table or indexed view ‘[test].[dbo].[fts_table2]’ (table or indexed view ID ‘1083150904’, database ID ‘8’). Number of documents processed: 1. Number of documents failed: 1. Number of documents that will be retried: 0.

Identifying documents that failed to get indexed in SQL Server 2005

Prior to SQL Server 2008, the full-text key value being reported in the SQLFT* logs would be in varbinary(900) format, so one you have to convert the value to make sense out of it. We still have to do this on SQL Server 2005. Here is how you would do it in SQL Server 2005.


2012-07-06 02:33:29.95 spid22s     Error ‘0x8004170c: The document format is not recognized by the filter.’ occurred during full-text index population for table or indexed view ‘[test].[dbo].[fts_table]’ (table or indexed view ID ‘2073058421’, database ID ‘5’), full-text key value 0x506F776572706F696E7446696C652E707074. Attempt will be made to reindex it.

2012-07-06 02:33:29.95 spid22s     The component ‘offfilt.dll’ reported error while indexing. Component path ‘C:\WINDOWS\system32\offfilt.dll’.

2012-07-06 02:33:49.92 spid22s     Warning: No appropriate filter was found during full-text index population for table or indexed view ‘[test].[dbo].[fts_table]’ (table or indexed view ID ‘5’, database ID ‘2073058421’), full-text key value 0x496D61676546696C652E6A7067. Some columns of the row were not indexed.

2012-07-06 02:35:53.81 spid22s     Error ‘0x8004170c: The document format is not recognized by the filter.’ occurred during full-text index population for table or indexed view ‘[test].[dbo].[fts_table]’ (table or indexed view ID ‘2073058421’, database ID ‘5’), full-text key value 0x446F63756D656E7446696C652E646F63. Attempt will be made to reindex it.

2012-07-06 02:35:53.81 spid22s     The component ‘offfilt.dll’ reported error while indexing. Component path ‘C:\WINDOWS\system32\offfilt.dll’.

In the above example, how would you identify the rows in the base table based on these values in the SQLFT logs (0x506F776572706F696E7446696C652E707074, 0x496D61676546696C652E6A7067, 0x446F63756D656E7446696C652E646F63) ???

I mentioned earlier that the full-text key values are in varbinary(900) format, so lets convert these values depending on what the data-type of the key column is (which is part of the Key index).

1. Identify the full-text key column on your table and its data-type using a query like given below

select a.name as [KeyColumn], b.name as [DataType], a.prec  as [Precision] from sys.syscolumns a
join sys.types b
on a.xtype = b.system_type_id
where colid=objectproperty(object_id('AllDocStreams'),'TableFulltextKeyColumn')
and id=object_id('AllDocStreams')

image

2. As you can see in the above output, the key column in on the column [docname] and it’s of type varchar(50). Before I show you the conversion part, here is the data in my table fts_table.
image
3. The Conversion Part.
  1. Full-Text Key Value is in Varbinary(900) format
  2. The Table Key Column is in Varchar(50) format
  3. Converting using the CONVERT() function, we see that 0x506F776572706F696E7446696C652E707074 translates to “PowerpointFile.ppt”. This is the document that failed to get indexed
declare @varbinary varbinary(900)
set  @varbinary =0x506F776572706F696E7446696C652E707074 
select convert(varchar(50),@varbinary)

 

image

4. If you have a lot of these types of indexing errors, make a list of the full-text key values reported in the SQLFT logs and you can directly query the base table using a query like given below to identify ONLY THOSE ROWS that failed to get indexed. You can add anything combination of unique columns you want to the SELECT list, but the where clause must be filtered on the full-text Key column identified in Step #1, which is “docname” in this example.

select docname, convert(varbinary(900),docname) as [Converted-FullTextKeyValue]
from fts_table where docname 
in (convert(varbinary(900),0x506F776572706F696E7446696C652E707074),
convert(varbinary(900),0x496D61676546696C652E6A7067),
convert(varbinary(900),0x446F63756D656E7446696C652E646F63) )

image

What Next, once you have identified the rows that failed to get full-text indexed?

Once you have identified a bunch of rows that failed to index, you can start to find out if there are any common patterns among these documents. The things to look out for are:-

  1. Do all the documents that failed have a common file-type. e.g. Did all .PDF or .DOCX files fail?
  2. Do these documents have the appropriate IFilters installed? e.g. Do we have a filter for .JPG present on the machine?
  3. Are all these failed documents from a single source/application?
  4. Is the source of all these documents a single computer/document library?

Life is even more easier with SQL Server 2008 onwards, since the entire procedure described above is done by a system stored procedure called sp_fulltext_keymappings

DECLARE @table_id int = OBJECT_ID(N'dbo.tblCandidateProfiles');
EXEC sp_fulltext_keymappings @table_id;

The [key] column returned by this system stored procedure is the Converted full-text key value which you will see in the SQLFT* log.

So now you know how to identify the rows/documents that failed to get indexed. In my next post on full-text, I will cover some of the common reasons for indexing failures and how to go about troubleshooting those. TTFN.

 

Posted in FTS (Full-Text Search) | Tagged: , , , , , , , , , | 4 Comments »

Data Collector (MDW) slow purge issue also affects SQL Server 2008

Posted by Sudarshan Narasimhan on June 5, 2012


I had posted on this topic before in my previous blog. SQL Server 2008 is also affected by slow purge procedure core.sp_purge_data as well as deadlocks in the purge procedure. This issue was fixed in SQL Server 2008 R2 SP1. For SQL Server 2008, the fix to the purge procedure has been done in Service Pack 3. Those of you using MDW to collect performance data, should apply SP3 as soon as possible. This will greatly reduce the purge procedure run time and your jobs will run faster.

The KB 2584903 has been updated to include SQL 2008 as an affected product. I’ve posted this on the Connect Bug as well here –> https://connect.microsoft.com/SQLServer/feedback/details/657148/management-data-warehouse-mdw-purge-job-proc-performance

Just applying SP1 for R2 or SP3 for SQL 2008 will not be enough to get the MDW purge procedures updated. Once the service pack has been successfully installed, you will have to run through the Configure Management Data Warehouse Wizard from SSMS to get the purge procedures updated with the new T-SQL code.

image

Posted in Data Collector | Tagged: , , , , , , | Leave a Comment »

Auditing T-SQL Script to find out who made changes to data in a table

Posted by Sudarshan Narasimhan on June 1, 2012


Having worked with multiple customers and different application where at times the data in the table has gotten changed and we need to find out when/where/who made the changes, I used the below T-SQL code to setup a trigger on the table and track the changes into an auditing table. There are obviously other ways to do this like SQL Profiler, XEvents etc., but I find using T-SQL more simpler and lesser overhead. So, sharing the script for others to use.

-- Note: This trigger tracks Updates and Deletes happening on a table. 
-- Please delete this trigger once the source has been identified and corrective actions have been taken.

-- 1. Creating the audit table to store information on Update/Delete
CREATE TABLE AuditTable
(
AuditID [int] IDENTITY(1,1) NOT NULL,
Timestamp datetime not null CONSTRAINT AuditTable_Timestamp DEFAULT (getdate()),
OperationType char(1),
OperationDate datetime DEFAULT (GetDate()), 
PrimaryKeyValue varchar(1000), 
OldColValue varchar(200), 
NewColValue varchar(200), 
UserName varchar(128),
AppName varchar(128),
ClientName varchar(128)
)
go

--2. Creating  the audit trigger
-- Replace PrimaryKeyValue with the PK Column Name
-- Replace NewColValue with the column name in the IF BLOCK
-- Replace OldColValue with the column name in the final SELECT statement
-- Replace TBLNAME with the name of your table which you want to track the changes for.

Create trigger TBLNAME_Audit on TBLNAME for update, delete
AS
declare @OperationType char(1),
@OperationDate datetime,
@NewColValue varchar(200),
@OldColValue varchar(200),
@UserName varchar(128),
@AppName varchar(128),
@ClientName varchar(128)

select @UserName = system_user
select @OperationDate = CURRENT_TIMESTAMP
select @ClientName = HOST_NAME()
select @AppName = APP_NAME()

if exists (select * from deleted)
      if exists (select * from inserted)
      begin
            select @OperationType = 'U'
            select @NewColValue = NewColValue from inserted
      end
      else
      begin
            select @OperationType = 'D'
            select @NewColValue = null
      end
      
Insert AuditTable (OperationType, OperationDate, PrimaryKeyValue, OldColValue, NewColValue, UserName, AppName, ClientName)
select @OperationType, @OperationDate, PrimaryKeyValue, OldColValue, @NewColValue, @UserName, @AppName, @ClientName
from deleted
go

--3. Query the audit table once the values in the base table has changed
select * from AuditTable

Posted in Auditing, Security, T-SQL | Tagged: , , , , , , | 5 Comments »

Monitoring tempdb space usage and scripts for finding queries which are using excessive tempdb space

Posted by Sudarshan Narasimhan on May 15, 2012


Many times during the life of a DBA, you might notice the tempdb database growing excessively, though no changes have recently been done. It’s often the case that due to data increase, the application T-SQL queries are not written to scale up, hence end up doing excessive sorting/hashing operations which consume space from your tempdb database. Here are some T-SQL scripts that you can use to monitor who/what is consuming space from tempdb and plan accordingly.

Before we get into identifying queries that use tempdb, it is very important to understand what all activities in SQL Server (both internal & user activities), which use the tempdb database. Broadly you can classify these into 3 categories:-

  1. Internal Objects
  2. Version Stores
  3. User Objects

From a feature perspective, here are the features in SQL Server that use space from tempdb.

  1. Query
  2. Triggers
  3. Snapshot isolation and read committed snapshot (RCSI)
  4. MARS
  5. Online index creation
  6. Temporary tables, table variables, and table-valued functions
  7. DBCC CHECK
  8. LOB parameters
  9. Cursors
  10. Service Broker and event notification
  11. XML and LOB variables
  12. Query notifications
  13. Database mail
  14. Index creation
  15. User-defined functions

From a query performance standpoint, here are some operators that use tempdb space.

1. Sort Operator : The sort operator needs tempdb space to sort the full rowset of incoming rows. This is usually send when user ODER BY and also for DISTINCT ORDER BY

        image

2. Hash Match Operator: Depending on the size of row, a hash table could use tempdb

      image

3. Spool Operator: This operator is used to save the intermediate set of rows for re-use and uses the tempdb database to save the query result set.

         image

Tempdb out of space error

Error: 1105, Severity 17, State 2
Could not allocate space for object dbo.TBL1  in database ‘tempdb’ because the ‘PRIMARY’ filegroup is full.

 

Identify which type of tempdb objects are consuming  space

The following query helps you understand if user objects or version store or internal objects are the ones using the space in tempdb. According to this output, you can focus on the below sections.

SELECT
SUM (user_object_reserved_page_count)*8 as user_obj_kb,
SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
SUM (version_store_reserved_page_count)*8  as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb,
SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage
If user_obj_kb is the highest consumer, then you that objects are being created by user queries like local or global temp tables or table variables. Also don’t forget to check if there are any permanent 
tables created in TempDB. Very rare, but I’ve seen this happening.
If version_store_kb is the highest consumer, then it means that the version store is growing faster than the clean up. Most likely there are long running transactions or open transaction (Sleeping state), 
which are
preventing the cleanup and hence not release tempdb space back.



Query that identifies the currently active T-SQL query, it’s text and the Application that is consuming a lot of tempdb space

SELECT es.host_name , es.login_name , es.program_name,
st.dbid as QueryExecContextDBID, DB_NAME(st.dbid) as QueryExecContextDBNAME, st.objectid as ModuleObjectId,
SUBSTRING(st.text, er.statement_start_offset/2 + 1,(CASE WHEN er.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max),st.text)) * 2 ELSE er.statement_end_offset 
END - er.statement_start_offset)/2) as Query_Text
, tsu.session_id ,tsu.request_id, tsu.exec_context_id, (tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count) as OutStanding_user_objects_page_counts, (tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count) as OutStanding_internal_objects_page_counts, er.start_time, er.command, er.open_transaction_count, er.percent_complete, er.estimated_completion_time, er.cpu_time, er.total_elapsed_time, er.reads,er.writes,
er.logical_reads, er.granted_query_memory FROM sys.dm_db_task_space_usage tsu inner join sys.dm_exec_requests er ON ( tsu.session_id = er.session_id and tsu.request_id = er.request_id
) inner join sys.dm_exec_sessions es ON ( tsu.session_id = es.session_id ) CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) st WHERE (tsu.internal_objects_alloc_page_count+tsu.user_objects_alloc_page_count) > 0 ORDER BY (tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count)+(tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count)
DESC



Tempdb and the Version Store

The version stored (SQL 2005 onwards) is a collection of objects that are used when Snapshot Isolation or Read-Committed Snapshot Isolation (RCSI) or online index rebuild etc. are used in a database.

Version store contains the committed rows which is how a SELECT operation does not get blocked when another UPDATE/DELETE is operating on the same row, because the SELECT reads the row from the version store, instead of the actual base table. When you enable this, the row has to be stored somewhere and tempdb happens to be the place. A row is maintained in the version store when there are transactions operating on that row in questions. When the transaction is committed, the row is cleaned up from the version store tables.

You can check the version store using the DMV sys.dm_tran_version_store

At times, when there are long running transactions or orphaned transactions, you might notice tempdb growth due to the version store.

You can use the following query to find the oldest transactions that are active and using row versioning.

SELECT top 5 a.session_id, a.transaction_id, a.transaction_sequence_num, a.elapsed_time_seconds,
b.program_name, b.open_tran, b.status
FROM sys.dm_tran_active_snapshot_database_transactions a
join sys.sysprocesses b
on a.session_id = b.spid
ORDER BY elapsed_time_seconds DESC

Trace Flag 1118

This trace flag is available starting with SQL 2000 SP3 to reduce tempdb contention by forcing uniform extent allocations as opposed to mixed extent allocations. This trace flag is only to be used if you seeing contention (wait_Stats) on the PFS/GAM pages like 2:1:1 etc.. More internal details on this trace flag is available in Paul Randal’s blog post here.

Not only does enabling the trace flag help but you need to create multiple tempdb files equal to the number of logical processors. So if you have 4 CPU’s you will create 4 tempdb data files. Now, what if you have 16 or 32 processors, do you still need to create that many tempdb files?

The answer is NO, you don’t have to. The above recommendation has been stated in many KB articles like http://support.microsoft.com/default.aspx?scid=kb;EN-US;328551

If the number of logical processors on your server is greater than or equal to 8, then use 8 data files for tempdb. If the number of logical processors is less than 8, then use as many data files as your processor count.

You can use the following against any of the SQL Servers you manage to find out if any change is required in the tempdb data files to reduce contention and improve general performance.

Declare @tempdbfilecount as int;
select @tempdbfilecount = (select count(*) from sys.master_files where database_id=2 and type=0);
WITH Processor_CTE ([cpu_count], [hyperthread_ratio])
AS
(
      SELECT  cpu_count, hyperthread_ratio
      FROM sys.dm_os_sys_info sysinfo
)
select Processor_CTE.cpu_count as [# of Logical Processors], @tempdbfilecount as [Current_Tempdb_DataFileCount], 
(case 
      when (cpu_count<8 and @tempdbfilecount=cpu_count)  then 'No' 
      when (cpu_count<8 and @tempdbfilecount<>cpu_count and @tempdbfilecount<cpu_count) then 'Yes' 
      when (cpu_count<8 and @tempdbfilecount<>cpu_count and @tempdbfilecount>cpu_count) then 'No'
      when (cpu_count>=8 and @tempdbfilecount=cpu_count)  then 'No (Depends on continued Contention)' 
      when (cpu_count>=8 and @tempdbfilecount<>cpu_count and @tempdbfilecount<cpu_count) then 'Yes'
      when (cpu_count>=8 and @tempdbfilecount<>cpu_count and @tempdbfilecount>cpu_count) then 'No (Depends on continued Contention)'
end) AS [TempDB_DataFileCount_ChangeRequired]
from Processor_CTE;

 

Here is a sample output. As you can see I have 8 processors and only 1 tempdb file. So I need to add 7 more files.

clip_image002


Last point before I wrap up this post. Once your tempdb database or log file is full, you have these options:-

1. Either you have to rollback any transactions consuming tempdb space or kill the transactions (not a good idea).

2. Create additional tempdb files in other drives which have free space, while you dig around to find the culprit who is growing tempdb.

3. Restart your SQL Server service.

Have fun working with tempdb. Here are some good references

Working with tempdb – http://technet.microsoft.com/hi-in/library/cc966545(en-us).aspx

Storage Engine Blog – http://blogs.msdn.com/b/sqlserverstorageengine/archive/2009/01/12/tempdb-monitoring-and-troubleshooting-out-of-space.aspx

Posted in Performance, T-SQL | Tagged: , , , , , , , , , | 18 Comments »

IO Cost in an execution plan–What it actually means?

Posted by Sudarshan Narasimhan on May 11, 2012


I/O is probably one the slowest of the resources. The information below gives a quick meaning of what these operators mean when you look at them in an execution plan. This specific concentrates on the “Estimated I/O Cost” value in an execution plan. I’ll post some other day in detail on how to decipher a SQL Server query execution plan.

Thanks to Niraj Mehta for putting together this content.

Table Scan

  • The total number of data pages in the table

Clustered Index Scan

  • The number of levels in the index plus the number of data pages to scan (data pages = #rows / #rows per page)

Non-Clustered Index Seek on a Heap (Bookmark Lookup)

  • The number of levels in the index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows (1 I/O for each row on the heap)

Non-Clustered Index (I1) Seek on a Clustered index (I2) (Bookmark Lookup)

  • The number of levels in the I1 index plus the number of leaf pages to read for qualifying rows plus the number of qualifying rows times the cost of searching for a clustered index (I2) key

Covering Non-Clustered index

  • The number of levels in the index plus the number of leaf index pages to read for qualifying rows (#qualifying rows / # rows per leaf page).

Posted in Performance | Tagged: , , , , , , , , , , | Leave a Comment »

Shedding light on SQL Server Statistics and Auto Update Statistics

Posted by Sudarshan Narasimhan on May 4, 2012


SQL Server collects statistical information about indexes and column data in the database. These statistics play a very important role in coming up with an efficient query plan. Statistical information is used by the Query Optimizer when it decides to choose Seek vs. Scan or using Index A vs. Index B, etc. So it’s important as a DBA to pay careful attention to statistics and making sure they are updated and maintained well.

Just like Chandler Bing’s job is “Statistical analysis and data reconfiguration”, its a good DBA’s job to take care of maintaining the statistics Smile.


SQL Server 2005 statistics features allow you to:

  • Implicitly create and update statistics with the default sampling rate (in the SELECT, INSERT, DELETE, and UPDATE commands, use of a column in a query condition such as a WHERE or JOIN clause causes statistics to be created or updated if necessary when automatic statistics creation and update is enabled)
  • Manually create and update statistics with any desired sampling rate, and drop statistics (CREATE STATISTICS, UPDATE STATISTICS, DROP STATISTICS, CREATE INDEX, DROP INDEX)
  • Manually create statistics in bulk for all columns of all tables in a database (sp_createstats)
  • Manually update all existing statistics in the database (sp_updatestats)
  • List statistics objects that exist for a table or database (sp_helpstats, catalog views sys.stats, sys.stats_columns)
  • Display descriptive information about statisticsobjects (DBCC SHOW_STATISTICS)
  • Enable and disable automatic creation and update of statistics database-wide or for a specific table or statistics object (ALTER DATABASE options AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, sp_autostats, and the NORECOMPUTE option on CREATE STATISTICS and UPDATE STATISTICS)
  • Enable and disable asynchronous automatic update ofstatistics (ALTER DATABASE option AUTO_UPDATE_STATISTICS_ASYNC

Behaviour change from SQL 2005 onwards

In SQL Server 2000, statistics update was determined by the number of row changes. Now, changes are tracked at the column level, and auto update of statistics can now be avoided on columns which have not changed enough to warrant statistics update.  This is a behaviour change between SQL 2000 and SQL 2005 onwards when it comes to auto update statistics. SQL Server 2005 and onwards determines whether to update statistics based on the colmodctrs (column modification counters) value.

A statistics object is considered out of date in the following cases:

1. The table size has gone from 0 to > 0 rows.
2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
4. If the statistics object is defined on a temporary table, it is out of date as above, except that there is an additional threshold Recomputation after insertion of 6 rows.


Q: How is the COLMODCTR maintained by SQL Server?
Ans: The colmodctr values that SQL Server keeps track of are continually modified as the data in the table changes. Depending on the amount of INSERT/UPDATE/DELETE/TRUNCATE/BULK OPS that are occurring on the table, this counter value keeps changing. The below table, describes what the effect on colmodctr value are for each operation.

Cc293623.table_C05621961_2(en-us,TechNet.10).jpg
Reference: http://technet.microsoft.com/en-us/library/cc293623.aspx

 

Controlling STATISTICS Behaviour

You have options like AUTO_UPDATE_STATISTICS, AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC etc, which are all database level options. So you need to configure this per each database. You can find if your user databases have these options by using sp_helpdb and looking at the status column or by using a query like below.

select name as [DB_NAME], is_auto_create_stats_on, is_auto_update_stats_on, is_auto_update_stats_async_on 
from sys.databases

How to find out which indexes or statistics needs to be updates?

You can use the following query on any SQL 2005+ instance to find out the % of rows modified and based on this decide if any indexes need to be rebuilt or statistics on the indexes need to be updated.

select
schemas.name as table_schema,
tbls.name as Object_name,
i.id as Object_id,
i.name as index_name,
i.indid as index_id,
i.rowmodctr as modifiedRows,
(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) as rowcnt,
convert(DECIMAL(18,8), convert(DECIMAL(18,8),i.rowmodctr) / convert(DECIMAL(18,8),(select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2))) as ModifiedPercent,
stats_date( i.id, i.indid ) as lastStatsUpdateTime
from sysindexes i
inner join sysobjects tbls on i.id = tbls.id
inner join sysusers schemas on tbls.uid = schemas.uid
inner join information_schema.tables tl
on tbls.name = tl.table_name
and schemas.name = tl.table_schema
and tl.table_type='BASE TABLE'
where 0 < i.indid and i.indid < 255
and table_schema <> 'sys'
and i.rowmodctr <> 0
and i.status not in (8388704,8388672)
and (select max(rowcnt) from sysindexes i2 where i.id = i2.id and i2.indid < 2) > 0
order by modifiedRows desc

Updating Statistics on all the table in any given database

I often get this often as to how you update all the tables in a database. You can use a script like below to achieve this.

Disclaimer: Do not run this unless you know its implications on a production server. Statistics update on all the tables will use CPU resources and depending on the size of the table take its own time.

use <YOUR_DB_NAME>
go
select identity(int,1,1) as rownum,table_name into table_count from information_schema.tables where table_type='base table'
declare @count int,@stmt varchar (255),@maxcount int,@tblname varchar(50)
set @count=1
select @maxcount=count(*) from table_count
while @count < @maxcount+1
begin 
      select @tblname=table_name from table_count where rownum=@count
      set @stmt = 'UPDATE STATISTICS '+ '[' +@tblname+ ']' + ' WITH FULLSCAN'
      PRINT ('Updating statistics for table :'+@tblname)
      EXEC(@stmt)
      PRINT ('Finished Updating statistics for table :'+@tblname)
      print ''
      set @count=@count+1
      set @stmt=''
End
drop table table_count

Some tips on Statistics

1. Table variables do not have statistics at all.

Table variables are meant for operations on a small number of rows, a few thousand rows at max. This is a good scenario where you need to think about temporary tables (#tbl), because unlike table variables, temp tables can have indexes created on them, which means they can have statistics.

2. Multi-Statement Table Value Functions (TVF’s) also do not have any statistics

So if you have a complex query logic implemented in a function in SQL Server, think again! This function does not have any statistical information present, so the SQL optimizer must guess the size of the results returned. The reason for this is a multi-statement TVF returns you a TABLE as an output and table does not have any statistics on it.

3. You can find out from the Execution Plan aka SET STATISTICS PROFILE statement if any statistics would help a particular query

When you enable STATISTICS PROFILE ON and execute any query/batch it displays the execution plan. In this output look for the column called “Warnings”. During the course of compiling the plan, if the SQL Server optimizer felt that some statistics on column A would have helped the query, it displays this warning in the execution plan as “NO STATS”. If you see any such warning, consider creating some column statistics or indexes on the particular object in the row.

4. Avoid creating indexes on very frequently updated columns as the statistics also will have to keep up with the amount of data modifications.

5. Viewing Statistics

You can use DBCC SHOW_STATISICS (‘tablename’ , ‘index name’) to view the statistics on any given index’/column stats along with the histogram. The system DMV sys.stats stores information on each statistics available in a particular database.

Any statistics having the name prefixed as _WA_Sys_ is a auto-created statistics, which means SQL Server itself created them. User created statistics will have a given name or have the index name, e.g. PK_TBL1

Posted in Performance | Tagged: , , , , , , , , | Leave a Comment »

How to prevent users from accessing SQL Server from any application or any login expect your main application & its login

Posted by Sudarshan Narasimhan on April 7, 2012


I heard of these interesting requirements when working with one of my customers. If I were to break it down, there are 2 types of scenarios where we want to restrict access to SQL Server.

  1. Only the main application which uses a SQL Login should be able to connect to the SQL Server. None of the Windows Logins or any other login should be able to connect the SQL database. In order words, the SQL database should be accessed only by the Application Login and no one else.
  2. Access to SQL Server and the database should be only through a custom application and no other application including SQL Server Management Studio, SQLCMD etc. should allow users to access the database.

Solution for Scenario #1

The easiest and secure way of preventing other logins from accessing SQL Server is through Logon Triggers. You might wonder why this is an issue, when only required logins are added in SQL Server. Think of a scenario where the database is deployed on client machines and obviously the client machine means the user will be an Administrator on his own machine. A perfect example of this is SQL Server Express databases.

You can use Logon Triggers to prevent Windows logins or other logins from accessing SQL Server. Logon triggers are fired every time a new connection is established to SQL Server. Just like regular triggers we can perform a ROLLBACK which will roll back the connection if the login is not your application login.

For this example assume that your application has a login called “MyApplicationUser” and this is the only login that should have access (even the local Administrator must not be able to access SQL Server).

Here is the T-SQL code to implement such a requirement

--Step1: Create Audit Table
USE master
GO
CREATE TABLE dbo.loginAuditTable (
    id INT IDENTITY PRIMARY KEY,
    data XML,
    program_name nvarchar(128)
)
GO

--Step2: Create VIEW To read XML Audit Data in user-friendly format
use master
go
CREATE VIEW dbo.loginAuditView
AS
SELECT id
      ,data.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname') AS EventType
      ,data.value('(/EVENT_INSTANCE/PostTime)[1]', 'datetime') AS PostTime
      ,data.value('(/EVENT_INSTANCE/SPID)[1]', 'int') AS SPID
      ,data.value('(/EVENT_INSTANCE/ServerName)[1]', 'nvarchar(257)') AS ServerName
      ,data.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname') AS LoginName
      ,data.value('(/EVENT_INSTANCE/LoginType)[1]', 'sysname') AS LoginType
      ,data.value('(/EVENT_INSTANCE/ClientHost)[1]', 'sysname') AS ClientHostName
      ,data.value('(/EVENT_INSTANCE/IsPooled)[1]', 'bit') AS IsPooled
      ,program_name
      ,data.value('(/EVENT_INSTANCE/SID)[1]', 'nvarchar(85)') AS SID
FROM master.dbo.loginAuditTable
GO

--Step3: Create Logon Trigger to block all users expect SA & MyApplicationUser
IF EXISTS(
    SELECT * FROM master.sys.server_triggers
    WHERE parent_class_desc = 'SERVER' AND name = N'Allow_only_Application_Login_Trigger')
DROP TRIGGER [Allow_only_Application_Login_Trigger] ON ALL SERVER
GO
 
CREATE TRIGGER Allow_only_Application_Login_Trigger
ON ALL SERVER WITH EXECUTE AS 'sa'
FOR LOGON
AS
BEGIN

DECLARE @data XML
SET @data = EVENTDATA()
 
DECLARE @AppName sysname
       ,@LoginName sysname
       ,@LoginType sysname
       ,@LoginDomain sysname
       ,@HostName sysname
SELECT @AppName = [program_name]
FROM sys.dm_exec_sessions
WHERE session_id = @data.value('(/EVENT_INSTANCE/SPID)[1]', 'int')

SELECT @LoginName = @data.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname')
      --,@LoginDomain = @data.value('(/EVENT_INSTANCE/LoginDomain)[1]', 'sysname')
      ,@LoginType = @data.value('(/EVENT_INSTANCE/LoginType)[1]', 'sysname')
      ,@HostName = @data.value('(/EVENT_INSTANCE/ClientHost)[1]', 'sysname')

IF @LoginName not in ('MyApplicationUser','sa') 
    BEGIN
        ROLLBACK; --Disconnect the session
        --Log the exception to our Auditing table
        INSERT INTO master.dbo.loginAuditTable(data, program_name)
        VALUES(@data, @AppName)
    END 
END;

 

When a user who isn’t in the logon trigger exception list above tries to connect, they will get this error.

TITLE: Connect to Database Engine

——————————

Cannot connect to SERVERNAME.

——————————

ADDITIONAL INFORMATION:

Logon failed for login ‘Domain\TestUser1’ due to trigger execution.

Changed database context to ‘master’.

Changed language setting to us_english.

(Microsoft SQL Server, Error: 17892)

Every time a user who isn’t SA or MyApplicationUser tries to connect to SQL Server and failed with error given above, you will see the following information in the audit table including the login name, client machine name, application name etc.

image


Note:

Be careful when creating logon triggers as any mistake/bug in your code logic, will block you and then you wont be able to connect back in again. If you are in such a situation, then you can use the Dedicated Admin Connection (DAC) to make a connection and then disable/drop the logon trigger. Connections via the DAC don’t fire login triggers and there can be at max 1 DAC connection for an entire instance. DAC can enabled from sp_configure.

EXEC sp_configure 'remote admin connections', 1;
GO
RECONFIGURE;

As a best practice, make sure you have DAC enabled on all your SQL instances, because it is a life-saver in situations like this Smile. To make a DAC connection use the -A switch, sqlcmd -A -D master or Admin:ServerName from SSMS.

Solution for Scenario #2

Scenario #2 is a little more interesting and more complicated. Basically, the ask is to prevent all database access except from 1 specific client application. So even tools like SSMS, SQLCMD, Profiler etc. should not be able to connect to SQL Server.

We can still implement a reasonable amount of security using Logon Triggers, but it is not 100% fool-proof. With the same logon trigger code given above, we need to use the program_name from sys.dm_exec_sessions to check if this program is allowed to establish a connection to SQL Server.

--Step3: Create Logon Trigger to block all users expect SA & MyApplicationUser
IF EXISTS(
    SELECT * FROM master.sys.server_triggers
    WHERE parent_class_desc = 'SERVER' AND name = N'Allow_only_Application_Login_Trigger')
DROP TRIGGER [Allow_only_Application_Login_Trigger] ON ALL SERVER
GO
 
CREATE TRIGGER Allow_only_Application_Login_Trigger
ON ALL SERVER WITH EXECUTE AS 'sa'
FOR LOGON
AS
BEGIN

DECLARE @data XML
SET @data = EVENTDATA()
 
DECLARE @AppName sysname
       ,@LoginName sysname
       ,@LoginType sysname
       ,@HostName sysname

SELECT @AppName = [program_name]
FROM sys.dm_exec_sessions
WHERE session_id = @data.value('(/EVENT_INSTANCE/SPID)[1]', 'int')

SELECT @LoginName = @data.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname')
      ,@LoginType = @data.value('(/EVENT_INSTANCE/LoginType)[1]', 'sysname')
      ,@HostName = @data.value('(/EVENT_INSTANCE/ClientHost)[1]', 'sysname')

IF @AppName not in ('MyProductionApplicationName1', 'MyReportingApplicationName2', 'Microsoft SQL Server Management Studio', 'Microsoft SQL Server Management Studio - Query') 
    BEGIN
        ROLLBACK; --Disconnect the session
        
        --Log the exception to our Auditing table
        INSERT INTO master.dbo.loginAuditTable(data, program_name)
        VALUES(@data, @AppName)
    END 
END;

 

This is the output from the Auditing table to track the applications and users who tried to connect to SQL Server.

image


The catch with this method is the the program_name or application name is not really a fool-proof method, since it is very much possible for someone to provide the correct application name in the connection string and get connected to SQL Server. There is a property called ApplicationName which can be set in the connection string. So if the application name is known, anyone can write a program to provide that name in the connection string and gain access to SQL Server.



E.g.

OLEDB Connection String

Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=master;Data Source=ProdServer1;Application Name=MyProductionApplicationName1

SQL Native Client Connection String

Integrated Security=SSPI;Persist Security Info=False;User ID="";Initial Catalog=master;Data Source=ProdServer1;Application Name=MyReportingApplicationName2

If we use the [Application Name] property in the connection string, then even if we have logon triggers, it will allow the user to connect to SQL Server, since we are just matching the application name which is specified in the connection string.

A more secure approach you could use for this is called "Application Roles". When connecting from an application you assume a particular role and only that role is granted privileges required in the database. So all apps connect via this mechanism and don’t give out SQL or NT logins for any unauthorised use. You will have to call sp_setapprole in your application with a password, once the connection is established. Once this call succeeds then the connection gets the privileges of the application role and loses privileges of the actual user, which is what we want. So if someone tried to connect to the database from SSMS or SQLCMD, they will access the DB using their credentials, which won’t have the required permissions on the tables, since only the application role has the rights on the tables. This is more secure & reliable approach, but one that requires application code change and as a DBA you will have to create the application role in SQL Server.

The steps to create application roles are given in Books Online.

Posted in Security | Tagged: , , , , , , , , | 12 Comments »

SQL Server 2012 – Released and available for public download

Posted by Sudarshan Narasimhan on April 3, 2012


SQL Server 2012 codenamed “Denali” is now ready for download and available for public consumption. Please check out the following blog post that talks about the general release.

http://blogs.technet.com/b/dataplatforminsider/archive/2012/04/02/sql-server-2012-is-generally-available.aspx

SQL 2012 is specifically targeted at Big Data and High Availability, including a powerful set of feature-rich BI tools like PowerView. Some of the new engine features are :-

  1. SQL Server AlwaysON
  2. Contained Databases
  3. Column Store Indexes
  4. User-Defined Server Roles
  5. Big Data (Hadoop)


Again, Books Online is your best resource to read up on the new release and to understand the new set of features.

You can download the evaluation/trial version of SQL 2012 here,

http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx

http://www.microsoft.com/download/en/details.aspx?id=29066

Posted in SQL News | Tagged: , , , , , , , | Leave a Comment »