Last Wednesday night was the 3rd meeting of the new MagicPASS SQL Server group. MagicPASS is the south Orlando chapter of PASS that is currently being run by Kendal vanDyke. They meet the third Wednesday of each month at Stetson University starting at about 5 PM.
Why start so early? So they can have two full sessions in one night. The first session this time was by Andy Warren who gave a very detailed and technical presentation on Indexes. Some of the more interesting things he covered included:
There are really 4 types of indexes in SQL: Clustered, Non-clustered, XML, and Spatial. Ok, that was sort of a trick question because most of us do not deal with Spatial data or even XML.
A table can only 1 clustered index because it defines the actual order of the table rows. However SQL 2005 supports up to 249 non-clustered indexes and SQL 2008 supports up to 999.
However, what you might not know is that each of the non-clustered indexes has an additional hidden column that points back to the clustered index value. Thus each clustered index field is repeated in each of the non-clustered indexes. This has several implications.
First, the combination of the non-clustered fields and the clustered fields could form a covering index for some of your queries.
Second, the larger (in bytes) you clustered index, the more space all of your non-clustered indexes will use because the values have to be repeated. This is one argument for using a unique identifier field rather than a long character or multi-column field as the clustered index key.
If your table does not have a clustered index, SQL Server generates a 8-byte RID which combines a reference to the file, page, and slot on the page where the data can be found. However, since a unique identifier can be 4-bytes, using it instead can save you space if you have several non-clustered indexes.
Why is the size of the non-clustered index important? Or any index for that matter? It is a trade-off between reading in as many unique index values as possible from each page read vs creating covered indexes and even the question of modifying the page fill factor so that inserts and even updates do not require splitting of pages too often.
Finally on the topic of index fragmentation, fragmentation can be bad if your TSQL execution plans rely on Index Scans to find data in the index rather than Index Seeks. Having too high of a fill factor will require more frequent page splits to accommodate new data inserts. Having too low of a fill factor will result in more page reads and larger indexes.
What is the best answer? It depends on how your plan to use your tables. This is why it is good to get familiar with reading execution plans and using Profiler to see how users are working with the databases before you begin to fine-tune your performance.
The second presentation was by Brian Mitchell of Microsoft on Making Business Intelligence with Data Warehousing Easier by using Microsoft Appliances and Fast Track Architectures.
He basically covered three topics beginning with the HP Business Decision Appliance. The advantage of this option is that with one SKU, a user can purchase both the hardware and software needed to create a BI solution for their company. The hardware comes with PowerPivot, SharePoint, and SQL Server pre-loaded and configured on a HP Proliant G7 Server. Once delivered, you can have your first BI solution up and running within a few days.
The second option is the SQL Server 2008 Fast Track Data Warehouse. This is more of a how-to guide telling you what pieces you need and how to put it together and configure the various parts to create your BI solution. This solution reduces the ‘pain’ of having to research what you need to buy and how to configure it.
The last option was the SQL Server 2008 R2 Parallel Data Warehouse. This is a solution for the high end users and ships like an appliance. However, the cost is about $2 million. The advantage of course is that loads and operations are highly parallel optimized and therefore execute significantly faster increasing the scale and reducing the execution time of large database and BI solutions.
Anyway, that was this month. Next month’s speaker is Jeff Garbus who will also dive deep into working with indexes and will dispel some of the myths and rules-of-thumb you may have been using.