Archive for the ‘TSQL’ Category

h1

SQL Server Indexed Views Barney Style

June 26, 2012

During my first few weeks at a new client I was asked to do some DBA tasks that I don’t normally do.  They were in the process of developing a new application and as a result a legacy DB was being carried over inside a new DB as a series of views instead of physical tables.  While the views rendered the data correctly they query performance went from a matter of less than 10 seconds to more than 10 minutes, and this was just on development data, not a full production cycle.  So with the performance blown out of the water they asked me to do a some research and see what was causing the slow down.

The first thing I did was looked at the tables in the legacy DB to see if there were indexes on the corresponding tables, and as I’m sure you’ve guessed, there were.  As I’m sure you have also deduced the indexes weren’t created in the new DB on the views.

At this point the only thing I really knew about creating indexes on views was that you could.  So I did a little digging and found out what was required to create and indexed view.  Of course I didn’t discover all of this at once, but instead after every hurdle I cleared I was presented another one.  So for your benefit and mine, I’m going to share with you what I’ve learned about indexed views.  As you can tell by the title, this is not an all inclusive dissertation on SQL Server indexed views.  This will be very much an introduction so I hope you get what you need from it.

So the first thing that I tried that didn’t work was scripting out the index from the old table and try adding it to the view.  This brought up the first requirement I found.  I got an error saying the view didn’t have Schema Binding on it.  I had no idea what this meant but I found the MSDN article that listed what Schema Binding was:

SCHEMABINDING

Binds the view to the schema of the underlying table or tables. When SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition. The view definition itself must first be modified or dropped to remove dependencies on the table that is to be modified. When you use SCHEMABINDING, the select_statement must include the two-part names (schema.object) of tables, views, or user-defined functions that are referenced. All referenced objects must be in the same database.

Views or tables that participate in a view created with the SCHEMABINDING clause cannot be dropped unless that view is dropped or changed so that it no longer has schema binding. Otherwise, the Database Engine raises an error. Also, executing ALTER TABLE statements on tables that participate in views that have schema binding fail when these statements affect the view definition.

So I did what was suggested and dropped and created the view with schema binding.  Great the view was created so I was ready to create the index again.  That was of course when the next error popped up.  The underlying tables where in different schemas which isn’t a big deal except, on top of that they had different owners for each schema, and that my friends is not allowed.

At this point my brain actually started to work properly, and in a stroke of, well not genius but somewhere above a drunken stupor and genius, I decided to search out how to create an indexed view instead of figuring out each error as it popped up.  That lead me to this MSDN article.

A view must meet the following requirements before you can create a clustered index on it:

  • The ANSI_NULLS and QUOTED_IDENTIFIER options must have been set to ON when the CREATE VIEW statement was executed. The OBJECTPROPERTY function reports this for views through the ExecIsAnsiNullsOn or ExecIsQuotedIdentOn properties.

  • The ANSI_NULLS option must have been set to ON for the execution of all CREATE TABLE statements that create tables referenced by the view.

  • The view must not reference any other views, only base tables.

  • All base tables referenced by the view must be in the same database as the view and have the same owner as the view.

  • The view must be created with the SCHEMABINDING option. Schema binding binds the view to the schema of the underlying base tables.

  • User-defined functions referenced in the view must have been created with the SCHEMABINDING option.

  • Tables and user-defined functions must be referenced by two-part names in the view. One-part, three-part, and four-part names are not allowed.

  • All functions referenced by expressions in the view must be deterministic. The IsDeterministic property of the OBJECTPROPERTY function reports whether a user-defined function is deterministic….

The list goes on from there, but you get the point.  There is a lot of prep work to do before creating an indexed view.  You might find it hard to go back and create an index on a view if you hadn’t planned it out well in advance.  with that being said here is the basic syntax for creating an indexed view, assuming all other conditions are met.


SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO

IF OBJECT_ID ('dbo.vTest', 'view') IS NOT NULL
DROP VIEW dbo.vTest ;
GO
CREATE VIEW dbo.vTest
WITH SCHEMABINDING
AS
SELECT SUM(ColA*ColB) AS Revenue,
ColC, ColD
FROM dbo.TableA AS a, dbo.TableB AS b
WHERE a.ColE = b.ColE
GROUP BY ColC, ColD;
GO

CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON dbo.vTest (ColC, ColD);

I hoped this helped get you on your way to creating indexed view.  For more information please check out these MDSN Articles 1, 2  and 3

Advertisements
h1

Using HASHBYTES() to compare columns

October 15, 2011

Recently, while at a client engagement, I was building some SSIS packages an issue came up where they didn’t want to use the CHECKSUM() function in TSQL to do column comparisons because the results can be inaccurate on some rare occasions.  I personally have never come across this but others here at Pragmatic Works have.  So we have two options freely available to work around this issue.  The first is the third party component plugin that you can get free at codeplex called Multiple Hash.  The client wasn’t comfortable with having to install this component on multiple servers throughout the environment so that option wasn’t available to me.  Instead I had to use the HASHBYTES() function in TSQL

HASHBYTES() is much more reliable than checksum when it comes to producing accurate results, but it comes at a slight cost. 

The first thing to note is how to construct the HASHBYTES() function.  In the first part you tell the function which algorithm you are going to use.  I’m using SHA1, but be aware that they single tics ‘ ‘ are required followed by a comma.  Then you must concatenate the columns you wish to use together as seen below.

 

image

There you can see its already a bit more arduous than using CHECKSUM(), but not that big of a deal to concatenate a bunch of columns.  Lets look at the results.

image

Uh-Oh here is our first problem.  HASHBYTES() doesn’t work with NULL values inside any columns.  So we’ve got to handle the Nulls in our query using the ISNULL() function.

image

Now the results look like this:

image

Then next thing that you have to look at is how HASHBYTES() handles(or more accurately doesn’t handle) Data types.  Here the ID column is an INT data type, but the same holds true for any non-string data type.

image

We get an error saying that the data type is wrong for the HASHBYTES() function

image

So now we have to CAST every column that is a non-string data type.

image

Now after this fix our results look better.

image

So as you can see already there will be a good deal more T-SQL coding involved with using HASHBYTES then with CHECKSUM(). But this isn’t all.  The last little gotcha isn’t quite as obvious as the first two.  Lets go back to our Null handling query.

image

I’ve gone and edited the data some for this example. Please also note that HASHBYTES() is case sensitive meaning that if you have the same spelling but different casing at an individual character level the hash value returned will be different.

image

The rows are different from one another but when concatenated together for the HASBYTES() function they produce the same exact value. So to handle this we have to update our code again.  We are going to add a rarely used character to the concatenation so ensure that they results will return correctly.

image

Here I chose to use a pipe to basically delimit the columns thereby making them different from each other. The result is much better.

image

If we take a quick look at results of the two concatenations we can see why we get the different results

image

The delimited column is obviously different from one row to the next, and the Non-Delimited column is exactly the same for each row.

While HASHBYTES() is definitely more accurate and reliable than using CHECKSUM(), there are some hoops to jump through to insure you get the results you need.  I hope this helps you guys out.