sunnuntai 23. elokuuta 2015

Let’s SELECT errors...

Data problems are often hard to notice when operating with SQL. With other languages like C, C#, Java etc. information is typically investigated and modified item-by-item. In SQL all the operations are set based, meaning that a single statement often deals with several rows. For example a simple SELECT statement investigates all the rows in involved tables, one way or another, and returns the desired results, hopefully...

Syntactical problems are easy to catch, you get an error. But what if your conditions are false, you don’t get an error but the results aren’t what were expected. When creating a program or modifying it, part of the testing is to ensure that the returned results are correct so this way we can deal with the initial situation. However, things get more complicated when the world around changes. Let’s take an example.

The following script creates an OrderItem table. It’s not a gem what comes to database design but sufficient to pinpoint the problem we’re discussing of:

CREATE TABLE OrderItem (
   OrderNumber int,
   Product     varchar(100),
   RowType     int CONSTRAINT ck_orderitem_rowtype 
                   CHECK (RowType IN (1,2)),   
                   -- 1=Included, 2=Excluded
   Amount      int,
   ListPrice   decimal
);

Product is the name of the item to be sold, Amount defines the quantity of items to be purchased and ListPrice is the price of a single item. RowType defines if this item is included in the order (value is one) or was it selected but then removed from the order (value of two).

Now let’s add some data

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Squared tyre', 1, 4, 100.0);

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Oil scent, 100 oz', 2, 1, 50.0);

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Haircut', 1, 1, 20.0);

Now let’s imagine that the application fetches a list of items in the order. In order to show the prices correctly, the SELECT statement needs to take the RowType into account, something like the following:

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice 
                      -- Calculate the price
          WHEN 2 THEN 0  -- Excluded so no price
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

The results would be

Status                Product            Amount   Price
-------------------   -----------------  ------   -----
Included into order   Haircut            1        20
Included into order   Squared tyre       4        400
Excluded from order   Oil scent, 100 oz  1        0

Also the application could fetch the items to be delivered, with the following query:

SELECT oi.Product, 
       oi.Amount,
       oi.Amount * oi.ListPrice  AS Price
FROM  OrderItem oi
WHERE oi.OrderNumber = 1
AND   oi.RowType = 1
ORDER BY oi.RowType,
         oi.Product;

And the results would be

Product        Amount   Price
------------   ------   -----
Haircut        1        20
Squared tyre   4        400

So everything is perfect, until…
One day an extra row type is defined and added into the database. It was decided that customers buying lot of items were rewarded with a gift included in the purchase, naturally free of charge. The gift is a bunch of flowers which is also sold separately. Because of this a new RowType 3 was introduced to distinguish free gift items from other items.
The following change was made to the database

ALTER TABLE OrderItem 
DROP CONSTRAINT ck_orderitem_rowtype;

ALTER TABLE OrderItem 
ADD CONSTRAINT ck_orderitem_rowtype 
CHECK (RowType IN (1,2,3));   
-- 1=Included, 2=Excluded, 3=Gift

The order 1 we used as an example had such a big value that it would get a gift so let’s add it

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Bunch of flowers', 3, 1, 5.0);

Now let’s see what happens with our original queries. The first one would return

Status                Product             Amount   Price
-------------------   -----------------   ------   -----
Included into order   Haircut             1        20
Included into order   Squared tyre        4        400
Excluded from order   Oil scent, 100 oz   1        0
NULL                  Bunch of flowers    1        NULL

And the second one

Product        Amount   Price
------------   ------   -----
Haircut        1        20
Squared tyre   4        400

The first one is more obvious you get NULL values in the result set in places where they are not expected so if someone looks carefully this may be noticed.

The second one is more dangerous, since the WHERE clause restricted the rows only to known RowTypes the gift isn’t included in the returned set of rows. Just by looking at the data nothing looks suspicious so you need to know the data behind in order to spot the problem.

If you think about other languages they have different kinds of mechanisms to handle known and unknown values. For example a switch structure has a default block that is fired for all values not listed in case statements. This same idea can be applied to some extent to queries. However, in order to ensure that false data doesn’t pass through we won’t do any default actions but deliberately generate errors.

Consider the following statement. It has a small modification in the CASE structure. RowTypes one and two are handled correctly but if an unknown RowType is spotted the ELSE branch is executed and a division by zero error is thrown.

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice
          WHEN 2 THEN 0
          ELSE  0/0
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

When the statement is run you get a message like following

Msg 8134, Level 16, State 1, Line 86
Divide by zero error encountered.

That would prevent a situation where an unknown RowType could cause false data to be returned. But what about the second one, it didn’t have any CASE structure in the SELECT. And even if a similar CASE would be added  to the statement it wouldn’t work since the WHERE clause takes care that no other values but 1 is returned.

Well, why not add the CASE into the WHERE clause:

SELECT oi.Product, 
       oi.Amount,
       oi.Amount * oi.ListPrice  AS Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
AND   1 = CASE oi.RowType 
             WHEN 1 THEN 1
             WHEN 2 THEN 0
             ELSE  0/0
          END
ORDER BY oi.RowType,
         oi.Product;

When the statement above is run, again, a division by zero error is returned.

So after adding the new RowType and fixing the statements the final versions could look something like these:

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
          WHEN 3 THEN 'Gift'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice
          WHEN 2 THEN 0
          WHEN 3 THEN 0
          ELSE  0/0
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

And

SELECT oi.Product, 
       oi.Amount,
       CASE oi.RowType
          WHEN 1 THEN oi.Amount * oi.ListPrice  
          WHEN 3 THEN 0
          ELSE 0/0
       END AS Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
AND   1 = CASE oi.RowType 
             WHEN 1 THEN 1
             WHEN 2 THEN 0
             WHEN 3 THEN 1
             ELSE  0/0
          END
ORDER BY oi.RowType,
         oi.Product;

Next time a new RowType is added both of the statements will react by causing and error and will hopefully get caught already in testing.


So is this the way to always write SELECT statements? No, it isn’t. The modification in the SELECT clause is quite safe and can be used more widely but the second one, a CASE in the WHERE clause is something to be careful with. Complex structures in conditions may lead into a situation where perfectly valid indexes aren’t used because of an expression. So it’s important to always check how the statement is behaving by investigating the execution plan.

Also it’s not feasible to use this kind of structure in every place. But in critical statements it may be justified to add deliberate errors in case of unknown values.

lauantai 6. lokakuuta 2012

Struggling with System.IO.Packaging when creating a CLR procedure

The problem

I started to create a small CLR procedure into Sql  Server 2012 Express Edition which would utilize System.IO.Packaging namespace. In order to use this namespace, WindowsBase must be referenced. After adding the required reference everything seemed to be fine when compiling the assembly.

Now when trying to register the newly created assembly to Sql Server, I got an error message stating:

Msg 10301, Level 16, State 1, Line 1

Assembly 'Test' references assembly 'system.xaml, version=4.0.0.0, culture=neutral, publickeytoken=b77a5c561934e089.', which is not present in the current database. SQL Server attempted to locate and automatically load the referenced assembly from the same location where referring assembly came from, but that operation has failed (reason: 2(The system cannot find the file specified.)). Please load the referenced assembly into the current database and retry your request.

Now, this was surprising, but understandable. So I added System.Xaml to the project references; After all I had no use for the visual interface in my project.

After compiling the project I re-tried to register the assembly. This time I received a different error:

Warning: The Microsoft .NET Framework assembly 'system.xaml, version=4.0.0.0, culture=neutral, publickeytoken=b77a5c561934e089.' you are registering is not fully tested in the SQL Server hosted environment and is not supported. In the future, if you upgrade or service this assembly or the .NET Framework, your CLR integration routine may stop working. Please refer SQL Server Books Online for more details.
Msg 6218, Level 16, State 2, Line 1
CREATE ASSEMBLY for assembly 'Test' failed because assembly 'System.Xaml' failed verification. Check if the referenced assemblies are up-to-date and trusted (for external_access or unsafe) to execute in the database. CLR Verifier error messages if any will follow this message
[ : System.Windows.Markup.ValueSerializer::CanConvertToString][mdToken=0x6000002][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::CanConvertFromString][mdToken=0x6000003][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::ConvertToString][mdToken=0x6000004][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::ConvertFromString][mdToken=0x6000005][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::TypeReferences][mdToken=0x6000006][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::.ctor][mdToken=0x6000001][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::.cctor][mdToken=0x600000d][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetSerializerFor][mdToken=0x6000007][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetSerializerFor][mdToken=0x6000008][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetSerializerFor][mdToken=0x6000009][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetSerializerFor][mdToken=0x600000a][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetConvertToException][mdToken=0x600000b][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.ValueSerializer::GetConvertFromException][mdToken=0x600000c][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.MemberDefinition::.ctor][mdToken=0x6000010][offset 0x00000000] Code size is zero.
[ : System.Windows.Markup.PropertyDefinition::get_Name][mdToken=0x6000011][offset 0x00000000] Code size is...

So after all I wasn't able to register the assembly to Sql Server at all.

What about other versions

Well I tried the same with Sql  Server 2008 R2 with no success. The error message was different and I needed to reference accessibility assembly which wasn't verified against Sql Server.

I also tried this with Sql  Server 2005 and it was quite a big surprise that Sql  Server 2005 seems to accept my assembly.

Conclusions

So it looks like newer versions of Sql Server don't accept WindowsBase reference. While I do understand this since lot's of the classes in this assembly are used in creating an user interface I don't quite understand why some basic functionality such as System.IO.Packaging is included in the same assembly.

Another potential problem is the System.Security.RightsManagement namespace which is included in WindowsBase