Sunday, December 21, 2008

Reducing the Size of your Database in SQL Server 2005/SP2

An exciting new feature in SQL Server 2005/SP2 is Vardecimal Storage Format. This storage format lets you reduce the size of your table significantly if the table has one of more columns of type decimal or numeric without requiring any changes to your application.


Up until now, the decimal and numeric types are stored as fixed length data in SQL Server. Both of these types are functionally equivalent and have a format of (p, s) where p is the precision (number of decimal digits) and s is the scale representing number of digits after the decimal. Depending on the precision (it can be declared in the range from 1 to 38), the decimal value can take anywhere from 5 bytes to 17 bytes. This can bloat the size of the table, especially when you have small decimal values for a column declared with high precision requirement. This issue is similar to char (17) vs. varchar(17). In this case, if most of your character data is 1 or 2 characters long but the max value is 17 characters long, you can reduce the size of the table by declaring the column to be of type varchar(17) instead of char(17).


The new vardecimal storage format stores the decimal/numeric values in a variable length storage format. It provides efficient storage of decimal/numeric data by eliminating the leading/trailing zeros and only storing the minimum required bytes. Using this format, you can get significant space savings (depending on your data distribution) in the space required to store decimal/numeric data. You can enable vardecimal storage format at a table level.


In our in-house testing, we have seen significant reduction in the size of the FACT table(s) that has large number of decimal columns. FACT tables are typically the largest table in a Data Warehouse. Here are some the numbers from our testing.



Best case reduction in the size of the table

57%

69%

51%


Estimating the space savings with vardecimal storage format

Before enabling Vardecimal storage format, you may want to know the potential reduction in the size of the table. Clearly, if the table has no decimal/numeric columns, there will be no savings. Note, that even if you have a table with decimal/numeric column types, there is no guarantee that you will be able to reduce the size of the table by enabling Vardecimal storage format. Again, this issue is similar to VARCHAR (17) vs. CHAR(17). If all the values in the column type has 17 characters, then average row length will be larger with VARCHAR(17) because it will be stored in the variable portion of the record structure. Recall, you need 2 bytes to store the offset of the variable length column. Also, if VARCAHR(17) is the only variable length column in the table, there is another overhead of 2 bytes to store number of variable length columns in the row. So in this case, the worst case, declaring column type as VARCAHR(17) may cost you 4 bytes more for each row than CHAR(17).



SQL Server 2005/SP2 provides you a tool, a stored procedure, to estimate the ‘reduction in row size’ with Vardecimal storage format. The following example illustrates the reduction in row size for two tables that have same scheme but different data, t_decimal being the best case and t_decimal2 being the worst case (where each decimal value has max 38 digits as allowed by the declared precision)



create table t_decimal (c1 int, c2 decimal(10,2), c3 decimal (38,2), c4 varchar(10))

go



create table t_decimal2 (c1 int, c2 decimal(10,2), c3 decimal (38,2), c4 varchar(10))

go



-- insert rows into these tables.

declare @i int

select @i = 0

while (@i < 1000)

begin

insert into t_decimal values (1, 0.0,0.0, 'hello')

insert into t_decimal2 values

(1,12345678.99,123456789012345678901234567890123499.99, 'hello')

set @i = @i + 1

end



-- Now let us find the potential space savings for each of these tables

-- This is the best case

exec sys.sp_estimated_rowsize_reduction_for_vardecimal 't_decimal'



Here is the output. Note, in this case, you can reduce the size of the row by almost 50%. Also, if you have more decimal/numeric columns, the savings will be proportionally larger.



avg_rowlen_fixed_format avg_rowlen_vardecimal_format row_count

--------------------------------------- --------------------------

46.00 24.00 1000







-- this is worst case. Note in this case, the average row length actually increases

-- with Vardecimal storage format.

--

exec sys.sp_estimated_rowsize_reduction_for_vardecimal 't_decimal2'



avg_rowlen_fixed_format avg_rowlen_vardecimal_format row_count

------------------------- ---------- ---------------- ------
46.00 48.00 1000

No comments: