It’s well known that indexes on columns used in where clause and for joins is a good thing in SQL, but what about other places. How about on aggregates?
Consider a simple table with an amount and a customerID. It’s a common requirement to calculate the total amount that each customer has paid. No conditions are enforced, so this would seem like a place where an index won’t help. Well, let’s see. (sample code at end)
The clustered index (and hence the physical order of the rows) is on the identity column.Take the following query.
SELECT CustomerID, SUM(Amount) FROM Payments group by customerID
Without any additional indexes, SQL will execute that as a hash match (aggregate) which comprises 63% of the query’s cost. That is because the data is not ordered by the grouped column, and hence cannot be simply be summed.
Now add an index on CustomerID, Amount and run the same query. Now, the aggregate can be satisfied just with the non clustred index and, more importantly, the data is in order of customerID, so just a stream aggregate is required and the data does not have to be resorted or run through a hashing algorithm. Moreover, the stream aggregate is only 15% of the query’s cost
CREATE TABLE Payments (
PaymentKey INT IDENTITY PRIMARY KEY,
;WITH DataPopulate (RowNo, CustomerID ,InvoiceID ,ADate, Amount) AS (
SELECT 1 AS RowNo, FLOOR(RAND()*40) as CustomerID, FLOOR(RAND()*200) as InvoiceID,
DATEADD(dd, FLOOR(RAND()*75454),'1753/01/01'), rand()*856542 AS Amount
SELECT rowNo+1, FLOOR(RAND(RowNo*85455)*40) AS CustomerID,
FLOOR(RAND(RowNo*85455)*500) AS InvoiceID,
rand(RowNo*25411)*86542 AS Amount
FROM DataPopulate WHERE RowNo<10000
INSERT INTO Payments (CustomerId, InvoiceID, PaymentDate, Amount)
SELECT CustomerID ,InvoiceID ,ADate, Amount FROM DataPopulate
OPTION (MAXRECURSION 10000)
Things borrower must know about Pre Closing Home loan - Most people tend to take a home loan for 15 to 20 years as this usually offers the lowest home loan interest rates. Now to prepay the loan, you have to pa...