Stephen Balkum

Agile Enthusiast, Leader, and Developer

So You Want to Learn SQL… Session 4

no comment

Homework Answers

Generate a list of products that have a bill of material, stating each subcomponent and the quantity used. You should use the Production.BillOfMaterials table and create a human readable phrase. You might find the STR function useful.

SELECT 'A ' + p.Name + ' contains' + STR(bom.PerAssemblyQty) + ' ' + c.Name
	FROM Production.Product p
	JOIN Production.BillOfMaterials bom ON bom.ProductAssemblyID = p.ProductID
	JOIN Production.Product c ON c.ProductID = bom.ComponentID
	ORDER BY 1

Date Functions

GETDATE()

This first function is self-explanatory. It gets the time right now.

SELECT GETDATE()

DATEDIFF(datepart, startdate, enddate)

This function returns the difference between two dates. Returned value is an integer. Here are some useful dateparts are:
datepart year quarter month day week hour minute second
Abbreviations yy, yyyy qq, q mm, m dd, d wk, ww hh mi, n ss, s

SELECT	DATEDIFF(dd, '8/1/2013', '8/10/2013') AS 'days',
	DATEDIFF(dd, '8/10/2013', '8/1/2013') AS 'negative difference',
	DATEDIFF(hh, '8/1/2013 3:00', '8/2/2013 1:00') AS 'hours',
	DATEDIFF(m, 0, '8/1/2013') AS 'since 1/1/1900'

DATEADD(datepart, number, date)

This function adds the specified number of dateparts to the given date and returns the new date. The dateparts are the same as DATEDIFF.

SELECT	DATEADD(dd, 1, '8/1/2013') AS '1 day forward',
	DATEADD(dd, -1, '3/1/2012') AS 'Leap year known',
	DATEADD(yyyy, 1, '8/12/2013 13:30') AS 'keeps time',
	DATEADD(mm, 1363, 0) AS 'since 1/1/1900'

Combine Functions

Frequently, we want to bucket dates. For example, to count orders per day:

SELECT DATEADD(DD, DATEDIFF(DD, 0, o.OrderDate), 0) AS 'Day', COUNT(*) AS 'Count'
	FROM Sales.SalesOrderHeader o
	WHERE o.OrderDate > '7/1/2008'
	GROUP BY DATEADD(DD, DATEDIFF(DD, 0, o.OrderDate), 0)
	ORDER BY Day

Notice, you still have to GROUP BY any column not involved in an aggregate function, regardless of its complexity.

UNION ALL

To produce a single result set that is a combination of two queries, use UNION ALL. All of the columns of the two queries must have exactly the same data types. Here is a single list of all the address types and phone number types we have in our database.

SELECT a.AddressTypeID AS ID, a.Name, 'Address' AS Type
	FROM Person.AddressType a
UNION ALL
SELECT p.PhoneNumberTypeID, p.Name, 'Phone'
	FROM Person.PhoneNumberType p
ORDER BY Name

Subqueries

A subquery, a query within a query, can be used almost anywhere to access data in a more useful way. Useful can be a different structure for joining or it could be for performance purposes.
Let’s take a look at this year’s high value orders product 863. A high value order containing product 863 is >=$40,000. So, we start with our a query using what we already know how to do:

SELECT o.SalesOrderID, o.TotalDue, p.FirstName + ' ' + p.LastName AS CustomerName
	FROM Sales.SalesOrderHeader o
	JOIN Sales.Customer c on c.CustomerID = o.CustomerID
	JOIN Person.Person p ON p.BusinessEntityID = c.PersonID
	JOIN Sales.SalesOrderDetail od ON od.SalesOrderID = o.SalesOrderID
	WHERE o.TotalDue > 40000
		AND od.ProductID = 863
	ORDER BY o.OrderDate

The problem is filtering to product 863 requires the SalesOrderDetail table. If we join to SalesOrderDetail, our rows may expand for any order with multiple line items of that product. We don’t need to display any data from OrderItems, but the data we need to filter on is there.

WHERE EXISTS (subquery)

So, we will add a special WHERE clause that just looks for the existence of data in a query:

SELECT o.SalesOrderID, o.TotalDue, p.FirstName + ' ' + p.LastName AS CustomerName
	FROM Sales.SalesOrderHeader o
	JOIN Sales.Customer c on c.CustomerID = o.CustomerID
	JOIN Person.Person p ON p.BusinessEntityID = c.PersonID
	WHERE o.TotalDue > 40000
		AND EXISTS (SELECT *
				FROM Sales.SalesOrderDetail od
				WHERE od.SalesOrderID = o.SalesOrderID
					AND od.ProductID = 863)
	ORDER BY o.OrderDate

There are two keys takeaways with this query:
1) We didn’t have to do a join to OrderItems and get needless row duplication.
2) The EXISTS check is very fast since it doesn’t really return data, just the existence of rows. Once it finds a single row, it moves on and does not have to find all of the rows.
Now, we can expand our query to show two products: 863 and 854. However, high value orders for product 854 are >= $80,000. We can UNION two queries:

SELECT o.SalesOrderID, o.TotalDue, p.FirstName + ' ' + p.LastName AS CustomerName, 863 AS ProductID
	FROM Sales.SalesOrderHeader o
	JOIN Sales.Customer c on c.CustomerID = o.CustomerID
	JOIN Person.Person p ON p.BusinessEntityID = c.PersonID
	WHERE o.TotalDue > 40000
		AND EXISTS (SELECT *
				FROM Sales.SalesOrderDetail od
				WHERE od.SalesOrderID = o.SalesOrderID
					AND od.ProductID = 863)
UNION ALL
SELECT o.SalesOrderID, o.TotalDue, p.FirstName + ' ' + p.LastName AS CustomerName, 854
	FROM Sales.SalesOrderHeader o
	JOIN Sales.Customer c on c.CustomerID = o.CustomerID
	JOIN Person.Person p ON p.BusinessEntityID = c.PersonID
	WHERE o.TotalDue > 80000
		AND EXISTS (SELECT *
				FROM Sales.SalesOrderDetail od
				WHERE od.SalesOrderID = o.SalesOrderID
					AND od.ProductID = 854)
ORDER BY SalesOrderID

JOINing to a Subquery

With everything we have seen in this session, we can now create a psuedo table and join to it. Our request is for a two-week rolling average for the past four weeks. Since our sample database is all historical data, we will use August 1, 2008, rather than GETDATE().
We will start by creating our time ranges. Here is the current two-week date range:

SELECT	DATEADD(dd, -1, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'StartDate',
	DATEADD(dd, 13, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'EndDate'

Add in the previous 4 weeks:

SELECT	DATEADD(dd, -1, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'StartDate',
	DATEADD(dd, 13, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'EndDate'
UNION ALL
SELECT	DATEADD(dd, -8, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
	DATEADD(dd, 6, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))
UNION ALL
SELECT	DATEADD(dd, -15, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
	DATEADD(dd, -1, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))
UNION ALL
SELECT	DATEADD(dd, -22, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
	DATEADD(dd, -8, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))

Our base query for AOV and order count:

SELECT	AVG(o.TotalDue) AS AOV, COUNT(*) AS Count
	FROM Sales.SalesOrderHeader o

We definitely don’t want to run this query since it looks at all orders. However, we can join to our query of date ranges to remove unwanted data:

SELECT w.startdate, AVG(o.TotalDue) AS AOV, COUNT(*) AS Count
	FROM Sales.SalesOrderHeader o
	JOIN (
		SELECT	DATEADD(dd, -1, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'StartDate',
			DATEADD(dd, 13, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)) as 'EndDate'
		UNION ALL
		SELECT	DATEADD(dd, -8, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
			DATEADD(dd, 6, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))
		UNION ALL
		SELECT	DATEADD(dd, -15, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
			DATEADD(dd, -1, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))
		UNION ALL
		SELECT	DATEADD(dd, -22, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0)),
			DATEADD(dd, -8, DATEADD(ww, DATEDIFF(ww, 0, '8/1/2008'), 0))
) W ON W.StartDate <= o.OrderDate AND w.EndDate > o.OrderDate
GROUP BY w.startdate

Notice that a row in Orders will join to multiple rows in the subquery if it belongs to multiple date ranges. This allows the order to be counted in two different two-week rolling averages.

Homework

1) List the two-week rolling total quantity by territory for the current period and 3 prior periods by week. The result set should have three columns: Period Start Date, Territory Name, and Total Quantity.
2) List the total sales for each day of the week by territory for July 2008. The columns in the result set should be: Territory, Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday. Do not display the territory if it had zero sales all week.

Next Session will be just the homework answers as I have exhausted what I consider necessary for the goal of this series.

Comments are closed.