Stephen Balkum

Agile Enthusiast, Leader, and Developer

So You Want to Learn SQL… Session 2

no comment

Review

Homework Answer

For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales

SELECT TerritoryID AS ID, AVG(TotalDue) AS AOV, COUNT(*) AS NumOrders
	FROM sales.SalesOrderHeader
	WHERE --OrderDate > '7/1/2008' AND OrderDate < '2008-8-1'
		OrderDate BETWEEN '7/1/2008' AND '8/1/2008'
	GROUP BY TerritoryID
	ORDER BY AOV DESC

COUNTing

COUNT(*) is very efficient since it has no restrictions and counts every qualifying row.  COUNT(column) does not count NULL values, but otherwise counts every row and pretty fast.  COUNT(DISTINCT column) requires the database engine to track the values it has already seen.  It also does not count NULL values.  This is the least efficient choice, so use it only when needed.

Take a look at the data from this query:

SELECT SalesOrderID, CreditCardID 
	FROM Sales.SalesOrderHeader
	WHERE OrderDate > '7/1/2008'

Now, observe the differences between the variations of COUNT:

SELECT COUNT(*), COUNT(distinct CreditCardID), COUNT(CreditCardID)
	FROM Sales.SalesOrderHeader
	WHERE OrderDate > '7/1/2008'

HAVING

The HAVING clause is the equivalent of WHERE for aggregate functions.  Pesky aggregate functions.

Example: How many orders by shipping method since 8/7/2013:

SELECT TerritoryID, COUNT(*) as NumOrders
	FROM Sales.SalesOrderHeader
	WHERE OrderDate > '7/1/2008'
	GROUP BY TerritoryID

But let’s only look at methods that have a statistically relevant number of orders:

SELECT TerritoryID, COUNT(*) as NumOrders
	FROM Sales.SalesOrderHeader
	WHERE OrderDate > '7/1/2008'
	GROUP BY TerritoryID
	HAVING COUNT(*) > 10

Normalization

Without getting technical, one of the biggest goals in transactional database design is to reduce data duplication.  This means a data value, or collection of values like an address, do not appear more than once.  The amount of data duplication in a database schema is known as Normalization.  This is why tables get split out to more tables and we have dependencies from one table to another.

When we design a table like Sales.SalesOrderHeader, we might put the shipping address in the table.  However, once a customer orders a second time from us we find the same shipping address appearing more than once in our data.  So, we pull the shipping address columns out to a separate table and leave behind a reference ID.

Foreign Keys

This reference ID is called a Foreign Key. It has the same value as the Primary Key in the table it references. A table can have a compound Primary Key which is multiple columns that together form a unique ID for the table. Referencing a table with a compound Primary Key requires all of columns involved in the key, thus a compound Foreign Key.

Take a look at the columns in the Orders table in the Object Explorer and you will see several Foreign Keys (FK).

[INNER] JOIN … ON … [AND]

We join tables together because there is desired related data for our query.  For example, to get the name of the customer for an order, we will have to join from the orders table to the customers table.

  • There multiple types of Joins with the most common, and default, being the Inner Join.
  • Draw the columns of another table into the query by specifying the relationship between the query and the incoming table:
    JOIN JoiningTable ON JoiningTable.PrimaryKey = ReferencingTable.ForeignKey
  • For compound keys, specify all of the relationships separated by AND:
    JOIN JoiningTable ON JoiningTable.PrimaryKey1 = ReferencingTable.ForeignKey1 AND JoiningTable.PrimaryKey2 = ReferencingTable.ForeignKey2
  • Any rows that cannot be matched across the join will be dropped from the result set. This usually happens when the FK column contains NULL.
  • Make a habit now of using table aliases to save typing. Table aliases are required for future complex join structures.

Here is a list of the number of orders by territory:

SELECT t.Name, COUNT(*)
	FROM Sales.SalesOrderHeader o
	JOIN Sales.SalesTerritory t ON t.TerritoryID = o.TerritoryID
	WHERE o.OrderDate > '7/1/2008'
	GROUP BY t.Name

Count of orders by shipping method and territory which requires joining from SalesOrderHeader to SalesTerritory and SalesOrderHeader to ShippingMethod:

SELECT sm.Name, t.Name, COUNT(*)
	FROM Sales.SalesOrderHeader o
	JOIN Sales.SalesTerritory t ON t.TerritoryID = o.TerritoryID
	JOIN Purchasing.ShipMethod sm ON sm.ShipMethodID = o.ShipMethodID
	WHERE o.OrderDate > '7/1/2008'
	GROUP BY t.Name, sm.Name

Count of orders by shipping state which requires joining from SalesOrderHeader to Address and then from Address to StateProvice:

SELECT sp.Name, COUNT(*)
	FROM Sales.SalesOrderHeader o
	JOIN Person.Address a ON a.AddressID = o.ShipToAddressID
	JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID
	WHERE o.OrderDate > '7/1/2008'
	GROUP BY sp.Name

Homework Exercise

List AOV per credit card type and country region code for July, 2008. Drop rows with statistically irrelevant results, i.e. less than 10. Sort by card type, AOV descending, then country.

Comments are closed.