Here is a summary of today’s session.

The Goal

To be able to write 90% of your own queries
Not to be a DBA
All of our topics apply well to MS SQL Server (TSQL), MySQL, Oracle PSQL, Access, DB2, PostgreSQL
I will reference the AdventureWorks2012 database on MS SQL Server in this series of blog posts.

Brief History

http://en.wikipedia.org/wiki/SQL
- Developed at IBM in early 70s
- SEQUEL became SQL because of trademark conflict with an aircraft company
- Oracle sold first commercial implementation in 1979
- Similar across vendors, but each has their quirks
- Standardized in 1986
The language has not changed significantly
Database designs are good at reading, writing, and reporting, but not all at once. Most databases are part of a transactional system, like an e-commerce site, and are not optimized for reporting. We have to deal with that and will highlight techniques to help.

Think About Sets

Think about sets of data. Do not think about discreet data entries.
SQL is very good at operating on sets of data and terrible at row-by-row operations.
Set Theory from your school days
- Union – joining two sets of data to make a bigger result set
- Intersection – just the common items of two sets of data
- Complement – subtract a subset from a larger set

Using Management Studio

View Databases, tables, columns in the Object Explorer on the left

New Query window
- Click “New Query” in toolbar. You can have many.
- Select database via dropdown in toolbar
- Hover over Query window tab to verify its database in the tooltip that appears
- F5 to Execute
  - If text selected, will only execute selected text
  - All text will be executed if none selected
  - Fastest developers rarely use a mouse
- Parse – “check” button on toolbar
- Stop
  - If your query is running longer than a couple of seconds, make sure you know what you are doing.
  - If your query is taking longer than 10 seconds, really make sure you know what you are doing.
  - If your query is taking 30 seconds or more, stop it and seek help.
- Statistics in lower right corner: execution time and number of rows
Don’t be afraid to explore

Tables, Columns, and Data

Table is like your favorite spreadsheet
- rows and columns
- similar types of data per column
- usually a unique item in leftmost column
Common column data types
- int – a number with no decimals
- char(#)
  - Specifies max length
  - Reserves all space
- nvarchar(#) and varchar(#)
  - Specifies max length
  - Simply put, only uses needed space
  - nvarchar allows multi-byte characters (oriential characters)
- datetime – pretty obvious
- money – number with decimals, not affected by floating point precision issues (repeating 9s)
- uniqueidentifier – an ugly string of “random” characters that is guaranteed unique upon creation
- float – number with decimals, lots of digits, might show “repeating 9s”
- bit – true or false
Vast majority will and should have a Primary Key (PK), but not required
- yellow key icon in Object Explorer
- usually an int (nicer for humans) or uniqueidentifier
NULL values, nullable columns
- The absence of value
- Not appropriate to give it meaning
- Frequently debated topic on how to use within a database design

SELECT … FROM

*
- returns all columns
- ```
SELECT * FROM sales.currency
```
- ```
SELECT * FROM sales.salesorderheader
```
  (Ouch!) Don’t do this unless you know the table is not too big. The orders table could have a lot of rows and you are just locking things up by returning tons of data you can’t process as a human.
COUNT(*) – an aggregate function
- Use COUNT to find out how many rows are in a table before doing something as wild as SELECT *
- ```
SELECT COUNT(*) FROM sales.salesorderheader
```
TOP #
- This returns a specified number of rows. Great way to look at some data without getting all of the data.
- Safest to assume the server is randomly choosing which rows unless you specify more in your query.
- ```
SELECT top 10 *
	FROM sales.salesorderheader
```
Specifying columns
- column1, column2, *
- AS – column alias, rename a column, used for convenience and can be very useful with more complex queries. Using the AS keyword while specifying a column alias is optional. In fact Oracle does not support it.
- Use single quotes or square brackets around your alias if spaces are desired.
- Pro tip: When exploring data, you may want to see all columns using the *, but you want to move particular columns to the left, specify those columns and add the * also.
- ```
SELECT TOP 10 salesorderid AS id, orderdate AS date, status, territoryid, totaldue, *
	FROM sales.salesorderheader
```
DISTINCT
- Returns unique values. This is a good tool to see what are the possible values for a column, like a status column.
- ```
SELECT DISTINCT status, territoryid
	FROM sales.salesorderheader
```

The 3 C’s

I attribute this to a friend and long time developer, Jim Mischel. He would coach budding developers on the 3 C’s’:

Conciseness – do what is needed to get the job done and no more
Completeness – make sure it fully works
Clarity – make it easy to understand later

I coach developers heavily on Clarity. We are writing code here. The #1 audience for any code you write is the next developer that has to read your code and make sense of it to fix a bug or add a feature. You will save yourself and your fellow developers a lot of time if you format your query for readability. Make it a habit and complain when it is not clear. Follow the conventions I use and your SQL will look like most everybody else’s. All but the very simplest of queries will span multiple indented lines.

Comments

/* … */
– – (to end of line)

SELECT TOP 100 *
	/* this will be ignored */
	FROM sales.salesorderheader

SELECT TOP 100 *
	-- this will also be ignored
	FROM sales.salesorderheader

SELECT TOP 100 * -- and this, too, will be ignored
	FROM sales.salesorderheader

ORDER BY

Column name
Column number
ASC for ascending order, this is the default
DESC for descending order

SELECT top 100 *
	FROM sales.salesorderheader
	ORDER BY salesorderid DESC

SELECT countryregioncode, name
	FROM sales.salesterritory
	ORDER BY countryregioncode DESC, name

SELECT countryregioncode, name
	FROM sales.salesterritory
	ORDER BY 1 DESC, 2

WHERE

=, >, <, >=, <=, <>, !=

<> and != mean the same: “not equal”

SELECT *
	FROM sales.salesorderheader
	WHERE orderdate > '7/1/2008'

SELECT *
	FROM sales.salesorderheader
	WHERE salesorderid > 74185

Note: ‘7/1/2008’ implies a time of midnight

AND

SELECT *
	FROM sales.salesorderheader
	WHERE salesorderid > 74185
		AND salesorderid != 74188

BETWEEN … AND (inclusive)

SELECT *
	FROM sales.salesorderheader
	WHERE orderdate BETWEEN '7/1/2008' AND '7/2/2008'

GROUP BY and Aggregate Functions

AVG/SUM/MIN/MAX/COUNT and others
Do a web search similar “msdn sql avg” to get quick access to reference documentation

SELECT status, AVG(totaldue)
	FROM sales.salesorderheader
	WHERE orderdate BETWEEN '7/1/2008' AND '7/3/2008'
	GROUP BY status

Homework

For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales

So You Want to Learn SQL… Session 1