Here is a summary of today’s session.
The Goal
- To be able to write 90% of your own queries
- Not to be a DBA
- All of our topics apply well to MS SQL Server (TSQL), MySQL, Oracle PSQL, Access, DB2, PostgreSQL
- I will reference the AdventureWorks2012 database on MS SQL Server in this series of blog posts.
Brief History
- http://en.wikipedia.org/wiki/SQL
- Developed at IBM in early 70s
- SEQUEL became SQL because of trademark conflict with an aircraft company
- Oracle sold first commercial implementation in 1979
- Similar across vendors, but each has their quirks
- Standardized in 1986
- The language has not changed significantly
- Database designs are good at reading, writing, and reporting, but not all at once. Most databases are part of a transactional system, like an e-commerce site, and are not optimized for reporting. We have to deal with that and will highlight techniques to help.
Think About Sets
- Think about sets of data. Do not think about discreet data entries.
- SQL is very good at operating on sets of data and terrible at row-by-row operations.
- Set Theory from your school days
- Union – joining two sets of data to make a bigger result set
- Intersection – just the common items of two sets of data
- Complement – subtract a subset from a larger set
Using Management Studio
- View Databases, tables, columns in the Object Explorer on the left
- New Query window
- Click “New Query” in toolbar. You can have many.
- Select database via dropdown in toolbar
- Hover over Query window tab to verify its database in the tooltip that appears
- F5 to Execute
- If text selected, will only execute selected text
- All text will be executed if none selected
- Fastest developers rarely use a mouse
- Parse – “check” button on toolbar
- Stop
- If your query is running longer than a couple of seconds, make sure you know what you are doing.
- If your query is taking longer than 10 seconds, really make sure you know what you are doing.
- If your query is taking 30 seconds or more, stop it and seek help.
- Statistics in lower right corner: execution time and number of rows
- Don’t be afraid to explore
Tables, Columns, and Data
- Table is like your favorite spreadsheet
- rows and columns
- similar types of data per column
- usually a unique item in leftmost column
- Common column data types
- int – a number with no decimals
- char(#)
- Specifies max length
- Reserves all space
- nvarchar(#) and varchar(#)
- Specifies max length
- Simply put, only uses needed space
- nvarchar allows multi-byte characters (oriential characters)
- datetime – pretty obvious
- money – number with decimals, not affected by floating point precision issues (repeating 9s)
- uniqueidentifier – an ugly string of “random” characters that is guaranteed unique upon creation
- float – number with decimals, lots of digits, might show “repeating 9s”
- bit – true or false
- Vast majority will and should have a Primary Key (PK), but not required
- yellow key icon in Object Explorer
- usually an int (nicer for humans) or uniqueidentifier
- NULL values, nullable columns
- The absence of value
- Not appropriate to give it meaning
- Frequently debated topic on how to use within a database design
SELECT … FROM
- *
- returns all columns
-
SELECT * FROM sales.currency
-
SELECT * FROM sales.salesorderheader
(Ouch!) Don’t do this unless you know the table is not too big. The orders table could have a lot of rows and you are just locking things up by returning tons of data you can’t process as a human.
- COUNT(*) – an aggregate function
- Use COUNT to find out how many rows are in a table before doing something as wild as SELECT *
-
SELECT COUNT(*) FROM sales.salesorderheader
- TOP #
- This returns a specified number of rows. Great way to look at some data without getting all of the data.
- Safest to assume the server is randomly choosing which rows unless you specify more in your query.
-
SELECT top 10 * FROM sales.salesorderheader
- Specifying columns
- column1, column2, *
- AS – column alias, rename a column, used for convenience and can be very useful with more complex queries. Using the AS keyword while specifying a column alias is optional. In fact Oracle does not support it.
- Use single quotes or square brackets around your alias if spaces are desired.
- Pro tip: When exploring data, you may want to see all columns using the *, but you want to move particular columns to the left, specify those columns and add the * also.
-
SELECT TOP 10 salesorderid AS id, orderdate AS date, status, territoryid, totaldue, * FROM sales.salesorderheader
- DISTINCT
- Returns unique values. This is a good tool to see what are the possible values for a column, like a status column.
-
SELECT DISTINCT status, territoryid FROM sales.salesorderheader
The 3 C’s
I attribute this to a friend and long time developer, Jim Mischel. He would coach budding developers on the 3 C’s’:
- Conciseness – do what is needed to get the job done and no more
- Completeness – make sure it fully works
- Clarity – make it easy to understand later
I coach developers heavily on Clarity. We are writing code here. The #1 audience for any code you write is the next developer that has to read your code and make sense of it to fix a bug or add a feature. You will save yourself and your fellow developers a lot of time if you format your query for readability. Make it a habit and complain when it is not clear. Follow the conventions I use and your SQL will look like most everybody else’s. All but the very simplest of queries will span multiple indented lines.
Comments
- /* … */
- – – (to end of line)
-
SELECT TOP 100 * /* this will be ignored */ FROM sales.salesorderheader
-
SELECT TOP 100 * -- this will also be ignored FROM sales.salesorderheader
-
SELECT TOP 100 * -- and this, too, will be ignored FROM sales.salesorderheader
ORDER BY
- Column name
- Column number
- ASC for ascending order, this is the default
- DESC for descending order
-
SELECT top 100 * FROM sales.salesorderheader ORDER BY salesorderid DESC
-
SELECT countryregioncode, name FROM sales.salesterritory ORDER BY countryregioncode DESC, name
-
SELECT countryregioncode, name FROM sales.salesterritory ORDER BY 1 DESC, 2
WHERE
- =, >, <, >=, <=, <>, !=
- <> and != mean the same: “not equal”
-
SELECT * FROM sales.salesorderheader WHERE orderdate > '7/1/2008'
-
SELECT * FROM sales.salesorderheader WHERE salesorderid > 74185
- Note: ‘7/1/2008’ implies a time of midnight
- AND
-
SELECT * FROM sales.salesorderheader WHERE salesorderid > 74185 AND salesorderid != 74188
-
- BETWEEN … AND (inclusive)
-
SELECT * FROM sales.salesorderheader WHERE orderdate BETWEEN '7/1/2008' AND '7/2/2008'
-
GROUP BY and Aggregate Functions
- AVG/SUM/MIN/MAX/COUNT and others
- Do a web search similar “msdn sql avg” to get quick access to reference documentation
-
SELECT status, AVG(totaldue) FROM sales.salesorderheader WHERE orderdate BETWEEN '7/1/2008' AND '7/3/2008' GROUP BY status
Homework
- For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales