Stephen Balkum

Agile Enthusiast, Leader, and Developer

So You Want to Learn SQL… Session 1

no comment

Here is a summary of today’s session.

The Goal

  • To be able to write 90% of your own queries
  • Not to be a DBA
  • All of our topics apply well to MS SQL Server (TSQL), MySQL, Oracle PSQL, Access, DB2, PostgreSQL
  • I will reference the AdventureWorks2012 database on MS SQL Server in this series of blog posts.

Brief History

  • http://en.wikipedia.org/wiki/SQL
    • Developed at IBM in early 70s
    • SEQUEL became SQL because of trademark conflict with an aircraft company
    • Oracle sold first commercial implementation in 1979
    • Similar across vendors, but each has their quirks
    • Standardized in 1986
  • The language has not changed significantly
  • Database designs are good at reading, writing, and reporting, but not all at once.  Most databases are part of a transactional system, like an e-commerce site, and are not optimized for reporting.  We have to deal with that and will highlight techniques to help.

Think About Sets

  • Think about sets of data.  Do not think about discreet data entries.
  • SQL is very good at operating on sets of data and terrible at row-by-row operations.
  • Set Theory from your school days
    • Union – joining two sets of data to make a bigger result set
    • Intersection – just the common items of two sets of data
    • Complement – subtract a subset from a larger set

Using Management Studio

  • View Databases, tables, columns in the Object Explorer on the left

  • New Query window
    • Click “New Query” in toolbar.  You can have many.
    • Select database via dropdown in toolbar
    • Hover over Query window tab to verify its database in the tooltip that appears
    • F5 to Execute
      • If text selected, will only execute selected text
      • All text will be executed if none selected
      • Fastest developers rarely use a mouse
    • Parse – “check” button on toolbar
    • Stop
      • If your query is running longer than a couple of seconds, make sure you know what you are doing.
      • If your query is taking longer than 10 seconds, really make sure you know what you are doing.
      • If your query is taking 30 seconds or more, stop it and seek help.
    • Statistics in lower right corner: execution time and number of rows
  • Don’t be afraid to explore

Tables, Columns, and Data

  • Table is like your favorite spreadsheet
    • rows and columns
    • similar types of data per column
    • usually a unique item in leftmost column
  • Common column data types
    • int – a number with no decimals
    • char(#)
      • Specifies max length
      • Reserves all space
    • nvarchar(#) and varchar(#)
      • Specifies max length
      • Simply put, only uses needed space
      • nvarchar allows multi-byte characters (oriential characters)
    • datetime – pretty obvious
    • money – number with decimals, not affected by floating point precision issues (repeating 9s)
    • uniqueidentifier – an ugly string of “random” characters that is guaranteed unique upon creation
    • float – number with decimals, lots of digits, might show “repeating 9s”
    • bit – true or false
  • Vast majority will and should have a Primary Key (PK), but not required
    • yellow key icon in Object Explorer
    • usually an int (nicer for humans) or uniqueidentifier
  • NULL values, nullable columns
    • The absence of value
    • Not appropriate to give it meaning
    • Frequently debated topic on how to use within a database design

SELECT … FROM

  • *
    • returns all columns
    • SELECT * FROM sales.currency
    • SELECT * FROM sales.salesorderheader

      (Ouch!)  Don’t do this unless you know the table is not too big.  The orders table could have a lot of rows and you are just locking things up by returning tons of data you can’t process as a human.

  • COUNT(*) – an aggregate function
    • Use COUNT to find out how many rows are in a table before doing something as wild as SELECT *
    • SELECT COUNT(*) FROM sales.salesorderheader
  • TOP #
    • This returns a specified number of rows. Great way to look at some data without getting all of the data.
    • Safest to assume the server is randomly choosing which rows unless you specify more in your query.
    • SELECT top 10 *
      	FROM sales.salesorderheader
  • Specifying columns
    • column1, column2, *
    • AS – column alias, rename a column, used for convenience and can be very useful with more complex queries. Using the AS keyword while specifying a column alias is optional. In fact Oracle does not support it.
    • Use single quotes or square brackets around your alias if spaces are desired.
    • Pro tip: When exploring data, you may want to see all columns using the *, but you want to move particular columns to the left, specify those columns and add the * also.
    • SELECT TOP 10 salesorderid AS id, orderdate AS date, status, territoryid, totaldue, *
      	FROM sales.salesorderheader
  • DISTINCT
    • Returns unique values. This is a good tool to see what are the possible values for a column, like a status column.
    • SELECT DISTINCT status, territoryid
      	FROM sales.salesorderheader

The 3 C’s

I attribute this to a friend and long time developer, Jim Mischel.  He would coach budding developers on the 3 C’s’:

  • Conciseness – do what is needed to get the job done and no more
  • Completeness – make sure it fully works
  • Clarity – make it easy to understand later

I coach developers heavily on Clarity.  We are writing code here.  The #1 audience for any code you write is the next developer that has to read your code and make sense of it to fix a bug or add a feature.  You will save yourself and your fellow developers a lot of time if you format your query for readability.  Make it a habit and complain when it is not clear.  Follow the conventions I use and your SQL will look like most everybody else’s.  All but the very simplest of queries will span multiple indented lines.

Comments

  • /* … */
  • – – (to end of line)
  • SELECT TOP 100 *
    	/* this will be ignored */
    	FROM sales.salesorderheader
  • SELECT TOP 100 *
    	-- this will also be ignored
    	FROM sales.salesorderheader
  • SELECT TOP 100 * -- and this, too, will be ignored
    	FROM sales.salesorderheader

ORDER BY

  • Column name
  • Column number
  • ASC for ascending order, this is the default
  • DESC for descending order
  • SELECT top 100 *
    	FROM sales.salesorderheader
    	ORDER BY salesorderid DESC
  • SELECT countryregioncode, name
    	FROM sales.salesterritory
    	ORDER BY countryregioncode DESC, name
  • SELECT countryregioncode, name
    	FROM sales.salesterritory
    	ORDER BY 1 DESC, 2

WHERE

  • =, >, <, >=, <=, <>, !=
    • <> and != mean the same: “not equal”
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE orderdate > '7/1/2008'
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE salesorderid > 74185
    • Note: ‘7/1/2008’ implies a time of midnight
  • AND
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE salesorderid > 74185
      		AND salesorderid != 74188
  • BETWEEN … AND (inclusive)
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE orderdate BETWEEN '7/1/2008' AND '7/2/2008'

GROUP BY and Aggregate Functions

  • AVG/SUM/MIN/MAX/COUNT and others
  • Do a web search similar “msdn sql avg” to get quick access to reference documentation
  • SELECT status, AVG(totaldue)
    	FROM sales.salesorderheader
    	WHERE orderdate BETWEEN '7/1/2008' AND '7/3/2008'
    	GROUP BY status

Homework

  • For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales

Comments are closed.