Stephen Balkum

Agile Enthusiast, Leader, and Developer

So You Want to Learn SQL… Session 1

no comment

Here is a summary of today’s session.

The Goal

  • To be able to write 90% of your own queries
  • Not to be a DBA
  • All of our topics apply well to MS SQL Server (TSQL), MySQL, Oracle PSQL, Access, DB2, PostgreSQL
  • I will reference the AdventureWorks2012 database on MS SQL Server in this series of blog posts.

Brief History

  • http://en.wikipedia.org/wiki/SQL
    • Developed at IBM in early 70s
    • SEQUEL became SQL because of trademark conflict with an aircraft company
    • Oracle sold first commercial implementation in 1979
    • Similar across vendors, but each has their quirks
    • Standardized in 1986
  • The language has not changed significantly
  • Database designs are good at reading, writing, and reporting, but not all at once.  Most databases are part of a transactional system, like an e-commerce site, and are not optimized for reporting.  We have to deal with that and will highlight techniques to help.

Think About Sets

  • Think about sets of data.  Do not think about discreet data entries.
  • SQL is very good at operating on sets of data and terrible at row-by-row operations.
  • Set Theory from your school days
    • Union – joining two sets of data to make a bigger result set
    • Intersection – just the common items of two sets of data
    • Complement – subtract a subset from a larger set

Using Management Studio

  • View Databases, tables, columns in the Object Explorer on the left

  • New Query window
    • Click “New Query” in toolbar.  You can have many.
    • Select database via dropdown in toolbar
    • Hover over Query window tab to verify its database in the tooltip that appears
    • F5 to Execute
      • If text selected, will only execute selected text
      • All text will be executed if none selected
      • Fastest developers rarely use a mouse
    • Parse – “check” button on toolbar
    • Stop
      • If your query is running longer than a couple of seconds, make sure you know what you are doing.
      • If your query is taking longer than 10 seconds, really make sure you know what you are doing.
      • If your query is taking 30 seconds or more, stop it and seek help.
    • Statistics in lower right corner: execution time and number of rows
  • Don’t be afraid to explore

Tables, Columns, and Data

  • Table is like your favorite spreadsheet
    • rows and columns
    • similar types of data per column
    • usually a unique item in leftmost column
  • Common column data types
    • int – a number with no decimals
    • char(#)
      • Specifies max length
      • Reserves all space
    • nvarchar(#) and varchar(#)
      • Specifies max length
      • Simply put, only uses needed space
      • nvarchar allows multi-byte characters (oriential characters)
    • datetime – pretty obvious
    • money – number with decimals, not affected by floating point precision issues (repeating 9s)
    • uniqueidentifier – an ugly string of “random” characters that is guaranteed unique upon creation
    • float – number with decimals, lots of digits, might show “repeating 9s”
    • bit – true or false
  • Vast majority will and should have a Primary Key (PK), but not required
    • yellow key icon in Object Explorer
    • usually an int (nicer for humans) or uniqueidentifier
  • NULL values, nullable columns
    • The absence of value
    • Not appropriate to give it meaning
    • Frequently debated topic on how to use within a database design

SELECT … FROM

  • *
    • returns all columns
    • SELECT * FROM sales.currency
    • SELECT * FROM sales.salesorderheader

      (Ouch!)  Don’t do this unless you know the table is not too big.  The orders table could have a lot of rows and you are just locking things up by returning tons of data you can’t process as a human.

  • COUNT(*) – an aggregate function
    • Use COUNT to find out how many rows are in a table before doing something as wild as SELECT *
    • SELECT COUNT(*) FROM sales.salesorderheader
  • TOP #
    • This returns a specified number of rows. Great way to look at some data without getting all of the data.
    • Safest to assume the server is randomly choosing which rows unless you specify more in your query.
    • SELECT top 10 *
      	FROM sales.salesorderheader
  • Specifying columns
    • column1, column2, *
    • AS – column alias, rename a column, used for convenience and can be very useful with more complex queries. Using the AS keyword while specifying a column alias is optional. In fact Oracle does not support it.
    • Use single quotes or square brackets around your alias if spaces are desired.
    • Pro tip: When exploring data, you may want to see all columns using the *, but you want to move particular columns to the left, specify those columns and add the * also.
    • SELECT TOP 10 salesorderid AS id, orderdate AS date, status, territoryid, totaldue, *
      	FROM sales.salesorderheader
  • DISTINCT
    • Returns unique values. This is a good tool to see what are the possible values for a column, like a status column.
    • SELECT DISTINCT status, territoryid
      	FROM sales.salesorderheader

The 3 C’s

I attribute this to a friend and long time developer, Jim Mischel.  He would coach budding developers on the 3 C’s’:

  • Conciseness – do what is needed to get the job done and no more
  • Completeness – make sure it fully works
  • Clarity – make it easy to understand later

I coach developers heavily on Clarity.  We are writing code here.  The #1 audience for any code you write is the next developer that has to read your code and make sense of it to fix a bug or add a feature.  You will save yourself and your fellow developers a lot of time if you format your query for readability.  Make it a habit and complain when it is not clear.  Follow the conventions I use and your SQL will look like most everybody else’s.  All but the very simplest of queries will span multiple indented lines.

Comments

  • /* … */
  • – – (to end of line)
  • SELECT TOP 100 *
    	/* this will be ignored */
    	FROM sales.salesorderheader
  • SELECT TOP 100 *
    	-- this will also be ignored
    	FROM sales.salesorderheader
  • SELECT TOP 100 * -- and this, too, will be ignored
    	FROM sales.salesorderheader

ORDER BY

  • Column name
  • Column number
  • ASC for ascending order, this is the default
  • DESC for descending order
  • SELECT top 100 *
    	FROM sales.salesorderheader
    	ORDER BY salesorderid DESC
  • SELECT countryregioncode, name
    	FROM sales.salesterritory
    	ORDER BY countryregioncode DESC, name
  • SELECT countryregioncode, name
    	FROM sales.salesterritory
    	ORDER BY 1 DESC, 2

WHERE

  • =, >, <, >=, <=, <>, !=
    • <> and != mean the same: “not equal”
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE orderdate > '7/1/2008'
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE salesorderid > 74185
    • Note: ‘7/1/2008’ implies a time of midnight
  • AND
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE salesorderid > 74185
      		AND salesorderid != 74188
  • BETWEEN … AND (inclusive)
    • SELECT *
      	FROM sales.salesorderheader
      	WHERE orderdate BETWEEN '7/1/2008' AND '7/2/2008'

GROUP BY and Aggregate Functions

  • AVG/SUM/MIN/MAX/COUNT and others
  • Do a web search similar “msdn sql avg” to get quick access to reference documentation
  • SELECT status, AVG(totaldue)
    	FROM sales.salesorderheader
    	WHERE orderdate BETWEEN '7/1/2008' AND '7/3/2008'
    	GROUP BY status

Homework

  • For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales

So You Want to Learn SQL…

no comment

I had a request from a new QA at the office to mentor him in the fine art of SQL.  Since most query requests come from marketing folks, I decided to open up the session to anyone interested and quickly had 16 students.  I guess I should be more than a little prepared.

The overall goal will be for folks to be capable of writing 90% of their own queries.  Once we reach that goal, we will pick a new goal.  The session is 25 minutes every Thursday morning and I expect it to take a few weeks.  I will document everything here so anyone can follow along.

Creating TDD monsters

no comment

I get solicited for advice frequently by budding software engineers.  “How do I start?”  “Can you recommend a book?” “What language should I learn?” As a homeschool family, I had a large impact on the computer related topics my children studied in high school.  This summer has produced an interesting twist: multiple members of the family want to write an app.  I’ll focus on my son’s app here.  He is a national chess master and has identified a gap in the chess app market for handheld devices.  He is currently a junior in college studying computational and applied math.  He has done a lot of analytical progragramming, but not with a general purpose programming language.

I have been focused on all things web for many years now and haven’t done anything significant on a desktop in a long time.  Since I predominantly code in C#, writing something for the surging Windows store could be fun.  With practically zero knowledge of the app store framework, I had the fledgling developers concentrate entirely on application logic rather than UI.  That means the only way to exercise the code is through tests. Most new developers begin their journey with console apps.  This turns out to be a a big plus for introducing TDD since unit tests feel a lot like running a simple console app.

Chess is not a simple game, so it is no surprise that my son’s app is loaded with complex logic.  He jumped in with both feet and quickly saw the power of unit tests.  In a very short time, he had over 100 tests validating checkmate logic.  As he got deeper in the development, he discovered multiple times a need to fundamentally change his architecture.  With each rework, he had an ever growing set of unit tests to fall back on. He absolutely gets it.

I’ve created a TDD monster!

Creating WordPress users in bulk

no comment

On one of the WordPress sites I maintain, I found myself needing to create about 50 users. With a quick search, I found the needed functions are wp_create_user and wp_new_user_notification.  However, the sample scripts did not work.  Either the required WordPress bootstrap file was incorrect, or needed a path, or the wrong arguments were passed.  So, for everyone’s benefit, here is my very simple script:

<?php
require('/full/path/to/site/wp-blog-header.php');

// repeat the following two lines for as many more users as needed.
$user_id = wp_create_user('username', 'ClearTextPassword', 'user@emailaddress.net');
wp_new_user_notification($user_id, 'ClearTextPassword');
?>

The user will receive the standard announcement email. I used Excel to generate the needed lines and pasted them into the script.  Finally, I placed the script, named makenewusers.php, at the root of the site and pointed a browser at http://mysite.com/makenewusers.php.

Restricting downloads in WordPress on Lighttpd

no comment

Here’s the scenario: you have a WordPress site with a collection of files that need to be protected to only authenticated users.  The site is not super high security, but you want a basic obstacle.

All of your searches will talk about .htaccess file assuming you are running Apache.  But, you have chosen a leaner server running Lighttpd which does not support .htaccess files.

I installed the Media File Manager plugin to easily create child folders in the uploads folder and be able to move files around.  My goal is for non-techies to maintain this site.  Using this plugin, I created the members-only folder inside uploads.  Inside this folder is this php file:

<?php
if( !empty( $_GET['name'] ) )
{
  // check if user is logged
  if( is_logged_in() )
  {
    $clean_name = str_replace("..", "", $_GET['name']);
    $clean_name = str_replace("/", "", $clean_name);
    $filename = "{$_SERVER['DOCUMENT_ROOT']}/wp-content/uploads/members-only/{$clean_name}";
    if( file_exists( $filename ) )
    {
      header( 'Cache-Control: public' );
      header( "Content-Disposition: attachment; filename={$clean_name}" );
      if(!empty($_GET['type']))
        header( "Content-Type: {$_GET['type']}");
      readfile( $filename );
      exit;
    }
  }
}
die( "ERROR: invalid file or you don't have permissions to download it." );

function is_logged_in() {
  foreach ($_COOKIE as $cookie_name => $cookie_value) {
    if(startswith($cookie_name, "wordpress_logged_in_"))
      return true;
  }
}

function startswith($haystack, $needle) {
  return substr($haystack, 0, strlen($needle)) === $needle;
}
?>

To block a url direct to the file, I added a new rewrite rule to the top of the list:

url.rewrite-final = (
  "^/wp-content/uploads/members-only/(.*)" => "/wp-content/uploads/members-only/download.php?name=$1",
  # Exclude some directories from rewriting
  "^/(wp-admin|wp-includes|wp-content|gallery2)/(.*)" => "$0",
  # Exclude .php files at root from rewriting
  "^/(.*.php)" => "$0",
  # Handle permalinks and feeds
  "^/(.*)$" => "/index.php/$1"

At this point, I can put links in my posts and have a modicum of security from users passing around the links to unauthorized users.  I can even specify the content-type:

http://www.mysite.com/wp-content/uploads/members-only/newsletter.pdf&type=application/pdf

Hope this helps.