Page cover image

SQL

A relational database is a database that organizes information into one or more tables.

1. Basic SQL Commands

Basic Commands

  • CREATE TABLE creates a new table.

  • INSERT INTO adds a new row to a table.

  • SELECT queries data from a table.

  • ALTER TABLE changes an existing table.

  • UPDATE edits a row in a table.

  • DELETE FROM deletes rows from a table.

Create Table

Data types include INTEGER, TEXT, DATE and REAL (decimal value).

SELECT * FROM table_name;

CREATE TABLE table_name (
   column_1 data_type, 
   column_2 data_type, 
   column_3 data_type
);

CREATE TABLE celebs (
   id INTEGER, 
   name TEXT, 
   age INTEGER
); 

INSERT INTO celebs (id, name, age) 
VALUES (1, 'Justin Bieber', 22);         

  • Clauses are written in capital letters - perform specific tasks (They are commands)

  • parameters are in parenthesis, in this case, a list of column names and data

  • CREATE TABLE - creates from a scratch

Inserting Rows of Data

INSERT INTO table_name - adds specified row

(id,name, age) - parameter identifying columns the data added to

VALUES - data inserted

2. Queries

  • SELECT is the clause we use every time we want to query information from a database.

  • AS renames a column or table.

  • DISTINCT return unique values.

  • WHERE is a popular command that lets you filter the results of the query based on conditions that you specify.

  • LIKE and BETWEEN are special operators.

  • AND and OR combines multiple conditions.

  • ORDER BY sorts the result.

ORDER BY

  • can ORDER BY RIGHT (columnname, numberofchar) : from the right, pick 3 characters then order by these

  • ORDER BY 1stpriority, second priority : etc ORDER BY RIGHT(row1, 3), row2 ASC

  • LIMIT specifies the maximum number of rows that the query will return.

  • CASE creates different outputs.

### COMPREHENSIVE REVIEW CHEATSHEET
SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999;

SELECT name FROM movies WHERE name BETWEEN 'A' AND 'J';

SELECT * 
FROM movies
WHERE year BETWEEN 1990 AND 1999
   AND genre = 'romance';

SELECT *
FROM movies
WHERE year BETWEEN 1970 AND 1979
  AND imdb_rating > 8;
  
SELECT *
FROM movies
WHERE year > 2014
   OR genre = 'action';
   
### SORTING by alphabet or numerical
SELECT *
FROM movies
ORDER BY name; #auto sort by A-Z

SELECT *
FROM movies
WHERE imdb_rating > 8
ORDER BY year DESC; # DESC is a key word for descending order
# ORDER BY year ASC; #ascending for ASC
# ORDER BY always goes after WHERE (if WHERE is present).

## ORDER BY LAST 3 CHARACTERS IN ASC ID
SELECT NAME FROM STUDENTS WHERE Marks > 75 ORDER BY RIGHT(NAME, 3), ID ASC;

### QUESTION: Write a new query that retrieves the name, year, and 
# imdb_rating columns of all the movies, ordered highest to lowest by their ratings.     
SELECT name, year, imdb_rating FROM movies ORDER BY imdb_rating DESC;

### LIMIT - specify max no. of rows in result set
SELECT * FROM movies where name BETWEEN 'A' and 'J' BY ORDER DESC LIMIT 20
# selecting top 3 imdbratings by limit 3
SELECT * FROM movies
ORDER BY imdb_rating DESC
LIMIT 3;


### CASE (if - then logic)
   #Task: split movies by imdbrating, above 8 is fantastic, above 6, poorly recieved, else, avoid at all costs    
SELECT name,
 CASE
  WHEN imdb_rating > 8 THEN 'Fantastic'  # WHEN tests condition, THEN returns if true  
  WHEN imdb_rating > 6 THEN 'Poorly Received'
  ELSE 'Avoid at All Costs'
 END # you can rename column name to 'Review' by 'END AS 'Review'
FROM movies;

Select to Fetch Data

1. Select from

SELECT name FROM celebs;
SELECT * FROM celebs;

name specifies column to select from, the * specifies data from all columns in the table. This command always return a new table, called result set.

  • " * " is a wildcard, select every column

SELECT name , genre , year FROM celebs;
//also works

2. Select as from

Allows you to set an alias for your column name in the results

SELECT name AS 'Title - Alias' FROM celebs;
# allows you to set name as an alias 
# this does not rename in the table, but only show in results

3. distinct (add before column name)

used to return unique values, filters out duplicate values in the column

SELECT DISTINCT tools 
FROM inventory;

4. where

WHERE clause filters the result set to only include rows where the following condition is true.

SELECT * FROM movies WHERE imdb_rating>8;

5. Like

SELECT * FROM movies WHERE name LIKE 'se_en';
  • LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column.

  • name LIKE 'Se_en' is a condition evaluating the name column for a specific pattern.

  • Se_en represents a pattern with a wildcard character.

The _ means you can substitute any individual character here without breaking the pattern. The names Seven and Se7en both match this pattern.

The percentage sign % is another wildcard character that can be used with LIKE.

This statement below filters the result set to only include movies with names that begin with the letter ‘A’:

SELECT * 
FROM movies
WHERE name LIKE 'A%';

SELECT * FROM movies WHERE name Like '%man%' # any name with man inside

use WHERE <column> IS NULL or IS NOT NULL : for missing data or non empty data

6. Between

SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999;

SELECT name FROM movies WHERE name BETWEEN 'A' AND 'J';

SELECT * 
FROM movies
WHERE year BETWEEN 1990 AND 1999
   AND genre = 'romance';

SELECT *
FROM movies
WHERE year BETWEEN 1970 AND 1979
  AND imdb_rating > 8;
  
SELECT *
FROM movies
WHERE year > 2014
   OR genre = 'action';

When the values are text, BETWEEN filters the result set for within the alphabetical range.

Alter Table to add new column

ALTER TABLE celebs;
ADD COLUMN twitter_handle 
TEXT;

NULL is a special value in SQL that represents missing or unknown data. Here, the rows that existed before the column was added have NULL (∅) values for twitter_handle.

UPDATE FROM table to edit existing rows or columns

UPDATE celebs
SET twitter_handle #indicates column to edit
= '@taylorswift13' # new value
WHERE id = 4; # indicates which row to apply to, here the row     
# with 4 nin id column

DELETE FROM to delete records

The statement below deletes all records in the celeb table with no twitter_handle

DELETE FROM celebs 
WHERE twitter_handle IS NULL;    
#where lets you know which rows    
#IS NULL - condition to delete  

Constraints Set for table

Constraints that add information about how a column can be used are invoked after specifying the data type for a column. They can be used to tell the database to reject inserted data that does not adhere to a certain restriction. The statement below sets constraints on the celebs table.

CREATE TABLE celebs (
   id INTEGER PRIMARY KEY, 
   name TEXT UNIQUE,
   date_of_birth TEXT NOT NULL  ,
   date_of_death TEXT DEFAULT 'Not Applicable'       
);

1. PRIMARY KEY columns can be used to uniquely identify the row. Attempts to insert a row with an identical value to a row already in the table will result in a constraint violation which will not allow you to insert the new row.

2. UNIQUE columns have a different value for every row. This is similar to PRIMARY KEY except a table can have many different UNIQUE columns.

3. NOT NULL columns must have a value. Attempts to insert a row without a value for a NOT NULL column will result in a constraint violation and the new row will not be inserted.

4. DEFAULT columns take an additional argument that will be the assumed value for an inserted row if the new row does not specify a value for that column.

3. Aggregate Functions

Calculations performed on multiple rows of a table are called aggregates.

  • COUNT(): count the number of rows

  • SUM(): the sum of the values in a column

  • MAX()/MIN(): the largest/smallest value

  • AVG(): the average of the values in a column

  • ROUND(): round the values in the column

By default, SQL tries to be as precise as possible without rounding. We can make the result table easier to read using the ROUND() function.

ROUND() function takes two arguments inside the parenthesis:

  1. a column name

  2. an integer

It rounds the values in the column to the number of decimal places specified by the integer.

  • GROUP BY : used with SELECT to arrange identical data to groups, The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT.

  • HAVING : similar to WHERE (which filters rows) but it filters GROUPs, HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.

    1. When we want to limit the results of a query based on values of the individual rows, use WHERE.

    2. When we want to limit the results of a query based on an aggregate property, use HAVING.

### 1. COUNT rows
SELECT COUNT(*) FROM table_name;
  # with condition
SELECT COUNT(*) 
FROM fake_apps
WHERE price = 0;

### 2. SUM values in column_name
SELECT SUM(column_name)
FROM table_name;

### 3. MAX MIN value in a column_name
SELECT MAX(column_name)
FROM table_name;

### 4. AVG average
SELECT AVG(column_name) FROM table_name;

### 5. ROUND (parameter1 = column_name, parameter2 = decimal places)
SELECT name, ROUND(price, 0)
FROM fake_apps;
    # Combining aggregators
    SELECT ROUND(AVG(price), 2)
    FROM fake_apps;
    
### 6. GROUP BY along with SELECT
SELECT year,
   AVG(imdb_rating) #avg of column
FROM movies #table_name
GROUP BY year
ORDER BY year;

SELECT price, COUNT(*) 
FROM fake_apps
WHERE downloads > 20000
GROUP BY price;

### 7. HAVING for filtering groups
SELECT year,
   genre,
   COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;

SELECT name, category FROM startups GROUP BY 2 HAVING COUNT(*)>3
GROUP BY I

Group By 2

SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY ROUND(imdb_rating)
ORDER BY ROUND(imdb_rating);

### The above and below is the same query
SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY 1 # this number references the first column u mentioned, ROUND(imdb_rating)      
ORDER BY 1;
GROUP BY II References

Count example 1:

SELECT COUNT(CITY) - COUNT(DISTINCT CITY) FROM STATION; 
#total minus distinct results

Example project : Forum moderating to catch rickroller users

SELECT user,
   COUNT(*)
FROM hacker_news
WHERE url LIKE '%watch?v=dQw4w9WgXcQ%' # this url is the subdoamin of youtube rickroll vid
GROUP BY 1
ORDER BY 2 DESC;

#or similarly
SELECT user,
   COUNT(*)
FROM hacker_news
WHERE url LIKE '%watch?v=dQw4w9WgXcQ%'
GROUP BY user
ORDER BY COUNT(*) DESC;

# To find source of news from forum hacker_news
SELECT CASE 
  WHEN url LIKE "%github.com%" THEN 'Github'
  WHEN url LIKE '%medium.com%' THEN 'Medium'
  WHEN url LIKE '%newyorktimes.com%' THEN 'New York Times'
  ELSE 'Other'
  END AS 'Source'
FROM hacker_news;
### to count the occurences
SELECT CASE 
  WHEN url LIKE "%github.com%" THEN 'Github'
  WHEN url LIKE '%medium.com%' THEN 'Medium'
  WHEN url LIKE '%newyorktimes.com%' THEN 'New York Times'
  ELSE 'Other'
  END AS 'Source',
  COUNT(*)
FROM hacker_news
GROUP BY 1;
Hacker_news count results

4. SQLite

SQLite is a database engine. It is software that allows users to interact with a relational database. In SQLite, a database is stored in a single file — a trait that distinguishes it from other database engines. This fact allows for a great deal of accessibility: copying a database is no more complicated than copying the file that stores the data, sharing a database can mean sending an email attachment.

  • does not validate data types

  • only one user can write to the file at a time

  • used where it makes sense for the database to be on the same disk as the application code.

SQLite creates schemas, which constrain the type of data in each column, but it does not enforce them. (will not reject wrong type)

#Just downloaded sqlite, access via following commands
sqlite <name of database.sqlite> 
#use .exit 
#use .open FILENAME 

strftime() - format date

takes 2 arguments : strftime(format, column)

For strftime(__, timestamp):

  • %Y returns the year (YYYY)

  • %m returns the month (01-12)

  • %d returns the day of the month (1-31)

  • %H returns 24-hour clock (00-23)

  • %M returns the minute (00-59)

  • %S returns the seconds (00-59)

if timestamp format is YYYY-MM-DD HH:MM:SS.

SELECT timestamp,
   strftime('%H', timestamp)
FROM hacker_news
GROUP BY 1
LIMIT 20;
Results of the above
Grouping by hour, we find the most popular hour to post to get the highest scores on the forum
Next we use round and renaming columns using 'AS' (forgot to add WHERE hour IS NOT NULL)

5. Multiple Tables

Sometimes there's too much information to put into a single table, here we learn some SQL commands to help us work with multiple tables.

COMMANDS:

  • JOIN will combine rows from different tables if the join condition is true.

  • LEFT JOIN will return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table.

  • Primary key is a column that serves a unique identifier for the rows in the table.

  • Foreign key is a column that contains the primary key to another table.

  • CROSS JOIN lets us combine all rows of one table with all rows of another table.

  • UNION stacks one dataset on top of another.

  • WITH allows us to define one or more temporary tables that can be used in the final query.

Example case : A Magazine company's subscription services database

3 total tables

JOIN tables

Combining tables manually is time-consuming. Luckily, SQL gives us an easy sequence for this: it’s called a JOIN.

If we want to combine orders and customers, we would type:

SELECT *
FROM orders
JOIN customers
  ON orders.customer_id = customers.customer_id; # table_name.column_name
  # given 3 tables, orders, subscription, customers

Let’s break down this command:

  1. The first line selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want.

  2. The second line specifies the first table that we want to look in, orders

  3. The third line uses JOIN to say that we want to combine information from orders with customers.

  4. The fourth line tells us how to combine the two tables. We want to match orders table’s customer_id column with customers table’s customer_id column.

Because column names are often repeated across multiple tables, we use the syntax table_name.column_name to be sure that our requests for columns are unambiguous. In our example, we use this syntax in the ON statement, but we will also use it in the SELECT or any other statement where we refer to column names.

For example: Instead of selecting all the columns using *, if we only wanted to select orders table’s order_id column and customers table’s customer_name column, we could use the following query:

SELECT orders.order_id,
   customers.customer_name
FROM orders
JOIN customers
  ON orders.customer_id = customers.customer_id;
  
  ### EXAMPLE - filter (WHERE is after JOIN ON)
SELECT * FROM orders
JOIN subscriptions
ON orders.subscription_id = subscriptions.subscription_id
WHERE subscriptions.description = 'Fashion Magazine';

Inner Joins

Example 1 - multiple queries

SELECT COUNT(*) FROM newspaper;
SELECT COUNT(*) FROM online;
SELECT COUNT(*)
FROM newspaper
JOIN online
ON newspaper.id = online.id;

Left Joins

What if we want to combine two tables and keep some of the un-matched rows?

SQL lets us do this through a command called LEFT JOIN. A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table.

SELECT *
FROM table1
LEFT JOIN table2 # joins regardless of presence of matching rows
  ON table1.c2 = table2.c2;
  1. The first line selects all columns from both tables.

  2. The second line selects table1 (the “left” table).

  3. The third line performs a LEFT JOIN on table2 (the “right” table).

  4. The fourth line tells SQL how to perform the join (by looking for matching values in column c2).

SELECT * 
FROM newspaper
LEFT JOIN online
  ON newspaper.id = online.id;

### Want to find those that subscribe to newspaper but NOT ONLINE
SELECT * 
FROM newspaper
LEFT JOIN online
  ON newspaper.id = online.id # the column that u r joining is ON
WHERE online.id IS NULL; # WHERE is empty in the online id row

CROSS JOIN

Join all rows of one table to another regardless of matching information

SELECT shirts.shirt_color,
   pants.pants_color
FROM shirts
CROSS JOIN pants; # dont need ON as u r not joingin on any column BUT EVERYTHING
  • The first two lines select the columns shirt_color and pants_color.

  • The third line pulls data from the table shirts.

  • The fourth line performs a CROSS JOIN with pants.

UNION - stacking tables

Sometimes we just want to stack one dataset on top of the other.

SELECT *
FROM table1
UNION
SELECT *
FROM table2;

SQL has strict rules for appending data:

  • Tables must have the same number of columns.

  • The columns must have the same data types in the same order as the first table.

WITH

Often times, we want to combine two tables, but one of the tables is the result of another calculation.

WITH previous_results AS (
   SELECT ...
   ...
   ...
   ...
)
SELECT *
FROM previous_results
JOIN customers
  ON _____ = _____;
  • The WITH statement allows us to perform a separate query (such as aggregating customer’s subscriptions)

  • previous_results is the alias that we will use to reference any columns from the query inside of the WITH clause

  • We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table)

Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.

EXAMPLE : Let’s return to our magazine order example. Our marketing department might want to know a bit more about our customers. For instance, they might want to know how many magazines each customer subscribes to. We can easily calculate this using our orders table, but a customer_id isn’t terribly useful for our marketing department, they probably want to know the customer’s name.

We want to be able to join the results of this query with our customers table, which will tell us the name of each customer. We can do this by using a WITH clause.

WITH previous_query AS (
  SELECT customer_id,
   COUNT(subscription_id) AS 'subscriptions'
FROM orders
GROUP BY customer_id
)
SELECT customers.customer_name, previous_query.subscriptions 
FROM previous_query
JOIN customers
  ON customers.customer_id = previous_query.customer_id;
Results of above

HackerRank Qns

  1. Query for even id

HackerRank Basic SQL

2. Basic Join with filter

Given the CITY and COUNTRY tables, query the sum of the populations of all cities where the CONTINENT is 'Asia'.

Note: CITY.CountryCode and COUNTRY.Code are matching key columns.

SELECT SUM(CITY.POPULATION)
FROM CITY
LEFT JOIN COUNTRY
    ON CITY.CountryCode = COUNTRY.Code
    WHERE COUNTRY.CONTINENT = 'Asia';

3. ORDER BY last 3 characters in ascending ID

Query the Name of any student in STUDENTS who scored higher than Marks. Order your output by the last three characters of each name. If two or more students both have names ending in the same last three characters (i.e.: Bobby, Robby, etc.), secondary sort them by ascending ID.

SELECT Name FROM STUDENTS WHERE Marks >75
ORDER BY RIGHT(Name,3), ID ASC

4. HackerRank Basic SQL Certification

# QUESTION 1. STUDENT ADVISOR
SELECT student_information.roll_number, student_information.name 
FROM student_information
JOIN faculty_information
ON student_information.advisor = faculty_information.employee_ID
WHERE (faculty_information.gender = 'M' AND faculty_information.salary >15000) 
OR (faculty_information.gender = 'F'AND faculty_information.salary >20000);    

# QUESTION 2: PROFITABLE STOCKS
# condition is that given list of stocks
SELECT price_today.stock_code 
FROM price_today JOIN price_tomorrow 
ON price_today.stock_code = price_tomorrow.stock_code 
WHERE price_tomorrow.price > price_today.price # condition
ORDER BY stock_code ASC;

Last updated