diff --git a/2_intro-sql-2.md b/2_intro-sql-2.md index 5cf1f6a..39ec148 100644 --- a/2_intro-sql-2.md +++ b/2_intro-sql-2.md @@ -288,6 +288,11 @@ Just like in Python, we use the `IS` keyword to check if a value is missing. A subquery, also known as a nested query, is a query within another SQL query. It's like a query inside a query! Subqueries are used to perform operations that require multiple steps, such as filtering data based on a complex condition or aggregating data before using it in the main query. In other words, instead of creating multiple new tables as intermediate steps, you can define these steps within the scope of a larger query. +### Types of Subquery + +A Subquery can return a single value (one row and one column), an entire column of values, or a table of values. +These each can be used in the location within a query where a static value, column, or table would otherwise be. + ### Using Subqueries in DuckDB Let's start by looking at our previously example query to understand how subqueries work in DuckDB. diff --git a/notebooks/2_intro-sql-2.ipynb b/notebooks/2_intro-sql-2.ipynb index e8bd112..9909134 100644 --- a/notebooks/2_intro-sql-2.ipynb +++ b/notebooks/2_intro-sql-2.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "2452983b", + "id": "7b06ba2e", "metadata": {}, "source": [ "\n", @@ -17,7 +17,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0c42e861", + "id": "a52e95d7", "metadata": {}, "outputs": [], "source": [ @@ -27,7 +27,7 @@ }, { "cell_type": "markdown", - "id": "63a3efdd", + "id": "f1ffc1bf", "metadata": {}, "source": [ "## Example Tables\n", @@ -42,7 +42,7 @@ { "cell_type": "code", "execution_count": null, - "id": "988fb387", + "id": "da64aa1b", "metadata": {}, "outputs": [], "source": [ @@ -54,7 +54,7 @@ }, { "cell_type": "markdown", - "id": "d92cf0af", + "id": "38f76505", "metadata": {}, "source": [ "To create the tables in your database, run:" @@ -63,7 +63,7 @@ { "cell_type": "code", "execution_count": null, - "id": "963bd2c6", + "id": "f7475c9d", "metadata": {}, "outputs": [], "source": [ @@ -74,7 +74,7 @@ }, { "cell_type": "markdown", - "id": "a11b1ea7", + "id": "1594ed31", "metadata": {}, "source": [ "To begin understanding the data contained in these tables, you can run:" @@ -83,7 +83,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4500f5f2", + "id": "e73efb1d", "metadata": {}, "outputs": [], "source": [ @@ -94,7 +94,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8153f709", + "id": "42863ed0", "metadata": {}, "outputs": [], "source": [ @@ -104,7 +104,7 @@ }, { "cell_type": "markdown", - "id": "9058dbd5", + "id": "969d1940", "metadata": { "cell_type": "markdown" }, @@ -117,7 +117,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f9a95fe5", + "id": "a8ab5f3d", "metadata": {}, "outputs": [], "source": [ @@ -127,7 +127,7 @@ }, { "cell_type": "markdown", - "id": "cfa7b042", + "id": "4f098dea", "metadata": { "cell_type": "markdown" }, @@ -140,7 +140,7 @@ { "cell_type": "code", "execution_count": null, - "id": "669b6603", + "id": "f47ada05", "metadata": {}, "outputs": [], "source": [ @@ -150,7 +150,7 @@ }, { "cell_type": "markdown", - "id": "4ad7c6e4", + "id": "5950e22a", "metadata": {}, "source": [ "## 1. Aggregate Functions\n", @@ -164,7 +164,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f7b4ed9c", + "id": "73f9064d", "metadata": {}, "outputs": [], "source": [ @@ -177,7 +177,7 @@ }, { "cell_type": "markdown", - "id": "4c45d537", + "id": "3ee4eadf", "metadata": {}, "source": [ "However, aggregating an entire table all the way up to just a single row isn't always what we are looking for. \n", @@ -194,7 +194,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b2406bf9", + "id": "f0b515e2", "metadata": {}, "outputs": [], "source": [ @@ -210,7 +210,7 @@ }, { "cell_type": "markdown", - "id": "375a7d49", + "id": "d2e76919", "metadata": {}, "source": [ "This command groups the rows by the `Species_Common_Name` column and calculates the average `Beak_Width`, `Beak_Depth` and `Beak_Length_Culmen` for the individuals in each bird species group.\n", @@ -223,7 +223,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3507f3fe", + "id": "7492bc0d", "metadata": {}, "outputs": [], "source": [ @@ -240,7 +240,7 @@ }, { "cell_type": "markdown", - "id": "cf888768", + "id": "2bccfbdd", "metadata": { "cell_type": "markdown" }, @@ -254,7 +254,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0d0cf2ed", + "id": "78fde95e", "metadata": {}, "outputs": [], "source": [ @@ -264,7 +264,7 @@ }, { "cell_type": "markdown", - "id": "451fa7cf", + "id": "7c95178b", "metadata": { "cell_type": "markdown" }, @@ -277,7 +277,7 @@ { "cell_type": "code", "execution_count": null, - "id": "752a2e89", + "id": "64667e73", "metadata": {}, "outputs": [], "source": [ @@ -287,7 +287,7 @@ }, { "cell_type": "markdown", - "id": "ed500e4f", + "id": "dfda73eb", "metadata": {}, "source": [ "### Getting the 95th percentile of a column value\n", @@ -298,7 +298,7 @@ { "cell_type": "code", "execution_count": null, - "id": "61bc256f", + "id": "ed36feab", "metadata": {}, "outputs": [], "source": [ @@ -310,7 +310,7 @@ }, { "cell_type": "markdown", - "id": "c4b7d53c", + "id": "1c3cfd43", "metadata": { "cell_type": "markdown" }, @@ -323,7 +323,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4f19a290", + "id": "387babac", "metadata": {}, "outputs": [], "source": [ @@ -333,7 +333,7 @@ }, { "cell_type": "markdown", - "id": "cd177d0a", + "id": "d466479a", "metadata": { "cell_type": "markdown" }, @@ -346,7 +346,7 @@ { "cell_type": "code", "execution_count": null, - "id": "335b4b16", + "id": "b1f626e6", "metadata": {}, "outputs": [], "source": [ @@ -356,7 +356,7 @@ }, { "cell_type": "markdown", - "id": "9073d2c6", + "id": "854e5b77", "metadata": {}, "source": [ "## 3. Understanding SQL Joins\n", @@ -370,7 +370,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c57b086b", + "id": "cb5cf87f", "metadata": {}, "outputs": [], "source": [ @@ -386,7 +386,7 @@ }, { "cell_type": "markdown", - "id": "2e1a8814", + "id": "e6af116b", "metadata": {}, "source": [ "### Step-by-Step Explanation\n", @@ -422,7 +422,7 @@ }, { "cell_type": "markdown", - "id": "b9093a86", + "id": "78d70c6a", "metadata": { "cell_type": "markdown" }, @@ -435,7 +435,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2fe6d05e", + "id": "2328417c", "metadata": {}, "outputs": [], "source": [ @@ -445,7 +445,7 @@ }, { "cell_type": "markdown", - "id": "f6fd9ad9", + "id": "642b4374", "metadata": { "cell_type": "markdown" }, @@ -458,7 +458,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8a4672e8", + "id": "f1cdd3b0", "metadata": {}, "outputs": [], "source": [ @@ -468,7 +468,7 @@ }, { "cell_type": "markdown", - "id": "68e5995b", + "id": "7b5264e5", "metadata": {}, "source": [ "### LEFT OUTER JOIN (LEFT JOIN)\n", @@ -484,7 +484,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1f615b9d", + "id": "a55d2598", "metadata": {}, "outputs": [], "source": [ @@ -500,7 +500,7 @@ }, { "cell_type": "markdown", - "id": "468b63d9", + "id": "7e0ae1ad", "metadata": {}, "source": [ "Notice how the `LEFT JOIN` query has 90371 rows in the result (the same number of rows as the `birds` table), and the `INNER JOIN` query only had 662 rows. \n", @@ -517,7 +517,7 @@ }, { "cell_type": "markdown", - "id": "f7b94012", + "id": "39e6e729", "metadata": { "cell_type": "markdown" }, @@ -533,7 +533,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4c32771c", + "id": "89ef520f", "metadata": {}, "outputs": [], "source": [ @@ -543,7 +543,7 @@ }, { "cell_type": "markdown", - "id": "9df13bed", + "id": "77075653", "metadata": {}, "source": [ "## 3. Subqueries\n", @@ -552,6 +552,11 @@ "\n", "A subquery, also known as a nested query, is a query within another SQL query. It's like a query inside a query! Subqueries are used to perform operations that require multiple steps, such as filtering data based on a complex condition or aggregating data before using it in the main query. In other words, instead of creating multiple new tables as intermediate steps, you can define these steps within the scope of a larger query.\n", "\n", + "### Types of Subquery\n", + "\n", + "A Subquery can return a single value (one row and one column), an entire column of values, or a table of values. \n", + "These each can be used in the location within a query where a static value, column, or table would otherwise be.\n", + "\n", "### Using Subqueries in DuckDB\n", "\n", "Let's start by looking at our previously example query to understand how subqueries work in DuckDB.\n", @@ -564,7 +569,7 @@ { "cell_type": "code", "execution_count": null, - "id": "a26a5995", + "id": "5b94013a", "metadata": {}, "outputs": [], "source": [ @@ -585,7 +590,7 @@ }, { "cell_type": "markdown", - "id": "ad2e8f1f", + "id": "9cbb32ce", "metadata": {}, "source": [ "In this example, the subquery (`SELECT QUANTILE_CONT(birds.Beak_Length_Culmen, 0.99) FROM birds INNER JOIN ducks ON birds.Species_Common_Name = ducks.name`) calculates the 99th percentile of beak length for all birds that are ducks. The main query then selects the names and beak measurements of individual ducks who have a beak length above this value." @@ -593,7 +598,7 @@ }, { "cell_type": "markdown", - "id": "5d440437", + "id": "36c2b19d", "metadata": { "cell_type": "markdown" }, @@ -607,7 +612,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4bca6ab8", + "id": "1fc38014", "metadata": {}, "outputs": [], "source": [ @@ -617,7 +622,7 @@ }, { "cell_type": "markdown", - "id": "75b6da1b", + "id": "ecf9428b", "metadata": { "cell_type": "markdown" }, @@ -630,7 +635,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13ac71cf", + "id": "e455b37e", "metadata": {}, "outputs": [], "source": [ @@ -640,7 +645,7 @@ }, { "cell_type": "markdown", - "id": "37961679", + "id": "1aa414e2", "metadata": { "cell_type": "markdown" }, @@ -655,7 +660,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f62b530a", + "id": "559380fe", "metadata": {}, "outputs": [], "source": [ @@ -665,7 +670,7 @@ }, { "cell_type": "markdown", - "id": "b20798b7", + "id": "4db82b61", "metadata": {}, "source": [ "#### Using the WITH Clause\n", @@ -679,7 +684,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d3d59579", + "id": "05f737bc", "metadata": {}, "outputs": [], "source": [ @@ -710,7 +715,7 @@ }, { "cell_type": "markdown", - "id": "ee64da13", + "id": "2c0fcc97", "metadata": {}, "source": [ "In this example, the `WITH` clause creates two temporary result sets called `duck_beaks` and `pc99_beak_len`. The main query then selects the names and beak measurements of ducks with `Beak_Length_Culmen` above the top 99th percentile beak length." @@ -718,7 +723,7 @@ }, { "cell_type": "markdown", - "id": "bd50217d", + "id": "b888dfc5", "metadata": { "cell_type": "markdown" }, @@ -731,7 +736,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1b8ce006", + "id": "84a980ce", "metadata": {}, "outputs": [], "source": [ @@ -741,7 +746,7 @@ }, { "cell_type": "markdown", - "id": "8cc4b927", + "id": "5b58fcdd", "metadata": { "cell_type": "markdown" }, @@ -754,7 +759,7 @@ { "cell_type": "code", "execution_count": null, - "id": "fea80443", + "id": "5f0b4858", "metadata": {}, "outputs": [], "source": [