{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3ab200a8",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Sorting\n",
    "\n",
    "This section compares practical sorting approaches from baseline to efficient methods.\n",
    "\n",
    "**Learning Goals**\n",
    "- Use Python built-in sorting effectively\n",
    "- Implement insertion, merge, bubble, and quicksort\n",
    "- Compare runtime trends across input patterns\n",
    "- Explain stability and memory trade-offs\n",
    "\n",
    "**Section Roadmap**\n",
    "- Start with Python's built-in tools — the right choice for real code.\n",
    "- Build and verify four classical algorithms from simplest to most efficient.\n",
    "- Use the comparison table and benchmark to connect implementation to complexity.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "080ccad7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.198498Z",
     "iopub.status.busy": "2026-04-12T21:25:16.198414Z",
     "iopub.status.idle": "2026-04-12T21:25:16.201544Z",
     "shell.execute_reply": "2026-04-12T21:25:16.201028Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from time import perf_counter\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dff214e",
   "metadata": {},
   "source": [
    "## Design Paradigms\n",
    "\n",
    "The four algorithms included in this section illustrate three distinct approaches:\n",
    "- **Brute force**: scan and compare every candidate pair; simple but $O(n^2)$ (e.g., bubble sort, insertion sort).\n",
    "- **Divide-and-conquer**: split the problem into smaller sub-problems, sort each, then combine; achieves $O(n \\log n)$ (e.g., merge sort, quicksort).\n",
    "- **In-place vs. out-of-place**: \n",
    "  - An in-place sort rearranges elements around a pivot without allocating extra memory proportional to the input (e.g., bubble sort, insertion sort). \n",
    "  - An out-of-place sort returns a new list and leaves the original untouched, which is useful when you need both the original and the sorted version (merge sort, `sorted()`, and quicksort)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bb3c20d",
   "metadata": {},
   "source": [
    "## Python Built-in Sorting\n",
    "\n",
    "Python provides two built-in sorting tools for sequences:\n",
    "\n",
    "| | `sorted(iterable)` | `list.sort()` |\n",
    "|---|---|---|\n",
    "| Returns | A **new** sorted list | `None` |\n",
    "| Original modified? | No: original is untouched | Yes: sorts **in-place** |\n",
    "| Works on | Any iterable | Lists only |\n",
    "\n",
    "Both use **Timsort** — a hybrid merge/insertion sorting algorithm that is stable (if two items are equal, their original order stays the same) and $O(n \\log n)$ in the worst case. It is highly optimized for real workloads, including partially sorted data.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61e2dd1f",
   "metadata": {},
   "source": [
    "### Key parameters\n",
    "- `reverse=True` — sort in descending order\n",
    "- `key=callable` — sort by a derived value rather than the element itself\n",
    "\n",
    "The **default advice** is: Prefer Python’s built-in sorting tools (`sorted()` or `.sort()`) in production code. Hand-written sorting implementations are usually for learning, analysis, or specialized needs, not everyday use.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a12e3bda",
   "metadata": {},
   "source": [
    "Now let us take a look at the `sorted()` function and the `list.sort()` method in terms of:\n",
    "- in-place vs out-of-place\n",
    "- the `key` parameter\n",
    "- `key` with `lambda` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "81f7b832",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== sorted() vs list.sort() ===\n",
      "sorted(nums):   [1, 2, 5, 5, 6, 9]      sorted\n",
      "print(nums):    [5, 2, 9, 1, 5, 6]      unchanged\n",
      "nums.sort():    [1, 2, 5, 5, 6, 9]      in-place\n",
      "\n",
      "=== Sorting with a key function ===\n",
      "sorted(words, key=len):                 ['fig', 'apple', 'banana', 'cherry']\n",
      "sorted(words, key=len, reverse=True):   ['banana', 'cherry', 'apple', 'fig']\n",
      "sorted(words, key=str.lower):           ['apple', 'banana', 'cherry', 'fig']\n",
      "\n",
      "=== key= with a lambda ===\n",
      "sort by last letter:    ['banana', 'apple', 'fig', 'cherry']\n",
      "sort by 2nd character:  ['banana', 'cherry', 'fig', 'apple']\n"
     ]
    }
   ],
   "source": [
    "print(\"=== sorted() vs list.sort() ===\")\n",
    "nums = [5, 2, 9, 1, 5, 6]\n",
    "print(f\"sorted(nums):   {sorted(nums)}      sorted\")    # returns a new sorted list — nums is unchanged\n",
    "print(f\"print(nums):    {nums}      unchanged\")         # original list is untouched\n",
    "\n",
    "nums.sort()         # sorts in-place — modifies nums directly, returns None\n",
    "print(f\"nums.sort():    {nums}      in-place\")                   # nums is now sorted\n",
    "\n",
    "print(\"\\n=== Sorting with a key function ===\")\n",
    "# key=callable: sort by a derived value instead of the element itself\n",
    "words = [\"banana\", \"apple\", \"fig\", \"cherry\"]\n",
    "print(f\"sorted(words, key=len):                 {sorted(words, key=len)}\")               # sort by word length\n",
    "print(f\"sorted(words, key=len, reverse=True):   {sorted(words, key=len, reverse=True)}\") # longest first\n",
    "print(f\"sorted(words, key=str.lower):           {sorted(words, key=str.lower)}\")         # sort case-insensitively\n",
    "\n",
    "print(\"\\n=== key= with a lambda ===\")\n",
    "# lambda is useful when the sort criterion is not a named function\n",
    "print(f\"sort by last letter:    {sorted(words, key=lambda w: w[-1])}\")   # ordered by final character (a, e, g, y)\n",
    "print(f\"sort by 2nd character:  {sorted(words, key=lambda w: w[1])}\")    # sort by the second character of each word\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b31346a1",
   "metadata": {},
   "source": [
    ":::{admonition} Lambda\n",
    "\n",
    "```python\n",
    "key=lambda w: w[-1]\n",
    "#          ^  ^--- w is now the word being evaluated\n",
    "#          |\n",
    "#          w is defined here — you could name it anything: word, x, item\n",
    "```\n",
    "\n",
    "A lambda function is equivalent to a named function:\n",
    "\n",
    "```python\n",
    "def last_letter(w):\n",
    "    letter = w[-1]      ### expression\n",
    "    return letter\n",
    "\n",
    "sorted(words, key=last_letter)  ### calling last_letter()\n",
    "```\n",
    "::: "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ac8da2b",
   "metadata": {},
   "source": [
    "### Stability and `key=`\n",
    "\n",
    "**Stability** in sorting means that when two elements compare as equal, their original relative order is preserved in the output. Python's `sorted()` and `.sort()` are both stable.\n",
    "\n",
    "Stability matters when records share the same key and you want original order preserved. This happens when you sort data in multiple passes. If you sort a list of students by score, a stable sort guarantees that students with the same score stay in whatever order they were before — letting you layer sorts predictably."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "252b730b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ava     90\n",
      "Ben     90\n",
      "Cara    85\n",
      "Dana    85\n"
     ]
    }
   ],
   "source": [
    "students = [\n",
    "    {\"name\": \"Ava\",   \"score\": 90},\n",
    "    {\"name\": \"Ben\",   \"score\": 90},  # same score as Ava — tied key\n",
    "    {\"name\": \"Cara\",  \"score\": 85},\n",
    "    {\"name\": \"Dana\",  \"score\": 85},  # same score as Cara — second tied pair\n",
    "]\n",
    "\n",
    "result = sorted(students, key=lambda s: s[\"score\"], reverse=True)   ### sort by score, highest first\n",
    "for s in result:\n",
    "    print(f\"{s['name']:6}  {s['score']}\")       ### :6 means to left-align the name in a 6-character field\n",
    "\n",
    "# Stable sort guarantees tied elements keep their original relative order:\n",
    "#   Ava before Ben  (both 90, Ava was first in input)\n",
    "#   Cara before Dana (both 85, Cara was first in input)\n",
    "# An unstable sort might swap them — useful when layering multiple sorts\n",
    "# (e.g., sort by score, then by grade level) without losing prior ordering.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68176812",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Sort by Word Length\n",
    "#   1. Given the list `words` below, use `sorted()` with `key=` to sort\n",
    "#      by word length, shortest to longest.\n",
    "#   2. Print the sorted list.\n",
    "#   3. Two words have the same length. Does their relative order change?\n",
    "#      What does this confirm about stability?\n",
    "### Your code starts here.\n",
    "words = [\"banana\", \"apple\", \"cherry\", \"date\", \"elderberry\", \"fig\"]\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a99711fd",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['fig', 'date', 'apple', 'banana', 'cherry', 'elderberry']\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "words = [\"banana\", \"apple\", \"cherry\", \"date\", \"elderberry\", \"fig\"]\n",
    "result = sorted(words, key=len)\n",
    "print(result)\n",
    "# \"banana\" and \"cherry\" are both length 6 — their relative order is preserved.\n",
    "# Different-length words are ordered by key value (length), as expected.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sorting-concepts-intuition",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Sorting Concepts & Intuition\n",
    "\n",
    "This section introduces core design paradigms and visual intuition before implementation details.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "223363fe",
   "metadata": {},
   "source": [
    "### Algorithm Intuition\n",
    "\n",
    "The animations below show each algorithm working on the same unsorted input. Watch where the sorted region grows, how many passes are needed, and how much movement each approach requires. These visuals build intuition for the complexity differences before you read the code."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "225f9ab6",
   "metadata": {},
   "source": [
    "<div style=\"display:flex; align-items:flex-start; gap:20px; margin-bottom:28px;\">\n",
    "  <img src=\"../../_static/images/insertion_sort.gif\" style=\"width:160px; flex-shrink:0;\" alt=\"Insertion Sort\">\n",
    "  <div>\n",
    "    <strong>Insertion Sort</strong> (<a href=\"https://en.wikipedia.org/wiki/Insertion_sort\">wikipedia</a>)<br>\n",
    "    <ul>\n",
    "      <li> <b>Brute Force</b>*: Builds a sorted region from the left, one element at a time. Each new element slides left past larger elements until it reaches its correct position &mdash; like inserting a playing card into a hand. (*also classified as a Decrease-and-Conquer algorithm.)\n",
    "      <li> <strong>In-place</strong> \n",
    "      <li> <strong>stable</strong><br><br>\n",
    "      <li> Best case $O(n)$ when the input is already sorted (only comparisons, no shifts). Average and worst case $O(n^2)$. Python's Timsort uses insertion sort internally for small sub-arrays.\n",
    "    </ul>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "<div style=\"display:flex; align-items:flex-start; gap:20px; margin-bottom:28px;\">\n",
    "  <img src=\"../../_static/images/bubble_sort.gif\" style=\"width:160px; flex-shrink:0;\" alt=\"Bubble Sort\">\n",
    "  <div>\n",
    "    <strong>Bubble Sort</strong> (<a href=\"https://en.wikipedia.org/wiki/Bubble_sort\">wikipedia</a>)<br>\n",
    "    <ul>\n",
    "      <li><strong>Brute Force</strong>: Makes repeated passes through the list, swapping any adjacent pair that is out of order. After each pass, the largest unsorted element has &ldquo;bubbled up&rdquo; to its final position. An early-exit flag stops the loop as soon as a pass completes with no swaps. \n",
    "      <li> <strong>In-place</strong> \n",
    "      <li> <strong>stable</strong> <br><br>\n",
    "      <li> $O(n^2)$ average and worst case. Best case $O(n)$ on already-sorted input (one pass, no swaps). Included for teaching and conceptual contrast only.\n",
    "    </ul>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "<div style=\"display:flex; align-items:flex-start; gap:20px; margin-bottom:28px;\">\n",
    "  <img src=\"../../_static/images/merge_sort.gif\" style=\"width:160px; flex-shrink:0;\" alt=\"Merge Sort\">\n",
    "  <div>\n",
    "    <strong>Merge Sort</strong> (<a href=\"https://en.wikipedia.org/wiki/Merge_sort\">wikipedia</a>)<br>\n",
    "    <ul>\n",
    "      <li> <b>Divide-and-Conquer</b>: recursively split the list in half until each sub-list has one element, then merge sorted pairs back together. The merge step uses two pointers to interleave the halves in order. \n",
    "      <li> <strong>Out-of-place</strong> ($O(n)$ extra memory)\n",
    "      <li> <strong>stable</strong><br><br> \n",
    "      <li> $O(n \\log n)$ best, average, and worst case &mdash; guaranteed regardless of input order. The consistent performance makes it a good choice when stability is required.\n",
    "    </ul>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "<div style=\"display:flex; align-items:flex-start; gap:20px;\">\n",
    "  <img src=\"../../_static/images/quick_sort.gif\" style=\"width:160px; flex-shrink:0;\" alt=\"Quick Sort\">\n",
    "  <div>\n",
    "    <strong>Quicksort</strong> (<a href=\"https://en.wikipedia.org/wiki/Quicksort\">wikipedia</a>)<br>\n",
    "    <ul>\n",
    "      <li> <b>Divide-and-Conquer by value</b>: pick a pivot, partition all other elements into &ldquo;less&rdquo;, &ldquo;equal&rdquo;, and &ldquo;greater&rdquo; groups, then recursively sort the outer two. No merge step needed &mdash; the three pieces concatenate directly. \n",
    "      <li> <strong>Not stable</strong> <br><br>\n",
    "      <li> $O(\\log n)$ stack space on average. $O(n \\log n)$ average with a randomly chosen pivot. Worst case $O(n^2)$ when the pivot consistently lands at one extreme (e.g., always the minimum on a sorted list). Typically the fastest general-purpose sort in practice.\n",
    "    </ul>\n",
    "  </div>\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f62c8cc7",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Core Sorting Algorithms\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9eb1070b",
   "metadata": {},
   "source": [
    "The following implementations are included for conceptual comparison and practice of sorting algorithms."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b483daa",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Insertion Sort\n",
    "\n",
    "Insertion sort builds a sorted section at the front of the list one element at a time. Think of sorting playing cards: you pick up each new card and slide it left past any cards that are larger, until it reaches its correct position. At every step, everything to the left of the current position is sorted; the right side is the unsorted remainder yet to be processed.\n",
    "\n",
    "- Time complexity: $O(n^2)$ average/worst; $O(n)$ best (already sorted; only comparisons, no shifts)\n",
    "- Space complexity: $O(1)$ extra; fully in-place\n",
    "- Useful on small or nearly sorted inputs; Python's Timsort uses insertion sort internally for small sub-arrays\n",
    "- Algorithm:\n",
    "  - Start with the second element as the first element is assumed to be sorted.\n",
    "  - Compare the second element with the first if the second is smaller then swap them.\n",
    "  - Move to the third element, compare it with the first two, and put it in its correct position\n",
    "  - Repeat until the entire array is sorted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "578e18a2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.220523Z",
     "iopub.status.busy": "2026-04-12T21:25:16.220445Z",
     "iopub.status.idle": "2026-04-12T21:25:16.222969Z",
     "shell.execute_reply": "2026-04-12T21:25:16.222604Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "insertion_sort checks passed.\n"
     ]
    }
   ],
   "source": [
    "def insertion_sort(arr):                  # sorts in-place; mutates the list passed in\n",
    "    for i in range(1, len(arr)):          # start at 1 since the first element is already \"sorted\"\n",
    "        current_value = arr[i]                      # the \"key\" is the element we want to insert into the sorted portion of the list\n",
    "        j = i - 1                           # j is the index of the last element in the sorted portion of the list  \n",
    "\n",
    "                                            # if the first one is greater the the second one, \n",
    "        while j >= 0 and arr[j] > current_value:    # while there are still elements in the sorted portion and the current element is greater than the key\n",
    "            arr[j + 1] = arr[j]         # shift the current element one position to the right to make room for the key\n",
    "            j -= 1                          # move left in the sorted portion to compare the next element\n",
    "\n",
    "        arr[j + 1] = current_value                  # insert the key into its correct position in the sorted portion of the list\n",
    "    return arr                            # returns same (now-sorted) list for chaining\n",
    "\n",
    "assert insertion_sort([]) == []\n",
    "assert insertion_sort([1]) == [1]\n",
    "assert insertion_sort([3, 1, 2]) == [1, 2, 3]\n",
    "assert insertion_sort([1, 2, 3]) == [1, 2, 3]   # already sorted\n",
    "assert insertion_sort([3, 2, 1]) == [1, 2, 3]   # reverse sorted\n",
    "assert insertion_sort([2, 2, 1]) == [1, 2, 2]   # duplicates    ### edge case: duplicate values should be sorted correctly, and their relative order should be preserved (stable sort)\n",
    "print('insertion_sort checks passed.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a0519cc9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.224390Z",
     "iopub.status.busy": "2026-04-12T21:25:16.224308Z",
     "iopub.status.idle": "2026-04-12T21:25:16.226560Z",
     "shell.execute_reply": "2026-04-12T21:25:16.226112Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  start:             [3, 1, 4, 1]\n",
      "  after placing 1 at index 0: [1, 3, 4, 1]\n",
      "  after placing 4 at index 2: [1, 3, 4, 1]\n",
      "  after placing 1 at index 1: [1, 1, 3, 4]\n"
     ]
    }
   ],
   "source": [
    "# Step-by-step trace: watch insertion sort work on [3, 1, 4, 1].\n",
    "# Each pass picks the next element (key) and slides it left to its correct position.\n",
    "trace = [3, 1, 4, 1]\n",
    "print(f\"  start:             {trace}\")\n",
    "for i in range(1, len(trace)):\n",
    "    key = trace[i]\n",
    "    j = i - 1\n",
    "    while j >= 0 and trace[j] > key:\n",
    "        trace[j + 1] = trace[j]\n",
    "        j -= 1\n",
    "    trace[j + 1] = key\n",
    "    print(f\"  after placing {key} at index {j+1}: {trace}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "912eee03",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.227593Z",
     "iopub.status.busy": "2026-04-12T21:25:16.227530Z",
     "iopub.status.idle": "2026-04-12T21:25:16.229016Z",
     "shell.execute_reply": "2026-04-12T21:25:16.228665Z"
    },
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Count Comparisons\n",
    "#   1. Write `insertion_sort_count(items)` that sorts a copy of items\n",
    "#      using insertion sort logic and counts every comparison made.\n",
    "#   2. Return a tuple: (sorted_list, comparison_count).\n",
    "#   3. Test with [5, 2, 4, 6, 1, 3] and with [1, 2, 3, 4, 5, 6] (already sorted).\n",
    "#   4. How do the comparison counts differ? What does this confirm about\n",
    "#      best-case vs. average-case complexity for insertion sort?\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0011844d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.230042Z",
     "iopub.status.busy": "2026-04-12T21:25:16.229971Z",
     "iopub.status.idle": "2026-04-12T21:25:16.232334Z",
     "shell.execute_reply": "2026-04-12T21:25:16.231861Z"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "unsorted input: [1, 2, 3, 4, 5, 6], comparisons: 12\n",
      "sorted input  : [1, 2, 3, 4, 5, 6], comparisons: 5\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "\n",
    "def insertion_sort_count(items):\n",
    "    items = items[:]  # Work on a copy; do not mutate input.\n",
    "    comparisons = 0\n",
    "    for i in range(1, len(items)):\n",
    "        key = items[i]\n",
    "        j = i - 1\n",
    "        while j >= 0 and (comparisons := comparisons + 1) and items[j] > key:\n",
    "            items[j + 1] = items[j]\n",
    "            j -= 1\n",
    "        items[j + 1] = key\n",
    "    return items, comparisons\n",
    "\n",
    "sorted_list, count = insertion_sort_count([5, 2, 4, 6, 1, 3])\n",
    "print(f\"unsorted input: {sorted_list}, comparisons: {count}\")\n",
    "\n",
    "sorted_list, count = insertion_sort_count([1, 2, 3, 4, 5, 6])\n",
    "print(f\"sorted input  : {sorted_list}, comparisons: {count}\")\n",
    "# Already-sorted input: each key requires only 1 comparison (with its predecessor).\n",
    "# That is O(n) total — the best case.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11025236",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Bubble Sort\n",
    "\n",
    "Bubble sort makes repeated passes through the list, comparing each adjacent pair and swapping them if they are out of order. After each full pass, the largest unsorted element has \"bubbled up\" to its final position at the end of the unsorted region. An early-exit flag (`swapped`) lets the loop terminate as soon as a full pass completes with no swaps, which means the list is already sorted.\n",
    "\n",
    "Despite the optimization, bubble sort is $O(n^2)$ on average and worst-case inputs. It is included here for conceptual contrast and complexity discussion, not for practical use.\n",
    "\n",
    "Algorithm:\n",
    "- Sorts the array using multiple passes. After the first pass, the maximum goes to end (its correct position). Same way, after second pass, the second largest goes to second last position and so on.\n",
    "- In every pass, process only those that have already not moved to correct position. After k passes, the largest k must have been moved to the last k positions.\n",
    "- In a pass, we consider remaining elements and compare all adjacent and swap if larger element is before a smaller element. If we keep doing this, we get the largest (among the remaining elements) at its correct position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "358b3477",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.233455Z",
     "iopub.status.busy": "2026-04-12T21:25:16.233383Z",
     "iopub.status.idle": "2026-04-12T21:25:16.235732Z",
     "shell.execute_reply": "2026-04-12T21:25:16.235340Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bubble_sort checks passed.\n"
     ]
    }
   ],
   "source": [
    "def bubble_sort(items):\n",
    "    n = len(items)\n",
    "    for i in range(n):\n",
    "        swapped = False\n",
    "        for j in range(0, n - i - 1):\n",
    "            if items[j] > items[j + 1]:\n",
    "                items[j], items[j + 1] = items[j + 1], items[j]\n",
    "                swapped = True\n",
    "        if not swapped:\n",
    "            break\n",
    "    return items\n",
    "\n",
    "assert bubble_sort([]) == []\n",
    "assert bubble_sort([1]) == [1]\n",
    "assert bubble_sort([3, 1, 2]) == [1, 2, 3]\n",
    "assert bubble_sort([1, 2, 3]) == [1, 2, 3]   # already sorted — early-exit flag should trigger\n",
    "assert bubble_sort([3, 2, 1]) == [1, 2, 3]   # reverse sorted\n",
    "assert bubble_sort([2, 2, 1]) == [1, 2, 2]   # duplicates\n",
    "print('bubble_sort checks passed.')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cff6fd57",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.236755Z",
     "iopub.status.busy": "2026-04-12T21:25:16.236687Z",
     "iopub.status.idle": "2026-04-12T21:25:16.238225Z",
     "shell.execute_reply": "2026-04-12T21:25:16.237902Z"
    },
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Count Swaps\n",
    "#   1. Write `bubble_sort_count(items)` that sorts a copy of items\n",
    "#      using bubble sort and counts every swap made.\n",
    "#   2. Return a tuple: (sorted_list, swap_count).\n",
    "#   3. Test with [3, 1, 2], [1, 2, 3] (already sorted), and [3, 2, 1] (reverse sorted).\n",
    "#   4. Which input produces the most swaps? Does this match bubble sort's worst-case input?\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "### uncomment to test your function:\n",
    "# for data in [[3, 1, 2], [1, 2, 3], [3, 2, 1]]:\n",
    "#     result, count = bubble_sort_count(data)\n",
    "#     print(f\"input {data}: sorted {result}, swaps: {count}\")\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d43367be",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.239291Z",
     "iopub.status.busy": "2026-04-12T21:25:16.239210Z",
     "iopub.status.idle": "2026-04-12T21:25:16.241600Z",
     "shell.execute_reply": "2026-04-12T21:25:16.241229Z"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input [3, 1, 2]: sorted [1, 2, 3], swaps: 2\n",
      "input [1, 2, 3]: sorted [1, 2, 3], swaps: 0\n",
      "input [3, 2, 1]: sorted [1, 2, 3], swaps: 3\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "\n",
    "def bubble_sort_count(items):\n",
    "    items = items[:]  # Work on a copy; do not mutate input.\n",
    "    n = len(items)\n",
    "    swaps = 0\n",
    "    for i in range(n):\n",
    "        swapped = False\n",
    "        for j in range(0, n - i - 1):\n",
    "            if items[j] > items[j + 1]:\n",
    "                items[j], items[j + 1] = items[j + 1], items[j]\n",
    "                swaps += 1\n",
    "                swapped = True\n",
    "        if not swapped:\n",
    "            break\n",
    "    return items, swaps\n",
    "\n",
    "for data in [[3, 1, 2], [1, 2, 3], [3, 2, 1]]:\n",
    "    result, count = bubble_sort_count(data)\n",
    "    print(f\"input {data}: sorted {result}, swaps: {count}\")\n",
    "# Reverse-sorted input has the most swaps — this is the O(n^2) worst case.\n",
    "# Already-sorted input exits after one pass with zero swaps (early-exit flag).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f164696",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Merge Sort\n",
    "\n",
    "Merge sort follows a **divide-and-conquer** strategy: recursively split the list in half until every sub-list contains one element (a single element is trivially sorted), then repeatedly **merge** pairs of sorted sub-lists back into a single sorted list.\n",
    "\n",
    "The merge step is where the work happens: two pointers scan the left and right halves simultaneously, always appending the smaller front element to the result. Because each element is visited once per level of recursion, each level costs $O(n)$. The recursion depth is $O(\\log n)$ (halving $n$ until size 1), so the total is $O(n \\log n)$ — and this holds for every input, regardless of order.\n",
    "\n",
    "- Time complexity: $O(n \\log n)$ best / average / worst\n",
    "- Space complexity: $O(n)$ extra — each merge allocates a new list\n",
    "- Stable: equal elements maintain their original relative order\n",
    "- Algorithm:\n",
    "  - Divide: Divide the list or array recursively into two halves until it can no more be divided.\n",
    "  - Conquer: Each subarray is sorted individually using the merge sort algorithm.\n",
    "  - Merge: The sorted subarrays are merged back together in sorted order. The process continues until all elements from both subarrays have been merged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "8737f4c2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.242847Z",
     "iopub.status.busy": "2026-04-12T21:25:16.242765Z",
     "iopub.status.idle": "2026-04-12T21:25:16.244978Z",
     "shell.execute_reply": "2026-04-12T21:25:16.244626Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def merge(left, right):\n",
    "    merged = []\n",
    "    i = j = 0\n",
    "\n",
    "    while i < len(left) and j < len(right):\n",
    "        if left[i] <= right[j]:\n",
    "            merged.append(left[i])\n",
    "            i += 1\n",
    "        else:\n",
    "            merged.append(right[j])\n",
    "            j += 1\n",
    "\n",
    "    merged.extend(left[i:])\n",
    "    merged.extend(right[j:])\n",
    "    return merged\n",
    "\n",
    "def merge_sort(items):\n",
    "    if len(items) <= 1:\n",
    "        return items[:]\n",
    "\n",
    "    mid = len(items) // 2\n",
    "    left = merge_sort(items[:mid])\n",
    "    right = merge_sort(items[mid:])\n",
    "    return merge(left, right)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4ab5a8d3",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.246309Z",
     "iopub.status.busy": "2026-04-12T21:25:16.246226Z",
     "iopub.status.idle": "2026-04-12T21:25:16.247787Z",
     "shell.execute_reply": "2026-04-12T21:25:16.247492Z"
    },
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Merge Two Sorted Lists\n",
    "#   1. Write `merge_sorted(a, b)` that combines two already-sorted lists\n",
    "#      into one sorted list.\n",
    "#   2. Do not call sorted() or any sort function — use only the two-pointer\n",
    "#      merge logic from the merge() function above.\n",
    "#   3. Test with:\n",
    "#      a = [1, 3, 5], b = [2, 4, 6]  → expected [1, 2, 3, 4, 5, 6]\n",
    "#      a = [],        b = [1, 2, 3]  → expected [1, 2, 3]\n",
    "#      a = [1, 1],    b = [1, 2]     → expected [1, 1, 1, 2]\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "662efa02",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.248926Z",
     "iopub.status.busy": "2026-04-12T21:25:16.248860Z",
     "iopub.status.idle": "2026-04-12T21:25:16.251522Z",
     "shell.execute_reply": "2026-04-12T21:25:16.251132Z"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 3, 4, 5, 6]\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "\n",
    "def merge_sorted(a, b):\n",
    "    result = []\n",
    "    i = j = 0\n",
    "    while i < len(a) and j < len(b):\n",
    "        if a[i] <= b[j]:\n",
    "            result.append(a[i])\n",
    "            i += 1\n",
    "        else:\n",
    "            result.append(b[j])\n",
    "            j += 1\n",
    "    result.extend(a[i:])\n",
    "    result.extend(b[j:])\n",
    "    return result\n",
    "\n",
    "assert merge_sorted([1, 3, 5], [2, 4, 6]) == [1, 2, 3, 4, 5, 6]\n",
    "assert merge_sorted([], [1, 2, 3]) == [1, 2, 3]\n",
    "assert merge_sorted([1, 1], [1, 2]) == [1, 1, 1, 2]\n",
    "print(merge_sorted([1, 3, 5], [2, 4, 6]))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdbdbb9c",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Quicksort\n",
    "\n",
    "Quicksort is also divide-and-conquer, but partitions by value rather than by position. It selects a **pivot** element and rearranges all other values into three groups: elements less than the pivot, elements equal to the pivot, and elements greater. It then recursively sorts the less-than and greater-than groups. No merge step is needed — the three already-sorted pieces concatenate directly into a sorted list.\n",
    "\n",
    "**Pivot choice determines performance.** With a well-chosen pivot that splits the data roughly in half, recursion depth is $O(\\log n)$ and total work is $O(n \\log n)$. With a consistently bad pivot — for example, always picking the minimum value on an already-sorted list — one partition is always empty and the other has $n-1$ elements, producing $O(n^2)$ behaviour. **Random pivoting** avoids this: a random element is unlikely to be the extreme value on every level, so expected performance remains $O(n \\log n)$ across typical inputs.\n",
    "\n",
    "- Average: $O(n \\log n)$\n",
    "- Worst-case: $O(n^2)$ — degenerate pivot at every level\n",
    "- Space: $O(\\log n)$ average stack frames from recursion; $O(n)$ worst-case\n",
    "- Algorithm:\n",
    "  - Choose a Pivot: Select an element from the array as the pivot. The choice of pivot can vary (e.g., first element, last element, random element, or median).\n",
    "  - Partition the Array: Re arrange the array around the pivot. After partitioning, all elements smaller than the pivot will be on its left, and all elements greater than the pivot will be on its right.\n",
    "  - Recursively Call: Recursively apply the same process to the two partitioned sub-arrays.\n",
    "  - Base Case: The recursion stops when there is only one element left in the sub-array, as a single element is already sorted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b2dc182a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.252628Z",
     "iopub.status.busy": "2026-04-12T21:25:16.252567Z",
     "iopub.status.idle": "2026-04-12T21:25:16.254486Z",
     "shell.execute_reply": "2026-04-12T21:25:16.254132Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def quicksort(items):\n",
    "    if len(items) <= 1:\n",
    "        return items[:]\n",
    "\n",
    "    pivot = items[random.randint(0, len(items) - 1)]\n",
    "    less = [x for x in items if x < pivot]\n",
    "    equal = [x for x in items if x == pivot]\n",
    "    greater = [x for x in items if x > pivot]\n",
    "\n",
    "    return quicksort(less) + equal + quicksort(greater)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8f26b81e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.255475Z",
     "iopub.status.busy": "2026-04-12T21:25:16.255405Z",
     "iopub.status.idle": "2026-04-12T21:25:16.256999Z",
     "shell.execute_reply": "2026-04-12T21:25:16.256650Z"
    },
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Three-Way Partition\n",
    "#   Quicksort works by partitioning around a pivot. Practice the logic directly:\n",
    "#   1. Write `partition(items, pivot)` that returns a tuple (less, equal, greater):\n",
    "#      - less:    all elements < pivot\n",
    "#      - equal:   all elements == pivot\n",
    "#      - greater: all elements > pivot\n",
    "#   2. Do not sort; just partition.\n",
    "#   3. Test:\n",
    "#      partition([3, 1, 4, 1, 5, 9, 2, 6, 5], 5)\n",
    "#      expected: ([3, 1, 4, 1, 2], [5, 5], [9, 6])\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "5f49cff5",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.258056Z",
     "iopub.status.busy": "2026-04-12T21:25:16.257996Z",
     "iopub.status.idle": "2026-04-12T21:25:16.260191Z",
     "shell.execute_reply": "2026-04-12T21:25:16.259797Z"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "less:    [3, 1, 4, 1, 2]\n",
      "equal:   [5, 5]\n",
      "greater: [9, 6]\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "\n",
    "def partition(items, pivot):\n",
    "    less    = [x for x in items if x < pivot]\n",
    "    equal   = [x for x in items if x == pivot]\n",
    "    greater = [x for x in items if x > pivot]\n",
    "    return less, equal, greater\n",
    "\n",
    "less, equal, greater = partition([3, 1, 4, 1, 5, 9, 2, 6, 5], 5)\n",
    "print(f\"less:    {less}\")\n",
    "print(f\"equal:   {equal}\")\n",
    "print(f\"greater: {greater}\")\n",
    "# Notice this is exactly the logic inside quicksort() — three list comprehensions\n",
    "# over the input, using the pivot to assign each element to its group.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sorting-apis-complexity",
   "metadata": {
    "tags": []
   },
   "source": [
    "## APIs & Complexity\n",
    "\n",
    "This section compares sorting API contracts (in-place mutation with returned list vs returned new list), then connects those behaviors to complexity growth.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "193cbf43",
   "metadata": {
    "tags": []
   },
   "source": [
    "### API Contracts\n",
    "\n",
    "A function's **contract** is its behavior guarantee: what input it expects, what it returns, and whether it mutates its arguments. In other words, a contract is an agreement between a function and its caller: what the caller must provide, and what the function guarantees in return. All four implementations below should produce the same result as Python's built-in `sorted()`. But equally important is *how* each algorithm interacts with its input, which is a distinction that matters every time you choose a sorting function in real code.\n",
    "\n",
    "\n",
    "**Two different API contracts:**\n",
    "\n",
    "- `insertion_sort` and `bubble_sort` sort **in-place**: they rearrange the original list and also return the same list object for convenience. After the call, the variable you passed in has been modified.\n",
    "- `merge_sort` and `quicksort` (as implemented here) return a **new sorted list** and leave the original untouched — the same contract as Python's built-in `sorted()`; by contrast, `.sort()` sorts in place and returns `None`.\n",
    "\n",
    "The first cell below makes the in-place side-effect visible. The second cell runs all four algorithms on the same demo list; `demo[:]` (a shallow copy) is passed to the in-place functions so that `demo` survives unchanged for the next call.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "1f0f130a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.261304Z",
     "iopub.status.busy": "2026-04-12T21:25:16.261241Z",
     "iopub.status.idle": "2026-04-12T21:25:16.263664Z",
     "shell.execute_reply": "2026-04-12T21:25:16.263290Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "return value : [1, 2, 3, 4, 5, 6]\n",
      "victim after : [1, 2, 3, 4, 5, 6]\n",
      "\n",
      "insertion: [1, 2, 3, 4, 5, 6]\n",
      "merge    : [1, 2, 3, 4, 5, 6]\n",
      "bubble   : [1, 2, 3, 4, 5, 6]\n",
      "quick    : [1, 2, 3, 4, 5, 6]\n",
      "builtin  : [1, 2, 3, 4, 5, 6]\n",
      "\n",
      "demo unchanged: [5, 2, 4, 6, 1, 3]\n"
     ]
    }
   ],
   "source": [
    "# --- In-place side-effect ---\n",
    "# insertion_sort (and bubble_sort) modify the list in place and return the same list.\n",
    "# calling them without a copy permanently changes the original list.\n",
    "victim = [5, 2, 4, 6, 1, 3]\n",
    "result = insertion_sort(victim)\n",
    "print('return value :', result)   # sorted list (same object as victim)\n",
    "print('victim after :', victim)   # [1, 2, 3, 4, 5, 6] — original is gone!\n",
    "\n",
    "print()\n",
    "\n",
    "# --- Consistent demo using copies ---\n",
    "# because insertion_sort and bubble_sort mutate their argument, we pass demo[:]\n",
    "# (a copy) so that demo stays intact for each subsequent call.\n",
    "demo = [5, 2, 4, 6, 1, 3]\n",
    "print('insertion:', insertion_sort(demo[:]))  # copy; in-place mutates input\n",
    "print('merge    :', merge_sort(demo))          # new list; demo unchanged\n",
    "print('bubble   :', bubble_sort(demo[:]))      # copy; in-place mutates input\n",
    "print('quick    :', quicksort(demo))           # new list; demo unchanged\n",
    "print('builtin  :', sorted(demo))              # new list; demo unchanged\n",
    "print()\n",
    "print('demo unchanged:', demo)                 # still [5, 2, 4, 6, 1, 3]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95881b5",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Complexity Growth\n",
    "\n",
    "The comparison table above names the Big O of each algorithm. The table below makes those classes concrete: for large $n$, the gap between $O(n^2)$ and $O(n \\log n)$ is not a small constant factor — it is the difference between feasible and impractical.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "751e0bbb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.264725Z",
     "iopub.status.busy": "2026-04-12T21:25:16.264656Z",
     "iopub.status.idle": "2026-04-12T21:25:16.266681Z",
     "shell.execute_reply": "2026-04-12T21:25:16.266331Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       n    log2(n)     n (O(n))      n*log2(n)            n^2\n",
      "      10       3.32           10          33.22            100\n",
      "     100       6.64          100         664.39          10000\n",
      "    1000       9.97         1000        9965.78        1000000\n",
      "   10000      13.29        10000      132877.12      100000000\n",
      "  100000      16.61       100000     1660964.05    10000000000\n",
      " 1000000      19.93      1000000    19931568.57  1000000000000\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "\n",
    "sizes = [10, 100, 1_000, 10_000, 100_000, 1_000_000]\n",
    "print(f\"{'n':>8} {'log2(n)':>10} {'n (O(n))':>12} {'n*log2(n)':>14} {'n^2':>14}\")\n",
    "for n in sizes:\n",
    "    print(f\"{n:>8} {math.log2(n):>10.2f} {n:>12} {n*math.log2(n):>14.2f} {n**2:>14}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9c4e536",
   "metadata": {},
   "source": [
    "### Sorting Cheat Sheet\n",
    "\n",
    "The table below summarizes time complexity, space complexity, and practical use cases. It includes both algorithms implemented here and a few additional reference algorithms for broader comparison.\n",
    "\n",
    "| Algorithm             | Category | Time (best) | Time (avg)   | Time (worst) | Space       | In-place? | Stable? | Recommended for                          |\n",
    "|-----------------------|----------|-------------|--------------|--------------|-------------|-----------|---------|------------------------------------------|\n",
    "| **Insertion sort**    | Naive    | O(n)        | O(n²)        | O(n²)        | O(1)        | Yes       | Yes     | Small or nearly-sorted data              |\n",
    "| **Bubble sort**       | Naive    | O(n)        | O(n²)        | O(n²)        | O(1)        | Yes       | Yes     | Teaching only                            |\n",
    "| Selection sort        | Naive    | O(n²)       | O(n²)        | O(n²)        | O(1)        | Yes       | No      | Teaching only                            |\n",
    "| Shell sort            | Hybrid   | O(n log n)† | Varies by gap sequence | O(n²) | O(1)        | Yes       | No      | Moderate data; memory-constrained cases  |\n",
    "| Smooth sort           | Hybrid   | O(n)        | O(n log n)   | O(n log n)   | O(1)        | Yes       | No      | Nearly-sorted data; rare in practice     |\n",
    "| **Merge sort**        | Efficient| O(n log n)  | O(n log n)   | O(n log n)   | O(n)        | No        | Yes     | Stable sort required; linked lists       |\n",
    "| **Quicksort**         | Efficient| O(n log n)  | O(n log n)   | O(n²)        | O(log n)*   | No (here) | No      | General-purpose; typically fastest       |\n",
    "| Heapsort              | Efficient| O(n log n)  | O(n log n)   | O(n log n)   | O(1)        | Yes       | No      | Guaranteed O(n log n); no extra memory   |\n",
    "| Timsort               | Efficient| O(n)        | O(n log n)   | O(n log n)   | O(n)        | No        | Yes     | Python's built-in; best real-world choice|\n",
    "| Python `sorted()`/`.sort()` | Built-in | O(n) | O(n log n)   | O(n log n)   | O(n)*       | `.sort()` mutates | Yes | Default choice for all real code    |\n",
    "\n",
    "\\* Quicksort space O(log n) is the call stack depth for average case; worst case stack is O(n). For this notebook's out-of-place implementation, additional list allocation is also used.  \n",
    "† Shell-sort bounds depend on the chosen gap sequence; values shown are common teaching approximations.\n",
    "\n",
    "**Default advice**: Prefer Python's built-in `sorted()` or `.sort()` in production code. Hand-written sorting implementations are usually for learning, analysis, or specialized needs, not everyday use. Study algorithms to understand trade-offs, Big $O$ reasoning, and algorithm design — knowledge that transfers beyond Python.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "evaluation-advanced-topics",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Evaluation\n",
    "\n",
    "This section moves from correctness checks to performance benchmarking and then to non-comparison sorting approaches.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fd55d94",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Validating Correctness\n",
    "\n",
    "All four implementations must return the same result as Python's built-in `sorted()` for any valid input. Randomized testing is used rather than hand-picked cases because it exercises orderings and value distributions that manual test design tends to miss — including duplicates, already-sorted inputs, and reverse-sorted inputs across many sizes. Always run correctness checks before timing: a subtle bug that only surfaces on certain inputs could silently corrupt the benchmark results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "e3c9df56",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.267921Z",
     "iopub.status.busy": "2026-04-12T21:25:16.267851Z",
     "iopub.status.idle": "2026-04-12T21:25:16.282698Z",
     "shell.execute_reply": "2026-04-12T21:25:16.282413Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All randomized checks passed.\n"
     ]
    }
   ],
   "source": [
    "for _ in range(50):                                         ### a loop of 50 random checks to catch edge cases\n",
    "    data = [random.randint(0, 1000) for _ in range(60)]     ### generate a list of 60 random integers between 0 and 1000\n",
    "    expected = sorted(data)                                 ### sort list using Python's built-in sorted()      \n",
    "\n",
    "    assert insertion_sort(data[:]) == expected\n",
    "    assert merge_sort(data) == expected\n",
    "    assert bubble_sort(data[:]) == expected\n",
    "    assert quicksort(data) == expected\n",
    "\n",
    "print('All randomized checks passed.')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb1fe4ba",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Input Pattern\n",
    "\n",
    "Pattern matters. Compare random, sorted, reverse, and duplicate-heavy data.\n",
    "\n",
    "Timing continuity:\n",
    "- `time.time()` works for basic measurement.\n",
    "- `time.perf_counter()` is preferred in these short benchmarks for better precision.\n",
    "\n",
    "Connection to Chapter 08:\n",
    "- As before, use timing to support Big O intuition rather than focusing on tiny decimal differences.\n",
    "\n",
    "*(See also `1002-searching.ipynb` for the 30-repeat timing utility and a discussion of preprocessing cost.)*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d13f95b",
   "metadata": {
    "tags": []
   },
   "source": [
    "> **Predict before running:** For each algorithm, which input pattern do you expect to be fastest — random, sorted, reverse, or duplicates? Write your predictions before executing the cell.\n",
    ">\n",
    "> Specific questions to consider:\n",
    "> - Insertion sort is $O(n)$ on already-sorted data. Which column should confirm this?\n",
    "> - Bubble sort has an early-exit flag. Should sorted input be faster than reverse-sorted?\n",
    "> - Quicksort with a random pivot should not degrade on any particular pattern — do the numbers agree?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "c5ae5921",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:16.284077Z",
     "iopub.status.busy": "2026-04-12T21:25:16.284002Z",
     "iopub.status.idle": "2026-04-12T21:25:18.133451Z",
     "shell.execute_reply": "2026-04-12T21:25:18.132982Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "     pattern     bubble    insertion      merge      quick    builtin\n",
      "      random   0.039754     0.021758   0.001653   0.001205   0.000040\n",
      "      sorted   0.000041     0.000096   0.000885   0.001174   0.000003\n",
      "     reverse   0.055290     0.039717   0.000906   0.001155   0.000004\n",
      "  duplicates   0.037121     0.016647   0.001191   0.000152   0.000028\n"
     ]
    }
   ],
   "source": [
    "# Shared timing utility — see also 1002-searching.ipynb for the 30-repeat version.\n",
    "def average_runtime(fn, *args, repeats=8):\n",
    "    total = 0.0\n",
    "    for _ in range(repeats):\n",
    "        start = perf_counter()\n",
    "        fn(*args)\n",
    "        total += perf_counter() - start\n",
    "    return total / repeats\n",
    "\n",
    "n = 1200\n",
    "random_data = [random.randint(0, 10_000) for _ in range(n)]\n",
    "sorted_data = sorted(random_data)\n",
    "reverse_data = sorted_data[::-1]\n",
    "duplicate_data = [random.choice([1, 2, 3, 4, 5]) for _ in range(n)]\n",
    "\n",
    "datasets = [\n",
    "    ('random', random_data),\n",
    "    ('sorted', sorted_data),\n",
    "    ('reverse', reverse_data),\n",
    "    ('duplicates', duplicate_data),\n",
    "]\n",
    "\n",
    "print(f\"{'pattern':>12} {'bubble':>10} {'insertion':>12} {'merge':>10} {'quick':>10} {'builtin':>10}\")\n",
    "for label, data in datasets:\n",
    "    t_bubble = average_runtime(lambda d: bubble_sort(d[:]), data)\n",
    "    t_insert = average_runtime(lambda d: insertion_sort(d[:]), data)\n",
    "    t_merge = average_runtime(merge_sort, data)\n",
    "    t_quick = average_runtime(quicksort, data)\n",
    "    t_builtin = average_runtime(sorted, data)\n",
    "    print(f\"{label:>12} {t_bubble:>10.6f} {t_insert:>12.6f} {t_merge:>10.6f} {t_quick:>10.6f} {t_builtin:>10.6f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beyond-comparison-sorting",
   "metadata": {},
   "source": [
    "### Beyond Comparison\n",
    "\n",
    "From the timing table above, even the fastest comparison-based methods still scale around $O(n \\log n)$ as input grows.\n",
    "\n",
    "All four algorithms above sort by *comparing* pairs of elements. Comparison sorting has a proven theoretical lower bound of $O(n \\log n)$ in the general case.\n",
    "\n",
    "**Counting sort** sidesteps this limit entirely. Instead of comparing elements, it uses each element's *value* as an array index, tallying how many times each value appears, then reconstructing the sorted output. This runs in $O(n + k)$ where $k$ is the range of input values.\n",
    "\n",
    "The trade-off: counting sort only works on integers (or values mappable to a bounded integer range). It cannot sort arbitrary objects, floats, or strings directly. When $k$ is small relative to $n$ (e.g., sorting exam scores 0–100 for thousands of students), it is significantly faster than any comparison sort."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "55eec699",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:18.135001Z",
     "iopub.status.busy": "2026-04-12T21:25:18.134891Z",
     "iopub.status.idle": "2026-04-12T21:25:18.136916Z",
     "shell.execute_reply": "2026-04-12T21:25:18.136261Z"
    },
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Counting Sort\n",
    "#   Unlike comparison-based sorts, counting sort runs in O(n + k) where k is the value range.\n",
    "#   Implement `counting_sort_nonnegative(items)`:\n",
    "#   1. Accept only nonnegative integers; raise `ValueError` otherwise.\n",
    "#   2. Return a new sorted list (do not sort in place).\n",
    "#   3. As a comment, explain when counting sort is preferable to merge sort.\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "7e643c01",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-12T21:25:18.138265Z",
     "iopub.status.busy": "2026-04-12T21:25:18.138161Z",
     "iopub.status.idle": "2026-04-12T21:25:18.141113Z",
     "shell.execute_reply": "2026-04-12T21:25:18.140739Z"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 1, 2, 2, 3, 3]\n",
      "counting_sort_nonnegative checks passed.\n"
     ]
    }
   ],
   "source": [
    "### Solution\n",
    "\n",
    "def counting_sort_nonnegative(items):\n",
    "    if not items:\n",
    "        return []\n",
    "\n",
    "    if any(x < 0 for x in items):  # Guard clause: validate before any allocation.\n",
    "        raise ValueError('counting_sort_nonnegative only supports nonnegative integers')\n",
    "\n",
    "    max_value = max(items)\n",
    "    counts = [0] * (max_value + 1)\n",
    "\n",
    "    for value in items:\n",
    "        counts[value] += 1\n",
    "\n",
    "    result = []\n",
    "    for value, freq in enumerate(counts):\n",
    "        result.extend([value] * freq)\n",
    "    return result\n",
    "\n",
    "exercise_data = [3, 1, 2, 1, 0, 3, 2]\n",
    "print(counting_sort_nonnegative(exercise_data))\n",
    "\n",
    "# Verification\n",
    "assert counting_sort_nonnegative([]) == []\n",
    "assert counting_sort_nonnegative([1]) == [1]\n",
    "assert counting_sort_nonnegative([3, 1, 2, 1, 0]) == [0, 1, 1, 2, 3]\n",
    "try:\n",
    "    counting_sort_nonnegative([-1, 3])\n",
    "    assert False, 'Should have raised ValueError'\n",
    "except ValueError:\n",
    "    pass\n",
    "print('counting_sort_nonnegative checks passed.')\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}