{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9d97603b",
   "metadata": {},
   "source": [
    "# Strings "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "03de58b0",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "current = Path.cwd()\n",
    "for parent in [current, *current.parents]:\n",
    "    if (parent / '_config.yml').exists():\n",
    "        project_root = parent  # ← Add project root, not chapters\n",
    "        break\n",
    "else:\n",
    "    project_root = Path.cwd().parent.parent\n",
    "\n",
    "sys.path.insert(0, str(project_root))\n",
    "\n",
    "from shared import thinkpython, diagram, jupyturtle\n",
    "from shared.download import download\n",
    "\n",
    "# Register as top-level modules so direct imports work in subsequent cells\n",
    "sys.modules['thinkpython'] = thinkpython\n",
    "sys.modules['diagram'] = diagram\n",
    "sys.modules['jupyturtle'] = jupyturtle"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "789ee19d",
   "metadata": {},
   "source": [
    "A **string** is a sequence of characters enclosed in quotes. A **character** can be a letter (in almost any alphabet), a digit, a punctuation mark, or white space. \n",
    "\n",
    "It should be noted that strings are immutable, meaning after creation, they cannot be modified or updated. \n",
    "\n",
    "Strings are one of the most commonly used data types in Python, and Python provides a rich set of built-in operations and methods for working with them.\n",
    "\n",
    "In this section we cover:\n",
    "\n",
    "- Creating strings\n",
    "- Indexing and slicing\n",
    "- Concatenation and repetition\n",
    "- Case methods\n",
    "- Searching and testing\n",
    "- Cleaning\n",
    "- Splitting and joining\n",
    "- String formatting\n",
    "- Type-checking methods\n",
    "- String comparison\n",
    "- Docstrings\n",
    "- Application"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5162a52",
   "metadata": {},
   "source": [
    "## String Creation and Accessing\n",
    "\n",
    "Strings are created through assignment operator using **single**, **double**, or **triple** quotes: 'hello', \"hello\", \"\"\"hello\"\"\".\n",
    "\n",
    "The built-in Python function `len()` works with string as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "711a942f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'str'>\n",
      "34\n"
     ]
    }
   ],
   "source": [
    "s = 'supercalifragilisticexpialidocious'\n",
    "\n",
    "print(type(s))\n",
    "\n",
    "n = len(s)\n",
    "print(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e59d375d",
   "metadata": {},
   "source": [
    "Single and double quotes are interchangeable for single-line strings. Triple quotes are used for multi-line strings or strings that contain both single and double quotes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "128fc518",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world!\n",
      "Hello, world!\n",
      "This is\n",
      "a multi-line\n",
      "string.\n"
     ]
    }
   ],
   "source": [
    "s1 = 'Hello, world!'\n",
    "s2 = \"Hello, world!\"\n",
    "s3 = \"\"\"This is\n",
    "a multi-line\n",
    "string.\"\"\"\n",
    "\n",
    "print(s1)\n",
    "print(s2)\n",
    "print(s3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d176b3e",
   "metadata": {},
   "source": [
    "### Escape Sequences and Raw Strings\n",
    "\n",
    "Inside a string, a backslash `\\` introduces an **escape sequence** — a two-character combination that represents a special character:\n",
    "\n",
    "| Escape Sequence | Meaning | Example output |\n",
    "|---|---|---|\n",
    "| `\\n` | Newline | moves to the next line |\n",
    "| `\\t` | Tab | inserts a horizontal tab |\n",
    "| `\\\\` | Backslash | a literal `\\` |\n",
    "| `\\\"` | Double quote | `\"` inside a double-quoted string |\n",
    "| `\\'` | Single quote | `'` inside a single-quoted string |\n",
    "\n",
    "A **raw string** is prefixed with `r` (or `R`) and treats backslashes as literal characters — no escape sequences are processed. Raw strings are especially useful for regular expression patterns and file paths."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5e03e05a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "line1\n",
      "line2\n",
      "col1\tcol2\n",
      "C:\\Users\\ty\n",
      "C:\\Users\\ty\n"
     ]
    }
   ],
   "source": [
    "print(\"line1\\nline2\")       # newline\n",
    "print(\"col1\\tcol2\")         # tab\n",
    "print(\"C:\\\\Users\\\\ty\")      # literal backslashes\n",
    "print(r\"C:\\Users\\ty\")       # raw string — same result, easier to read\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0e92a884",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Escape Sequences and Raw Strings\n",
    "# Difficulty: Basic\n",
    "# 1. Print two words on two lines using \\n\n",
    "# 2. Print two values separated by a tab using \\t\n",
    "# 3. Print a Windows path using a raw string\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "6431ebc9",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apple\n",
      "banana\n",
      "score\t95\n",
      "C:\\Users\\alice\\data\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "print(\"apple\\nbanana\")\n",
    "print(\"score\\t95\")\n",
    "print(r\"C:\\Users\\alice\\data\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83eebc6e",
   "metadata": {},
   "source": [
    "### Indexing and Slicing\n",
    "\n",
    "Strings are **sequences**, meaning each character has a numbered position called an **index**. Python uses zero-based indexing: the first character is at index `0`, the second at index `1`, and so on. Negative indices count from the end of the string."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19267d6c",
   "metadata": {},
   "source": [
    "#### String Indexing\n",
    "\n",
    "As a sequence type, the expression in brackets is an **index**, so called because it *indicates* which character in the sequence to select. String indexing is `0`-based."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0a1e4a1c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b\n"
     ]
    }
   ],
   "source": [
    "fruit = \"banana\"\n",
    "print(fruit[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0016829",
   "metadata": {},
   "source": [
    "You can select a character from a string with the **bracket operator**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "133edb06",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P\n",
      "y\n",
      "n\n",
      "o\n"
     ]
    }
   ],
   "source": [
    "s = 'Python'\n",
    "\n",
    "print(s[0])    # 'P'  — first character\n",
    "print(s[1])    # 'y'  — second character\n",
    "print(s[-1])   # 'n'  — last character\n",
    "print(s[-2])   # 'o'  — second to last"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddc6f0c4",
   "metadata": {},
   "source": [
    "As a reminder, the last letter of a string is the length of the string minus 1. If you use `len()` to access the last element of the sequence you get an `IndexError` (**`string index out of range`**) because there is no element there to be accessed: **0-based indexing**. \n",
    "\n",
    "Also because of 0-based indexing, to get the last character, you have to subtract `1` from `n` (**n-1**)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9df74cbe",
   "metadata": {},
   "outputs": [],
   "source": [
    "fruit = 'banana'\n",
    "n = len(fruit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "53ddfe0f",
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "string index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[31mIndexError\u001b[39m\u001b[31m:\u001b[39m string index out of range\n"
     ]
    }
   ],
   "source": [
    "%%expect IndexError\n",
    "\n",
    "fruit[n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2ac4ca15",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruit[n-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83c2f759",
   "metadata": {},
   "source": [
    "Often forgotten, there's an easier way to access the last element of a sequence: negative indexing, which counts backward from the end. The index `-1` selects the last letter, `-2` selects the second to last, and so on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "2858ddb1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruit[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eab58074",
   "metadata": {},
   "source": [
    "The index in brackets can be a variable.\n",
    "Or an expression that contains variables and operators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8e8d6a17",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "n\n",
      "b a n a n a "
     ]
    }
   ],
   "source": [
    "i = 1\n",
    "print(fruit[i])\n",
    "print(fruit[i+1])\n",
    "\n",
    "for i in range(len(fruit)):\n",
    "    print(fruit[i], end=' ')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f805abb1",
   "metadata": {},
   "source": [
    "Just like lists and tuples, the value of the **index** has to be an **integer** -- otherwise you get a `TypeError`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3c4fe527",
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "string indices must be integers, not 'float'",
     "output_type": "error",
     "traceback": [
      "\u001b[31mTypeError\u001b[39m\u001b[31m:\u001b[39m string indices must be integers, not 'float'\n"
     ]
    }
   ],
   "source": [
    "%%expect TypeError\n",
    "\n",
    "fruit[1.5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee14893",
   "metadata": {},
   "source": [
    "It is tempting to use the `[]` operator on the left side of an\n",
    "assignment, with the intention of changing a character in a string. \n",
    "\n",
    "The result is a `TypeError`. In the error message, the object is the string and the \"item\" is the character we tried to assign. \n",
    "\n",
    "The reason for this error is that strings are **immutable**, which means you can't change an existing string. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3d5bd98a",
   "metadata": {},
   "outputs": [],
   "source": [
    "greeting = 'hello, world'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "85172f02",
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[31mTypeError\u001b[39m\u001b[31m:\u001b[39m 'str' object does not support item assignment\n"
     ]
    }
   ],
   "source": [
    "%%expect TypeError\n",
    "\n",
    "greeting[0] = 'J'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb53ed01",
   "metadata": {},
   "source": [
    "The best you can do is create a new string that is a variation of the original."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ad1d7467",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Jello, world'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_greeting = 'J' + greeting[1:]     ### \"+\" is concatenate here\n",
    "new_greeting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3185ecf",
   "metadata": {},
   "source": [
    "This example concatenates a new first letter onto a slice of `greeting`.\n",
    "It has no effect on the original string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "14e46b03",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'hello, world'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "greeting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "130aa560",
   "metadata": {},
   "source": [
    "#### Slicing Strings\n",
    "\n",
    "Just like lists and tuples, a segment of a string is called a **slice**. Selecting a slice is similar to selecting a character. The general syntax of slicing is the same as lists and tuple:\n",
    "\n",
    "```python \n",
    "sequence[start:stop:step]\n",
    "``` \n",
    "Also, the parameters are start-inclusive and stop-exclusive.\n",
    "\n",
    "- `start` — index to begin at (inclusive, default `0`)\n",
    "- `stop` — index to end at (exclusive, default end of string)\n",
    "- `step` — how many characters to skip (default `1`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "b1b70ae4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ban'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruit = 'banana'\n",
    "fruit[0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7825a1c3",
   "metadata": {},
   "source": [
    "The operator `[m:n]` returns the part of the string from the `m`th character to the `n`th character, including the first but **excluding** the second. This behavior is counterintuitive, but it might help to imagine the indices pointing *between* the characters, as in this figure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7df47fa3",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "from diagram import make_binding, Element, Value\n",
    "\n",
    "binding = make_binding(\"fruit\", ' b a n a n a ')\n",
    "elements = [Element(Value(i), None) for i in range(7)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "3a07c99e",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJsAAABKCAYAAACsAyYGAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAABvBJREFUeJztnGlIFVEUx49mVrSo7RFhZYQtVmhFC2QFUQSlBGF9iYiIPlSfgigiEyHsc32wBZQWTSgkCrIkzAoirVzK9n2hLCkzwxJ9N86B95hyZpzXjDed/j8Q7vjOnJl57z/n3pm5/4lQSikCQAOROjYCAMQGtILKBrQBsQFtQGxAGxAb0AbEBrQBsQFtQGxAGxAb0AbEBrQBsQFtQGxAGxAb0AbEBrQBsQFtQGxAGxAb0AbEBrqn2M6ePUuTJk2iGTNm0N27dx2vd+vWLcrIyJB2Y2Mj5eTkUE9j7Nixpv/Pz8+n9PR0+h/Jz8+nvXv3do3YcnNzac+ePVRdXU1JSUmh/7e1tdmuN3PmTCoqKurRYgPucSy2bdu20bVr12jXrl00b948ioiIoMzMTJo1axbt3Lmzwxl+/vx5WrhwobSvXLki1ZDZvHkzffv2TZZZhP+S79+/06FDh+jq1av048cP29hhw4ZZftbU1EQrV66kyZMn04IFC+jly5emcdu3b5fvi4+d4x49emQax9WCe4IVK1ZIzsWLF9Pnz5+7Xc5+/frRgAEDyDEqDFJTU1VxcbG0edWsrKzQZ3l5eSotLS20fO7cOYlnysrK1PTp06X94sULFRMTo7oD7e3tqqSkRGVnZ6ucnBxVXl6uWlpawsrBxx0dHa3u378vy/v371dLliwxjf348WOoXVhYqJYuXWoal5mZqeLj41VDQ4MsZ2RkqH379nW7nOESRS7YsGEDdQWPHz+m27dvky7i4+PlLOcKXF5eLmf9/PnzHa/PlZ7HssymTZto9+7d1N7eTr169fotrrS0lA4cOCCVPRAIWFYWZtmyZTRkyBBpz50713KM/K9zhoMrsRlLaFRUlHzBQTrrlroTOnzar1+/pi1btlBlZSUlJCRQbW2tdFFW9O3bN9Rm0ZqNi7tDTm1iMzJhwgTZsZaWFurduzcVFBSYxg0aNEhiWltbKTo62jRm4sSJ8tfV8Fl76dIluVrmfeYx5uzZs3/7UZxw48YNevjwISUmJtLRo0dp0aJFHara169fZRujRo0ScR88eND1/veUnJ6Lbc6cObR8+XKaOnWq7Ch3Qzdv3uwQN3jwYFq3bh1NmzZNKiP/0P8KFv2rV6/kzP0bkRm70R07dtDTp0+lmzp27FiHmKSkJFqzZg1NmTJFYry4XdJTcgaJ4IGbZ9kAsAFPEIA2IDYX/Pz509M4P+Y0ArG5gJ+keBnnx5xGIDYXjB8/3tM4P+Y0ArG5oKGhwdM4P+Y0ArG5gO9HeRnnx5xGIDYX8INoL+P8mNMIxOaC+vp6T+P8mNMIbuq6gB9UDxw40LM4P+Y0gsrmgqqqKk/j/JjTCCob0AYqmwt4hq+XcX7MaQSVzQU8H4zn8XkV58ecRlDZXGA2hcpNnB9zGkFlcwFPl+b5eV7F+TGnEVQ2F7x7987TOD/m7JKZuv8TbAh5+/YtxcbGii3uzyngRh/G8ePH6dOnTzRixAgaPny47Y9XUlJCkZGRYvjhGbJmeZubm8WDy5/duXOHVq1aZXu/i00tnNfo8/0T9vIeOXJE7IpsNFq9ejX179/fMp6tinzrg7fPM5yDZp9O8cSj9R/x/v17debMGWmz9a+2ttYytq2tTTU3N4v9sb6+3jZvU1OTam1tlXZpaamqq6uztB8GAgFpV1VVyT5YwbGnTp1Subm5ttv+8uWLKioqUk7gfSwoKJBjCxd0o2Hy5s0bcR0FTT68bAVXH7sKYYSrU/DhNq/HJnAzuPIFP+MJjHbm6Xv37knltcplhI8jLy+PLl++bOs244rOV6GFhYVSYbnSOgViCxPuGvv06SNtNsiwacZLGhsb6fnz57busg8fPoiLi+12bC6yco7V1dWJAakz2Hi0detWWr9+vbwl4MGDB5axLC6+OFi7di0lJyeL19YpEFuYsMCCU6JZeH8z+8EKzltcXExpaWmW40Bm5MiRtHHjRrEMXr9+3TSGbZXskHJS1bhSsa2SY3n8ZfeQnY9/zJgxsn/jxo2T8ahTILYw4S+aKw/z7NkzWfaCQCBAp0+fptTUVBo6dKhlnNEIzhXWal4Zi6CmpoZOnDghlejChQuO/ARsbbS7pTF69GiZOMldLVfYuLg4cgrus/0FbGzmq8eYmJhOq9DJkyflR4mNjaWUlJTQC3b+hIVx8eLF0BUrv3THrAvk7fL2eezGFYlfaNPZ7IvDhw/LayGsePLkCZWVlYlweT/5mDi/FRUVFdJFcyXk7Tu93waxAW2gGwXagNiANiA2oA2IDWgDYgPagNiANiA2oA2IDWgDYgPagNiANiA2oA2IDWgDYgPagNiANiA2oA2IDWgDYgPagNiANiA2oA2IDWgDYgOki1/yJaJTtmqVoQAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 135x54 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from diagram import diagram, adjust\n",
    "from matplotlib.transforms import Bbox\n",
    "\n",
    "width, height, x, y = [1.35, 0.54, 0.23, 0.39]\n",
    "\n",
    "ax = diagram(width, height)\n",
    "bbox = binding.draw(ax, x, y)\n",
    "bboxes = [bbox]\n",
    "\n",
    "def draw_elts(x, y, elements):\n",
    "    for elt in elements:\n",
    "        bbox = elt.draw(ax, x, y, draw_value=False)\n",
    "        bboxes.append(bbox)\n",
    "\n",
    "        x1 = (bbox.xmin + bbox.xmax) / 2\n",
    "        y1 = bbox.ymax + 0.02\n",
    "        y2 = y1 + 0.14\n",
    "        handle = plt.plot([x1, x1], [y1, y2], ':', lw=0.5, color='gray')\n",
    "        x += 0.105\n",
    "    \n",
    "draw_elts(x + 0.48, y - 0.25, elements)\n",
    "bbox = Bbox.union(bboxes)\n",
    "# adjust(x, y, bbox)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec248f7b",
   "metadata": {},
   "source": [
    "For example, the slice `[3:6]` selects the letters `ana`, which means that `6` is **legal** as part of a **slice**, but **not** **legal** as an **index**.\n",
    "\n",
    "Also, \n",
    "- if you omit the first index, the slice starts at the beginning of the string.\n",
    "- if you omit the second index, the slice goes to the end of the string:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "42887b0a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n",
      "world!\n",
      "Hello\n",
      "Hlo ol!\n",
      "!dlrow ,olleH\n"
     ]
    }
   ],
   "source": [
    "s = 'Hello, world!'\n",
    "\n",
    "print(s[0:5])    # 'Hello'   — characters 0 through 4\n",
    "print(s[7:])     # 'world!'  — from index 7 to end\n",
    "print(s[:5])     # 'Hello'   — from start to index 4\n",
    "print(s[::2])    # every other character\n",
    "print(s[::-1])   ### 'reversed string' ###"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63920578",
   "metadata": {},
   "source": [
    "If the first index is greater than or equal to the second, the result is an **empty string**, represented by two quotation marks. An empty string contains no characters and has length 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "f6a0be45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "len(fruit[3:3]): 0\n",
      "Type of fruit[3:3]: <class 'str'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "''"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(f\"len(fruit[3:3]): {len(fruit[3:3])}\")\n",
    "print(f\"Type of fruit[3:3]: {type(fruit[3:3])}\")\n",
    "fruit[3:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "030ac726",
   "metadata": {},
   "source": [
    "Continuing this example, what do you think `fruit[:]` means? Try it and\n",
    "see."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "2f33ded6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'banana'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruit[:]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cfa64b8",
   "metadata": {},
   "source": [
    "To practice your slicing skills, play these in your head with string \"banana\", which may not be as easy as you think."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "9f8b44f2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'bnn'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruit[0:-1]                     \n",
    "fruit[-2:]                      \n",
    "fruit[0:-1:2]                   \n",
    "\n",
    "# print(fruit[0:-1])              ### all but the last letter: banan\n",
    "# print(fruit[-2:])               ### the last two letters: na\n",
    "# print(fruit[0:-1:2])            ### step is 2, so you get bnn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "10ba446a",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Indexing and Slicing\n",
    "# Difficulty: Basic\n",
    "text = 'superpython'\n",
    "# 1. Print the first character\n",
    "# 2. Print the last character using negative indexing\n",
    "# 3. Print every second character\n",
    "# 4. Print the reversed string\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "3670ac95",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "s\n",
      "n\n",
      "spryhn\n",
      "nohtyprepus\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "text = 'superpython'\n",
    "print(text[0])\n",
    "print(text[-1])\n",
    "print(text[::2])\n",
    "print(text[::-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33668dc2",
   "metadata": {},
   "source": [
    "### Concatenation and Repetition\n",
    "\n",
    "The `+` operator joins two strings together (**concatenation**). The `*` operator repeats a string a given number of times (**repetition**)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "07e14e8a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, World!\n",
      "* * * * * * * * * * \n",
      "hahaha\n"
     ]
    }
   ],
   "source": [
    "first = 'Hello'\n",
    "last  = 'World'\n",
    "\n",
    "# Concatenation\n",
    "greeting = first + ', ' + last + '!'\n",
    "print(greeting)        # 'Hello, World!'\n",
    "\n",
    "# Repetition\n",
    "line = '* ' * 10\n",
    "print(line)            # '* * * * * * * * * * '\n",
    "\n",
    "print('ha' * 3)        # 'hahaha'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "eede96da",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Concatenation and Repetition\n",
    "# Difficulty: Basic\n",
    "first = 'Alice'\n",
    "last = 'Bob'\n",
    "# 1. Build and print: \"Alice & Bob\" using concatenation\n",
    "# 2. Print \"ha\" repeated 4 times\n",
    "# 3. Create a divider of 20 dashes and print it\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "a1bd97c1",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Alice & Bob\n",
      "hahahaha\n",
      "--------------------\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "first = 'Alice'\n",
    "last = 'Bob'\n",
    "print(first + \" & \" + last)\n",
    "print(\"ha\" * 4)\n",
    "print(\"-\" * 20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39fee41e",
   "metadata": {},
   "source": [
    "## String Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4337712d",
   "metadata": {},
   "source": [
    "Python provides strings methods that perform a variety of useful operations. A method is similar to a function, it usually takes arguments and returns a value. But the syntax for methods is different from that of functions. A method belongs to an object, so, for example, the method `upper()` that returns a new all uppercase string has to come after a string object with a `.` (**dot notation**), which makes the method syntax like`'banana'.upper()` to output 'BANANA', instead of what a function would look like `upper('banana')`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "dbddc05b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'BANANA'"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word = 'banana'\n",
    "new_word = word.upper()\n",
    "new_word"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80eb36d3",
   "metadata": {},
   "source": [
    "This use of the dot operator specifies the name of the method, `upper`, and the name of the string to apply the method to, `word`.\n",
    "The empty parentheses indicate that this method takes no arguments.\n",
    "\n",
    "A method call is called an **invocation**; in this case, we would say that we are invoking `upper` on `word`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "3d1aea57",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "47\n",
      "['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']\n"
     ]
    }
   ],
   "source": [
    "methods = [m for m in dir(str) if not m.startswith('_')]\n",
    "num_str_methods = len(methods)\n",
    "print(num_str_methods)  # 47\n",
    "print(methods)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "ed9271a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "47"
      ]
     },
     "metadata": {
      "scrapbook": {
       "mime_prefix": "",
       "name": "num_str_methods"
      }
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "from myst_nb import glue\n",
    "glue(\"num_str_methods\", num_str_methods)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f3919a5",
   "metadata": {},
   "source": [
    "Python offers {glue:}`num_str_methods` string methods. Here below is a collection of some of the commonly used ones.\n",
    "\n",
    "| Category | Method | Description |\n",
    "|---|---|---|\n",
    "| Case | `.upper()` | All uppercase |\n",
    "| Case | `.lower()` | All lowercase |\n",
    "| Search | `.find(x)` | Index of first match, -1 if missing |\n",
    "| Search | `.index(x)` | Index of first match, raises error if missing |\n",
    "| Search | `.count(x)` | Count occurrences |\n",
    "| Whitespace | `.strip()` | Remove leading/trailing whitespace |\n",
    "| Split | `.split(x)` | Split on delimiter |\n",
    "| Join | `.join(lst)` | Join list into string |\n",
    "| Replace | `.replace(a, b)` | Replace all occurrences |\n",
    "| Check | `.isspace()` | All whitespace |\n",
    "| Check | `.isupper()` | All uppercase |\n",
    "| Check | `.islower()` | All lowercase |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "3c6a212d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--- Case ---\n",
      "  HELLO, WORLD!  \n",
      "  hello, world!  \n",
      "\n",
      "--- Search ---\n",
      "4\n",
      "16\n",
      "2\n",
      "\n",
      "--- Whitespace ---\n",
      "'Hello, World!'\n",
      "\n",
      "--- Split ---\n",
      "['the', 'quick', 'brown', 'fox']\n",
      "\n",
      "--- Join ---\n",
      "apple, banana, cherry\n",
      "\n",
      "--- Replace ---\n",
      "the quick brown cat\n",
      "\n",
      "--- Check ---\n",
      "True\n",
      "True\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "s = \"  Hello, World!  \"\n",
    "words = \"the quick brown fox\"\n",
    "\n",
    "# Case\n",
    "print(\"--- Case ---\")\n",
    "print(s.upper())\n",
    "print(s.lower())\n",
    "\n",
    "# Search\n",
    "print(\"\\n--- Search ---\")\n",
    "print(words.find(\"quick\"))\n",
    "print(words.index(\"fox\"))\n",
    "print(words.count(\"o\"))\n",
    "\n",
    "# Whitespace\n",
    "print(\"\\n--- Whitespace ---\")\n",
    "print(repr(s.strip()))\n",
    "\n",
    "# Split\n",
    "print(\"\\n--- Split ---\")\n",
    "print(words.split(\" \"))\n",
    "\n",
    "# Join\n",
    "print(\"\\n--- Join ---\")\n",
    "print(\", \".join([\"apple\", \"banana\", \"cherry\"]))\n",
    "\n",
    "# Replace\n",
    "print(\"\\n--- Replace ---\")\n",
    "print(words.replace(\"fox\", \"cat\"))\n",
    "\n",
    "# Check\n",
    "print(\"\\n--- Check ---\")\n",
    "print(\"   \".isspace())\n",
    "print(\"HELLO\".isupper())\n",
    "print(\"hello\".islower())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a4172ce",
   "metadata": {},
   "source": [
    "### Case Methods\n",
    "\n",
    "Python provides several methods for changing the case of a string. These are useful for normalizing text before comparison or display."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "2aa5cab8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HELLO, WORLD!\n",
      "hello, world!\n",
      "Hello, World!\n",
      "Hello, world!\n",
      "HELLO, WORLD!\n"
     ]
    }
   ],
   "source": [
    "s = 'hello, world!'\n",
    "\n",
    "print(s.upper())        # 'HELLO, WORLD!'  — all uppercase\n",
    "print(s.lower())        # 'hello, world!'  — all lowercase\n",
    "print(s.title())        # 'Hello, World!'  — first letter of each word capitalized\n",
    "print(s.capitalize())   # 'Hello, world!'  — first letter of string capitalized\n",
    "print(s.swapcase())     # 'HELLO, WORLD!'  — swap upper and lower"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b622a25",
   "metadata": {},
   "source": [
    "Case methods are often used to make comparisons case-insensitive. For example, you might want to turn a username or email address all uppercase in the case of user login. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "10dd7834",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "user_input = 'Alice'\n",
    "username    = 'alice'\n",
    "\n",
    "print(user_input == username)                    # False\n",
    "print(user_input.lower() == username.lower())    # True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "22d2eefa",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Case Methods\n",
    "# Difficulty: Basic\n",
    "user_input = 'PyThOn'\n",
    "target = 'python'\n",
    "# 1. Print user_input in upper, lower, and title case\n",
    "# 2. Print whether user_input matches target case-insensitively\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "1389ad4e",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PYTHON\n",
      "python\n",
      "Python\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "user_input = 'PyThOn'\n",
    "target = 'python'\n",
    "print(user_input.upper())\n",
    "print(user_input.lower())\n",
    "print(user_input.title())\n",
    "print(user_input.casefold() == target.casefold())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9383f003",
   "metadata": {},
   "source": [
    "### Searching and Testing\n",
    "\n",
    "#### Finding a Substring\n",
    "\n",
    "`find(sub)` returns the index of the first occurrence of `sub`, or `-1` if not found. `index(sub)` works the same way but raises a `ValueError` if the substring is not found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "e465895e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "17\n",
      "-1\n",
      "17\n"
     ]
    }
   ],
   "source": [
    "s = 'data science and data engineering'\n",
    "\n",
    "print(s.find('data'))     # 0  — first occurrence\n",
    "print(s.find('data', 5))  # 17 — search starting at index 5\n",
    "print(s.find('math'))     # -1 — not found\n",
    "\n",
    "print(s.rfind('data'))    # 17 — last occurrence"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bece8efc",
   "metadata": {},
   "source": [
    "#### Counting Occurrences\n",
    "\n",
    "`count(sub)` returns the number of non-overlapping occurrences of a substring."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "e363d5c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "s = 'banana'\n",
    "print(s.count('a'))    # 3\n",
    "print(s.count('an'))   # 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4099eafd",
   "metadata": {},
   "source": [
    "#### Starts and Ends With\n",
    "\n",
    "`startswith(prefix)` and `endswith(suffix)` test whether a string begins or ends with a given substring. Both return `True` or `False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ec42a9a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "filename = 'report_2025.csv'\n",
    "\n",
    "print(filename.startswith('report'))   # True\n",
    "print(filename.endswith('.csv'))       # True\n",
    "print(filename.endswith('.xlsx'))      # False"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b95c6a5",
   "metadata": {},
   "source": [
    "#### The `in` Operator\n",
    "\n",
    "The `in` operator tests whether a substring appears anywhere in a string. It is the most readable way to check for membership."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "af0c59e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "s = 'machine learning'\n",
    "\n",
    "print('learning' in s)    # True\n",
    "print('deep' in s)        # False\n",
    "print('deep' not in s)    # True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "aea3cc37",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Searching and Testing\n",
    "# Difficulty: Intermediate\n",
    "sentence = 'data science uses data pipelines'\n",
    "# 1. Find the index of the first \"data\"\n",
    "# 2. Find the index of \"data\" starting from position 5\n",
    "# 3. Count how many times \"data\" appears\n",
    "# 4. Check if \"science\" is in sentence\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "671c764d",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "18\n",
      "2\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "sentence = 'data science uses data pipelines'\n",
    "print(sentence.find('data'))\n",
    "print(sentence.find('data', 5))\n",
    "print(sentence.count('data'))\n",
    "print('science' in sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "407bbda6",
   "metadata": {},
   "source": [
    "### Cleaning\n",
    "\n",
    "Real-world text data often contains extra whitespace or unwanted characters. Python provides several methods for cleaning strings.\n",
    "\n",
    "#### Stripping Whitespace\n",
    "\n",
    "- `strip()` removes leading and trailing whitespace. \n",
    "- `lstrip()` (left strip) removes only leading whitespace.\n",
    "- `rstrip()` (right strip) removes only trailing whitespace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "300fa0a3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'hello, world!'\n",
      "'hello, world!   '\n",
      "'   hello, world!'\n"
     ]
    }
   ],
   "source": [
    "s = '   hello, world!   '\n",
    "\n",
    "print(repr(s.strip()))    # 'hello, world!'\n",
    "print(repr(s.lstrip()))   # 'hello, world!   '\n",
    "print(repr(s.rstrip()))   # '   hello, world!'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f7c8686",
   "metadata": {},
   "source": [
    "You can also pass a character to strip. For example, `s.strip('.')` removes leading and trailing periods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "c47fcaef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello\n"
     ]
    }
   ],
   "source": [
    "s = '...hello...'\n",
    "print(s.strip('.'))    # 'hello'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c59d0674",
   "metadata": {},
   "source": [
    "#### Replacing Substrings\n",
    "\n",
    "`replace(old, new)` returns a new string with all occurrences of `old` replaced by `new`. An optional third argument limits the number of replacements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "b5c7470e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I like dogs. Cats are great.\n",
      "I like dogs. Cats are great.\n",
      "hello world\n"
     ]
    }
   ],
   "source": [
    "s = 'I like cats. Cats are great.'\n",
    "\n",
    "print(s.replace('cats', 'dogs'))        # replace all\n",
    "print(s.replace('cats', 'dogs', 1))     # replace first occurrence only\n",
    "\n",
    "# Useful for removing characters\n",
    "s2 = 'hello, world!'\n",
    "print(s2.replace(',', '').replace('!', ''))   # 'hello world'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "b6383085",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Cleaning Strings\n",
    "# Difficulty: Intermediate\n",
    "raw = '...  Hello, Python!  ...'\n",
    "# 1. Strip leading/trailing dots\n",
    "# 2. Strip leading/trailing whitespace from the result\n",
    "# 3. Replace \"Python\" with \"Data Science\"\n",
    "# 4. Print the cleaned string\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "dadf077e",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, Data Science!\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "raw = '...  Hello, Python!  ...'\n",
    "clean = raw.strip('.').strip().replace('Python', 'Data Science')\n",
    "print(clean)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b576468c",
   "metadata": {},
   "source": [
    "### Splitting and Joining\n",
    "\n",
    "#### Splitting\n",
    "\n",
    "`split(sep)` breaks a string into a list of substrings at each occurrence of the separator `sep`. If no separator is given, it splits on any whitespace and removes empty strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "cf66c128",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Python', 'R', 'SQL', 'Julia']\n",
      "['one', 'two', 'three']\n",
      "['a', '', 'b', '', 'c']\n",
      "['2025', '08-26']\n"
     ]
    }
   ],
   "source": [
    "s = 'Python,R,SQL,Julia'\n",
    "print(s.split(','))           # ['Python', 'R', 'SQL', 'Julia']\n",
    "\n",
    "s2 = 'one two   three'\n",
    "print(s2.split())             # ['one', 'two', 'three']\n",
    "\n",
    "# Split on a specific delimiter, keeping empty strings\n",
    "s3 = 'a,,b,,c'\n",
    "print(s3.split(','))          # ['a', '', 'b', '', 'c']\n",
    "\n",
    "# Limit the number of splits\n",
    "s4 = '2025-08-26'\n",
    "print(s4.split('-', 1))       # ['2025', '08-26']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f77438ca",
   "metadata": {},
   "source": [
    "#### Joining\n",
    "\n",
    "`join(iterable)` is the inverse of `split()`. It concatenates a list of strings into one string, inserting the separator between each element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "1864509f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python is fun\n",
      "Python-is-fun\n",
      "Pythonisfun\n",
      "too many spaces\n"
     ]
    }
   ],
   "source": [
    "words = ['Python', 'is', 'fun']\n",
    "\n",
    "print(' '.join(words))     # 'Python is fun'\n",
    "print('-'.join(words))     # 'Python-is-fun'\n",
    "print(''.join(words))      # 'Pythonisfun'\n",
    "\n",
    "# Practical: reassemble a cleaned sentence\n",
    "sentence = '  too   many   spaces  '\n",
    "cleaned  = ' '.join(sentence.split())\n",
    "print(cleaned)             # 'too many spaces'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "19f6b87a",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Splitting and Joining\n",
    "# Difficulty: Intermediate\n",
    "record = 'alice,bob,charlie'\n",
    "# 1. Split the record into a list of names\n",
    "# 2. Join names with \" - \"\n",
    "# 3. Print both the list and joined string\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "fbbb9ca1",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['alice', 'bob', 'charlie']\n",
      "alice - bob - charlie\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "record = 'alice,bob,charlie'\n",
    "names = record.split(',')\n",
    "joined = ' - '.join(names)\n",
    "print(names)\n",
    "print(joined)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9941bad",
   "metadata": {},
   "source": [
    "### String Formatting\n",
    "\n",
    "String formatting inserts values into a string template. Python offers three approaches: f-strings (modern, recommended), `str.format()`, and `%` formatting (legacy).\n",
    "\n",
    "#### f-Strings\n",
    "\n",
    "An **f-string** is prefixed with `f` and uses `{}` to embed expressions directly inside the string. F-strings are the most readable and most commonly used approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "6fddace4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Student: Alice\n",
      "Score: 95.68\n",
      "Score:      95.68\n",
      "ALICE\n",
      "Double score: 191.356\n"
     ]
    }
   ],
   "source": [
    "name  = 'Alice'\n",
    "score = 95.678\n",
    "\n",
    "print(f'Student: {name}')\n",
    "print(f'Score: {score:.2f}')        # 2 decimal places\n",
    "print(f'Score: {score:>10.2f}')     # right-aligned, width 10\n",
    "print(f'{name.upper()}')              # apply conversion (capitalize)\n",
    "print(f'Double score: {score * 2}') # expressions work inside {}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f748acd0",
   "metadata": {},
   "source": [
    "#### Format Specification Mini-Language\n",
    "\n",
    "Inside `{}`, a colon `:` introduces a **format spec** that controls how the value is displayed.\n",
    "\n",
    "| Spec | Meaning | Example |\n",
    "|------|---------|---------|\n",
    "| `.2f` | 2 decimal places (float) | `3.14` |\n",
    "| `d` | integer | `42` |\n",
    "| `e` | scientific notation | `3.14e+00` |\n",
    "| `%` | percentage | `75.00%` |\n",
    "| `>10` | right-align, width 10 | `      3.14` |\n",
    "| `<10` | left-align, width 10 | `3.14      ` |\n",
    "| `^10` | center, width 10 | `   3.14   ` |\n",
    "| `,` | thousands separator | `1,000,000` |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "bf25eaea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3.1416\n",
      "3.141593e+00\n",
      "1,000,000\n",
      "75.6%\n",
      "   3.14   \n"
     ]
    }
   ],
   "source": [
    "pi = 3.14159265\n",
    "n  = 1000000\n",
    "r  = 0.756\n",
    "\n",
    "print(f'{pi:.4f}')      # '3.1416'\n",
    "print(f'{pi:e}')        # '3.141593e+00'\n",
    "print(f'{n:,}')         # '1,000,000'\n",
    "print(f'{r:.1%}')       # '75.6%'\n",
    "print(f'{pi:^10.2f}')   # '   3.14   '"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89cbe6c4",
   "metadata": {},
   "source": [
    "#### `str.format()`\n",
    "\n",
    "`str.format()` is an older but still widely used formatting approach. Values are passed as arguments and inserted into `{}` placeholders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "a7dc9b5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name: Bob, Grade: 88.5\n",
      "Name: Bob, Grade: 88.5\n",
      "Name: Bob, Grade: 88.5\n"
     ]
    }
   ],
   "source": [
    "name  = 'Bob'\n",
    "grade = 88.5\n",
    "\n",
    "print('Name: {}, Grade: {:.1f}'.format(name, grade))\n",
    "print('Name: {0}, Grade: {1:.1f}'.format(name, grade))   # positional\n",
    "print('Name: {n}, Grade: {g:.1f}'.format(n=name, g=grade))  # keyword"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "dc2bda0e",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: String Formatting\n",
    "# Difficulty: Intermediate\n",
    "name = 'Alice'\n",
    "score = 92.456\n",
    "# 1. Print name and score with score rounded to 1 decimal place using f-string\n",
    "# 2. Print score as a percentage with 1 decimal place (assume score/100)\n",
    "# 3. Print name right-aligned in width 10\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "81baaead",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name: Alice, Score: 92.5\n",
      "Percent: 92.5%\n",
      "     Alice\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "name = 'Alice'\n",
    "score = 92.456\n",
    "print(f'Name: {name}, Score: {score:.1f}')\n",
    "print(f'Percent: {score/100:.1%}')\n",
    "print(f'{name:>10}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ca40a76",
   "metadata": {},
   "source": [
    "### Type-Checking Methods\n",
    "\n",
    "Python strings have a family of `is*()` methods that test the character composition of a string. Each returns `True` or `False`.\n",
    "\n",
    "| Method | Returns `True` if... |\n",
    "|--------|----------------------|\n",
    "| `isdigit()` | all characters are digits (0–9) |\n",
    "| `isalpha()` | all characters are letters |\n",
    "| `isalnum()` | all characters are letters or digits |\n",
    "| `isspace()` | all characters are whitespace |\n",
    "| `isupper()` | all cased characters are uppercase |\n",
    "| `islower()` | all cased characters are lowercase |\n",
    "| `istitle()` | string is in title case |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "9c2bdc48",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "False\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "print('12345'.isdigit())     # True\n",
    "print('abc'.isalpha())       # True\n",
    "print('abc123'.isalnum())    # True\n",
    "print('   '.isspace())       # True\n",
    "print('HELLO'.isupper())     # True\n",
    "print('hello'.islower())     # True\n",
    "print('Hello World'.istitle()) # True\n",
    "\n",
    "# Mixed cases return False\n",
    "print('abc123!'.isalnum())   # False — '!' is not alphanumeric\n",
    "print(''.isdigit())          # False — empty string"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f57a48a5",
   "metadata": {},
   "source": [
    "These methods are useful for input validation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "86f545b8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Valid year: 2025\n"
     ]
    }
   ],
   "source": [
    "user_input = '2025'\n",
    "\n",
    "if user_input.isdigit():\n",
    "    year = int(user_input)\n",
    "    print(f'Valid year: {year}')\n",
    "else:\n",
    "    print('Please enter a number.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "34986e8c",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Type-Checking Methods\n",
    "# Difficulty: Intermediate\n",
    "samples = ['123', 'abc', 'abc123', '   ', 'Hello World']\n",
    "# 1. For each sample, print isdigit, isalpha, and isalnum results\n",
    "# 2. For \"Hello World\", print istitle result\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "68640dab",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "123 True False True\n",
      "abc False True True\n",
      "abc123 False False True\n",
      "    False False False\n",
      "Hello World False False False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "samples = ['123', 'abc', 'abc123', '   ', 'Hello World']\n",
    "for s in samples:\n",
    "    print(s, s.isdigit(), s.isalpha(), s.isalnum())\n",
    "print('Hello World'.istitle())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60b7f8f0",
   "metadata": {},
   "source": [
    "### Methods Reference\n",
    "\n",
    "Python provides a number of function and methods for string operations. The commonly used methods are: \n",
    "\n",
    "| Operation | Syntax | Description |\n",
    "|-----------|--------|-------------|\n",
    "| Length | `len(s)` | Number of characters |\n",
    "| Indexing | `s[i]` | Character at position `i` |\n",
    "| Slicing | `s[start:stop:step]` | Extract substring |\n",
    "| Concatenation | `s1 + s2` | Join two strings |\n",
    "| Repetition | `s * n` | Repeat string `n` times |\n",
    "| Uppercase | `s.upper()` | All uppercase |\n",
    "| Lowercase | `s.lower()` | All lowercase |\n",
    "| Title case | `s.title()` | Capitalize each word |\n",
    "| Find | `s.find(sub)` | Index of first match, or `-1` |\n",
    "| Count | `s.count(sub)` | Number of occurrences |\n",
    "| Membership | `sub in s` | Test if substring present |\n",
    "| Strip | `s.strip()` | Remove leading/trailing whitespace |\n",
    "| Replace | `s.replace(old, new)` | Substitute substring |\n",
    "| Split | `s.split(sep)` | String → list |\n",
    "| Join | `sep.join(list)` | List → string |\n",
    "| f-string | `f'{var:.2f}'` | Formatted string literal |\n",
    "| Type check | `s.isdigit()`, etc. | Test character composition |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "e3dbb7c4",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Methods Reference Practice\n",
    "# Difficulty: Intermediate\n",
    "s = '  banana split  '\n",
    "# Use at least 4 methods from this section to:\n",
    "# 1. Remove outer spaces\n",
    "# 2. Replace \"split\" with \"bread\"\n",
    "# 3. Convert to uppercase\n",
    "# 4. Check whether \"BANANA\" is in the final string\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "41b70f67",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BANANA BREAD\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "s = '  banana split  '\n",
    "t = s.strip().replace('split', 'bread').upper()\n",
    "print(t)\n",
    "print('BANANA' in t)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49e4da57",
   "metadata": {},
   "source": [
    "## String Comparison"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fa1d3e8",
   "metadata": {},
   "source": [
    "Observe the following operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "fb653441",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "### check out the comparisons here:\n",
    "\n",
    "print(\"A\" < 'a')\n",
    "print(\"a\" < 'banana')\n",
    "print('Pineapple' > 'pineapple')\n",
    "print('Pineapple' > 'banana')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7fae2a8",
   "metadata": {},
   "source": [
    "The relational operators work on strings as seen above. String comparisons are based on the ASCII code table (this one is easier to read than the one presented in an earlier chapter). As you can see in the table below, each character has a decimal number that string comparison uses to compare strings. Note that:\n",
    "\n",
    "- `0` is `48`\n",
    "- `A` is `65`\n",
    "- `a` is `97`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8547eec2",
   "metadata": {},
   "source": [
    "<table border=\"1\">\n",
    "\n",
    "<thead>\n",
    "<tr>\n",
    "<th style=\"max-width:30px;\">Dec</th>\n",
    "<th>Chr</th>\n",
    "<th></th>\n",
    "<th style=\"max-width:30px;\">Dec</th>\n",
    "<th>Chr</th>\n",
    "<th></th>\n",
    "<th style=\"max-width:30px;\">Dec</th>\n",
    "<th>Chr</th>\n",
    "<th></th>\n",
    "<th style=\"max-width:30px;\">Dec</th>\n",
    "<th>Chr</th>\n",
    "<th></th>\n",
    "<th style=\"max-width:30px;\">Dec</th>\n",
    "<th>Chr</th>\n",
    "</tr>\n",
    "</thead>\n",
    "\n",
    "<tbody>\n",
    "\n",
    "<tr>\n",
    "<td>0</td>\n",
    "<td class=\"greycell\">NUL</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>26</td>\n",
    "<td class=\"greycell\">SUB</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>52</td>\n",
    "<td class=\"boldred\">4</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>78</td>\n",
    "<td class=\"boldred\">N</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>104</td>\n",
    "<td class=\"boldred\">h</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>1</td>\n",
    "<td class=\"greycell\">SOH</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>27</td>\n",
    "<td class=\"greycell\">ESC</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>53</td>\n",
    "<td class=\"boldred\">5</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>79</td>\n",
    "<td class=\"boldred\">O</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>105</td>\n",
    "<td class=\"boldred\">i</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>2</td>\n",
    "<td class=\"greycell\">STX</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>28</td>\n",
    "<td class=\"greycell\">FS</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>54</td>\n",
    "<td class=\"boldred\">6</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>80</td>\n",
    "<td class=\"boldred\">P</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>106</td>\n",
    "<td class=\"boldred\">j</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>3</td>\n",
    "<td class=\"greycell\">ETX</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>29</td>\n",
    "<td class=\"greycell\">GS</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>55</td>\n",
    "<td class=\"boldred\">7</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>81</td>\n",
    "<td class=\"boldred\">Q</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>107</td>\n",
    "<td class=\"boldred\">k</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>4</td>\n",
    "<td class=\"greycell\">EOT</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>30</td>\n",
    "<td class=\"greycell\">RS</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>56</td>\n",
    "<td class=\"boldred\">8</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>82</td>\n",
    "<td class=\"boldred\">R</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>108</td>\n",
    "<td class=\"boldred\">l</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>5</td>\n",
    "<td class=\"greycell\">ENQ</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>31</td>\n",
    "<td class=\"greycell\">US</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>57</td>\n",
    "<td class=\"boldred\">9</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>83</td>\n",
    "<td class=\"boldred\">S</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>109</td>\n",
    "<td class=\"boldred\">m</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>6</td>\n",
    "<td class=\"greycell\">ACK</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>32</td>\n",
    "<td class=\"boldred\"></td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>58</td>\n",
    "<td class=\"boldred\">:</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>84</td>\n",
    "<td class=\"boldred\">T</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>110</td>\n",
    "<td class=\"boldred\">n</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>7</td>\n",
    "<td class=\"greycell\">BEL</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>33</td>\n",
    "<td class=\"boldred\">!</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>59</td>\n",
    "<td class=\"boldred\">;</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>85</td>\n",
    "<td class=\"boldred\">U</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>111</td>\n",
    "<td class=\"boldred\">o</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>8</td>\n",
    "<td class=\"greycell\">BS</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>34</td>\n",
    "<td class=\"boldred\">\"</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>60</td>\n",
    "<td class=\"boldred\">&lt;</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>86</td>\n",
    "<td class=\"boldred\">V</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>112</td>\n",
    "<td class=\"boldred\">p</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>9</td>\n",
    "<td class=\"greycell\">HT</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>35</td>\n",
    "<td class=\"boldred\">#</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>61</td>\n",
    "<td class=\"boldred\">=</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>87</td>\n",
    "<td class=\"boldred\">W</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>113</td>\n",
    "<td class=\"boldred\">q</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>10</td>\n",
    "<td class=\"greycell\">LF</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>36</td>\n",
    "<td class=\"boldred\">$</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>62</td>\n",
    "<td class=\"boldred\">&gt;</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>88</td>\n",
    "<td class=\"boldred\">X</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>114</td>\n",
    "<td class=\"boldred\">r</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>11</td>\n",
    "<td class=\"greycell\">VT</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>37</td>\n",
    "<td class=\"boldred\">%</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>63</td>\n",
    "<td class=\"boldred\">?</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>89</td>\n",
    "<td class=\"boldred\">Y</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>115</td>\n",
    "<td class=\"boldred\">s</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>12</td>\n",
    "<td class=\"greycell\">FF</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>38</td>\n",
    "<td class=\"boldred\">&amp;</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>64</td>\n",
    "<td class=\"boldred\">@</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>90</td>\n",
    "<td class=\"boldred\">Z</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>116</td>\n",
    "<td class=\"boldred\">t</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>13</td>\n",
    "<td class=\"greycell\">CR</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>39</td>\n",
    "<td class=\"boldred\">'</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>65</td>\n",
    "<td class=\"boldred\">A</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>91</td>\n",
    "<td class=\"boldred\">[</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>117</td>\n",
    "<td class=\"boldred\">u</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>14</td>\n",
    "<td class=\"greycell\">SO</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>40</td>\n",
    "<td class=\"boldred\">(</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>66</td>\n",
    "<td class=\"boldred\">B</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>92</td>\n",
    "<td class=\"boldred\">\\</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>118</td>\n",
    "<td class=\"boldred\">v</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>15</td>\n",
    "<td class=\"greycell\">SI</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>41</td>\n",
    "<td class=\"boldred\">)</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>67</td>\n",
    "<td class=\"boldred\">C</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>93</td>\n",
    "<td class=\"boldred\">]</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>119</td>\n",
    "<td class=\"boldred\">w</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>16</td>\n",
    "<td class=\"greycell\">DLE</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>42</td>\n",
    "<td class=\"boldred\">*</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>68</td>\n",
    "<td class=\"boldred\">D</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>94</td>\n",
    "<td class=\"boldred\">^</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>120</td>\n",
    "<td class=\"boldred\">x</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>17</td>\n",
    "<td class=\"greycell\">DC1</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>43</td>\n",
    "<td class=\"boldred\">+</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>69</td>\n",
    "<td class=\"boldred\">E</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>95</td>\n",
    "<td class=\"boldred\">_</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>121</td>\n",
    "<td class=\"boldred\">y</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>18</td>\n",
    "<td class=\"greycell\">DC2</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>44</td>\n",
    "<td class=\"boldred\">,</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>70</td>\n",
    "<td class=\"boldred\">F</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>96</td>\n",
    "<td class=\"boldred\">`</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>122</td>\n",
    "<td class=\"boldred\">z</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>19</td>\n",
    "<td class=\"greycell\">DC3</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>45</td>\n",
    "<td class=\"boldred\">-</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>71</td>\n",
    "<td class=\"boldred\">G</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>97</td>\n",
    "<td class=\"boldred\">a</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>123</td>\n",
    "<td class=\"boldred\">{</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>20</td>\n",
    "<td class=\"greycell\">DC4</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>46</td>\n",
    "<td class=\"boldred\">.</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>72</td>\n",
    "<td class=\"boldred\">H</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>98</td>\n",
    "<td class=\"boldred\">b</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>124</td>\n",
    "<td class=\"boldred\">|</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>21</td>\n",
    "<td class=\"greycell\">NAK</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>47</td>\n",
    "<td class=\"boldred\">/</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>73</td>\n",
    "<td class=\"boldred\">I</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>99</td>\n",
    "<td class=\"boldred\">c</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>125</td>\n",
    "<td class=\"boldred\">}</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>22</td>\n",
    "<td class=\"greycell\">SYN</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>48</td>\n",
    "<td class=\"boldred\">0</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>74</td>\n",
    "<td class=\"boldred\">J</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>100</td>\n",
    "<td class=\"boldred\">d</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>126</td>\n",
    "<td class=\"boldred\">~</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>23</td>\n",
    "<td class=\"greycell\">ETB</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>49</td>\n",
    "<td class=\"boldred\">1</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>75</td>\n",
    "<td class=\"boldred\">K</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>101</td>\n",
    "<td class=\"boldred\">e</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>127</td>\n",
    "<td class=\"boldred\">DEL</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>24</td>\n",
    "<td class=\"greycell\">CAN</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>50</td>\n",
    "<td class=\"boldred\">2</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>76</td>\n",
    "<td class=\"boldred\">L</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>102</td>\n",
    "<td class=\"boldred\">f</td>\n",
    "<td></td>\n",
    "<td></td>\n",
    "<td></td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "<td>25</td>\n",
    "<td class=\"greycell\">EM</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>51</td>\n",
    "<td class=\"boldred\">3</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>77</td>\n",
    "<td class=\"boldred\">M</td>\n",
    "<td class=\"borderright\"></td>\n",
    "<td>103</td>\n",
    "<td class=\"boldred\">g</td>\n",
    "<td></td>\n",
    "<td></td>\n",
    "<td></td>\n",
    "</tr>\n",
    "\n",
    "</tbody>\n",
    "\n",
    "\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a2cd659",
   "metadata": {},
   "source": [
    "So we can use the ASCII code table to compare strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "b754d462",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All right, banana.\n"
     ]
    }
   ],
   "source": [
    "word = 'banana'\n",
    "\n",
    "if word == 'banana':\n",
    "    print('All right, banana.')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9be6097",
   "metadata": {},
   "source": [
    "Other relational operations are useful for putting words in alphabetical\n",
    "order:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "44374eb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compare_word(word):\n",
    "    if word < 'banana':\n",
    "        print(word, 'comes before banana.')\n",
    "    elif word > 'banana':\n",
    "        print(word, 'comes after banana.')\n",
    "    else:\n",
    "        print('All right, banana.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "a46f7035",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apple comes before banana.\n"
     ]
    }
   ],
   "source": [
    "compare_word('apple')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b66f449a",
   "metadata": {},
   "source": [
    "Python does not handle uppercase and lowercase letters the same way\n",
    "people do. All the uppercase letters come before all the lowercase\n",
    "letters, so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "a691f9e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pineapple comes before banana.\n"
     ]
    }
   ],
   "source": [
    "compare_word('Pineapple')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9b916c9",
   "metadata": {},
   "source": [
    "This can be problematic sometimes. To solve this problem, we can convert strings to a standard format, such as all lowercase or all uppercase, before performing the comparison. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "d4355b93",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pineapple comes after banana.\n"
     ]
    }
   ],
   "source": [
    "compare_word('Pineapple'.lower())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "27ca1148",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: String Comparison\n",
    "# Difficulty: Challenge\n",
    "words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']\n",
    "# 1. Print the list sorted as-is\n",
    "# 2. Print the list sorted case-insensitively\n",
    "# 3. Build and print a list of tuples: (word, word.casefold())\n",
    "# 4. Print whether \"Apple\" and \"apple\" are equal under casefold\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "7af7b824",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Apple', 'Banana', 'apple', 'banana', 'cherry']\n",
      "['Apple', 'apple', 'banana', 'Banana', 'cherry']\n",
      "[('Apple', 'apple'), ('apple', 'apple'), ('banana', 'banana'), ('Banana', 'banana'), ('cherry', 'cherry')]\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']\n",
    "print(sorted(words))\n",
    "print(sorted(words, key=str.casefold))\n",
    "pairs = [(w, w.casefold()) for w in words]\n",
    "print(pairs)\n",
    "print('Apple'.casefold() == 'apple'.casefold())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30b43901",
   "metadata": {},
   "source": [
    "## Looping and Sorting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86b09411",
   "metadata": {},
   "source": [
    "### Looping Through String Lists\n",
    "\n",
    "You can use a `for` statement to loop through the elements of a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "id": "a0145b1d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apple\n",
      "banana\n",
      "cherry\n"
     ]
    }
   ],
   "source": [
    "fruits = ['apple', 'banana', 'cherry']\n",
    "\n",
    "for fruit in fruits:\n",
    "    print(fruit)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32a3b5d4",
   "metadata": {},
   "source": [
    "`.split()` returns a list of words, we can use `for` to loop through them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "id": "b0d0d37b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We\n",
      "are\n",
      "programmed\n",
      "to\n",
      "receive\n"
     ]
    }
   ],
   "source": [
    "s = 'We are programmed to receive'  ### lyric from the Eagles' 1976 hit song \"Hotel California\".\n",
    "\n",
    "for word in s.split():\n",
    "    print(word)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07423363",
   "metadata": {},
   "source": [
    "Not that it's useful, but a `for` loop over an empty list never runs the indented statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "id": "ff8dd4bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "for x in []:\n",
    "    print('This never happens.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ad54a44",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Looping Through String Lists\n",
    "# Difficulty: Basic\n",
    "words = ['apple', 'Banana', 'cherry', 'Date', 'elderberry']\n",
    "# 1. Loop through the words and print each word in lowercase\n",
    "# 2. Create a new list containing only words that start with a vowel (a, e, i, o, u)\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "1e0a8da5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Numbers multiplied by 2:\n",
      "20\n",
      "40\n",
      "60\n",
      "80\n",
      "100\n",
      "\n",
      "Numbers > 25: [30, 40, 50]\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "words = ['apple', 'Banana', 'cherry', 'Date', 'elderberry']\n",
    "\n",
    "print(\"Words in lowercase:\")\n",
    "for word in words:\n",
    "    print(word.lower())\n",
    "\n",
    "vowels = ['a', 'e', 'i', 'o', 'u']\n",
    "starts_with_vowel = []\n",
    "for word in words:\n",
    "    if word[0].lower() in vowels:\n",
    "        starts_with_vowel.append(word)\n",
    "\n",
    "print(f\"\\nWords starting with a vowel: {starts_with_vowel}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24c73570",
   "metadata": {},
   "source": [
    "### Sorting String Lists\n",
    "\n",
    "Python provides a built-in function called `sorted` that sorts the elements of a list and the `.sort()` method that does similarly.\n",
    "\n",
    "- `sorted()`\n",
    "- `.join()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "39d8106e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['a', 'b', 'c']"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scramble = ['c', 'a', 'b']\n",
    "sorted(scramble)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c11a1493",
   "metadata": {},
   "source": [
    "The original list is unchanged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "c5d3412c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['c', 'a', 'b']"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scramble"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "909b3603",
   "metadata": {},
   "source": [
    "`sorted` works with any kind of sequence, not just strings or lists. So we can sort the letters in a string like this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "id": "ac2bce0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['e', 'e', 'l', 'r', 's', 't', 't']"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted('letters')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40f50da5",
   "metadata": {},
   "source": [
    "The result is a **list**. To convert the list to a string, we can use `join`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "id": "61192a2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "letters = ''.join(sorted('letters'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ee43856",
   "metadata": {},
   "source": [
    "With an empty string as the delimiter, the elements of the list are joined with nothing between them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91b9571a",
   "metadata": {},
   "source": [
    "In lists, you have a `.sort()` method, which is not available in strings; it is list only."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "be42f9cd",
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'str' object has no attribute 'sort'",
     "output_type": "error",
     "traceback": [
      "\u001b[31mAttributeError\u001b[39m\u001b[31m:\u001b[39m 'str' object has no attribute 'sort'\n"
     ]
    }
   ],
   "source": [
    "%%expect AttributeError\n",
    "\n",
    "letters.sort()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "6b6cea8d",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Sorting Lists\n",
    "# Difficulty: Intermediate\n",
    "scores = [85, 92, 78, 90, 88]\n",
    "names = ['Charlie', 'Alice', 'Bob']\n",
    "# 1. Sort the scores in descending order (highest first)\n",
    "# 2. Sort the names alphabetically and join them with commas\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "93c692ed",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Scores (descending): [92, 90, 88, 85, 78]\n",
      "Names (alphabetically): Alice, Bob, Charlie\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "scores = [85, 92, 78, 90, 88]\n",
    "names = ['Charlie', 'Alice', 'Bob']\n",
    "\n",
    "sorted_scores = sorted(scores, reverse=True)\n",
    "sorted_names = sorted(names)\n",
    "names_joined = \", \".join(sorted_names)\n",
    "\n",
    "print(f\"Scores (descending): {sorted_scores}\")\n",
    "print(f\"Names (alphabetically): {names_joined}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e796b6b",
   "metadata": {},
   "source": [
    "(section_docstring)=\n",
    "## Docstrings\n",
    "\n",
    "A **docstring** is a string at the beginning of a function that explains the interface (\"doc\" is short for \"documentation\").\n",
    "Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "24c7ccd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def polyline(n, length, angle):\n",
    "    \"\"\"Draws line segments with the given length and angle between them.\n",
    "    \n",
    "    n: integer number of line segments\n",
    "    length: length of the line segments\n",
    "    angle: angle between segments (in degrees)\n",
    "    \"\"\"    \n",
    "    for i in range(n):\n",
    "        forward(length)\n",
    "        left(angle)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9608df4d",
   "metadata": {},
   "source": [
    "By convention, docstrings are triple-quoted strings, also known as **multiline strings** because the triple quotes allow the string to span more than one line.\n",
    "\n",
    "A docstring should:\n",
    "\n",
    "* Explain concisely what the function does, without getting into the details of how it works,\n",
    "\n",
    "* Explain what effect each parameter has on the behavior of the function, and\n",
    "\n",
    "* Indicate what type each parameter should be, if it is not obvious.\n",
    "\n",
    "Writing this kind of documentation is an important part of interface design.\n",
    "A well-designed interface should be simple to explain; if you have a hard time explaining one of your functions, maybe the interface could be improved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "id": "e437d5df",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Writing a Docstring\n",
    "# Difficulty: Challenge\n",
    "# Write a function called area_rectangle(width, height).\n",
    "# 1. Add type hints for parameters and return value.\n",
    "# 2. Include a docstring describing parameters, return value, and one raised error.\n",
    "# 3. Raise ValueError if width or height is negative.\n",
    "# 4. Call the function with (3, 4) and print the result.\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "id": "5f4cfedb",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "12\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "def area_rectangle(width: float, height: float) -> float:\n",
    "    \"\"\"Return the area of a rectangle.\n",
    "\n",
    "    width: non-negative numeric width of the rectangle\n",
    "    height: non-negative numeric height of the rectangle\n",
    "    returns: numeric area\n",
    "    raises: ValueError if width or height is negative\n",
    "    \"\"\"\n",
    "    if width < 0 or height < 0:\n",
    "        raise ValueError('width and height must be non-negative')\n",
    "    return width * height\n",
    "\n",
    "print(area_rectangle(3, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ede0460",
   "metadata": {},
   "source": [
    "\n",
    "## Application: Word List\n",
    "\n",
    "Let's apply what we've learned to a real-world task: building and searching a word list.\n",
    "\n",
    "In the previous chapter, we read the file `words.txt` and searched for words with certain properties, like using the letter `e`.\n",
    "But we read the entire file many times, which is not efficient.\n",
    "It is better to read the file once and put the words in a list.\n",
    "The following loop shows how."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "id": "afb8c3bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "words_file = project_root / 'data' / 'words.txt'\n",
    "if not words_file.exists():\n",
    "    download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/words.txt', words_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "id": "ec2e7239",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "113783"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_list = []\n",
    "\n",
    "for line in open(words_file, encoding='utf-8'):\n",
    "    word = line.strip()\n",
    "    word_list.append(word)\n",
    "    \n",
    "len(word_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "id": "01fe5d61",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['aa',\n",
       " 'aah',\n",
       " 'aahed',\n",
       " 'aahing',\n",
       " 'aahs',\n",
       " 'aal',\n",
       " 'aalii',\n",
       " 'aaliis',\n",
       " 'aals',\n",
       " 'aardvark']"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_list[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18ee706a",
   "metadata": {},
   "source": [
    "Before the loop, `word_list` is initialized with an empty list.\n",
    "Each time through the loop, the `append` method adds a word to the end.\n",
    "When the loop is done, there are more than 113,000 words in the list.\n",
    "\n",
    "Another way to do the same thing is to use `read` to read the entire file into a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "id": "d62cf70f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1016511"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string = words_file.read_text(encoding='utf-8')\n",
    "len(string)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46a8329f",
   "metadata": {},
   "source": [
    "The result is a single string with more than a million characters.\n",
    "We can use the `split` method to split it into a list of words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "8b06681f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "113783"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_list = string.split()\n",
    "len(word_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8cb6ae5",
   "metadata": {},
   "source": [
    "Evaluating the variable `word_list` in Jupyter Notebook will give you the whole list, which is very long, so let us use a for loop to take a look at the first 5 elements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "7013b629",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "aa\n",
      "aah\n",
      "aahed\n",
      "aahing\n",
      "aahs\n"
     ]
    }
   ],
   "source": [
    "for i in range(5):\n",
    "    print(word_list[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cd1c24c",
   "metadata": {},
   "source": [
    "Or just use slicing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "6278b792",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['aa', 'aah', 'aahed', 'aahing', 'aahs']"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_list[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fbcb9ca",
   "metadata": {},
   "source": [
    "And we always want to know the data type of our data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "9e8b69c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n"
     ]
    }
   ],
   "source": [
    "print(type(word_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d20c3fdb",
   "metadata": {},
   "source": [
    "Now, to check whether a string appears in the list, we can use the `in` operator.\n",
    "For example, `'demotic'` is in the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "id": "b67f325f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'demotic' in word_list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c82017fb",
   "metadata": {},
   "source": [
    "But `'contrafibularities'` is not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "id": "6334664a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'contrafibularities' in word_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "id": "3d62f066",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"supercalifragilisticexpialidocious\" in word_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "id": "c45d528b",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### EXERCISE: Word List Application\n",
    "# Difficulty: Challenge\n",
    "# Using word_list from this section:\n",
    "# 1. Print the first 3 words\n",
    "# 2. Count how many words start with \"a\"\n",
    "# 3. Print the average word length (rounded to 2 decimals)\n",
    "# 4. Find and print the longest word among the first 5000 words\n",
    "### Your code starts here:\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "8f67fc7d",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['aa', 'aah', 'aahed']\n",
      "6557\n",
      "7.93\n",
      "anticonservationist\n"
     ]
    }
   ],
   "source": [
    "# Solution\n",
    "print(word_list[:3])\n",
    "count_a = sum(1 for w in word_list if w.startswith('a'))\n",
    "print(count_a)\n",
    "avg_len = sum(len(w) for w in word_list) / len(word_list)\n",
    "print(round(avg_len, 2))\n",
    "longest_5k = max(word_list[:5000], key=len)\n",
    "print(longest_5k)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
