{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "83f5f63e",
   "metadata": {},
   "source": [
    "# Sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1b3db3f7",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "current = Path.cwd()\n",
    "for parent in [current, *current.parents]:\n",
    "    if (parent / '_config.yml').exists():\n",
    "        project_root = parent  # ← Add project root, not chapters\n",
    "        break\n",
    "else:\n",
    "    project_root = Path.cwd().parent.parent\n",
    "\n",
    "sys.path.insert(0, str(project_root))\n",
    "\n",
    "from shared import thinkpython, diagram, jupyturtle\n",
    "from shared.download import download\n",
    "\n",
    "# Register as top-level modules so direct imports work in subsequent cells\n",
    "sys.modules['thinkpython'] = thinkpython\n",
    "sys.modules['diagram'] = diagram\n",
    "sys.modules['jupyturtle'] = jupyturtle\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0eafbe4e",
   "metadata": {},
   "source": [
    "## What is a set\n",
    "\n",
    "A set is an **unordered** collection of unique values. Unlike lists or tuples, sets do not allow **duplicates**, and the elements have no guaranteed order. \n",
    "\n",
    "Here below is an example of a set. Note that it uses curly braces just like dictionaries. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ccf867f4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3}\n"
     ]
    }
   ],
   "source": [
    "s = {1, 2, 3, 3, 2}\n",
    "print(s)                ### {1, 2, 3}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aac530ad",
   "metadata": {},
   "source": [
    "Sets are useful when you care about **membership**: whether something is **in** a collection; rather than order or position.\n",
    "\n",
    "The most common use cases are:\n",
    "\n",
    "- Removing duplicates from a list\n",
    "- Checking membership quickly\n",
    "- Comparing two collections (what's shared, what's different)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "352d5069",
   "metadata": {},
   "source": [
    "For sets, the four key properties that make sets different from other data structures:\n",
    "\n",
    "1. **Unordered**: Elements have no guaranteed position. There is no first or last element.\n",
    "2. **Unique**: Duplicates are automatically removed.\n",
    "3. **Mutable**: You can add and remove elements.\n",
    "4. **Hashable elements only**: Every element must be hashable. In practice this means immutable types: `int`, `float`, `str`, `tuple`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "abf76b98",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 1. Sets are unordered, so the order of elements is not guaranteed.\n",
    "{3, 1, 2} == {1, 2, 3}  # True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0120dc41",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 3}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 2. Sets do not allow duplicate elements, so duplicates are automatically removed.\n",
    "{1, 2, 2, 3, 3}         # {1, 2, 3}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c6ebf262",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 3, 4}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 3. Sets are mutable, so you can add or remove elements after the set is created.\n",
    "s = {1, 2, 3}\n",
    "s.add(4)                # works\n",
    "s.remove(2)             # works\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0d3cb961",
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unhashable type: 'list'",
     "output_type": "error",
     "traceback": [
      "\u001b[31mTypeError\u001b[39m\u001b[31m:\u001b[39m unhashable type: 'list'\n"
     ]
    }
   ],
   "source": [
    "%%expect TypeError\n",
    "\n",
    "### 4. Sets can contain only hashable (immutable) elements, so you cannot include lists or other sets as elements.\n",
    "{1, \"hello\", (1, 2)}    # valid\n",
    "{[1, 2], [3, 4]}        # TypeError"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0fd5117",
   "metadata": {},
   "source": [
    "### Choosing the Right Structure\n",
    "\n",
    "| Need                                   | Use         |\n",
    "|----------------------------------------|-------------|\n",
    "| Ordered collection, duplicates allowed | `list`      |\n",
    "| Fixed data, multiple return values     | `tuple`     |\n",
    "| Key-value lookup                       | `dict`      |\n",
    "| Unique items, membership testing       | `set`       |\n",
    "| Hashable set (dict key, nested set)    | `frozenset` |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a35f0db4",
   "metadata": {},
   "source": [
    "### Hashable vs mutable\n",
    "\n",
    "We have seen that some data types are mutable and some are hashable. It's now time to learn a little more about the. \n",
    "\n",
    "- **Hashable** means an object has a hash value that stays stable during its lifetime.\n",
    "- **Mutable** objects usually are **not hashable** (for example: `list`, `dict`, `set`).\n",
    "- **Immutable** objects are often hashable (for example: `int`, `float`, `str`, `tuple`), but for tuples, all nested elements must also be hashable.\n",
    "- For hash-table types: `dict` requires **hashable keys**, and `set` requires **hashable elements**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40b211f3",
   "metadata": {},
   "source": [
    "#### Mutability"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb38a8c3",
   "metadata": {},
   "source": [
    "For the property of mutability, let's have a look over the general data structures. For in-place change methods for the common data structures, here is a comparison table. Tuple and strings are immutable, so there's no methods available for the purpose.\n",
    "\n",
    "| Method                              | List | Dict | Set | Tuple | String |\n",
    "|-------------------------------------|------|------|-----|-------|--------|\n",
    "| `.append(x)`                        | ✓    |      |     |       |        |\n",
    "| `.insert(i, x)`                     | ✓    |      |     |       |        |\n",
    "| `.extend(iter)`                     | ✓    |      |     |       |        |\n",
    "| `.update(other)`                    |      | ✓    | ✓   |       |        |\n",
    "| `.remove(x)`                        | ✓    |      | ✓   |       |        |\n",
    "| `.pop()`                            | ✓    | ✓    | ✓   |       |        |\n",
    "| `.popitem()`                        |      | ✓    |     |       |        |\n",
    "| `.add(x)`                           |      |      | ✓   |       |        |\n",
    "| `.discard(x)`                       |      |      | ✓   |       |        |\n",
    "| `.sort()`                           | ✓    |      |     |       |        |\n",
    "| `.reverse()`                        | ✓    |      |     |       |        |\n",
    "| `.setdefault(k, v)`                 |      | ✓    |     |       |        |\n",
    "| `.clear()`                          | ✓    | ✓    | ✓   |       |        |\n",
    "| `.intersection_update(t)`           |      |      | ✓   |       |        |\n",
    "| `.difference_update(t)`             |      |      | ✓   |       |        |\n",
    "| `.symmetric_difference_update(t)`   |      |      | ✓   |       |        |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "030ea21c",
   "metadata": {},
   "source": [
    "### In-place set operations\n",
    "\n",
    "In Python, mutable methods that modify in-place typically return `None`. The general rule in Python is:\n",
    "\n",
    "- Mutates in-place → returns `None`\n",
    "- Returns a new object → original unchanged\n",
    "  \n",
    "These methods modify the original set instead of creating a new one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "86f3ec74",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "update: {1, 2, 3, 4, 5}\n",
      "intersection_update: {3}\n",
      "difference_update: {1, 2}\n",
      "symmetric_difference_update: {1, 2, 4, 5}\n"
     ]
    }
   ],
   "source": [
    "x = {1, 2, 3}\n",
    "y = {3, 4, 5}\n",
    "\n",
    "x.update(y)                        # union in-place\n",
    "print(\"update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.intersection_update(y)\n",
    "print(\"intersection_update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.difference_update(y)\n",
    "print(\"difference_update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.symmetric_difference_update(y)\n",
    "print(\"symmetric_difference_update:\", x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7e49497",
   "metadata": {},
   "source": [
    "#### Hashability\n",
    "\n",
    "Let us compare **set**, **list**, and **dict** and this time focus on hashability. \n",
    "\n",
    "\n",
    "| Feature      | Set                    | List          | Dict                          |\n",
    "|--------------|------------------------|---------------|-------------------------------|\n",
    "| Ordered      | **No**                     | Yes           | Yes (insertion order)         |\n",
    "| Duplicates   | **No**                     | Yes           | Keys: No, Values: Yes         |\n",
    "| Mutable      | **Yes**                    | Yes           | Yes                           |\n",
    "| Allows only **hashable** elements | Yes   | No            | Keys only                     |\n",
    "| Lookup speed | Avg O(1)               | O(n)          | Avg O(1)                      |\n",
    "| Use case     | Unique items, membership | Ordered items | Key-value pairs               |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17099ee3",
   "metadata": {},
   "source": [
    "In computer science, hashable means that an object can be converted into a fixed-size integer (called a hash value or hash code) via a hash function. This hash value is used to quickly locate the object in data structures like hash tables, dictionaries, and sets. For something to be hashable, it generally needs to satisfy two properties:\n",
    "\n",
    "- **Consistency**: the hash value must not change during the object's lifetime. This is why mutable objects (like Python lists) are typically not hashable because if their contents change, their hash would need to change too, which breaks lookups.\n",
    "- **Equality consistency**: if two objects are considered equal (`a == b`), they must have the same hash value. The reverse isn't required; two different objects can share a hash (called a collision), but equal objects must hash identically.\n",
    "\n",
    "Hash-based data structures like hash maps and hash sets rely on hash values to place and find items in O(1) average time. If you want to use an object as a dictionary key or store it in a set, it needs to be hashable.\n",
    "\n",
    "In summary, **hashable** means you can be consistently fingerprinted with a number, which is the prerequisite for being usable as a key in fast lookup structures."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630069ed",
   "metadata": {},
   "source": [
    "```text\n",
    "key     →  hash function    →  index  →  bucket in array\n",
    "\"cat\"   →  hash(\"cat\")      →  42     →  array[42]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0f675460",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8096599184176479363\n",
      "6680279693667265536\n",
      "8096599184176479363\n"
     ]
    }
   ],
   "source": [
    "print(hash(\"cat\"))           ### -9223372036854775808\n",
    "print(hash(\"dog\"))           ### -867955804616931492\n",
    "print(hash(\"cat\"))           ### -9223372036854775808"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98e2455a",
   "metadata": {},
   "source": [
    "## Creating sets\n",
    "\n",
    "A set is created using curly braces `{}` with comma-separated values, or by using the `set()` constructor. \n",
    "\n",
    "Unlike lists, sets are unordered and do not allow duplicates - if you add the same value twice, it only appears once. This makes sets useful for storing **unique** items. One important gotcha: an empty {} creates a dict, not a set - you must use set() for an empty set.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "41d69110",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3}\n",
      "{1, 2, 3}\n"
     ]
    }
   ],
   "source": [
    "s = {1, 2, 3, 3}       # duplicate 3 is ignored\n",
    "print(s)               # {1, 2, 3}\n",
    "\n",
    "s = set([1, 2, 2, 3])  # from a list\n",
    "print(s)               # {1, 2, 3}\n",
    "\n",
    "s = set()              # empty set\n",
    "d = {}                 # this is a dict, not a set!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54e54710",
   "metadata": {},
   "source": [
    "`set` elements must be **hashable**. Since sets use **hashing** internally to enforce uniqueness and enable fast lookup, all elements must be hashable. This means you can store integers, floats, strings, and tuples in a set, but not lists or dicts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f488cd39",
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unhashable type: 'list'",
     "output_type": "error",
     "traceback": [
      "\u001b[31mTypeError\u001b[39m\u001b[31m:\u001b[39m unhashable type: 'list'\n"
     ]
    }
   ],
   "source": [
    "%%expect TypeError\n",
    "\n",
    "# Valid\n",
    "s = {1, \"hello\", (1, 2)}\n",
    "\n",
    "# Invalid\n",
    "s = {[1, 2], [3, 4]}    # TypeError: unhashable type: 'list'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17f4f5e2",
   "metadata": {},
   "source": [
    "Similarly, a `tuple` is hashable only if **all** of its elements are hashable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ab98e372",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-3550055125485641917\n",
      "7267574591690527098\n",
      "unhashable type: 'list'\n"
     ]
    }
   ],
   "source": [
    "print(hash((1, 2)))          # ok\n",
    "print(hash((1, (2, 3))))     # ok\n",
    "\n",
    "try:\n",
    "    hash((1, [2, 3]))        # list inside tuple => unhashable\n",
    "except TypeError as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b005023",
   "metadata": {},
   "source": [
    "### Accessing Elements\n",
    "\n",
    "Sets have no indexing or positional access. Because sets are unordered, there's no concept of \"first\" or \"second\" element, so index access makes no sense. \n",
    "\n",
    "Compared with list and dictionary, `dict` is the only one with a built-in safe getter. A `list` requires a **guard clause**. \n",
    "\n",
    "| | List | Dict | Set |\n",
    "|---|---|---|---|\n",
    "| **Access** | by index `l[i]` | by key `d[key]` | no direct access |\n",
    "| **Safe get** | `l[i] if i < len(l) else default` | `d.get(key, default)` | `val if val in s else default` |\n",
    "| **Built-in safe method** | none | `.get()` | none |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "cf593cd9",
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "list index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[31mIndexError\u001b[39m\u001b[31m:\u001b[39m list index out of range\n"
     ]
    }
   ],
   "source": [
    "%%expect IndexError\n",
    "\n",
    "d = {'x': 1, 'y': 2}\n",
    "d.get('z', 'not found')                 # 'not found' — no error\n",
    "\n",
    "l = [10, 20, 30]\n",
    "l[5]                                  # IndexError!   ### uncomment to see the error"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78699f53",
   "metadata": {},
   "source": [
    "For `list`, a safe way to handle a possible out-of-range error is to use a guard clause. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "4fa18963",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "l = [10, 20, 30]\n",
    "\n",
    "result = l[5] if 5 < len(l) else None   # None — safe\n",
    "print(result)                           # None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eae58e00",
   "metadata": {},
   "source": [
    "To work with elements in a set, you can:\n",
    "\n",
    "| Method | Example | Notes |\n",
    "|---|---|---|\n",
    "| **Membership check** | `3 in s` | True/False only, no error |\n",
    "| **Iterate** | `for x in s:` | no guaranteed order |\n",
    "| **Convert to list** | `list(s)[0]` | order not guaranteed |\n",
    "| **`s.pop()`** | `s.pop()` | returns an arbitrary element |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "dfd96d28",
   "metadata": {},
   "outputs": [],
   "source": [
    "### check membership\n",
    "\n",
    "if \"apple\" in s:\n",
    "    print(\"found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "4f9c90f9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "frozenset({3, 4})\n",
      "frozenset({1, 2})\n"
     ]
    }
   ],
   "source": [
    "### iterate over a set\n",
    "for element in s:\n",
    "    print(element)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "cb1e1a08",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[frozenset({3, 4}), frozenset({1, 2})]\n"
     ]
    }
   ],
   "source": [
    "### conversion to list\n",
    "l = list(s)\n",
    "print(l)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "288f4af5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### s.pop() removes and returns an arbitrary element from the set. If the set is empty, it raises a KeyError.\n",
    "s = {1, 2, 3}\n",
    "s.pop()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22161c76",
   "metadata": {},
   "source": [
    "## Set Operations \n",
    "\n",
    "Sets support mathematical operations that make it easy to compare two collections. Each operation has both an operator shorthand and a method form which produce the same result:\n",
    "\n",
    "| Operation            | Operator | Method                       | Description                                      |\n",
    "|----------------------|----------|------------------------------|--------------------------------------------------|\n",
    "| Union                | `a \\| b`  | `a.union(b)`                 | All elements from both sets                      |\n",
    "| Intersection         | `a & b`  | `a.intersection(b)`          | Only elements common to both sets                |\n",
    "| Difference           | `a - b`  | `a.difference(b)`            | Elements in `a` but not in `b`                   |\n",
    "| Symmetric Difference | `a ^ b`  | `a.symmetric_difference(b)`  | Elements in either set, but not in both          |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b9dae9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = {1, 2, 3}\n",
    "b = {3, 4, 5}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7ff8e79",
   "metadata": {},
   "source": [
    "Note that, **operator** forms (`|`, `&`, `-`, `^`) require **set operands**. Method forms are more flexible and can take any iterable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "b74eae75",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 4, 5}\n",
      "unsupported operand type(s) for |: 'set' and 'list'\n"
     ]
    }
   ],
   "source": [
    "print(a.union([3, 4, 5]))    # works: method accepts iterable\n",
    "\n",
    "try:\n",
    "    a | [3, 4, 5]            # TypeError: right side is a list\n",
    "except TypeError as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27d1077e",
   "metadata": {},
   "source": [
    "### Union \n",
    "\n",
    "All elements from both sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "aeafeb88",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 3, 4, 5}"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a | b        # {1, 2, 3, 4, 5}\n",
    "a.union(b)   # same"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "083b51d6",
   "metadata": {},
   "source": [
    "### Intersection\n",
    "\n",
    "Only elements in both sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "aef3bac1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{3}"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a & b              # {3}\n",
    "a.intersection(b)  # same"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ebb441c",
   "metadata": {},
   "source": [
    "### Difference\n",
    "\n",
    "Elements in a but not in b\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "74da1325",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2}"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a - b              # {1, 2}\n",
    "a.difference(b)    # same"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f538100",
   "metadata": {},
   "source": [
    "### Symmetric Difference\n",
    "\n",
    "Elements in either set, but not both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "64af9751",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 4, 5}"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a ^ b                       # {1, 2, 4, 5}\n",
    "a.symmetric_difference(b)   # same"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2a1f2d8",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Basic Set Operations\n",
    "# 1. Create A = {1, 2, 3, 4} and B = {3, 4, 5, 6}.\n",
    "# 2. Print A | B, A & B, A - B, and A ^ B.\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f17f5d63",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "A = {1, 2, 3, 4}\n",
    "B = {3, 4, 5, 6}\n",
    "\n",
    "print(\"Union:\", A | B)\n",
    "print(\"Intersection:\", A & B)\n",
    "print(\"Difference A-B:\", A - B)\n",
    "print(\"Symmetric Difference:\", A ^ B)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19982422",
   "metadata": {},
   "source": [
    "## Set Methods\n",
    "\n",
    "Python sets have methods for adding, removing, and checking elements.\n",
    "\n",
    "| Method             | Description                                   |\n",
    "|--------------------|-----------------------------------------------|\n",
    "| `.add(x)`          | Adds a single element                         |\n",
    "| `.update(iterable)`| Adds multiple elements                        |\n",
    "| `.copy()`       | Returns a shallow copy of the set  |\n",
    "| `.remove(x)`       | Removes element, raises `KeyError` if missing |\n",
    "| `.discard(x)`      | Removes element, silent if missing            |\n",
    "| `.pop()`           | Removes and returns an arbitrary element      |\n",
    "| `.clear()`         | Empties the set                               |\n",
    "| `.issubset(s)`     | Returns `True` if all elements are in `s`     |\n",
    "| `.issuperset(s)`   | Returns `True` if set contains all of `s`     |\n",
    "| `.isdisjoint(s)`   | Returns `True` if no elements in common       |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02f8779f",
   "metadata": {},
   "source": [
    "### Adding elements:\n",
    "\n",
    "- `add()`\n",
    "- `update()`: adds multiple elements, similar to `dict.update()`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "5c7aafe4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 3, 4, 5, 6}"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = {1, 2, 3}\n",
    "s.add(4)            # {1, 2, 3, 4}\n",
    "s.update([5, 6])    # {1, 2, 3, 4, 5, 6}\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4898cd7e",
   "metadata": {},
   "source": [
    "### Copy vs aliasing\n",
    "\n",
    "Sets are mutable, so assignment shares the same object. Use `.copy()` when you need an independent set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "64de2dc2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a: {1, 2, 3}\n",
      "b: {1, 2, 3}\n",
      "c: {1, 2}\n"
     ]
    }
   ],
   "source": [
    "a = {1, 2}\n",
    "b = a               # alias (same object)\n",
    "c = a.copy()        # separate object\n",
    "\n",
    "a.add(3)\n",
    "print(\"a:\", a)\n",
    "print(\"b:\", b)\n",
    "print(\"c:\", c)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a7e4012",
   "metadata": {},
   "source": [
    "### Removing elements:\n",
    "\n",
    "- `remove()`\n",
    "- `discard()`\n",
    "- `pop()`\n",
    "- `clear()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "id": "32830bae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2}\n",
      "{1, 2}\n",
      "{2}\n",
      "set()\n"
     ]
    }
   ],
   "source": [
    "s = {1, 2, 3}\n",
    "\n",
    "s.remove(3)     # raises KeyError if not found\n",
    "print(s)        # {1, 2}\n",
    "\n",
    "s.discard(3)    # silent if not found — safer\n",
    "print(s)        # {1, 2}\n",
    "\n",
    "s.pop()         # removes and returns an arbitrary element\n",
    "print(s)        # which item is removed is arbitrary\n",
    "\n",
    "s.clear()       # empties the set\n",
    "print(s)        # set()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43cdb10f",
   "metadata": {},
   "source": [
    "If you use `remove()` in a `for` loop for a set, you may run into an error because Python creates an internal iterator that tracks position inside the set. The moment you call `s.remove()`, the set's structure changes and the iterator breaks and raises an error.\n",
    "\n",
    "The solution to this problem is to iterate over a copy or just use a `while` loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "id": "8712a534",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing 1\n"
     ]
    },
    {
     "ename": "RuntimeError",
     "evalue": "Set changed size during iteration",
     "output_type": "error",
     "traceback": [
      "\u001b[31mRuntimeError\u001b[39m\u001b[31m:\u001b[39m Set changed size during iteration\n"
     ]
    }
   ],
   "source": [
    "%%expect RuntimeError\n",
    "\n",
    "s = {1, 2, 3,}\n",
    "for item in s:\n",
    "    s.remove(item)\n",
    "    print(f\"Processing {item}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "id": "074240f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing 2\n",
      "Processing 3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "set()"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### iterate through copy of the set while modifying the original\n",
    "\n",
    "for item in s.copy():   # iterate over copy\n",
    "    s.remove(item)      # modify original safely\n",
    "    print(f\"Processing {item}\")\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "id": "ce404de4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing 1\n",
      "Processing 2\n",
      "Processing 3\n"
     ]
    }
   ],
   "source": [
    "# process all items, consuming the set\n",
    "\n",
    "s = {1, 2, 3}\n",
    "while s:\n",
    "    item = s.pop()\n",
    "    print(f\"Processing {item}\")\n",
    "# s is now empty"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8929f271",
   "metadata": {},
   "source": [
    "`s.pop()` is useful when you want to process and consume a set one element at a time, and you don't care which element you get. This behavior is different from `pop()` in `list` and `dict`.\n",
    "\n",
    "| | `list.pop()` | `dict.pop()` | `set.pop()` |\n",
    "|---|---|---|---|\n",
    "| **Removes** | by index | by key | arbitrary element |\n",
    "| **Returns** | removed value | removed value | removed value |\n",
    "| **Default arg** | index (default `-1`) | key + optional default | none |\n",
    "| **If not found** | `IndexError` | `KeyError` (or default) | `KeyError` |\n",
    "| **Predictable?** | yes (by index) | yes (by key) | no (arbitrary item) |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5e935c7",
   "metadata": {},
   "source": [
    "### Checking membership:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "d26aa7e7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "s = {1, 2, 3}\n",
    "\n",
    "print(3 in s)         # True\n",
    "print(3 not in s)     # False"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f97d7ed3",
   "metadata": {},
   "source": [
    "### Subset & Superset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "id": "fa4c9d3d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = {1, 2}\n",
    "b = {1, 2, 3}\n",
    "\n",
    "a.issubset(b)    # True — all of a is in b\n",
    "b.issuperset(a)  # True — b contains all of a\n",
    "a.isdisjoint(b)  # False — they share elements"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8576fdd",
   "metadata": {},
   "source": [
    "Operator equivalents for subset/superset are often used in real code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "id": "1165718f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "a = {1, 2}\n",
    "b = {1, 2, 3}\n",
    "\n",
    "print(a < b)    # proper subset\n",
    "print(a <= b)   # subset\n",
    "print(b > a)    # proper superset\n",
    "print(b >= a)   # superset\n",
    "print(a <= a)   # True (same set is subset of itself)\n",
    "print(a < a)    # False (not a proper subset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6619fd2a",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Update and Membership Check\n",
    "# 1. Start with fruits = {\"apple\", \"banana\"}.\n",
    "# 2. Add \"cherry\" and update with [\"banana\", \"mango\"].\n",
    "# 3. Remove \"apple\" safely.\n",
    "# 4. Print whether \"mango\" is in fruits.\n",
    "# 5. Print whether {\"banana\", \"cherry\"} is a subset of fruits.\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b8d7159",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "fruits = {\"apple\", \"banana\"}\n",
    "fruits.add(\"cherry\")\n",
    "fruits.update([\"banana\", \"mango\"])\n",
    "fruits.discard(\"apple\")\n",
    "\n",
    "print(\"mango\" in fruits)\n",
    "print({\"banana\", \"cherry\"}.issubset(fruits))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c22d640",
   "metadata": {},
   "source": [
    "## Frozensets\n",
    "\n",
    "A frozenset is an immutable version of a set. Once created, elements cannot be added or removed. This is useful when, for example, you need to protect data from modification, to use set as dict key or set element, or for graph edges and combinations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "id": "8e7afb6d",
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'frozenset' object has no attribute 'add'",
     "output_type": "error",
     "traceback": [
      "\u001b[31mAttributeError\u001b[39m\u001b[31m:\u001b[39m 'frozenset' object has no attribute 'add'\n"
     ]
    }
   ],
   "source": [
    "%%expect AttributeError\n",
    "\n",
    "fs = frozenset({1, 2, 3})\n",
    "fs.add(4)     # AttributeError — frozensets are immutable"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56fa6e87",
   "metadata": {},
   "source": [
    "### Creating a frozenset\n",
    "\n",
    "The syntax for creating a frozenset is:\n",
    "\n",
    "```python\n",
    "frozenset()            # empty frozenset\n",
    "frozenset(iterable)    # from any iterable (list, set, string, etc.)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "id": "9e545a44",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "frozenset({1, 2, 3})\n",
      "frozenset({'o', 'h', 'e', 'l'})\n"
     ]
    }
   ],
   "source": [
    "fs = frozenset([1, 2, 3])\n",
    "print(fs)                   # frozenset({1, 2, 3})\n",
    "fs = frozenset(\"hello\")\n",
    "print(fs)                   # element order may vary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "765dd1ec",
   "metadata": {},
   "source": [
    "### Why use a frozenset?\n",
    "\n",
    "Because it is immutable, a frozenset is hashable, which means it can be: \n",
    "\n",
    "- used as a dictionary key, or \n",
    "- stored inside another set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "c1f80041",
   "metadata": {},
   "outputs": [],
   "source": [
    "# As a dict key\n",
    "d = {frozenset({1, 2}): \"pair\"}\n",
    "\n",
    "# Inside a set\n",
    "s = {frozenset({1, 2}), frozenset({3, 4})}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f91f361e",
   "metadata": {},
   "source": [
    "| Feature          | set               | frozenset              |\n",
    "|------------------|-------------------|------------------------|\n",
    "| Mutable          | Yes               | No                     |\n",
    "| Hashable         | No                | Yes                    |\n",
    "| Can be dict key  | No                | Yes                    |\n",
    "| Set operations   | All               | Non-mutating only      |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ebc4e8a",
   "metadata": {},
   "source": [
    "## Set Performance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0350b947",
   "metadata": {},
   "source": [
    "When you store a key in a dictionary or an element in a set, Python calls hash() on it to compute an integer, then uses that integer to determine where to store the value in memory. This is called a **hash table**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b918d6d",
   "metadata": {},
   "source": [
    "The benefit is that lookups are usually O(1) on average: Python jumps directly to a bucket via hash instead of scanning the whole collection. In the worst case (many collisions), lookup can degrade toward O(n)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "30103e4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# O(n) — scans the entire list\n",
    "\"apple\" in [\"apple\", \"banana\", \"cherry\"]\n",
    "\n",
    "# Average O(1) — jumps directly via hash bucket\n",
    "\"apple\" in {\"apple\", \"banana\", \"cherry\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeb39265",
   "metadata": {},
   "source": [
    "(set-bigo)=\n",
    "**O(n)** and **O(1)** come from **Big O notation**, which describes how the **speed** of an operation scales with the **size** of the data. The full treatment of Big O comes in the Algorithms chapter, where sorting and searching will be covered.\n",
    "\n",
    "For now, just know that:\n",
    "- **Average O(1)**: hash-table lookups in `dict`/`set` are constant time on average, because Python uses the hash to jump to a bucket.\n",
    "- **Worst-case O(n)**: if many keys collide into the same bucket, Python may need to scan multiple entries.\n",
    "- **O(n)**: **linear time**. The operation gets slower as the collection grows. A list search is O(n) because Python checks each element one by one until it finds a match:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "856d0605",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = {\"apple\": 1, \"banana\": 2, \"cherry\": 3}\n",
    "d[\"apple\"]   # same speed whether dict has 3 or 3 million items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "7baa8ed3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"apple\" in [\"apple\", \"banana\", \"cherry\"]  # checks up to n items"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec48aca5",
   "metadata": {},
   "source": [
    "In short, \n",
    "\n",
    "- **O(1)** is like looking up a word in a dictionary: you go directly to the page\n",
    "- **O(n)** is like finding a word in a novel: you read until you find it"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e512f70",
   "metadata": {},
   "source": [
    "You already saw this in the **dictionary** section, and the same idea applies here:\n",
    "\n",
    "- **O(1)** means near-constant lookup time on average (`key in dict`, `item in set`).\n",
    "- **O(n)** means linear time, where Python may scan through many elements (`item in list`).\n",
    "\n",
    "So for membership checks, sets are usually much faster than lists as data grows."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b8792d1",
   "metadata": {},
   "source": [
    "### Performance Demo\n",
    "\n",
    "This benchmark mirrors the dictionary example: same data size, worst-case target (`n - 1`), and timing for membership checks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "id": "9a5bb8e6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List lookup (O(n)): 0.004426 s\n",
      "Set lookup (avg O(1)): 0.000021 s\n",
      "Set was ~206x faster\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "n = 1_000_000\n",
    "big_list = list(range(n))\n",
    "big_set = set(range(n))\n",
    "target = n - 1                  # worst case for linear list search\n",
    "\n",
    "# O(n): list search scans until it finds target\n",
    "start = time.perf_counter()     # time.perf_counter() is better for benchmarking code\n",
    "result = target in big_list\n",
    "list_time = time.perf_counter() - start\n",
    "\n",
    "# Average O(1): set lookup jumps to hash bucket\n",
    "start = time.perf_counter()\n",
    "result = target in big_set\n",
    "set_time = time.perf_counter() - start\n",
    "\n",
    "print(f\"List lookup (O(n)): {list_time:.6f} s\")\n",
    "print(f\"Set lookup (avg O(1)): {set_time:.6f} s\")\n",
    "print(f\"Set was ~{list_time / set_time:.0f}x faster\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df369e30",
   "metadata": {},
   "source": [
    "## Set Comprehensions\n",
    "\n",
    "A **set comprehension** builds a set from an expression, using the same syntax as a list comprehension but with curly braces `{}`. Because the result is a set, duplicates are automatically discarded.\n",
    "\n",
    "The syntax for creating sets is:\n",
    "\n",
    "```python\n",
    "{expression for item in iterable}\n",
    "{expression for item in iterable if condition}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "d2c4b705",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Square set: {0, 9, 4, 1}\n",
      "Unique word lengths: {3, 4, 5}\n",
      "Vowels found: {'o', 'e'}\n"
     ]
    }
   ],
   "source": [
    "# Create a set of squares\n",
    "square_set = {x**2 for x in range(-3, 4)}\n",
    "print(\"Square set:\", square_set)\n",
    "\n",
    "# Remove duplicates and transform\n",
    "sentence = \"the quick brown fox jumps over the lazy dog\"\n",
    "unique_lengths = {len(word) for word in sentence.split()}\n",
    "print(\"Unique word lengths:\", unique_lengths)\n",
    "\n",
    "# Filter vowels\n",
    "vowels = {char.lower() for char in \"Hello World\" if char.lower() in 'aeiou'}\n",
    "print(\"Vowels found:\", vowels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb832ca4",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Set Comprehension Practice\n",
    "# 1. From nums = [1, 2, 2, 3, 4, 4, 5], build a set of squared values.\n",
    "# 2. From text = \"apple banana apple cherry\", build a set of unique word lengths.\n",
    "# 3. Print both sets.\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f4a54d3",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "nums = [1, 2, 2, 3, 4, 4, 5]\n",
    "squares = {n**2 for n in nums}\n",
    "\n",
    "text = \"apple banana apple cherry\"\n",
    "lengths = {len(word) for word in text.split()}\n",
    "\n",
    "print(\"Squares:\", squares)\n",
    "print(\"Unique lengths:\", lengths)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14fc1ba9",
   "metadata": {},
   "source": [
    "### Comparing with `dict` Comprehension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "26bae0c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Set comprehension — curly braces, no colon\n",
    "{x**2 for x in range(5)}\n",
    "\n",
    "# Dict comprehension — curly braces WITH colon\n",
    "{x: x**2 for x in range(5)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7fc75f84",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}