13.2. Searching#
This section develops search methods from foundational to efficient, then evaluates them with correctness checks and timing evidence.
Learning Goals
Implement linear and binary search correctly
Explain when sorted order changes what is possible
Use hash-based lookup for repeated membership checks
Compare asymptotic cost with measured runtime
Select a search strategy from data and workload constraints
Section Roadmap
Establish correctness and edge-case behavior first.
Compare scaling behavior as input size and lookup count increase.
Use the decision guide to choose an approach for concrete scenarios.
from time import perf_counter # High-resolution timer for benchmark cells.
import random # Randomized sample generation for lookup demos.
13.2.1. Linear Search#
Linear search is the baseline strategy: scan from left to right until the target is found or the input is exhausted.
Because it does not assume any order, it works on both sorted and unsorted data. This universality is its key strength.
Case |
Complexity |
When it occurs |
|---|---|---|
Best |
\(O(1)\) |
Target is the first element |
Average |
\(O(n)\) |
Target is in the middle region |
Worst |
\(O(n)\) |
Target is the last element or absent |
Key properties
No preprocessing is required.
Runtime is proportional to the number of inspected elements.
It serves as a reference point when comparing faster methods.
*Note that we are using index here for return.
def linear_search(items, target):
"""Return index of target in items, or -1 if not found."""
for i, value in enumerate(items): # scan left-to-right until a match is found.
if value == target:
return i
return -1 # -1 signals that target is absent.
sample = [7, 4, 9, 1, 3] # sanity checks: one present and one absent target.
print(linear_search(sample, 9)) # 2
print(linear_search(sample, 8)) # -1
2
-1
How assert Works
assert <condition> raises an AssertionError immediately when <condition> is False, and does nothing when it is True.
It is the standard lightweight way to write self-checking tests directly inside a notebook.
assert expression # passes silently when True
assert expression, message # optional message is shown if it fails
Use assert after you implement a function to verify that specific inputs produce the expected output.
A passing assert confirms that case is correct; a failing one tells you exactly which expectation was violated.
# assert examples
assert 2 + 2 == 4 # passes silently
assert linear_search([7, 4, 9, 1, 3], 9) == 2 # found at index 2 -> passes
assert linear_search([7, 4, 9, 1, 3], 8) == -1 # absent -> passes
# The line below would raise AssertionError because the correct index is 2, not 0:
# assert linear_search([7, 4, 9, 1, 3], 9) == 0
print("All assertions passed.")
All assertions passed.
### Exercise: Find Minimum Value
# 1. Write `min_value(nums)` that scans through the list in a single pass.
# 2. Return `None` for an empty list.
# 3. Return the smallest value otherwise (do not use the built-in `min`).
### Your code starts here.
# Suggested structure:
# - guard clause for empty input
# - initialize current minimum from first element
# - update while scanning remaining elements
### Your code ends here.
### Verification tests for min_value
### uncomment the following lines to run the checks after you implement min_value.
# assert min_value([]) is None
# assert min_value([5]) == 5
# assert min_value([3, 1, 4, 1, 5]) == 1
# assert min_value([7, 7, 7]) == 7
# assert min_value([-3, -1, -4]) == -4
# print("min_value checks passed.")
min_value checks passed.
Next: If data can be kept sorted and repeated lookups are needed, binary search is usually the better choice.
13.2.2. Binary Search#
Binary search uses sorted order to remove half of the remaining candidates at each step.
The algorithm maintains an interval [left, right]. At each iteration, it checks the midpoint mid:
if
items[mid] == target, the search is completeif
items[mid] < target, discard the left half (left = mid + 1)if
items[mid] > target, discard the right half (right = mid - 1)
This repeated halving yields \(O(\log n)\) time. For large inputs, this is dramatically more efficient than a linear scan.
Why sorted input is required: midpoint comparisons (halvings) are only informative when elements are in nondecreasing order.
Index semantics: binary_search() returns an index into the sorted input passed to it, not into the original unsorted sequence.
Case |
Complexity |
|---|---|
Best |
\(O(1)\) — target is midpoint on first check |
Worst / Average |
\(O(\log n)\) (as developed in 10.1) |
13.2.2.1. Iterative Implementation#
Read the code below as a direct implementation of the interval idea:
Start with the full search interval (
left = 0,right = n-1).Recompute the midpoint each iteration.
Keep only the half that can still contain the target.
Stop when the interval becomes empty (
left > right).
What to observe while reading the code
The interval shrinks every iteration, guaranteeing termination.
Every update preserves correctness: no viable target position is discarded.
Returning
-1means the algorithm has proved absence, not just failed to guess.
def binary_search(sorted_items, target):
"""Return index of target in sorted_items, or -1 if not found.
Note: the returned index is a position in *sorted_items*, not the original
unsorted list. Use linear_search when you need the original-list position.
"""
left, right = 0, len(sorted_items) - 1
while left <= right:
mid = (left + right) // 2 # midpoint partitions remaining interval.
guess = sorted_items[mid]
if guess == target:
return mid
if guess < target:
left = mid + 1 # target can only be in the right half.
else:
right = mid - 1 # target can only be in the left half.
return -1 # Interval exhausted: target is absent.
sorted_sample = [1, 3, 4, 7, 9, 14, 20]
print(binary_search(sorted_sample, 14)) # 5
print(binary_search(sorted_sample, 2)) # -1
5
-1
13.2.2.2. Tracing Binary Search#
Trace binary_search([1, 3, 4, 7, 9, 14, 20], 9) step by step. The list has 7 elements (indices 0–6).
Iteration |
|
|
|
|
Action |
|---|---|---|---|---|---|
1 |
0 |
6 |
3 |
7 |
|
2 |
4 |
6 |
5 |
14 |
|
3 |
4 |
4 |
4 |
9 |
|
Three comparisons to locate 9 in a 7-element list. An exhaustive scan would require up to 7.
Exercise: Trace binary_search([1, 3, 4, 7, 9, 14, 20], 2) by hand. Fill in the table:
Iteration |
|
|
|
|
Action |
|---|---|---|---|---|---|
1 |
0 |
6 |
? |
? |
? |
2 |
? |
? |
? |
? |
? |
3 |
? |
? |
? |
? |
? |
What does the function return, and why?
(Check your answer: 2 is absent, so the interval eventually empties and the function returns -1.)
13.2.2.3. Recursive Binary Search#
(*optional topic)
A recursive implementation expresses the same halving logic with function calls instead of a loop.
Time complexity remains \(O(\log n)\), the same as the iterative version.
Extra space differs: recursion uses \(O(\log n)\) call-stack space, while iteration uses \(O(1)\).
In Python, iteration is usually preferred in production because recursive calls add overhead and are limited by recursion depth. This extension is primarily for conceptual comparison.
def binary_search_recursive(sorted_items, target, left=0, right=None):
if right is None:
right = len(sorted_items) - 1 # Initialize right boundary on first call.
if left > right:
return -1 # Base case: empty interval means target is absent.
mid = (left + right) // 2
guess = sorted_items[mid]
if guess == target:
return mid
if guess < target:
return binary_search_recursive(sorted_items, target, mid + 1, right) # Recur right.
return binary_search_recursive(sorted_items, target, left, mid - 1) # Recur left.
sample_sorted = [1, 3, 4, 7, 9, 14, 20]
print('iterative 14:', binary_search(sample_sorted, 14))
print('recursive 14:', binary_search_recursive(sample_sorted, 14))
print('iterative 2 :', binary_search(sample_sorted, 2))
print('recursive 2 :', binary_search_recursive(sample_sorted, 2))
iterative 14: 5
recursive 14: 5
iterative 2 : -1
recursive 2 : -1
How Recursive Functions Work
A recursive function calls itself with a smaller version of the same problem until it reaches a base case — a condition simple enough to answer directly without another call.
Every recursive function needs two parts:
Part |
Purpose |
Example above |
|---|---|---|
Base case |
Stops the recursion |
|
Recursive case |
Shrinks the problem and calls itself |
|
Each call is pushed onto the call stack. When the base case is reached, the stack unwinds and each call returns its result to the caller above it.
Tracing a simple example — countdown(3):
countdown(3) → calls countdown(2)
countdown(2) → calls countdown(1)
countdown(1) → calls countdown(0)
countdown(0) → base case: return ← stack unwinds from here
The depth of the stack equals the number of recursive calls made before the base case is hit. For binary search that depth is at most \(O(\log n)\).
# Simple recursive example: countdown
def countdown(n):
if n <= 0: # base case — stop here
print("Go!")
return
print(n)
countdown(n - 1) # recursive case — same problem, smaller n
countdown(5)
# ----- factorial: returns a value from the recursive call -----
def factorial(n):
if n == 0: # base case
return 1
return n * factorial(n - 1) # recursive case
print(factorial(5)) # 120 (5 * 4 * 3 * 2 * 1)
# ----- relate back to binary search -----
# binary_search_recursive(sorted_items, 9, left=0, right=6)
# mid=3, guess=7 < 9 → call with left=4, right=6
# mid=5, guess=14 > 9 → call with left=4, right=4
# mid=4, guess=9 == 9 → return 4 (base case: found!)
print(binary_search_recursive([1, 3, 4, 7, 9, 14, 20], 9)) # 4
5
4
3
2
1
Go!
120
4
### Exercise: Searching a Sorted List
# sorted_nums = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
# 1. Use binary_search() to find 23. Print the index.
# 2. Search for 100. What does it return and why?
# 3. The list has 10 elements. Print the maximum number of comparisons
# binary search would need. (Hint: import math; math.ceil(math.log2(n)))
### Your code starts here.
### Your code ends here.
5
-1
4
13.2.3. The bisect Module#
The bisect module, a Python implementation of binary search, provides production-ready tools for searching and insertion in sorted lists. The term “bisect” means “cut in half,” which is conceptually the same as “binary search.”
Function |
What it does |
|---|---|
|
Leftmost index where |
|
Rightmost insertion index (also available as |
|
Inserts |
import bisect
a = [1, 3, 5, 7, 9]
print(bisect.bisect_left(a, 5)) # returns 2 — the index where 5 is found
print(bisect.bisect_right(a, 5)) # returns 3 — the index where 5 would be inserted to maintain order
2
3
13.2.3.1. Why bisect#
The hand-written binary_search function from the previous section is a clean illustration, but bisect is almost always the better choice in practice for three reasons:
Correctness. Off-by-one errors in binary search are notoriously easy to introduce. The three most common mistakes are: using
left < rightinstead ofleft <= right(exits one step early),left = midinstead ofleft = mid + 1(infinite loop), andright = midinstead ofright = mid - 1(same). Each fails only on certain inputs, making the bug easy to miss. Thebisectmodule is part of the Python standard library and has been validated across many Python versions and platforms.Insertion point.
binary_searchreturns only an index or-1.bisect_leftandbisect_rightalways return a valid insertion point, which enables tasks beyond simple lookup: finding the nearest value, counting elements in a range, or inserting while preserving order. A-1sentinel cannot express these results. If the target is larger than all elements,bisect_leftreturns len(a),Built-in insertion support.
bisect.insortkeeps a list sorted as elements arrive, combining the position search and the insert in one call. Replicating this correctly with a hand-written implementation requires both a correct binary search and careful use oflist.insert.
When bisect is the right choice:
The data is already sorted (or can be kept sorted).
Lookups, insertions, or range queries on sorted data are needed.
When a custom implementation may be warranted:
The search logic differs from a simple equality check (e.g., searching a rotated array).
The data structure is not a plain list (e.g., a sorted linked list or a binary search tree).
The most important practical pattern is: locate with bisect_left, then verify equality. This prevents false positives when the target is absent and avoids re-implementing binary search logic.
13.2.3.2. Insertion with insort#
bisect.insort(a, x) keeps a list sorted as new elements arrive. Internally it calls bisect_right to find the correct position, then calls list.insert() to place the element there.
The two-step cost matters in practice:
Finding the position takes \(O(\log n)\) — binary search locates the slot.
Shifting elements to make room takes \(O(n)\) — every item after the insertion point moves one index to the right.
The overall cost per insort call is therefore \(O(n)\), dominated by the shift. This is acceptable when insertions are infrequent and the list must remain sorted at all times (an online scenario; meaning “one item at a time, future items unknown”). It becomes a bottleneck when adding many items at once: appending all items and calling list.sort() once costs \(O(n \log n)\) total, versus \(O(n^2)\) for \(n\) individual insort calls.
The code below illustrates all three ideas: the absent/present distinction between bisect_left and bisect_right, the insort operation, and the bisect_search idiom.
13.2.3.3. The bisect_search() Idiom#
The bisect_search function is the idiom to use whenever you need to check existence or find an index. It includes the bounds check and equality check. The bounds check must come first due to Python’s short-circuit evaluation: if the index is out of bounds, the second condition is never evaluated, avoiding an IndexError.
import bisect
data = [1, 3, 4, 7, 9, 14, 20]
print(f"bisect_left(5) → index {bisect.bisect_left(data, 5)}") # 3 — absent: insert at index 3 (between 4 and 7)
print(f"bisect_right(5) → index {bisect.bisect_right(data, 5)}") # 3 — absent: same index as bisect_left
print(f"bisect_left(7) → index {bisect.bisect_left(data, 7)}") # 3 — present: index of existing 7
print(f"bisect_right(7) → index {bisect.bisect_right(data, 7)}") # 4 — present: one past existing 7
bisect.insort(data, 5)
print(f"after insort(5): {data}") # 5 inserted at index 3
def bisect_search(sorted_items, target):
"""Return index of target in sorted_items, or -1 if not found."""
i = bisect.bisect_left(sorted_items, target)
if i < len(sorted_items) and sorted_items[i] == target: # sorted_items[i] is the first item >= target; check if it's a match
return i
return -1 # target not found
nums = [1, 3, 4, 7, 9, 14, 20]
print(bisect_search(nums, 9)) # 4
print(bisect_search(nums, 6)) # -1
# Edge case: target larger than all elements — bisect_left returns len(nums)
print(f"bisect_left(999) → index {bisect.bisect_left(nums, 999)}") # 7 — equals len(nums); out of bounds, which is why bisect_search checks i < len(sorted_items)
bisect_left(5) → index 3
bisect_right(5) → index 3
bisect_left(7) → index 3
bisect_right(7) → index 4
after insort(5): [1, 3, 4, 5, 7, 9, 14, 20]
4
-1
bisect_left(999) → index 7
When searching, the following questions may arise and bisect handles them all:
Does the target exist? — bisect_search idiom handles this
Where is the first occurrence? — bisect_left + verify
Where is the last occurrence? — bisect_right - 1 + verify
How many occurrences? — bisect_right(a, x) - bisect_left(a, x)
What is the range of indices? — combines the above two
This is a powerful pattern because bisect handles all of these without any custom loop.
import bisect
a = [1, 2, 2, 2, 3]
left = bisect.bisect_left(a, 2) # 1
right = bisect.bisect_right(a, 2) # 4
count = right - left # 3
exists = left < len(a) and a[left] == 2 # True
print(f"2 appears {count} times in {a}, exists: {exists}")
2 appears 3 times in [1, 2, 2, 2, 3], exists: True
### Exercise: Using the bisect Module
# scores = [72, 85, 88, 90, 95]
# 1. Use bisect.insort() to insert 87. Print the updated list.
# 2. Use bisect.bisect_left() and bisect.bisect_right() to count
# how many times 88 appears in the updated list.
# 3. Use the bisect_search idiom to check whether 90 exists.
### Your code starts here.
### Your code ends here.
[72, 85, 87, 88, 90, 95]
1
True
13.2.4. Hash-Based Lookup#
The previous methods assume sorted data. If order doesn’t matter and you only need membership testing, a hash-based approach is often faster and simpler.
When the task is membership testing, a set or dict is often the best tool available. Hash tables use key-based addressing rather than scanning: a hash function maps each key to a memory slot directly, so a lookup does not need to examine other elements. This makes \(O(1)\) average-case time constant regardless of collection size.
Common misconception: The
inoperator is not uniformly fast.target in my_listis \(O(n)\) — it performs a linear scan, equivalent tolinear_search. Onlytarget in my_setandtarget in my_dictare \(O(1)\) average case. The syntax looks identical; the data structure determines the cost.
When this approach fits
You need many membership checks on the same collection.
Sorted access or index-based operations are not required.
Extra memory to hold the hash table is acceptable.
Key trade-off: building the hash table has an upfront \(O(n)\) cost, but repeated lookups are fast.
random.seed(42) # Fixed seed keeps the demonstration reproducible across runs.
numbers = [random.randint(1, 20_000) for _ in range(15_000)]
lookup_set = set(numbers)
target = numbers[len(numbers) // 2] # Use one present value for contrast.
missing = 999_999 # Use one absent value for contrast.
print('target in list:', target in numbers)
print('target in set :', target in lookup_set)
print('missing in list:', missing in numbers)
print('missing in set :', missing in lookup_set)
target in list: True
target in set : True
missing in list: False
missing in set : False
### Exercise: Hash-Based vs. List Lookup
# data = list(range(0, 10_000, 3)) # every third integer from 0 to 9999
# 1. Build data_set = set(data).
# 2. For each target in [9, 100, 999, 1000, 9999], print whether it is
# in data (list) and in data_set (set) — one line per target.
# 3. Are results the same? Add a comment explaining which is faster and why.
### Your code starts here.
### Your code ends here.
9 True True
100 False False
999 True True
1000 False False
9999 True True
13.2.5. Correctness Checks#
Before benchmarking, verify correctness on edge cases that often reveal hidden bugs:
empty input
one-element input (present and absent targets)
duplicates (index semantics matter)
a target near the end (worst case for linear search)
assert here is used as an executable correctness check. Using assert turns expectations into executable checks. If any assumption is violated (i.e., condition is untrue), execution stops immediately and the failing case (AssertionError) is exposed.
tests = [ # Each pair is (input_sequence, target).
([], 5),
([10], 10),
([10], 5),
([1, 2, 2, 2, 3], 2),
(list(range(100)), 99),
]
for arr, target in tests: ### unpack tuple into arr and target for each test case
lin_idx = linear_search(arr, target)
arr_sorted = sorted(arr)
bin_idx = binary_search(arr_sorted, target)
assert (target in set(arr)) == (target in arr) # Set/list membership should agree as booleans.
if target in arr:
assert lin_idx != -1
assert arr[lin_idx] == target # Found case: index must point to a matching value.
assert bin_idx != -1
assert arr_sorted[bin_idx] == target
else:
assert lin_idx == -1 # Absent case: both searches should report -1.
assert bin_idx == -1
print('All checks passed.')
All checks passed.
13.2.5.1. Finding First Duplicate#
Standard binary search returns a matching index, but not necessarily the leftmost one. With duplicates — for example, [1, 2, 2, 2, 4, 4, 5] — binary_search may land on any of the 2s. Many tasks need the first (leftmost) occurrence: finding the start of a range, counting elements, or inserting before all copies of a value.
The fix: when a match is found, record the index as a candidate and keep searching left (right = mid - 1). The loop terminates with the smallest index that matches.
Contract: binary_search_first() returns the index of the leftmost occurrence of target, or -1 if absent.
def binary_search_first(sorted_items, target):
left, right = 0, len(sorted_items) - 1
result = -1
while left <= right:
mid = (left + right) // 2
guess = sorted_items[mid]
if guess == target:
result = mid # Record candidate and keep searching left for first occurrence.
right = mid - 1
elif guess < target:
left = mid + 1
else:
right = mid - 1
return result
dup_data = [1, 2, 2, 2, 4, 4, 5]
assert binary_search_first([], 2) == -1 # Empty input check.
assert binary_search_first([2, 2, 2], 2) == 0 # All-duplicate input check.
assert binary_search_first(dup_data, 2) == 1
assert binary_search_first(dup_data, 4) == 4
assert binary_search_first(dup_data, 3) == -1 # Absent target check.
print(binary_search_first(dup_data, 2)) # expected: 1
print(binary_search_first(dup_data, 4)) # expected: 4
print(binary_search_first(dup_data, 3)) # expected: -1
print('binary_search_first checks passed.')
1
4
-1
binary_search_first checks passed.
### Exercise: First Occurrence with Duplicates
# sorted_items = [1, 2, 2, 2, 4, 4, 5]
# 1. Use binary_search_first() to find the first index of 2 and of 4.
# 2. Use binary_search() for the same targets.
# 3. Are the results always the same? Add a comment explaining the difference.
### Your code starts here.
### Your code ends here.
1
4
3
5
13.2.6. Timing Comparison#
This section evaluates a practical question: When is preprocessing worth it?
Two workloads are benchmarked to represent common usage patterns:
One lookup: measure the full workflow cost, including any preprocessing (sorting or building a
set).Many repeated lookups: perform preprocessing once, then reuse the resulting structure across many queries.
This side-by-side design separates two ideas that are often conflated:
Asymptotic lookup cost (for example, \(O(\log n)\) or \(O(1)\)), and
Total workflow cost (setup + lookup).
Interpretation guide:
If only one lookup is needed, preprocessing may dominate total time.
If many lookups are needed, preprocessing is often amortized, so faster lookup methods become more beneficial.
Why wrapper functions are used. linear_search and binary_search are passed directly to average_runtime because their signatures match fn(data, target). Set membership (target in lookup_set) is an expression, not a callable with that signature, so it is wrapped in named functions. This keeps the benchmark interface consistent and makes each timing represent a comparable workflow.
We use time.perf_counter() for higher-resolution timing. The coarser time.time() from earlier chapters is acceptable for rough timing, but less reliable for short operations.
Predict before running: For
n = 100,000with a single lookup, which method do you expect to be fastest — linear search, sort-then-binary, or build-set-then-lookup? Write your prediction and reasoning before executing the cell below.(Hint: think about the full workflow cost — preprocessing included — not just the per-lookup complexity.)
def average_runtime(fn, *args, repeats=30):
total = 0.0
for _ in range(repeats):
start = perf_counter()
fn(*args)
total += perf_counter() - start
return total / repeats # Average repeated measurements to reduce timing noise.
def sort_then_binary_once(data, target):
sorted_data = sorted(data)
return binary_search(sorted_data, target) # One-time workflow: sort, then one binary lookup.
def build_set_once(data, target):
lookup_set = set(data)
return target in lookup_set # One-time workflow: build hash table, then one membership test.
def linear_many(data, targets):
return sum(linear_search(data, target) != -1 for target in targets) # Repeated lookups, no preprocessing.
def sort_then_binary_many(data, targets):
sorted_data = sorted(data)
return sum(binary_search(sorted_data, target) != -1 for target in targets) # Preprocess once, reuse sorted data.
def build_set_many(data, targets):
lookup_set = set(data)
return sum(target in lookup_set for target in targets) # Preprocess once, reuse hash table.
sizes = [10_000, 50_000, 100_000, 200_000]
print('One lookup: includes preprocessing cost')
print(f"{'n':>10} {'linear total':>15} {'sort+binary':>15} {'build+set':>15}")
for n in sizes:
data = list(range(n))
target = n - 1
t_linear_one = average_runtime(linear_search, data, target)
t_binary_one = average_runtime(sort_then_binary_once, data, target)
t_set_one = average_runtime(build_set_once, data, target)
print(f"{n:>10} {t_linear_one:>15.6f} {t_binary_one:>15.6f} {t_set_one:>15.6f}")
print()
print('Many lookups: preprocessing is paid once and reused')
print(f"{'n':>10} {'linear total':>15} {'sort+binary':>15} {'build+set':>15}")
for n in sizes:
data = list(range(n))
targets = [0, n // 4, n // 2, (3 * n) // 4, n - 1] * 20 # Mix positions to avoid one-case bias.
t_linear_many = average_runtime(linear_many, data, targets)
t_binary_many = average_runtime(sort_then_binary_many, data, targets)
t_set_many = average_runtime(build_set_many, data, targets)
print(f"{n:>10} {t_linear_many:>15.6f} {t_binary_many:>15.6f} {t_set_many:>15.6f}")
One lookup: includes preprocessing cost
n linear total sort+binary build+set
10000 0.000155 0.000032 0.000055
50000 0.000782 0.000143 0.000246
100000 0.001531 0.000322 0.000551
200000 0.003064 0.000667 0.001144
Many lookups: preprocessing is paid once and reused
n linear total sort+binary build+set
10000 0.007790 0.000111 0.000051
50000 0.038162 0.000247 0.000239
100000 0.076312 0.000424 0.000523
200000 0.163739 0.001260 0.001941
13.2.6.1. Interpreting the Results#
One-lookup results: Linear search wins. Sorting \(n\) elements costs \(O(n \log n)\) and building a hash table costs \(O(n)\) — both are more expensive than a single \(O(n)\) linear scan. Preprocessing is never recovered when there is only one query.
Many-lookups results: The picture reverses. Preprocessing is paid once and amortized across all queries. As \(n\) grows, linear search scales poorly (\(O(n)\) per lookup), while binary search (\(O(\log n)\)) and set lookup (\(O(1)\) average) stay comparatively fast. The set typically wins on raw lookup speed; binary search is competitive when memory is constrained or the data is already sorted.
Takeaway: Total workflow cost = preprocessing cost + (number of lookups × per-lookup cost). The asymptotically fastest isolated lookup is not always the fastest workflow.
13.2.7. Search Strategy Summary#
No single search method is optimal in every context. Selection should be based on workload, data properties, and total cost.
Strategy |
Time |
Needs sorted data? |
Best for |
|---|---|---|---|
Linear search |
\(O(n)\) |
No |
One-off lookups; small or unsorted input |
Binary search |
\(O(\log n)\) |
Yes |
Repeated lookups on sorted data |
|
\(O(\log n)\) |
Yes |
Production search/insertion on sorted lists |
|
\(O(1)\) average |
No |
High-volume membership checks |
Cost of sorting: Binary search and bisect require sorted input. Since sorting costs \(O(n \log n)\), it may not pay off for only one or two lookups.
Preprocessing trade-off: Building a set costs \(O(n)\) time and \(O(n)\) space, but each later membership check is typically \(O(1)\).
Decision checklist
Is the data already sorted?
How many lookups will be performed?
Is insertion while preserving order required?
Is extra memory acceptable?
In practice, choose the method with the lowest total workflow cost (setup + lookup), not just the lowest isolated lookup complexity.