8.1. Strings#

Hide code cell source

import sys
from pathlib import Path

current = Path.cwd()
for parent in [current, *current.parents]:
    if (parent / '_config.yml').exists():
        project_root = parent  # ← Add project root, not chapters
        break
else:
    project_root = Path.cwd().parent.parent

sys.path.insert(0, str(project_root))

from shared import thinkpython, diagram, jupyturtle
from shared.download import download

# Register as top-level modules so direct imports work in subsequent cells
sys.modules['thinkpython'] = thinkpython
sys.modules['diagram'] = diagram
sys.modules['jupyturtle'] = jupyturtle

A string is a sequence of characters enclosed in quotes. A character can be a letter (in almost any alphabet), a digit, a punctuation mark, or white space.

It should be noted that strings are immutable, meaning after creation, they cannot be modified or updated.

Strings are one of the most commonly used data types in Python, and Python provides a rich set of built-in operations and methods for working with them.

In this section we cover:

  • Creating strings

  • Indexing and slicing

  • Concatenation and repetition

  • Case methods

  • Searching and testing

  • Cleaning

  • Splitting and joining

  • String formatting

  • Type-checking methods

  • String comparison

  • Docstrings

  • Application

8.1.1. String Creation and Accessing#

Strings are created through assignment operator using single, double, or triple quotes: ‘hello’, “hello”, “””hello”””.

The built-in Python function len() works with string as well.

s = 'supercalifragilisticexpialidocious'

print(type(s))

n = len(s)
print(n)
<class 'str'>
34

Single and double quotes are interchangeable for single-line strings. Triple quotes are used for multi-line strings or strings that contain both single and double quotes.

s1 = 'Hello, world!'
s2 = "Hello, world!"
s3 = """This is
a multi-line
string."""

print(s1)
print(s2)
print(s3)
Hello, world!
Hello, world!
This is
a multi-line
string.

8.1.1.1. Escape Sequences and Raw Strings#

Inside a string, a backslash \ introduces an escape sequence — a two-character combination that represents a special character:

Escape Sequence

Meaning

Example output

\n

Newline

moves to the next line

\t

Tab

inserts a horizontal tab

\\

Backslash

a literal \

\"

Double quote

" inside a double-quoted string

\'

Single quote

' inside a single-quoted string

A raw string is prefixed with r (or R) and treats backslashes as literal characters — no escape sequences are processed. Raw strings are especially useful for regular expression patterns and file paths.

print("line1\nline2")       # newline
print("col1\tcol2")         # tab
print("C:\\Users\\ty")      # literal backslashes
print(r"C:\Users\ty")       # raw string — same result, easier to read
line1
line2
col1	col2
C:\Users\ty
C:\Users\ty
### EXERCISE: Escape Sequences and Raw Strings
# Difficulty: Basic
# 1. Print two words on two lines using \n
# 2. Print two values separated by a tab using \t
# 3. Print a Windows path using a raw string
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
print("apple\nbanana")
print("score\t95")
print(r"C:\Users\alice\data")
apple
banana
score	95
C:\Users\alice\data

8.1.1.2. Indexing and Slicing#

Strings are sequences, meaning each character has a numbered position called an index. Python uses zero-based indexing: the first character is at index 0, the second at index 1, and so on. Negative indices count from the end of the string.

8.1.1.2.1. String Indexing#

As a sequence type, the expression in brackets is an index, so called because it indicates which character in the sequence to select. String indexing is 0-based.

fruit = "banana"
print(fruit[0])
b

You can select a character from a string with the bracket operator.

s = 'Python'

print(s[0])    # 'P'  — first character
print(s[1])    # 'y'  — second character
print(s[-1])   # 'n'  — last character
print(s[-2])   # 'o'  — second to last
P
y
n
o

As a reminder, the last letter of a string is the length of the string minus 1. If you use len() to access the last element of the sequence you get an IndexError (string index out of range) because there is no element there to be accessed: 0-based indexing.

Also because of 0-based indexing, to get the last character, you have to subtract 1 from n (n-1).

fruit = 'banana'
n = len(fruit)
%%expect IndexError

fruit[n]
IndexError: string index out of range
fruit[n-1]
'a'

Often forgotten, there’s an easier way to access the last element of a sequence: negative indexing, which counts backward from the end. The index -1 selects the last letter, -2 selects the second to last, and so on.

fruit[-1]
'a'

The index in brackets can be a variable. Or an expression that contains variables and operators.

i = 1
print(fruit[i])
print(fruit[i+1])

for i in range(len(fruit)):
    print(fruit[i], end=' ')
a
n
b a n a n a 

Just like lists and tuples, the value of the index has to be an integer – otherwise you get a TypeError.

%%expect TypeError

fruit[1.5]
TypeError: string indices must be integers, not 'float'

It is tempting to use the [] operator on the left side of an assignment, with the intention of changing a character in a string.

The result is a TypeError. In the error message, the object is the string and the “item” is the character we tried to assign.

The reason for this error is that strings are immutable, which means you can’t change an existing string.

greeting = 'hello, world'
%%expect TypeError

greeting[0] = 'J'
TypeError: 'str' object does not support item assignment

The best you can do is create a new string that is a variation of the original.

new_greeting = 'J' + greeting[1:]     ### "+" is concatenate here
new_greeting
'Jello, world'

This example concatenates a new first letter onto a slice of greeting. It has no effect on the original string.

greeting
'hello, world'

8.1.1.2.2. Slicing Strings#

Just like lists and tuples, a segment of a string is called a slice. Selecting a slice is similar to selecting a character. The general syntax of slicing is the same as lists and tuple:

sequence[start:stop:step]

Also, the parameters are start-inclusive and stop-exclusive.

  • start — index to begin at (inclusive, default 0)

  • stop — index to end at (exclusive, default end of string)

  • step — how many characters to skip (default 1)

fruit = 'banana'
fruit[0:3]
'ban'

The operator [m:n] returns the part of the string from the mth character to the nth character, including the first but excluding the second. This behavior is counterintuitive, but it might help to imagine the indices pointing between the characters, as in this figure:

../../_images/f8c95287f197129b00cc2a96a8a9cae3359271a53193289dbe50486af1013b91.png

For example, the slice [3:6] selects the letters ana, which means that 6 is legal as part of a slice, but not legal as an index.

Also,

  • if you omit the first index, the slice starts at the beginning of the string.

  • if you omit the second index, the slice goes to the end of the string:

s = 'Hello, world!'

print(s[0:5])    # 'Hello'   — characters 0 through 4
print(s[7:])     # 'world!'  — from index 7 to end
print(s[:5])     # 'Hello'   — from start to index 4
print(s[::2])    # every other character
print(s[::-1])   ### 'reversed string' ###
Hello
world!
Hello
Hlo ol!
!dlrow ,olleH

If the first index is greater than or equal to the second, the result is an empty string, represented by two quotation marks. An empty string contains no characters and has length 0.

print(f"len(fruit[3:3]): {len(fruit[3:3])}")
print(f"Type of fruit[3:3]: {type(fruit[3:3])}")
fruit[3:3]
len(fruit[3:3]): 0
Type of fruit[3:3]: <class 'str'>
''

Continuing this example, what do you think fruit[:] means? Try it and see.

fruit[:]
'banana'

To practice your slicing skills, play these in your head with string “banana”, which may not be as easy as you think.

fruit[0:-1]                     
fruit[-2:]                      
fruit[0:-1:2]                   

# print(fruit[0:-1])              ### all but the last letter: banan
# print(fruit[-2:])               ### the last two letters: na
# print(fruit[0:-1:2])            ### step is 2, so you get bnn
'bnn'
### EXERCISE: Indexing and Slicing
# Difficulty: Basic
text = 'superpython'
# 1. Print the first character
# 2. Print the last character using negative indexing
# 3. Print every second character
# 4. Print the reversed string
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
text = 'superpython'
print(text[0])
print(text[-1])
print(text[::2])
print(text[::-1])
s
n
spryhn
nohtyprepus

8.1.1.3. Concatenation and Repetition#

The + operator joins two strings together (concatenation). The * operator repeats a string a given number of times (repetition).

first = 'Hello'
last  = 'World'

# Concatenation
greeting = first + ', ' + last + '!'
print(greeting)        # 'Hello, World!'

# Repetition
line = '* ' * 10
print(line)            # '* * * * * * * * * * '

print('ha' * 3)        # 'hahaha'
Hello, World!
* * * * * * * * * * 
hahaha
### EXERCISE: Concatenation and Repetition
# Difficulty: Basic
first = 'Alice'
last = 'Bob'
# 1. Build and print: "Alice & Bob" using concatenation
# 2. Print "ha" repeated 4 times
# 3. Create a divider of 20 dashes and print it
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
first = 'Alice'
last = 'Bob'
print(first + " & " + last)
print("ha" * 4)
print("-" * 20)
Alice & Bob
hahahaha
--------------------

8.1.2. String Methods#

Python provides strings methods that perform a variety of useful operations. A method is similar to a function, it usually takes arguments and returns a value. But the syntax for methods is different from that of functions. A method belongs to an object, so, for example, the method upper() that returns a new all uppercase string has to come after a string object with a . (dot notation), which makes the method syntax like'banana'.upper() to output ‘BANANA’, instead of what a function would look like upper('banana').

word = 'banana'
new_word = word.upper()
new_word
'BANANA'

This use of the dot operator specifies the name of the method, upper, and the name of the string to apply the method to, word. The empty parentheses indicate that this method takes no arguments.

A method call is called an invocation; in this case, we would say that we are invoking upper on word.

methods = [m for m in dir(str) if not m.startswith('_')]
num_str_methods = len(methods)
print(num_str_methods)  # 47
print(methods)
47
['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
from myst_nb import glue
glue("num_str_methods", num_str_methods)
47

Python offers 47 string methods. Here below is a collection of some of the commonly used ones.

Category

Method

Description

Case

.upper()

All uppercase

Case

.lower()

All lowercase

Search

.find(x)

Index of first match, -1 if missing

Search

.index(x)

Index of first match, raises error if missing

Search

.count(x)

Count occurrences

Whitespace

.strip()

Remove leading/trailing whitespace

Split

.split(x)

Split on delimiter

Join

.join(lst)

Join list into string

Replace

.replace(a, b)

Replace all occurrences

Check

.isspace()

All whitespace

Check

.isupper()

All uppercase

Check

.islower()

All lowercase

s = "  Hello, World!  "
words = "the quick brown fox"

# Case
print("--- Case ---")
print(s.upper())
print(s.lower())

# Search
print("\n--- Search ---")
print(words.find("quick"))
print(words.index("fox"))
print(words.count("o"))

# Whitespace
print("\n--- Whitespace ---")
print(repr(s.strip()))

# Split
print("\n--- Split ---")
print(words.split(" "))

# Join
print("\n--- Join ---")
print(", ".join(["apple", "banana", "cherry"]))

# Replace
print("\n--- Replace ---")
print(words.replace("fox", "cat"))

# Check
print("\n--- Check ---")
print("   ".isspace())
print("HELLO".isupper())
print("hello".islower())
--- Case ---
  HELLO, WORLD!  
  hello, world!  

--- Search ---
4
16
2

--- Whitespace ---
'Hello, World!'

--- Split ---
['the', 'quick', 'brown', 'fox']

--- Join ---
apple, banana, cherry

--- Replace ---
the quick brown cat

--- Check ---
True
True
True

8.1.2.1. Case Methods#

Python provides several methods for changing the case of a string. These are useful for normalizing text before comparison or display.

s = 'hello, world!'

print(s.upper())        # 'HELLO, WORLD!'  — all uppercase
print(s.lower())        # 'hello, world!'  — all lowercase
print(s.title())        # 'Hello, World!'  — first letter of each word capitalized
print(s.capitalize())   # 'Hello, world!'  — first letter of string capitalized
print(s.swapcase())     # 'HELLO, WORLD!'  — swap upper and lower
HELLO, WORLD!
hello, world!
Hello, World!
Hello, world!
HELLO, WORLD!

Case methods are often used to make comparisons case-insensitive. For example, you might want to turn a username or email address all uppercase in the case of user login.

user_input = 'Alice'
username    = 'alice'

print(user_input == username)                    # False
print(user_input.lower() == username.lower())    # True
False
True
### EXERCISE: Case Methods
# Difficulty: Basic
user_input = 'PyThOn'
target = 'python'
# 1. Print user_input in upper, lower, and title case
# 2. Print whether user_input matches target case-insensitively
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
user_input = 'PyThOn'
target = 'python'
print(user_input.upper())
print(user_input.lower())
print(user_input.title())
print(user_input.casefold() == target.casefold())
PYTHON
python
Python
True

8.1.2.2. Searching and Testing#

8.1.2.2.1. Finding a Substring#

find(sub) returns the index of the first occurrence of sub, or -1 if not found. index(sub) works the same way but raises a ValueError if the substring is not found.

s = 'data science and data engineering'

print(s.find('data'))     # 0  — first occurrence
print(s.find('data', 5))  # 17 — search starting at index 5
print(s.find('math'))     # -1 — not found

print(s.rfind('data'))    # 17 — last occurrence
0
17
-1
17

8.1.2.2.2. Counting Occurrences#

count(sub) returns the number of non-overlapping occurrences of a substring.

s = 'banana'
print(s.count('a'))    # 3
print(s.count('an'))   # 2
3
2

8.1.2.2.3. Starts and Ends With#

startswith(prefix) and endswith(suffix) test whether a string begins or ends with a given substring. Both return True or False.

filename = 'report_2025.csv'

print(filename.startswith('report'))   # True
print(filename.endswith('.csv'))       # True
print(filename.endswith('.xlsx'))      # False
True
True
False

8.1.2.2.4. The in Operator#

The in operator tests whether a substring appears anywhere in a string. It is the most readable way to check for membership.

s = 'machine learning'

print('learning' in s)    # True
print('deep' in s)        # False
print('deep' not in s)    # True
True
False
True
### EXERCISE: Searching and Testing
# Difficulty: Intermediate
sentence = 'data science uses data pipelines'
# 1. Find the index of the first "data"
# 2. Find the index of "data" starting from position 5
# 3. Count how many times "data" appears
# 4. Check if "science" is in sentence
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
sentence = 'data science uses data pipelines'
print(sentence.find('data'))
print(sentence.find('data', 5))
print(sentence.count('data'))
print('science' in sentence)
0
18
2
True

8.1.2.3. Cleaning#

Real-world text data often contains extra whitespace or unwanted characters. Python provides several methods for cleaning strings.

8.1.2.3.1. Stripping Whitespace#

  • strip() removes leading and trailing whitespace.

  • lstrip() (left strip) removes only leading whitespace.

  • rstrip() (right strip) removes only trailing whitespace.

s = '   hello, world!   '

print(repr(s.strip()))    # 'hello, world!'
print(repr(s.lstrip()))   # 'hello, world!   '
print(repr(s.rstrip()))   # '   hello, world!'
'hello, world!'
'hello, world!   '
'   hello, world!'

You can also pass a character to strip. For example, s.strip('.') removes leading and trailing periods.

s = '...hello...'
print(s.strip('.'))    # 'hello'
hello

8.1.2.3.2. Replacing Substrings#

replace(old, new) returns a new string with all occurrences of old replaced by new. An optional third argument limits the number of replacements.

s = 'I like cats. Cats are great.'

print(s.replace('cats', 'dogs'))        # replace all
print(s.replace('cats', 'dogs', 1))     # replace first occurrence only

# Useful for removing characters
s2 = 'hello, world!'
print(s2.replace(',', '').replace('!', ''))   # 'hello world'
I like dogs. Cats are great.
I like dogs. Cats are great.
hello world
### EXERCISE: Cleaning Strings
# Difficulty: Intermediate
raw = '...  Hello, Python!  ...'
# 1. Strip leading/trailing dots
# 2. Strip leading/trailing whitespace from the result
# 3. Replace "Python" with "Data Science"
# 4. Print the cleaned string
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
raw = '...  Hello, Python!  ...'
clean = raw.strip('.').strip().replace('Python', 'Data Science')
print(clean)
Hello, Data Science!

8.1.2.4. Splitting and Joining#

8.1.2.4.1. Splitting#

split(sep) breaks a string into a list of substrings at each occurrence of the separator sep. If no separator is given, it splits on any whitespace and removes empty strings.

s = 'Python,R,SQL,Julia'
print(s.split(','))           # ['Python', 'R', 'SQL', 'Julia']

s2 = 'one two   three'
print(s2.split())             # ['one', 'two', 'three']

# Split on a specific delimiter, keeping empty strings
s3 = 'a,,b,,c'
print(s3.split(','))          # ['a', '', 'b', '', 'c']

# Limit the number of splits
s4 = '2025-08-26'
print(s4.split('-', 1))       # ['2025', '08-26']
['Python', 'R', 'SQL', 'Julia']
['one', 'two', 'three']
['a', '', 'b', '', 'c']
['2025', '08-26']

8.1.2.4.2. Joining#

join(iterable) is the inverse of split(). It concatenates a list of strings into one string, inserting the separator between each element.

words = ['Python', 'is', 'fun']

print(' '.join(words))     # 'Python is fun'
print('-'.join(words))     # 'Python-is-fun'
print(''.join(words))      # 'Pythonisfun'

# Practical: reassemble a cleaned sentence
sentence = '  too   many   spaces  '
cleaned  = ' '.join(sentence.split())
print(cleaned)             # 'too many spaces'
Python is fun
Python-is-fun
Pythonisfun
too many spaces
### EXERCISE: Splitting and Joining
# Difficulty: Intermediate
record = 'alice,bob,charlie'
# 1. Split the record into a list of names
# 2. Join names with " - "
# 3. Print both the list and joined string
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
record = 'alice,bob,charlie'
names = record.split(',')
joined = ' - '.join(names)
print(names)
print(joined)
['alice', 'bob', 'charlie']
alice - bob - charlie

8.1.2.5. String Formatting#

String formatting inserts values into a string template. Python offers three approaches: f-strings (modern, recommended), str.format(), and % formatting (legacy).

8.1.2.5.1. f-Strings#

An f-string is prefixed with f and uses {} to embed expressions directly inside the string. F-strings are the most readable and most commonly used approach.

name  = 'Alice'
score = 95.678

print(f'Student: {name}')
print(f'Score: {score:.2f}')        # 2 decimal places
print(f'Score: {score:>10.2f}')     # right-aligned, width 10
print(f'{name.upper()}')              # apply conversion (capitalize)
print(f'Double score: {score * 2}') # expressions work inside {}
Student: Alice
Score: 95.68
Score:      95.68
ALICE
Double score: 191.356

8.1.2.5.2. Format Specification Mini-Language#

Inside {}, a colon : introduces a format spec that controls how the value is displayed.

Spec

Meaning

Example

.2f

2 decimal places (float)

3.14

d

integer

42

e

scientific notation

3.14e+00

%

percentage

75.00%

>10

right-align, width 10

      3.14

<10

left-align, width 10

3.14     

^10

center, width 10

  3.14 

,

thousands separator

1,000,000

pi = 3.14159265
n  = 1000000
r  = 0.756

print(f'{pi:.4f}')      # '3.1416'
print(f'{pi:e}')        # '3.141593e+00'
print(f'{n:,}')         # '1,000,000'
print(f'{r:.1%}')       # '75.6%'
print(f'{pi:^10.2f}')   # '   3.14   '
3.1416
3.141593e+00
1,000,000
75.6%
   3.14   

8.1.2.5.3. str.format()#

str.format() is an older but still widely used formatting approach. Values are passed as arguments and inserted into {} placeholders.

name  = 'Bob'
grade = 88.5

print('Name: {}, Grade: {:.1f}'.format(name, grade))
print('Name: {0}, Grade: {1:.1f}'.format(name, grade))   # positional
print('Name: {n}, Grade: {g:.1f}'.format(n=name, g=grade))  # keyword
Name: Bob, Grade: 88.5
Name: Bob, Grade: 88.5
Name: Bob, Grade: 88.5
### EXERCISE: String Formatting
# Difficulty: Intermediate
name = 'Alice'
score = 92.456
# 1. Print name and score with score rounded to 1 decimal place using f-string
# 2. Print score as a percentage with 1 decimal place (assume score/100)
# 3. Print name right-aligned in width 10
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
name = 'Alice'
score = 92.456
print(f'Name: {name}, Score: {score:.1f}')
print(f'Percent: {score/100:.1%}')
print(f'{name:>10}')
Name: Alice, Score: 92.5
Percent: 92.5%
     Alice

8.1.2.6. Type-Checking Methods#

Python strings have a family of is*() methods that test the character composition of a string. Each returns True or False.

Method

Returns True if…

isdigit()

all characters are digits (0–9)

isalpha()

all characters are letters

isalnum()

all characters are letters or digits

isspace()

all characters are whitespace

isupper()

all cased characters are uppercase

islower()

all cased characters are lowercase

istitle()

string is in title case

print('12345'.isdigit())     # True
print('abc'.isalpha())       # True
print('abc123'.isalnum())    # True
print('   '.isspace())       # True
print('HELLO'.isupper())     # True
print('hello'.islower())     # True
print('Hello World'.istitle()) # True

# Mixed cases return False
print('abc123!'.isalnum())   # False — '!' is not alphanumeric
print(''.isdigit())          # False — empty string
True
True
True
True
True
True
True
False
False

These methods are useful for input validation:

user_input = '2025'

if user_input.isdigit():
    year = int(user_input)
    print(f'Valid year: {year}')
else:
    print('Please enter a number.')
Valid year: 2025
### EXERCISE: Type-Checking Methods
# Difficulty: Intermediate
samples = ['123', 'abc', 'abc123', '   ', 'Hello World']
# 1. For each sample, print isdigit, isalpha, and isalnum results
# 2. For "Hello World", print istitle result
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
samples = ['123', 'abc', 'abc123', '   ', 'Hello World']
for s in samples:
    print(s, s.isdigit(), s.isalpha(), s.isalnum())
print('Hello World'.istitle())
123 True False True
abc False True True
abc123 False False True
    False False False
Hello World False False False
True

8.1.2.7. Methods Reference#

Python provides a number of function and methods for string operations. The commonly used methods are:

Operation

Syntax

Description

Length

len(s)

Number of characters

Indexing

s[i]

Character at position i

Slicing

s[start:stop:step]

Extract substring

Concatenation

s1 + s2

Join two strings

Repetition

s * n

Repeat string n times

Uppercase

s.upper()

All uppercase

Lowercase

s.lower()

All lowercase

Title case

s.title()

Capitalize each word

Find

s.find(sub)

Index of first match, or -1

Count

s.count(sub)

Number of occurrences

Membership

sub in s

Test if substring present

Strip

s.strip()

Remove leading/trailing whitespace

Replace

s.replace(old, new)

Substitute substring

Split

s.split(sep)

String → list

Join

sep.join(list)

List → string

f-string

f'{var:.2f}'

Formatted string literal

Type check

s.isdigit(), etc.

Test character composition

### EXERCISE: Methods Reference Practice
# Difficulty: Intermediate
s = '  banana split  '
# Use at least 4 methods from this section to:
# 1. Remove outer spaces
# 2. Replace "split" with "bread"
# 3. Convert to uppercase
# 4. Check whether "BANANA" is in the final string
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
s = '  banana split  '
t = s.strip().replace('split', 'bread').upper()
print(t)
print('BANANA' in t)
BANANA BREAD
True

8.1.3. String Comparison#

Observe the following operations.

### check out the comparisons here:

print("A" < 'a')
print("a" < 'banana')
print('Pineapple' > 'pineapple')
print('Pineapple' > 'banana')
True
True
False
False

The relational operators work on strings as seen above. String comparisons are based on the ASCII code table (this one is easier to read than the one presented in an earlier chapter). As you can see in the table below, each character has a decimal number that string comparison uses to compare strings. Note that:

  • 0 is 48

  • A is 65

  • a is 97

Dec Chr Dec Chr Dec Chr Dec Chr Dec Chr
0 NUL 26 SUB 52 4 78 N 104 h
1 SOH 27 ESC 53 5 79 O 105 i
2 STX 28 FS 54 6 80 P 106 j
3 ETX 29 GS 55 7 81 Q 107 k
4 EOT 30 RS 56 8 82 R 108 l
5 ENQ 31 US 57 9 83 S 109 m
6 ACK 32 58 : 84 T 110 n
7 BEL 33 ! 59 ; 85 U 111 o
8 BS 34 " 60 < 86 V 112 p
9 HT 35 # 61 = 87 W 113 q
10 LF 36 $ 62 > 88 X 114 r
11 VT 37 % 63 ? 89 Y 115 s
12 FF 38 & 64 @ 90 Z 116 t
13 CR 39 ' 65 A 91 [ 117 u
14 SO 40 ( 66 B 92 \ 118 v
15 SI 41 ) 67 C 93 ] 119 w
16 DLE 42 * 68 D 94 ^ 120 x
17 DC1 43 + 69 E 95 _ 121 y
18 DC2 44 , 70 F 96 ` 122 z
19 DC3 45 - 71 G 97 a 123 {
20 DC4 46 . 72 H 98 b 124 |
21 NAK 47 / 73 I 99 c 125 }
22 SYN 48 0 74 J 100 d 126 ~
23 ETB 49 1 75 K 101 e 127 DEL
24 CAN 50 2 76 L 102 f
25 EM 51 3 77 M 103 g

So we can use the ASCII code table to compare strings.

word = 'banana'

if word == 'banana':
    print('All right, banana.')
All right, banana.

Other relational operations are useful for putting words in alphabetical order:

def compare_word(word):
    if word < 'banana':
        print(word, 'comes before banana.')
    elif word > 'banana':
        print(word, 'comes after banana.')
    else:
        print('All right, banana.')
compare_word('apple')
apple comes before banana.

Python does not handle uppercase and lowercase letters the same way people do. All the uppercase letters come before all the lowercase letters, so:

compare_word('Pineapple')
Pineapple comes before banana.

This can be problematic sometimes. To solve this problem, we can convert strings to a standard format, such as all lowercase or all uppercase, before performing the comparison.

compare_word('Pineapple'.lower())
pineapple comes after banana.
### EXERCISE: String Comparison
# Difficulty: Challenge
words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']
# 1. Print the list sorted as-is
# 2. Print the list sorted case-insensitively
# 3. Build and print a list of tuples: (word, word.casefold())
# 4. Print whether "Apple" and "apple" are equal under casefold
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']
print(sorted(words))
print(sorted(words, key=str.casefold))
pairs = [(w, w.casefold()) for w in words]
print(pairs)
print('Apple'.casefold() == 'apple'.casefold())
['Apple', 'Banana', 'apple', 'banana', 'cherry']
['Apple', 'apple', 'banana', 'Banana', 'cherry']
[('Apple', 'apple'), ('apple', 'apple'), ('banana', 'banana'), ('Banana', 'banana'), ('cherry', 'cherry')]
True

8.1.4. Looping and Sorting#

8.1.4.1. Looping Through String Lists#

You can use a for statement to loop through the elements of a list.

fruits = ['apple', 'banana', 'cherry']

for fruit in fruits:
    print(fruit)
apple
banana
cherry

.split() returns a list of words, we can use for to loop through them.

s = 'We are programmed to receive'  ### lyric from the Eagles' 1976 hit song "Hotel California".

for word in s.split():
    print(word)
We
are
programmed
to
receive

Not that it’s useful, but a for loop over an empty list never runs the indented statements.

for x in []:
    print('This never happens.')
### EXERCISE: Looping Through String Lists
# Difficulty: Basic
words = ['apple', 'Banana', 'cherry', 'Date', 'elderberry']
# 1. Loop through the words and print each word in lowercase
# 2. Create a new list containing only words that start with a vowel (a, e, i, o, u)
### Your code starts here:



### Your code ends here.

Hide code cell source

# Solution
words = ['apple', 'Banana', 'cherry', 'Date', 'elderberry']

print("Words in lowercase:")
for word in words:
    print(word.lower())

vowels = ['a', 'e', 'i', 'o', 'u']
starts_with_vowel = []
for word in words:
    if word[0].lower() in vowels:
        starts_with_vowel.append(word)

print(f"\nWords starting with a vowel: {starts_with_vowel}")
Words in lowercase:
apple
banana
cherry
date
elderberry

Words starting with a vowel: ['apple', 'elderberry']

8.1.4.2. Sorting String Lists#

Python provides a built-in function called sorted that sorts the elements of a list and the .sort() method that does similarly.

  • sorted()

  • .join()

scramble = ['c', 'a', 'b']
sorted(scramble)
['a', 'b', 'c']

The original list is unchanged.

scramble
['c', 'a', 'b']

sorted works with any kind of sequence, not just strings or lists. So we can sort the letters in a string like this.

sorted('letters')
['e', 'e', 'l', 'r', 's', 't', 't']

The result is a list. To convert the list to a string, we can use join.

letters = ''.join(sorted('letters'))

With an empty string as the delimiter, the elements of the list are joined with nothing between them.

In lists, you have a .sort() method, which is not available in strings; it is list only.

%%expect AttributeError

letters.sort()
AttributeError: 'str' object has no attribute 'sort'
### EXERCISE: Sorting Lists
# Difficulty: Intermediate
scores = [85, 92, 78, 90, 88]
names = ['Charlie', 'Alice', 'Bob']
# 1. Sort the scores in descending order (highest first)
# 2. Sort the names alphabetically and join them with commas
### Your code starts here:



### Your code ends here.

Hide code cell source

# Solution
scores = [85, 92, 78, 90, 88]
names = ['Charlie', 'Alice', 'Bob']

sorted_scores = sorted(scores, reverse=True)
sorted_names = sorted(names)
names_joined = ", ".join(sorted_names)

print(f"Scores (descending): {sorted_scores}")
print(f"Names (alphabetically): {names_joined}")
Scores (descending): [92, 90, 88, 85, 78]
Names (alphabetically): Alice, Bob, Charlie

8.1.5. Docstrings#

A docstring is a string at the beginning of a function that explains the interface (“doc” is short for “documentation”). Here is an example:

def polyline(n, length, angle):
    """Draws line segments with the given length and angle between them.
    
    n: integer number of line segments
    length: length of the line segments
    angle: angle between segments (in degrees)
    """    
    for i in range(n):
        forward(length)
        left(angle)

By convention, docstrings are triple-quoted strings, also known as multiline strings because the triple quotes allow the string to span more than one line.

A docstring should:

  • Explain concisely what the function does, without getting into the details of how it works,

  • Explain what effect each parameter has on the behavior of the function, and

  • Indicate what type each parameter should be, if it is not obvious.

Writing this kind of documentation is an important part of interface design. A well-designed interface should be simple to explain; if you have a hard time explaining one of your functions, maybe the interface could be improved.

### EXERCISE: Writing a Docstring
# Difficulty: Challenge
# Write a function called area_rectangle(width, height).
# 1. Add type hints for parameters and return value.
# 2. Include a docstring describing parameters, return value, and one raised error.
# 3. Raise ValueError if width or height is negative.
# 4. Call the function with (3, 4) and print the result.
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
def area_rectangle(width: float, height: float) -> float:
    """Return the area of a rectangle.

    width: non-negative numeric width of the rectangle
    height: non-negative numeric height of the rectangle
    returns: numeric area
    raises: ValueError if width or height is negative
    """
    if width < 0 or height < 0:
        raise ValueError('width and height must be non-negative')
    return width * height

print(area_rectangle(3, 4))
12

8.1.6. Application: Word List#

Let’s apply what we’ve learned to a real-world task: building and searching a word list.

In the previous chapter, we read the file words.txt and searched for words with certain properties, like using the letter e. But we read the entire file many times, which is not efficient. It is better to read the file once and put the words in a list. The following loop shows how.

from pathlib import Path
words_file = project_root / 'data' / 'words.txt'
if not words_file.exists():
    download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/words.txt', words_file)
word_list = []

for line in open(words_file, encoding='utf-8'):
    word = line.strip()
    word_list.append(word)
    
len(word_list)
113783
word_list[:10]
['aa',
 'aah',
 'aahed',
 'aahing',
 'aahs',
 'aal',
 'aalii',
 'aaliis',
 'aals',
 'aardvark']

Before the loop, word_list is initialized with an empty list. Each time through the loop, the append method adds a word to the end. When the loop is done, there are more than 113,000 words in the list.

Another way to do the same thing is to use read to read the entire file into a string.

string = words_file.read_text(encoding='utf-8')
len(string)
1016511

The result is a single string with more than a million characters. We can use the split method to split it into a list of words.

word_list = string.split()
len(word_list)
113783

Evaluating the variable word_list in Jupyter Notebook will give you the whole list, which is very long, so let us use a for loop to take a look at the first 5 elements:

for i in range(5):
    print(word_list[i])
aa
aah
aahed
aahing
aahs

Or just use slicing.

word_list[:5]
['aa', 'aah', 'aahed', 'aahing', 'aahs']

And we always want to know the data type of our data:

print(type(word_list))
<class 'list'>

Now, to check whether a string appears in the list, we can use the in operator. For example, 'demotic' is in the list.

'demotic' in word_list
True

But 'contrafibularities' is not.

'contrafibularities' in word_list
False
"supercalifragilisticexpialidocious" in word_list
False
### EXERCISE: Word List Application
# Difficulty: Challenge
# Using word_list from this section:
# 1. Print the first 3 words
# 2. Count how many words start with "a"
# 3. Print the average word length (rounded to 2 decimals)
# 4. Find and print the longest word among the first 5000 words
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
print(word_list[:3])
count_a = sum(1 for w in word_list if w.startswith('a'))
print(count_a)
avg_len = sum(len(w) for w in word_list) / len(word_list)
print(round(avg_len, 2))
longest_5k = max(word_list[:5000], key=len)
print(longest_5k)
['aa', 'aah', 'aahed']
6557
7.93
anticonservationist