Formatting, indexing and slicing strings.¶
Placing data into strings¶
Often we want to display results of our work at several points in a program: in the middle of calculations to check the code's progress and/or to see that intermediate results are running; at the end to display the final results. Additionally, we might want to make labels for plotting, or save data to an output file for further use. To do this, we have to fully understand what we are outputting (the type, length/shape of object, etc.). Then decisions have to be made like how many decimal places to display, how to include text describing what each number is in many cases, how to make aligned columns of results, etc. This all comes under the category of string formatting: inserting the quantities of interest into strings and specifying what display properties it should have.
Up until this point, we have been printing strings and other types very simply to display data (as described here), separating each item with commas:
print("Finished.") print("x =", x) print("Avec =", A, " and Bvec =", B)
etc. We now look at more interesting ways to insert data into strings and format the results.
There are different methods and styles for performing this kind of operation in Python, but we will primarily use the modern "string format method".
The help file for strings contains a .format()
method with both
positional and keyword arguments. Therefore for a given string S
,
we can apply .format()
as follows to obtain a new string:
format(...) S.format(*args, **kwargs) -> str Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces ('{' and '}').
The basic approach for this is to write a string with place-holders
for values specified using curly brackets { }
, and then providing
the values themselves as arguments. The values can be either
variables or expressions to be evaluated. For example:
x = 5 print("x = {}".format(x)) y = -15.5 print("if y = {}, then 5-y = {}".format(y, 5-y)) Avec = np.arange(3) Bvec = np.ones(2, dtype=bool) print("Avec = {} and Bvec = {}".format(Avec, Bvec))
produces:
x = 5 if y = -15.5, then 5-y = 20.5 Avec = [0 1 2] and Bvec = [ True True]
In general, if we have N values to insert, we will reserve N
spaces in string with curly brackets { }
.
Formatting data in strings¶
Above, we have specified how to place values directly into a string.
We now discuss how to format it with various contents of the curly
brackets { }
, controlling things like spacing, alignment, number
of decimal places and even ordering.
Ordering of variables¶
By default, the values inserted into the string are placed by order of position. If we want to, it is possible to specify indices of the argument positions inside the curly brackets, in order to change around the order of placement in the string or even to repeat values. Consider:
xval, yval = 45.80000, -99 print("first = {0}, last = {1}".format(xval, yval)) print("first = {1}, last = {0}".format(xval, yval)) print("first = {0}, again = {0}, more (??) = {0}, last = {1}".format(xval, yval))
which produces
first = 45.8, last = -99 first = -99, last = 45.8 first = 45.8, again = 45.8, more (??) = 45.8, last = -99
Notice how the order is specified in each case and the output. We can
also see that even though xval
is a float specified to 5 decimal
places, the Python interpreter has only specified one place. The next
section shows how to control that.
Control characters¶
We can control several aspects of spacing and decimal values using
control characters. These are also specified in the curly
brackets, but follow a colon :
. Consider:
import numpy as np print("PI is approx: {}".format(np.pi)) print("PI is approx: {:.3f}".format(np.pi)) print("PI is approx: {:.7f}".format(np.pi)) print("PI is approx: {:.25f}".format(np.pi))
which produces
PI is approx: 3.14159265359 PI is approx: 3.142 PI is approx: 3.1415927 PI is approx: 3.1415926535897931159979635
Thus, the :f
specifies that the value is to be treated as a
float
, and one can also specify the number of decimal places, such
as 7 with :0.7f
or :.7f
. Note that the output is rounded to
that value, not just truncated. As further examples, consider:
xval, yval = 45.80000, -99 print("first = {0}, last = {1}".format(xval, yval)) print("first = {0:f}, last = {1:f}".format(xval, yval)) print("first = {0:0.8f}, last = {1:f}".format(xval, yval)) print("first = {0:0.8f}, last = {1:0.8f}".format(xval, yval)) print("first = {0:e}, last = {1:e}".format(xval, yval)) print("first = {0:0.8e}, last = {1:0.8e}".format(xval, yval))
which produces
first = 45.8, last = -99 first = 45.800000, last = -99.000000 first = 45.80000000, last = -99.000000 first = 45.80000000, last = -99.00000000 first = 4.580000e+01, last = -9.900000e+01 first = 4.58000000e+01, last = -9.90000000e+01
Thus, the :f
specifies that the value is to be treated as a
float
, and one can also specify the number of decimal places, such
as 8 with :0.8f
or :.8f
.
The :e
specifies "exponentiated" representation, and also takes an
argument for a number of decimal places to include.
The number to the left of the decimal specifies how many spaces should be placed to the left of a decimal point. One can use this to align numbers at a decimal point. For example, consider the two outputs in this case with/without using this:
C = np.array([-18.5, 300.1234, 0.1, 99.9999999]) N = len(C) print("Without 'left' spacing:") for i in range(N): print("val [{0}] --> {1}".format(i, C[i])) print("\nWith 'left' spacing") for i in range(N): print("val [{0}] --> {1:15.8f}".format(i, C[i]))
which produces
Without 'left' spacing: val [0] --> -18.5 val [1] --> 300.1234 val [2] --> 0.1 val [3] --> 99.9999999 With 'left' spacing val [0] --> -18.50000000 val [1] --> 300.12340000 val [2] --> 0.10000000 val [3] --> 99.99999990
table to be filled in
Control character
description
f
floating point number
e
scientific notation
d
integer
s
string
Whitespace and escape characters¶
Spacing can be controlled in several ways. The following are all examples of white space:
print("Whitespace example with all spaces inserted") print("Whitespace example with 2 \t\t tabs inserted") print("Whitespace example with a \n newline char inserted")Whitespace example with all spaces inserted Whitespace example with 2 tabs inserted Whitespace example with a newline char inserted
Note that \t
and \n
are actually treated as a single
character. You can see this by checking the length of a string:
print(len("abc d")) print(len("abc\td)) print(len("abc\nd"))
which is 5 in each case. The backslash \
in this (and most)
contexts is an escape character that alters the typical
interpretation of the character following it. Thus, abc\td
has
different interpretation than abctd
; we say that \t
is an
escape sequence (typically just the escape and the character
following it).
Sometimes the escape character \
is used to make a "normal"
character signify something else (such as \t
, and sometimes it is
used to escape the behavior of a "special" character. As an example
of the latter, consider what the following prints as:
print("The backslash looks like: \")
It actually leads to a syntax error, because Python wants to interpret
the slash as escaping whatever follows it, and the second quotation
marks "
are excaped, and don't pair up to close the string
anymore. One can actually use the escape character itself to escape
the escape character's escaping behavior:
print("The backslash looks like: \\")
Escapade successful.
Whitespace character
description
' '
space
'\t'
tab character
'\n'
newline character
Indexing and slicing strings¶
As we have seen, a string is a sequence made up of one or more characters which can be letters, numbers, symbols, and white spaces. In this section we discuss how to access individual characters in a string and how we can extract a substring from a given string.
String indexing¶
Just like for arrays, string characters can be indexed and each character corresponds to an index number starting from 0. For the string Next Einstein, the possible index numbers are 0 for the first letter N through to 12 for the last letter n.
N
e
x
t
E
i
n
s
t
e
i
n
0
1
2
3
4
5
6
7
8
9
10
11
12
This particular string has 13 characters (including the white space character) representing the length on the string. Then len() function can be used to determine the length of a string:
len('Next Einstein')
The answer is 13.
Given a string S, the command S[i] will return the character at index position i. For instance:
print('Next Einstein'[0])
print('Next Einstein'[9])
print('Next Einstein'[12])
will return:
N
s
n
Characters can also be accessed by negative index starting from -1 as follows.
N
e
x
t
E
i
n
s
t
e
i
n
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
By using negetive index, we can extract the last character of a string S with the command S[-1], which is more handy than the command S[len(s) -1] which does the same job.
String slicing¶
Sometimes, we want to extract a substring or a range of characters from a string S. For instance extracting the first 4 characters of the string Next Einstein will return Next
N
e
x
t
E
i
n
s
t
e
i
n
0
1
2
3
and extracting every second characters starting between the fourth and the eleventh character will yield tEnt
N
e
x
t
E
i
n
s
t
e
i
n
3
5
7
9
Translating into Python commands, we get:
'Next Einstein'[:4]
'Next Einstein'[3:10:2]
Which return Next and tEnt.
In general the Python string slicing syntax is given by:
string_to_slice[start_pos:end_pos:step]
The slicing begins at the start_pos index (included) and stops at end_pos index (excluded). The step parameter is used to specify the steps to take from start to end index. If the step is not specified, the default step of 1 is applied. Hence then command:
string_to_slice[start_pos:end_pos]
will return a substring between the start_pos index (included) and the end_pos index (excluded): For instance:
'Next Einstein'[6:11]
will yield inste
If the start_pos index is not specified, the default start_pos index 0 is applied. For instance:
'Next Einstein'[ :10:2]
will yield Nx is. That is taking every second character (step = 2) from the beginning of the string (default start_pos index is 0) to end_pos 10 ( 10 excluded)
On the other hand, if the end_pos ondex is not specified, the default is the end of the string. For instance:
'Next Einstein'[5 : : 3]
will yield Esi. That is taking every third character (step = 3) starting at start_pos index 5 (included) to the end of the string (default end_pos index is the end of the string)
Exercise: What will the the output of the following commands:
'Next Einstein'[ : ]
'Next Einstein'[ : : ]
Try and explain your answer.
A negative step means that we start counting from the end of the string. For instance we can reverse a string using slicing by providing the step value as -1:
'Next Einstein'[ : : -1 ]
The output is nietsniE txeN
Practice¶
Write a program that inputs an email address and returns the username and the domain name
What is the output of the following lines of code:
print('Days of the week'[4:12:3]) print('Days of the week'[9]) print('Days of the week'[-7]) print('I love python programming'[:8]) print('I love python programming'[4:]) print('you love python programming'[::3]) print('You love python programming'[-8:-2]) print('I love python programming'[:-5])
Give the command to extract the word python from the string I love python programming.
Give the command to extract the substring I love python from the string I love python programming.
Give the command to extract every fifth characters form the string I love python programming starting at the third character.