:orphan: :tocdepth: 2 .. _format_str: Formatting, indexing and slicing strings. =================================================== .. contents:: :local: .. highlight:: python Placing data into strings --------------------------------------------------------------------------- Often we want to display results of our work at several points in a program: in the middle of calculations to check the code's progress and/or to see that intermediate results are running; at the end to display the final results. Additionally, we might want to make labels for plotting, or save data to an output file for further use. To do this, we have to fully understand *what* we are outputting (the type, length/shape of object, etc.). Then decisions have to be made like how many decimal places to display, how to include text describing what each number is in many cases, how to make aligned columns of results, etc. This all comes under the category of **string formatting**: inserting the quantities of interest into strings and specifying what display properties it should have. Up until this point, we have been printing strings and other types very simply to display data (as described :ref:`here `), separating each item with commas: .. code-block:: python print("Finished.") print("x =", x) print("Avec =", A, " and Bvec =", B) etc. We now look at more interesting ways to insert data into strings and format the results. There are different methods and styles for performing this kind of operation in Python, but we will primarily use the modern "string format method". The help file for strings contains a ``.format()`` method with both positional and keyword arguments. Therefore for a given string ``S``, we can apply ``.format()`` as follows to obtain a new string: .. code-block:: python format(...) S.format(*args, **kwargs) -> str Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces ('{' and '}'). The basic approach for this is to write a string with place-holders for values specified using curly brackets ``{ }``, and then providing the values themselves as arguments. The values can be either variables or expressions to be evaluated. For example: .. code-block:: python x = 5 print("x = {}".format(x)) y = -15.5 print("if y = {}, then 5-y = {}".format(y, 5-y)) Avec = np.arange(3) Bvec = np.ones(2, dtype=bool) print("Avec = {} and Bvec = {}".format(Avec, Bvec)) produces: .. code-block:: none x = 5 if y = -15.5, then 5-y = 20.5 Avec = [0 1 2] and Bvec = [ True True] In general, if we have *N* values to insert, we will reserve *N* spaces in string with curly brackets ``{ }``. Formatting data in strings --------------------------------------------------------------------------- Above, we have specified how to place values directly into a string. We now discuss how to format it with various contents of the curly brackets ``{ }``, controlling things like spacing, alignment, number of decimal places and even ordering. Ordering of variables ########################################################################## By default, the values inserted into the string are placed by order of position. If we want to, it is possible to specify indices of the argument positions inside the curly brackets, in order to change around the order of placement in the string or even to repeat values. Consider: .. code-block:: python xval, yval = 45.80000, -99 print("first = {0}, last = {1}".format(xval, yval)) print("first = {1}, last = {0}".format(xval, yval)) print("first = {0}, again = {0}, more (??) = {0}, last = {1}".format(xval, yval)) which produces .. code-block:: none first = 45.8, last = -99 first = -99, last = 45.8 first = 45.8, again = 45.8, more (??) = 45.8, last = -99 Notice how the order is specified in each case and the output. We can also see that even though ``xval`` is a float specified to 5 decimal places, the Python interpreter has only specified one place. The next section shows how to control that. Control characters ########################################################################## We can control several aspects of spacing and decimal values using **control** characters. These are also specified in the curly brackets, but follow a colon ``:``. Consider: .. code-block:: python import numpy as np print("PI is approx: {}".format(np.pi)) print("PI is approx: {:.3f}".format(np.pi)) print("PI is approx: {:.7f}".format(np.pi)) print("PI is approx: {:.25f}".format(np.pi)) which produces .. code-block:: none PI is approx: 3.14159265359 PI is approx: 3.142 PI is approx: 3.1415927 PI is approx: 3.1415926535897931159979635 Thus, the ``:f`` specifies that the value is to be treated as a ``float``, and one can also specify the number of decimal places, such as 7 with ``:0.7f`` or ``:.7f``. Note that the output is *rounded* to that value, not just truncated. As further examples, consider: .. code-block:: python xval, yval = 45.80000, -99 print("first = {0}, last = {1}".format(xval, yval)) print("first = {0:f}, last = {1:f}".format(xval, yval)) print("first = {0:0.8f}, last = {1:f}".format(xval, yval)) print("first = {0:0.8f}, last = {1:0.8f}".format(xval, yval)) print("first = {0:e}, last = {1:e}".format(xval, yval)) print("first = {0:0.8e}, last = {1:0.8e}".format(xval, yval)) which produces .. code-block:: none first = 45.8, last = -99 first = 45.800000, last = -99.000000 first = 45.80000000, last = -99.000000 first = 45.80000000, last = -99.00000000 first = 4.580000e+01, last = -9.900000e+01 first = 4.58000000e+01, last = -9.90000000e+01 Thus, the ``:f`` specifies that the value is to be treated as a ``float``, and one can also specify the number of decimal places, such as 8 with ``:0.8f`` or ``:.8f``. The ``:e`` specifies "exponentiated" representation, and also takes an argument for a number of decimal places to include. The number to the left of the decimal specifies how many spaces should be placed to the left of a decimal point. One can use this to align numbers at a decimal point. For example, consider the two outputs in this case with/without using this: .. code-block:: python C = np.array([-18.5, 300.1234, 0.1, 99.9999999]) N = len(C) print("Without 'left' spacing:") for i in range(N): print("val [{0}] --> {1}".format(i, C[i])) print("\nWith 'left' spacing") for i in range(N): print("val [{0}] --> {1:15.8f}".format(i, C[i])) which produces .. code-block:: none Without 'left' spacing: val [0] --> -18.5 val [1] --> 300.1234 val [2] --> 0.1 val [3] --> 99.9999999 With 'left' spacing val [0] --> -18.50000000 val [1] --> 300.12340000 val [2] --> 0.10000000 val [3] --> 99.99999990 | **table to be filled in** .. list-table:: :header-rows: 1 :widths: 10 50 * - Control character - description * - ``f`` - floating point number * - ``e`` - scientific notation * - ``d`` - integer * - ``s`` - string .. NTS: + note that all this applies to strings, we we are just printing; also can save these things as strings to use, e.g., in labels + also have repr() as well as str(), useful for the string *computational* representation itself, not the translation of the string Whitespace and escape characters ########################################################################## .. NTS: move this section earlier Spacing can be controlled in several ways. The following are all examples of white space: .. code-block:: python print("Whitespace example with all spaces inserted") print("Whitespace example with 2 \t\t tabs inserted") print("Whitespace example with a \n newline char inserted") .. code-block:: none Whitespace example with all spaces inserted Whitespace example with 2 tabs inserted Whitespace example with a newline char inserted Note that ``\t`` and ``\n`` are actually treated as a *single* character. You can see this by checking the length of a string: .. code-block:: python print(len("abc d")) print(len("abc\td)) print(len("abc\nd")) which is 5 in each case. The backslash ``\`` in this (and most) contexts is an **escape character** that alters the typical interpretation of the character following it. Thus, ``abc\td`` has different interpretation than ``abctd``; we say that ``\t`` is an **escape sequence** (typically just the escape and the character following it). Sometimes the escape character ``\`` is used to make a "normal" character signify something else (such as ``\t``, and sometimes it is used to escape the behavior of a "special" character. As an example of the latter, consider what the following prints as: .. code-block:: python print("The backslash looks like: \") It actually leads to a syntax error, because Python wants to interpret the slash as escaping whatever follows it, and the second quotation marks ``"`` are excaped, and don't pair up to close the string anymore. One can actually use the escape character itself to escape the escape character's escaping behavior: .. code-block:: python print("The backslash looks like: \\") Escapade successful. .. list-table:: :header-rows: 1 :widths: 10 50 * - Whitespace character - description * - ``' '`` - space * - ``'\t'`` - tab character * - ``'\n'`` - newline character .. ex: .. code-block:: python # Ex. module: my_module.py # Version : 1.0 # Date : Feb. 2, 2018 xval = 10 def f_hello(): print("Hello!") :download:`prog_file_00.py ` .. list-table:: :header-rows: 1 :widths: 100 * - File **prog_file_00.py**: * - .. literalinclude:: media/prog_file_00.py :linenos: Indexing and slicing strings ---------------------------------------- As we have seen, a string is a sequence made up of one or more characters which can be letters, numbers, symbols, and white spaces. In this section we discuss how to access individual characters in a string and how we can extract a substring from a given string. String indexing ########################## Just like for arrays, string characters can be indexed and each character corresponds to an index number starting from 0. For the string **Next Einstein**, the possible index numbers are 0 for the first letter **N** through to 12 for the last letter **n**. .. list-table:: :header-rows: 1 :widths: 6 6 6 6 6 6 6 6 6 6 6 6 6 * - N - e - x - t - - E - i - n - s - t - e - i - n * - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 This particular string has 13 characters (including the white space character) representing the length on the string. Then **len()** function can be used to determine the length of a string:: len('Next Einstein') The answer is 13. Given a string **S**, the command **S[i]** will return the character at index position **i**. For instance:: print('Next Einstein'[0]) print('Next Einstein'[9]) print('Next Einstein'[12]) will return:: N s n Characters can also be accessed by negative index starting from -1 as follows. .. list-table:: :header-rows: 1 :widths: 6 6 6 6 6 6 6 6 6 6 6 6 6 * - N - e - x - t - - E - i - n - s - t - e - i - n * - -13 - -12 - -11 - -10 - -9 - -8 - -7 - -6 - -5 - -4 - -3 - -2 - -1 By using negetive index, we can extract the last character of a string **S** with the command **S[-1]**, which is more handy than the command **S[len(s) -1]** which does the same job. String slicing ############### Sometimes, we want to extract a substring or a range of characters from a string S. For instance extracting the first 4 characters of the string **Next Einstein** will return **Next** .. list-table:: :header-rows: 1 :widths: 6 6 6 6 6 6 6 6 6 6 6 6 6 * - N - e - x - t - - E - i - n - s - t - e - i - n * - 0 - 1 - 2 - 3 - - - - - - - - - and extracting every second characters starting between the fourth and the eleventh character will yield **tEnt** .. list-table:: :header-rows: 1 :widths: 6 6 6 6 6 6 6 6 6 6 6 6 6 * - N - e - x - t - - E - i - n - s - t - e - i - n * - - - - 3 - - 5 - - 7 - - 9 - - - Translating into Python commands, we get:: 'Next Einstein'[:4] 'Next Einstein'[3:10:2] Which return **Next** and **tEnt**. In general the Python string slicing syntax is given by:: string_to_slice[start_pos:end_pos:step] The slicing begins at the start_pos index (included) and stops at end_pos index (excluded). The step parameter is used to specify the steps to take from start to end index. If the *step* is not specified, the default step of 1 is applied. Hence then command:: string_to_slice[start_pos:end_pos] will return a substring between the start_pos index (included) and the end_pos index (excluded): For instance:: 'Next Einstein'[6:11] will yield **inste** If the start_pos index is not specified, the default start_pos index 0 is applied. For instance:: 'Next Einstein'[ :10:2] will yield **Nx is**. That is taking every second character (step = 2) from the beginning of the string (default start_pos index is 0) to end_pos 10 ( 10 excluded) On the other hand, if the end_pos ondex is not specified, the default is the end of the string. For instance:: 'Next Einstein'[5 : : 3] will yield **Esi**. That is taking every third character (step = 3) starting at start_pos index 5 (included) to the end of the string (default end_pos index is the end of the string) Exercise: What will the the output of the following commands:: 'Next Einstein'[ : ] 'Next Einstein'[ : : ] Try and explain your answer. A negative step means that we start counting from the end of the string. For instance we can reverse a string using slicing by providing the step value as -1:: 'Next Einstein'[ : : -1 ] The output is **nietsniE txeN** Practice ----------- #. Write a program that inputs an email address and returns the username and the domain name #. What is the output of the following lines of code:: print('Days of the week'[4:12:3]) print('Days of the week'[9]) print('Days of the week'[-7]) print('I love python programming'[:8]) print('I love python programming'[4:]) print('you love python programming'[::3]) print('You love python programming'[-8:-2]) print('I love python programming'[:-5]) #. Give the command to extract the word **python** from the string **I love python programming**. #. Give the command to extract the substring **I love python** from the string **I love python programming**. #. Give the command to extract every fifth characters form the string **I love python programming** starting at the third character.