:tocdepth: 2

.. _str_and_format:
 
****************************
Strings, II, and formatting
****************************

.. contents:: :local:

.. highlight:: python

We introduced strings very briefly :ref:`earlier on
<comm_str_print_str>`.  We now take a deeper look at the type/class,
its methods, and how to combine them more usefully with data for
display/visualization.

String type
=============

The string type is another ordered collection in Python (we have
already seen :ref:`arrays <arrays>`).  The length ``N = len(STRING)``
of a given string will be important to consider, stating how many
elements are in it.  Each element of a string is of a single type,
commonly called a **character** (or **char**), but it is also really
just a string itself with :math:`N=1` elements.

From the above, we should be unsurprised to see the following displays
of string length, type and values::

  print("len of S  :", len(S))
  print("type of S :", type(S).__name__)
  print("val of elements 0 and 3  :", S[0], S[3])
  print("type of elements 0 and 3 :", type(S[0]).__name__, type(S[3]).__name__)

\.\.\. produce:

.. code-block:: none

   len of S  : 13
   type of S : str
   val of elements 0 and 3  : W k
   type of elements 0 and 3 : str str


.. NTS: move this somewhere else: 

    .. note:: When simply displaying type with ``type('abc')``, one
              obtains the simple ``str`` output.  However, putting the
              same expression *inside* a print function to display it
              leads to a lot of other text being shown. To avoid this, we
              have used the ``__name__`` method to just display the name
              of the type; thus compare the output of::

                print(type('abc'))
                print(type('abc').__name__)

              \.\.\. which is:

              .. code-block:: none

                <class 'str'>
                str

We have already explored :ref:`indexing <indices>` and :ref:`slicing
<index_slice>` rules in strings.  (And these syntaxes also apply to
arrays and other ordered collections.)

Capitalization matters in strings: uppercase and lowercase characters
are not equal.  Evaluating::

  'A' == 'a' 

\.\.\. produces ``False``.  And Python knows the difference between a
string '5' and an int 5, so that::

  '5' == 5

\.\.\. produces ``False``, as well.  

There is a special kind of string called the **null string** (or
**empty string**), which has the property of having no characters,
just an "open" quote followed immediately by the "close" one.  As
such, its length::

  i_am_a_null_str = ''
  print(len(i_am_a_null_str))

\.\.\. is ``0``.  This is an interesting case because ``bool('')``
evaluates to ``False``, as if it were the string equivalent of 0 in
the int type (hence the name "null string").  Any other string within
the ``bool`` type conversion, such as ``bool('Hi')``, will evaluate to
``True``.

.. container:: qpractice

   **Q:** How many lines are printed in the following?

   .. code-block:: Python
      :linenos:

      s1 = '''Abc'''
      if len(s1) < 5 :
          print("Am I printed?")

   .. hidden-code-block:: Python
      :label: + show/hide response

      # One line is printed.  The open/close quotes are not included in the 
      # length of the string, so ``s1`` has only 3 chars.

   **Q:** How many lines are printed in the following?

   .. code-block:: Python
      :linenos:

      s2 = ''
      if s2 :
          print("Am I printed?")

   .. hidden-code-block:: Python
      :label: + show/hide response

      # No lines are printed.  Recall that the statement in an if-condition is
      # evaluated as if there were a ``bool(...)`` type conversion surrounding it.
      # Since we just learned that ``bool('')`` is False, then nothing within
      # that "if" branch would be evaluated.  

      # This is actually a somewhat commonly used property of strings, to check 
      # if they have any chars. An alternative would be to check if 
      # ``len(STRING) > 0``, but the above is briefer.


However, while we can use indices to select string elements, we cannot
simply reassign string elements, like we could with arrays.  Thus,
trying the following::

  S[0] = 'T' 

\.\.\. leads to an error::

   <ipython-input-349-d9a85c67a48e> in <module>
   ----> 1 S[0] = 'T'

   TypeError: 'str' object does not support item assignment

In Python strings are an **unmutable** ordered collection, which
literally means their elements cannot be reassigned while the rest of
the object is unchanged.  This is different a different situation than
with arrays, which are a *mutable* kind of collection.  However, we
will see `below <str_and_format_meth>` that we *can* effectively
change string elements using some of the class's methods.

.. _str_and_format_wspace:

Whitespace and special characters
=================================

As seen above when looking at the string ``S``, the space :samp:`\ `
|nbsp| character is just a normal string element. But it is also a
member of a particular group of characters called **whitespace**,
which also includes tabulation (tabs) and newlines.  As the name
implies, these characters are generally used to provide spaces or
breaks between other characters.  These breaks are useful for people
visualizing text, data output, etc.; they provide spaces between
words, vertical alignment and more.  Here is a list of whitespace
characters in Python:

   .. list-table::
      :header-rows: 1
      :widths: 15 20 65

      * - Whitespace character 
        - Name
        - Description
      * - :samp:`\ ` |nbsp|
        - space
        - move cursor one space to the right
      * - ``\t``
        - tab 
        - tabulation moves the cursor to the next predefined
          "tabulation stop" that are evenly spaced across a line
      * - ``\n``
        - newline 
        - move cursor to start of next line

.. NTS: there doesn't actually appear to be a vertical tab anymore, at
   least not in Py 3.7 in the jup notebooks.  But there is in
   iPython...  investigate!

      * - ``\v``
        - vertical tab
        - move cursor to next line and advance to same horizontal
          position that was just left

You might notice that multiple whitespace characters are written as
``\`` with a following letter.  This is a syntax in programming, where
the ``\`` is called an **escape character**: it leads Python to alter
the interpretation of one or more characters following it.  Thus,
``abc\td`` has different interpretation than ``abctd``; we say that
``\t`` is an **escape sequence**, which is typically just the escape
and the character following it (though, again, some escape sequences
are longer).  Having escape sequences basically expands the kinds of
things that can be put into a string beyond just letters and numbers
(most programming languages have such a syntax, and ``\`` is often the
escape character).  Here are various whitespace examples in action:

.. code-block::  python

   print("Whitespace example with only spaces inserted")
   print("Whitespace example with 2\t\ttabs inserted")
   print("Whitespace example with a\nnewline char inserted")

\.\.\. which evaluate to:

.. code-block::  none

   Whitespace example with only spaces inserted
   Whitespace example with 2		tabs inserted
   Whitespace example with a
   newline char inserted

Note that ``\t`` and ``\n`` are each treated as a *single* character,
and also that these don't need to be separated by spaces or anything
to be recognized.  You can see this by checking the length of the
following strings:

.. code-block::  python

   print(len("abc d"))
   print(len("abc\td))
   print(len("abc\nd"))

which is ``5`` in each case.  Also note that tabulation does not
insert a set amount of whitespace.  Each text editor has evenly spaced
locations across a page, and the tabulation makes the cursor "jump" to
the next one that appears to the right.  This tab spacing varies
across both text editor programs and computers.  Try the following
example on your system::

  print("abc\tdefghijklm")
  print("abcde\tfghijklm")
  print("abcdefg\thijklm")
  print("abcdefghi\tjklm")
  print("abcdefghijk\tlm")

On our computer, this looks like:

.. code-block:: none

   abc     defghijklm
   abcde   fghijklm
   abcdefg hijklm
   abcdefghi       jklm
   abcdefghijk     lm

.. note:: Using tabulation can be a bit unstable if you run code on
          different systems, because the size of tabs can vary.  Tabs
          can be a quick way to insert vertically-aligned space, but
          we will see some more broadly stable approaches, below, with
          string formatting.

Above, the escape character ``\`` is used to make a "normal" character
signify something else, such as ``\t`` to mean "tab" instead of the
letter "t".  However, the same syntax is also used to escape the
behavior of a "special" character in order to make "normal."  One
example of this being useful is to include a ``"`` character inside a
string whose open/close quotes are also ``"``, without needing to
change the open/close quotes themselves::

  print("The \" is my favorite character.")

\.\.\. which successfully prints::

  The " is my favorite character.

.. container:: qpractice

   **Q:** What is another way successfully print ``"`` within the
   above string?

   .. hidden-code-block:: python
      :linenos:
      :label: + show/hide code

      # Use one of the other available open/close quote pairs (mentioned in
      # the first section describing strings).  Any of the following work:
      print('The " is my favorite character.')
      print('''The " is my favorite character.''')
      print("""The " is my favorite character.""")


However, we might also accidentally escape a character's behavior that
we did not want to modify.  Consider the following, trying to display
the ``\`` itself::

.. code-block::  python

   print("The backslash looks like: \")

This produces a syntax error:

.. code-block::  none

     File "<ipython-input-350-536484bd8797>", line 1
       print("The backslash looks like: \")
                                           ^
   SyntaxError: EOL while scanning string literal

\.\.\. where the Python interpreter points to a spot *after* the
actual problem (the acronym **EOL** is for "end of line").  This
happens because by default Python interprets the backslash as escaping
whatever follows it, which in this case is the the second quotation
marks ``"``: Python no longer recognizes those quotes as closing the
string, and keeps searching further in that line.  When the EOL is
reached before a close of the string, Python complains.  To resolve
this, one can actually use the escape character itself in order to
escape the escape character's escaping behavior (read that twice!):

.. code-block::  python

   print("The backslash looks like: \\")

\.\.\. which produces the appropriate output::

  The backslash looks like: \

Escape successful!  Note that writing ``\\`` leads to only one
backslash being printed, because the first one is on escaping duty.

This ``\\`` is one example of a non-whitespace escape sequence.  Some
other potentially useful ones are:

.. list-table::
   :header-rows: 1
   :widths: 15 20 65

   * - Escape sequence
     - Name
     - Description
   * - ``\\``
     - backslash
     - display single backslash char (escape the escaper)
   * - ``\'``, ``\"``
     - single (or double) quote
     - display single (or double) quote, and do not interpret it as
       a string start or finish
   * - ``\{``, ``\}``
     - left (or right) curly bracket 
     - display left (or right) curly bracket; we will see below why
       ``{ }`` is special in "string formatting"
   * - ``\b`` 
     - backspace
     - delete preceding char (putting several in a row will
       not remove other backspace chars, but will remove
       that number of other preceding chars)

.. container:: qpractice

   | **Q:** How would you print the following string? 
   | ``The tab character looks like: \t.``

   .. hidden-code-block:: python
      :linenos:
      :label: + show/hide code

      # Have to escape the slash:
      print("The tab character looks like: \\t.")

.. _str_and_format_ops:

String operators
=================

The behavior of ``+`` and ``*`` with strings was discussed
:ref:`earlier <comm_str_print_str_ops>`.  

Briefly, ``+`` concatenates two strings, and can be part of a series
expression::

  print("Hello" + "And" + "Goodbye ")

\.\.\. outputs:  ``HelloAndGoodbye``.

And ``*`` can operate on a string together with an int, producing a
new string that is that integer number of copies of the original
string concatenated::

  print("Xyz" * 3) 
  
\.\.\. outputs: ``XyzXyzXyz``.  

.. container:: qpractice

   | **Q:** These operators can also be combined in expressions.  How
     might one compactly make the following string?
   | ``Bafana Bafana and Banyana Banyana each drew their matches.``

   .. hidden-code-block:: python
      
      s = 'Bafana '*2 + 'and ' + 'Banyana '*2 + 'each drew their matches.'
      print(s)
   

.. _str_and_format_meth:

String methods (examples)
==========================

As we have seen in Python, each type generally comes with a host of
useful methods and attributes.  Here we just touch on the variety of
string-specific functionalities that are built-in to Python.  Each of
the string methods will output new objects, rather than operating
in-place (because the string type is immutable), but it is still a
good habit to double check the helps when first becoming acquainted
with a method.

Note that Python itself typically doesn't distinguish between
individual characters whether letter, number, whitespace or another
symbol.  We can see this in some of the string methods.  However,
because *humans* often do, there are several string methods that can
help to identifying words and other "lexical" properties.

You can find out more comprehensively about the available string
methods with ``help(str)``, of which we just highlight some useful
ones here:

**capitalize**

.. code-block:: none

   |  capitalize(self, /)
   |      Return a capitalized version of the string.
   |      
   |      More specifically, make the first character have upper case and the rest lower
   |      case.
   |  

This method takes no arguments (:ref:`recall <methods_use>`: the
``self`` means that the object it is attached to is essentially an
input, but we don't name it again when *using* the method), but it
just capitalizes the first character in a string (and :

.. code-block:: python

  print('antananarivo'.capitalize())
  print("MADAGASCAR".capitalize())
  print("tsingy de BEMARAHA".capitalize())
  print(" zebu".capitalize())

\.\.\. produces:

.. code-block:: none

   Antananarivo
   Madagascar
   Tsingy de bemaraha
    zebu

You can see that these results match the description: Python doesn't
know about whitespace-separated words within the string to capitalize.
And if the first character is whitespace, it doesn't adjust try to
find the first non-whitespace character.  

**count**

.. code-block:: none

   |  count(...)
   |      S.count(sub[, start[, end]]) -> int
   |      
   |      Return the number of non-overlapping occurrences of substring sub in
   |      string S[start:end].  Optional arguments start and end are
   |      interpreted as in slice notation.
   |  

This method takes one required argument, ``sub``, which is the
"sub"string to search for within the main string.  It can then take up
to two optional args, to control the starting and ending points of the
search::

  place = '''Ouagadougou, Burkina Faso'''
  place.count('ou')

\.\.\. outputs: ``2`` (noting again that Python distinguishes between
uppercase and lowercase).  It is easy to forget the string-quotes on
``ou``, which would lead to the following error:

.. code-block:: none

   ----> 2 place.count(ou)

   NameError: name 'ou' is not defined

\.\.\. as Python tried to interpret ``ou`` as a variable name (which
would be fine *if* you had defined it as one previously).  

Using the options, one has::

  print(place.count('ou', 7))     # search from index [7] to the end
  print(place.count('ou', 7, 8))  # search from index [7, 8)]

\.\.\. which produces ``1`` and ``0`` respectively (the second case
defines a preeetttty narrow subset of the initial string to search
within).

**find**

.. code-block:: none

   |  find(...)
   |      S.find(sub[, start[, end]]) -> int
   |      
   |      Return the lowest index in S where substring sub is found,
   |      such that sub is contained within S[start:end].  Optional
   |      arguments start and end are interpreted as in slice notation.
   |      
   |      Return -1 on failure.
   |  

This method works quite similarly to the ``count()`` method, above,
but instead of returning how many times ``sub`` appears, it returns
the "lowest index in S where substring sub is found": that is, the
first place in the string that ``sub`` is found.

We note that one has to be a little careful about how we interpret the
output, because Python recognizes ``-1`` as a valid index. Consider
using our variable from above; we could use ``find()`` and print the
character at the returned index as follows, to verify that the method
works::

  ii = place.find('B')
  print(place[ii])

And this outputs ``B``, so that is fine.  But what happens if we try
searching of a different letter, which isn't there?  Well, the output
of the following::

  ii = place.find('Z') 
  print(place[ii])

\.\.\. is ``o``, so it appears that Python is wrong (!).  However, if
we print the index, as well, we can see what is happening::

  ii = place.find('Z') 
  print(ii)
  print(place[ii])

\.\.\. because now we recognize that the index is ``-1``, which the
docstring tells us to interpret as "not found", and we see that the
``o`` appears because ``place[-1]`` picks out the last character.

.. _str_and_format_meth_split:

**split**

.. code-block:: none

   |  split(self, /, sep=None, maxsplit=-1)
   |      Return a list of the words in the string, using sep as the delimiter string.
   |      
   |      sep
   |        The delimiter according which to split the string.
   |        None (the default value) means split according to any whitespace,
   |        and discard empty strings from the result.
   |      maxsplit
   |        Maximum number of splits to do.
   |        -1 (the default value) means no limit.
   |  

This is a really useful method for helping us to identify words in a
string that contains a line of text.  This method will break up a
string into a list (another ordered collection we will discuss
:ref:`shortly <list_comp>`) of strings, with the split happening at a
particular "separator" ``sep``; default sep(arator) is any whitespace.
So, consider the following::

  sentence = '''Yamoussoukro:\t the political capital of Cote d'Ivoire.'''
  print(sentence)
  print(sentence.split())

\.\.\. whose output is::

  Yamoussoukro:	 the political capital of Cote d'Ivoire.
  ['Yamoussoukro:', 'the', 'political', 'capital', 'of', 'Cote', "d'Ivoire."]

Note that in each new string in the list, there is no whitespace.  We
could choose to split the string at, say, only *spaces* :samp:`\ `
|nbsp| or, say, at ``'it'``::

  print(sentence.split(' '))         # specify sep as arg by position
  print(sentence.split(sep='it'))    # specify sep as kwarg

\.\.\. which outputs::

  ['Yamoussoukro:\t', 'the', 'political', 'capital', 'of', 'Cote', "d'Ivoire."]
  ['Yamoussoukro:\t the pol', 'ical cap', "al of Cote d'Ivoire."]

.. note:: There are *many* more methods for strings, and it is worth
          exploring them.  The above gives a sampling, and some
          demonstrations of translating the docstring sections for
          each.  The Practice Problems below include more, as well.

.. _str_and_format_formatting:

String formatting: Place data into strings
===========================================

Often we want to display results of our work at several points in a
program: in the middle of calculations to check the code's progress
and/or to see that intermediate results are running; at the end to
display the final results. Additionally, we might want to make labels
for plotting, or save data to an output file for further use. To do
this, we have to fully understand *what* we are outputting (the type,
length/shape of object, etc.).  Then decisions have to be made like
how many decimal places to display, how to include text describing
what each number is in many cases, how to make aligned columns of
results, etc.  This all comes under the category of **string
formatting**: inserting the quantities of interest into strings and
specifying what display properties it should have.

Basic format method
---------------------------

Up until this point, we have used ``print()`` to display strings and
other types very simply (as described :ref:`here
<comm_str_print_print>`), by separating each item with commas:

  .. code-block::  python

     print("Finished.")
     print("x =", x)
     print("Avec =", A, "and Bvec =", B)

etc.  We now look at more interesting ways to insert data into strings
and format the results.

There are different methods and styles for performing this kind of
operation in Python, but we will primarily use the modern, built-in
string ``format()`` method". From scrolling down/searching the
``help(str)`` docstring, we see that this method takes both positional
and keyword arguments:

.. code-block:: none

   ...
   format(...)
     S.format(*args, **kwargs) -> str

     Return a formatted version of S, using substitutions from args and kwargs.
     The substitutions are identified by braces ('{' and '}').
   ...

The basic approach for this is to write a string with ``{}``
positioned as a placeholder each time we will want to insert a value
somewhere, and then we provide the values themselves as arguments.
The values can be either variables or expressions to be evaluated.
For example:

.. code-block:: python
   :linenos:

   x = 5
   print("x = {}".format(x))

   y = -15.5
   print("If y = {}, then five minus it = {}".format(y, 5-y))

   z = "Xhosa"
   print("The first two letters of '{}' are {}.".format(z, z[:2]))

   Avec = np.arange(-2, 2)
   Bvec = np.ones(2, dtype=bool)
   print("\nAvec = {} and Bvec = {}".format(Avec, Bvec))

produces:

.. code-block::  none

   x = 5
   If y = -15.5, then five minus it = 20.5
   The first two letters of 'Xhosa' are Xh.

   Avec = [-2 -1  0  1] and Bvec = [ True  True]

In this usage, if we have *N* values to insert, we will reserve *N*
spaces in string with curly brackets ``{}`` and also have *N*,
comma-separated values within the ``format(...)`` method.  And note
from the example of printing arrays, the above works well for short
arrays, but for longer ones, we might want to make a loop display
indices:

.. code-block:: python
   :linenos:
      
   for idx in range( len(Avec) ):
       print("[{}]th value is : {}".format(idx, Avec[idx]))

\.\.\. produces:

.. code-block:: none

  [0]th value is : -2
  [1]th value is : -1
  [2]th value is : 0
  [3]th value is : 1

This provides a nicer way to view large arrays, lists or other ordered
collections.  In the next section we will see more options to format
this even more precisely, such as controlling vertical alignment,
spacing and more.

.. note:: Here and in discussion below, we generally print the strings
          that are being formatted. This is just because we want to
          quickly display the results.  However, in each case, we
          could alternatively save the results to a variable with
          assignment, such as::

            z   = "Xhosa"
            var = "The first two letters of '{}' are {}.".format(z, z[:2])

          \.\.\. and use it further in some way.  We mention this so
          there is no misapprehension that string formatting only
          occurs inside ``print()``-- instead, it is quite general and
          useful.

.. _str_and_format_formspec_str:

Format specifiers: string type
-------------------------------

Beyond just displaying a variable, we can control several aspects of
spacing, alignment and decimal output (where appropriate) using
**format specifiers**. These are placed within each pair of curly
brackets whose contents you want to format.  The specifier starts with
a colon ``:``, can then contain some options, and typically ends with
the type.  We first look at options for the string type, which is
denoted by ending the format specifier with an ``s``.

One of the most common formatting options for strings is to create a
window of a certain width (in terms of number of characters), into
which the string value is inserted, and the next characters appear
after that window.  If the string length is greater than the window's
width, then it is effectively as if the window weren't specified
Additionally, one can specify whether to use left (``<``), right
(``>``) or center (``^``) **justification** for the string within that
window.  Consider the following::
  
  ss = 'abc'
  print("0123456789|")            # just making a reference "count" of spaces
  print("{:10s}|".format(ss))     # window = 10 spaces, str = def loc (left)
  print("{:<10s}|".format(ss))    # window = 10 spaces, str = left loc
  print("{:^10s}|".format(ss))    # window = 10 spaces, str = center loc
  print("{:>10s}|".format(ss))    # window = 10 spaces, str = right loc

\.\.\. which produces:

.. code-block:: none

   0123456789|
   abc       |
   abc       |
      abc    |
          abc|

The vertical ``|`` shows where the edge of the specified 10 char
window is in each case.  Note how the center justification is "as
central as possible", depending on the relative even/oddness of the
window and inserted string.

When specifying justification, you can also choose to fill the spaces
in the window with a particular character, such as in the following::

  print("{:.<10s}|".format(ss))    
  print("{:-^10s}|".format(ss))    
  print("{:z>10s}|".format(ss))    

\.\.\. which produces:

.. code-block:: none

   abc.......|
   ---abc----|
   zzzzzzzabc|


.. _str_and_format_formspec_num:

Format specifiers: numerical type
-----------------------------------

Python has a lot of numerical types, so perhaps we shouldn't be
surprised that there are also several numerical specifier types (and
here we only consider scalar/non-collection ones).  Some of the more
common ones include:

  .. list-table::
     :header-rows: 1
     :widths: 10 50

     * - Type 
       - Description
     * - ``b``
       - binary
     * - ``d``
       - integer
     * - ``e``, ``E``
       - scientific notation float, with "e" or "E", respectively
     * - ``f``
       - "fixed point" float (i.e., standard decimal notation)
     * - ``g``, ``G``
       - "general" format: if the value is within a couple magnitudes
         of 0, display as "fixed point"; else, using scientific
         notation
     * - ``%``
       - convert to percent (multiplies by 100) and display with ``%``
         at end

.. note:: Mixing specifier and variable types between string and
          numerical types (e.g., trying to put an int into a ``:s``
          specifier, or a str into a ``:f`` specifier) produces an
          error.

          Within numerical types, trying to put an float into ``:d``
          fails, but putting an int into ``:f`` is allowed.

So::

  dd = 45 
  print("{:d}".format(dd))  
  print("{:e}".format(dd))  
  print("{:E}".format(dd))  
  print("{:f}".format(dd))  
  print("{:%}".format(dd))

\.\.\. produces:

.. code-block:: none

   45
   4.500000e+01
   4.500000E+01
   45.000000
   4500.000000%

Python appears to default to a precision of 6 decimal places across
all relevant specifier types. This can be controlled by using
``.NUMBER`` in the specifier as follows::

  print("PI is approx: {}".format(np.pi))         # def with no specifier
  print("PI is approx: {:f}".format(np.pi))       # def for fixed point spec
  print("PI is approx: {:.2f}".format(np.pi))     # 2 decimals
  print("PI is approx: {:.25f}".format(np.pi))    # 25 decimals

Note that the displayed output is *rounded*, not just truncated:

.. code-block:: none

   PI is approx: 3.141592653589793
   PI is approx: 3.141593
   PI is approx: 3.14
   PI is approx: 3.1415926535897931159979635

The window and justification syntax from the :ref:`above string
formatting <str_and_format_formspec_str>` applies equivalently to
numerical types, and it can be combined with the precision
specification, too::

  ff = 0.123 
  print("{:10f}|".format(ff))    
  print("{:<10f}|".format(ff))   
  print("{:^10.3f}|".format(ff))  
  print("{:>10.0f}|".format(ff))  

\.\.\. produces:

.. code-block:: none

     0.123000|
   0.123000  |
     0.123   |
            0|

Though we see that for numerical types, values are displayed with
*right* justification by default (instead of *left*, for strings).

.. _str_and_format_formspec_prac_ex:

Format specifiers: practical example
-------------------------------------

The above is fun, sure, by why would we need to spend all this time
thinking about windows for variables and numbers of decimal places?  

Well, when we want to display data, vertical alignment and horizontal
displacement can make a bit difference in being able to quickly (and
accurately) assess output.  Consider the following example, where we
display an array:

.. code-block::  python

   C = np.array([-18.5, 300.1234, 0.1, 99.9999999, 25, 16, 91, 0.032, -14.4, 0, 1, 56])
   N = len(C)
   for i in range(N):
       print("val [{}] --> {}".format(i, C[i]))

This produces:

.. code-block::  none

   val [0] --> -18.5
   val [1] --> 300.1234
   val [2] --> 0.1
   val [3] --> 99.9999999
   val [4] --> 25.0
   val [5] --> 16.0
   val [6] --> 91.0
   val [7] --> 0.032
   val [8] --> -14.4
   val [9] --> 0.0
   val [10] --> 1.0
   val [11] --> 56.0

Because the array elements have no vertical alignment, it's hard to
tell where the largest or smallest value might be: just because a
number has a lot of decimal places doesn't mean it's large.
Additionally, since there are more than 10 elements, the indices
change from being single to double digit, which also makes it
difficult to check the numbers.  So, let's try to fix this with format
specifiers.  There is a lot of freedom here, but consider the
following:

.. code-block::  python

   for i in range(N):
       print("val [{:2}] --> {:15.8f}".format(i, C[i]))

\.\.\. which produces:

.. code-block::  none

   val [ 0] -->    -18.50000000
   val [ 1] -->    300.12340000
   val [ 2] -->      0.10000000
   val [ 3] -->     99.99999990
   val [ 4] -->     25.00000000
   val [ 5] -->     16.00000000
   val [ 6] -->     91.00000000
   val [ 7] -->      0.03200000
   val [ 8] -->    -14.40000000
   val [ 9] -->      0.00000000
   val [10] -->      1.00000000
   val [11] -->     56.00000000

This looks much easier to assess!  Again, details will matter a lot:
we kind of have to know the largest index value to know how many
spaces it might take up; we have to know the type and relevant number
of decimal places in the array; etc.  However, once we know those
things, we can really make nice visual displays with just a couple
parameters.

Format specifiers: nesting (bonus)
-------------------------------------

As a small, extra (but fun!) point about format specifiers: the
specifier options *themselves* can be inserted, via **nested
specifiers**.  That is, consider the following::

  print("The values are:  |{:{}.{}f}|  |{:{}.{}f}|".format(1.234567, 15, 9, 9.87, 10, 5))


\.\.\. which produces:

.. code-block:: none

   The values are:  |    1.234567000|  |   9.87000|

What is happening here?  Well, by counting the curly braces, there are
6 values to insert.  How do we know the order in which values will be
placed?  Well, start from left to right; when we get a curly bracket,
that will get the first value. We then check for any inner curly
brackets, and if there are some, go left-right filling in with the
next values.  When done, continue through the string for any more
"outer" curly brackets, and repeat.  Thus, the above is equivalent
to::

  print("The values are:  |{:15.9f}|  |{:10.5f}|".format(1.234567, 9.87))

(You can verify this more precisely by setting the string currently in
each ``print()`` to a variable, and use ``==`` to check strict equality.)


Practice 
==========

#. | Print the following string:
   | ``I don't really care for ", ', ''' or """.  But \ is OK.``

#. Use the str method **replace** to:

   i. put your own name in the place of ``who?`` in this string: ``My
      name is who?``.

   #. swap ``old`` and ``new`` in the following string: ``What's old
      is now new!``.

#. Use the str method **strip** to:

   i. trim whitespace from around (but not within) the sentence in
      this str: ``\tHere is the sentence, by itself.  \n``

   #. remove the dashes from ``------this string---------``

   #. leave only the word ``center`` in: ``nnnnncenternnnnn``

   #. What is the name of the related method to strip whitespace (or
      other characters) from only the right side of the string?


#. Write a for-loop to see how many orders of magnitude from zero a
   number has to be before the ``g`` format specifier changes from
   displaying a fixed point float to using scientific notation.
   Investigate using powers of 10, with both negative and positive
   exponentials.

#. Can you draw the following picture using only one print statement?
   
   .. figure:: media/triangle.png
      :figwidth: 60%
      :align: center
      :alt: draw a triangle


.. NTS: 

   prob not include here

   #. *Bonus*.  Above in :ref:`the practical formatting example
      <str_and_format_formspec_prac_ex>` we mentioned that for displaying
      array indices with uniform spacing in general required knowing how
      many indices were there.  That isn't quite true.  Can you use
      nested looping to try to make "perfectly" display the indices (that
      is, make the window just wide enough for the largest index), for an
      array of arbitrary length?


.. |nbsp| unicode:: 0xA0 
   :trim:


.. NTS:  make a question for someone to explore with this method:

   
.. NTS: I think ordering of string variables can come later, as well
   as using variable names to define order, and dictionaries/lists.


.. from previous


   Formatting data in strings
   ---------------------------------------------------------------------------

   Above, we have specified how to place values directly into a string.
   We now discuss how to format it with various contents of the curly
   brackets ``{ }``, controlling things like spacing, alignment, number
   of decimal places and even ordering.

   Ordering of variables
   ##########################################################################

   By default, the values inserted into the string are placed by order of
   position.  If we want to, it is possible to specify indices of the
   argument positions inside the curly brackets, in order to change
   around the order of placement in the string or even to repeat values.
   Consider:

     .. code-block::  python

        xval, yval = 45.80000, -99
        print("first = {0}, last = {1}".format(xval, yval))
        print("first = {1}, last = {0}".format(xval, yval))
        print("first = {0}, again = {0}, more (??) = {0}, last = {1}".format(xval, yval))

   which produces

     .. code-block::  none

        first = 45.8, last = -99
        first = -99, last = 45.8
        first = 45.8, again = 45.8, more (??) = 45.8, last = -99

   Notice how the order is specified in each case and the output.  We can
   also see that even though ``xval`` is a float specified to 5 decimal
   places, the Python interpreter has only specified one place.  The next
   section shows how to control that.


.. NTS:  mention this somewher, yes

      + also have repr() as well as str(), useful for the string
        *computational* representation itself, not the translation of the
        string