6. Strings, comments and print: Help humans program

6.1. Strings

Another type that is very useful to have in programming is a non-mathematical one: string, which is often referred to by its abbreviation str. A string is a sequence of characters (keystrokes), and these are enclosed between a pair of either single ' or triple quotation marks (quotes) '''; one can also use pairs of " or """ to mark the start and end of the string. Each of the following is a string (and the print function is discussed below):

s1 = 'Bonjour.'
s2 = "Yes, I'm a string!"
s3 = '''It may seem funny, but I am also a string.'''
print(s1)
print(s2)
print(s3)

Note that when each of those is displayed, the single/double/triple quotes that envelope the string are not displayed as part of the string (only part of the definition syntax):

Bonjour.
Yes, I'm a string!
It may seem funny, but I am also a string.

This is because the opening/closing quotes are not considered part of the string, merely its boundary markers.

Note

If you display a string or string variable just by typing its name in the environment without using print, you will see boundary quotes, and maybe not even the specific ones you chose. For example, just entering the variable s3 from above, one sees:

'It may seem funny, but I am also a string.'

Python is displaying the full data representation (boundaries included) and doing so in the way it has stored it internally (simplifying boundaries, if possible).

You cannot mix-and-match the opening and closing quote style---whichever is used to open must also be used to close. This is because once Python detects the quotation opening a string, it will keep interpreting everything as part of that string until it finds its closing partner... and be very unhappy if it doesn't find it.

However, one can include quotation mark(s) and apostrophes as characters inside of a string, as we can see in the s2 example above. These non-boundary quotes are simply normal characters, and displayed as such. That is actually one reason why there are so many ways to signify the boundaries of a string: one can denote the boundary using a different type of quote than any interior one(s), and thus Python can correctly interpret what is a character (part of the string) and what is a boundary marker.

If we don't keep the boundary and internal quotes distinct, then we will confuse Python with "unintended" closing quotes. Note that the following will lead to an error:

s4 = 'I'm going to cause trouble!'

Python sees the first, single ' and starts interpreting what follows as characters in a string. When it gets to the apostrophe between I and m, it will view it as the closing boundary of the string (even if we did not intend it that way). What happens when Python tries to interpret the rest of the line, which has been shut out from being part of the string? Well:

  File "<ipython-input-298-f426678cc555>", line 1
    s4 = 'I'm going to cause trouble!'
            ^
SyntaxError: invalid syntax

... Python is simply unhappy. The interpreter points at the m as a point of confusion, because it appears immediately after it has closed the string, with no operator or anything in between---hence the "invalid syntax." To fix this, we could use one of the other string enclosers shown above (like we did in defining s2) at the start and stop of this string. Try rerunning the above code with any of the possible alternatives, and print the result, to verify this.

Note

Opening and closing quotes must be used when we write a string, otherwise Python will try to interpret the characters as variable names or operators. This will typically lead to syntax error(s), since the characters were not intended to be evaluated as an expression.

The output of type("some string") or type(s1) from above is str. Again, there is no difference if we change the bounding quotes; type('''some string''') is also just str.

The type conversion function for strings is str(). Thus, str(4) evaluates to '4'. Note the quotes signifying that it is a string of characters.

A useful string property to note is its length. This is the number of elements in the string, which is the number of characters in it, and there is a built-in function called len() to calculate this. For example:

len('''Hello, madame!''')

... is 14, which shows a couple important points to note:

  • the bounding quotes do not count toward the string length; they are not elements or characters in the string itself: they just define the start and finish.

  • the space does count toward the string length. Even though in writing we often use it to separate words, Python does not make this distinction---it counts as a string element to be counted. So, we should start viewing a space as just a regular character, too. (We will see other kinds of spaces in programming, later.)

When thinking about the length of strings, one might reasonably ask, What is the shortest a string can be? Can we have one of length zero? And, indeed, we can have a zero-length string, which is just written: ''. This is called the null or empty string, which just contains no characters between the opening and closing quotes. Note that it could also be written with any format of opening/closing quotes.

Finally, we note that strings are case-sensitive, meaning that 'ABC' is different than 'Abc', 'abc', etc.

Q: How does one write the null string using either of the triple-quote string boundaries? Verify that the length of what you wrote is actually zero.

+ show/hide code

Q: What is the length of each of s2 and s3 above?

+ show/hide code

Q: Store the following string in a variable and print it:
I think I'll read "Zur Elektrodynamik bewegter Koerper" today.
+ show/hide code

Q: What is displayed by the following?:

s5 = 'apple banana cabbage'
print('s5')
+ show/hide code

6.1.1. Operate on strings: + and *

We first met the + and * operators in a previous section looking at numerical types, where these each performed standard mathematical roles. We now meet them again here, and will explore how the same symbols can also do work as binary operators with strings. This may look a little odd at first, but it is a good reminder that we must always consider the type of the objects we are using in expressions, along with the operators.

Note

Operators (or functions) that change behavior when operands (or inputs) have different types are called overloaded.

There aren't an infinite number of symbols, so it is convenient to reuse some. Hopefully this is done in ways where the choice of symbol being recycled makes some logical sense at a human level. We will have to judge for ourselves!

When + is placed between two strings, as in <string 1> + <string 2>, this expression evaluates to the two strings stuck together in order: <string 1><string 2>, forming a single, new string. This operation is called concatenation (coming from the Latin word catena for "chain", so you can picture this process as joining successive links into a single, sequential chain). For instance if we have the following:

S1 = 'Good day,'
S2 = 'my dear.'
S1 + S2

... then the output will be a single string: 'Good day,my dear.'

Note that the final string does not have any space between day, and my. We might see the result as a grammatical mistake, but this is no error by Python. Instead, this is just how the string concatenation happens: the two operands get stuck together exactly as they appear, with nothing extra added. To put a space between those words, you could change one of the strings to include a white space, e.g., S1 = 'Good day, '.

We can have an expression with multiple concatenations The result of the operation can be assigned to a variable, just like any math or other expression. Thus, we can have:

first_name  = 'Nelson'
family_name = 'Mandela'
full_name   = first_name + ' ' + family_name
print(full_name)

The * can operate on one string and one int, such as <int> * <str> or <str> * <int>. What does multiplying a string by a number mean? Let's take the following example (and also take a guess of what might be a reasonable outcome of running it):

"Bafana" * 2

It evaluates to a new string: 'BafanaBafana'. That is, we get a new string that is <int> copies of <str> concatenated together (again, with no space inserted); and the order of operands doesn't matter.

Note that trying to multiply a string by a float will not work:

<ipython-input-14-678d67b56f76> in <module>
----> 1 "Bafana" * 2.5
      2

TypeError: can't multiply sequence by non-int of type 'float'

No fractal strings allowed here!

And as far as we know, no other basic mathematical symbols operate on strings. For example, Bafana / 2 produces an error, rather than cutting the string in half. But + and * are useful to keep in mind.

Q: How would you make a string of all As that has the same number of characters as the string s3, from above? Hint: you should not need to count the all the characters yourself.

+ show/hide code

Q: Make a string of all apostrophes, one for each character in the concatenation of S1 and S2.

+ show/hide code

Q: What are the value and type of the following?

1str(4.5) + str(3.5)
24.5 + str(3.5)
3str(4) * str(3)
4str(4) * 3
+ show/hide code

6.2. Print

While programming, we will often want to display intermediate and/or final results, so that we know what is happening or what the main output is. To do this, we use the print() function in Python, displaying number values and strings (and more) directly. Multiple items can be printed at once, by separating each with a comma (here and in other functions, we refer to each comma-separated item as an argument, or input):

x = 21.0
print(x)
print("something")
print("something", x)
print('The value of x is:', x, ", and that is great.")

Looking at the output of those four lines:

21.0
something
something 21.0
The value of x is: 21.0 , and that is great.

A few things to note:

  • commas do separate items, but we can also include them within a string, as in the fourth print; Python will not be confused, because it knows what is a character inside a string vs an item separator outside a string.

  • Python also knows the difference between a variable name inside a string (the name is shown) and one outside a string (the value is shown), such as for x in the fourth print.

  • the print function automatically introduces a space between the comma-separated items (unlike string concatenation, above); this is actually just default behavior we will learn to control later.

Just printing a value is fine for short programs, but as they get longer, it is often useful to have a short string describing what is being output. This reduces the ambiguity of what quantity is being displayed. Consider:

u = (12 * 34**5 - 678) / 9
v = (98 % 7 ) * 65 + 43 // 21
print("v =", v)
print("u =", u)

... which produces:

v = 2
u = 60580490.0

Note

There are a lot of fancier functionalities with printing and strings, which we will cover later on. For now, we just want to be able to print basic information pretty straightforwardly.

The arguments to print can be expressions that contain operations. Each expression is first evaluated and then just the result is printed. For example, this:

a = 10
b = 35
print("The difference b-a is:", b-a)

... outputs:

The difference b-a is: 25

Finally, we note that in moving from Python 2 to Python 3, the allowed syntax of printing changed. Python 2 did not require the parentheses in the print line (it was a print statement, not a print function). Since Python 3 enforces print as a function, the parentheses are now required.

Q: Let p = 6. Use a single print function to display its square, cube and square-root.

+ show/hide code

6.3. Comments

When programming, we are continually writing commands in Pythonese for the computer to interpret. While we might have a clear idea of what we want to calculate when we write the expression, we might forget what exactly that brilliant idea was later on. And while individual calculations might be fairly small, a program can have a lot of them combined in an infinite number of ways, and potentially become very complicated. This all makes it difficult to re-evaluate and check code, which is bad news for expanding functionality or fixing

Therefore, it is very useful to include comments in coding projects. Comments are text notes that are not evaluated by Python, but instead they exist purely to be read by humans. On any line, one starts a comment with the hash (or "pound") symbol #, and then any text to its right is concealed within the comment. Comments may start at any point within a line; any text to the left will still be evaluated.

For example, you can copy and paste the following to see that the text after # is not evaluated (which is good---try removing a comment and copy+pasting, and you will surely get errors as gets interpreted):

  • # This is an example of using a comment.
    a = 10
    b = 35  # and I can put a comment here, too!
    print("The difference 'a-b' is:", b-a)
    
  • # print("I am invisible! (It's because I'm inside a comment.)")
    print("I am *not* inside a comment.  (I will be interpreted!)")
    
  • # This is a multiline comment, because I have ever-so-many things
    # say about... what again?
    X = 10
    print("X^2 = ", X**2)
    
  • GRAVITY = 9.8   # units of m/s**2
    
    print("The numerical value of the acceleration")
    print("  due to gravity on the Earth's surface")
    print("  (in SI units) is approximately:", GRAVITY)
    

Q: What happens if a # appears inside of a string? Will everything to its right be commented out?

+ show/hide response

In summary:

  • Even though we write a block of code to calculate something specific today, we might forget what it was supposed to do tomorrow.

  • Comments help us remember those thoughts within the code.

  • Print functions can help guide our understanding of what is happening during code runtime. We can also use them to inform/warn of potential problems.

  • Strings are quite flexible, and particularly useful types for printing (and dealing with file input/output, as we will see later).

6.4. Practice

  1. To what type does each expression evaluate in the following lines:

    13+27
    23+27.5
    3str(3+27.5)
    4'''3+27.5'''
    
  2. In how many ways can you mark the start and end of a string? List them.

  3. Assign the following text to a variable and print it: Youssou N'Dour says, "Rokku Mi Rokka," doesn't he?

  4. What is the output of the following:

    1. # print(5*"hello")
      
    2. print(5*"#hello")
      
  5. If you copy+paste the following code block and evaluate it, how many values would be displayed?

    125 * 17.3
    2x = 45 + 3i
    3200 - (95 % 17) / 0.0001
    4'''sawubona'''
    

    On the off chance that not every line is displayed, how could you display the results of each evaluated expression?

  6. If x = 5, what would the output of each of the following lines of code be, as they try to express the same statement?:

    print("I am x years older than my immediate junior brother")
    print("I am" + str(x) + "years older than my immediate junior brother")
    print("I am" + x + "years older than my immediate junior brother")
    

    None of these cases probably expresses the desired output, either for programmatic or grammatical reasons. Write a print() function to express this sentence in the most readable/logical way.

  7. If s = 'hat', what would the output of each of the following lines of code be?:

    print(s+s+s+s+s)
    print(5*s)
    print(s*5)
    print(2*s+s*3)
    
  8. Let W represent a line width (number of allowed characters), and S be particular string. If we assume that S has fewer characters than the width W, how could you display S so that it was horizontally centered in the line (it could be off by 1 character, say, depending on the even/oddness of S and W)? That is, how would you include the correct number of spaces for a general S and W, so S is approximately in the middle of the line?

    For example, consider:

    W = 50
    S = "I am in the middle"
    

    Printing S in the middle of a 50 space line would look like (with a top row of '----|'*10 added to count 50 spaces and show subdivisions):

    ----|----|----|----|----|----|----|----|----|----|
                    I am in the middle