{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "# Intro to sequences: strings, lists, and tuples" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "A *sequence type* in Python is an *ordered* collection of objects. The term 'ordered' here means that we can retrieve the first object in the sequence, and the second object, and so on.\n", "\n", "There are three main sequence types in Python: strings, lists, and tuples.\n", "\n", "- A *string* is an immutable sequence of characters\n", "- A *list* is a mutable sequence of objects (of any type)\n", "- A *tuple* is an immutable sequence of objects (of any type)\n", "\n", "The term *immutable* means that a value in the sequence **cannot** be changed directly; while *mutable* means that a value can be changed. We will see examples of this below.\n", "\n", "## Finding the length of a sequence\n", "The *len* function can be used to find the length of any sequence." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word = 'hello'\n", "len(word)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Sequence indexing\n", "\n", "Each item in a sequence has a numbered index, which begins at 0. For example, the string \"hello\" has the following indices:\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Index01234
Characterhello
\n", "\n", "\n", "You can access the item at index $i$ by typing `sequence[i]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[0] # returns the element at index 0 (i.e., the first element)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[1] # returns the element at index 1 (i.e., the second element)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "A negative index, with the value $-i$, corresponds to the $i^{th}$ element from the end." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[-1] # returns the element at index -1 (i.e., the last character)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "Using an invalid index will result in an error." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[10]" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Referencing consecutive elements of a sequence using *slicing*\n", "*Slicing* can be used to get consecutive elements (a slice) of a sequence.\n", "\n", "Slices are specified through the code\n", "``` Python\n", "sequence[start:stop:step]\n", "```\n", "\n", "where \n", "- *start* is the index where the slice begins (defaults to 0, the first element of the sequence)\n", "- *stop* is used to denote the end of the slice, but the slice stops at the index with value *stop - 1* (defaults to `len(sequence)`, which is the end of the sequence)\n", "- *step* determines the step size (or stride) between indices (defaults to one)\n", "\n", "In other words, `sequence[a:b]` will return all elements from index \n", "_a_ up to but not including index *b*.\n", "\n", "It may seem strange that elements up to but *not including* index _b_ are returned, but this is done because the length of the slice will always be *b - a*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[0:2] # get the first 2 characters (from index 0 up to but not including index 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the default value of the starting index is 0, we can also specify the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "word[:2] # get the first 2 characters (from default index 0 up to but not including index 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to get the 3 characters beginning with the 2nd character, we can use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "i = 1\n", "word[i:i+3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "scrolled": true }, "outputs": [], "source": [ "# we can use negative index values, for example to get the last 2 characters\n", "word[-2:]" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "### Exercise\n", "\n", "In the string below, use slicing with the appropriate indices to extract the word 'is'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "sentence = 'Today is a good day'\n", "sentence" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "The index of the 'd' in *day* is 16. Use a slice with this index to extract the word *day*. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sentence[16]" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "Now use a slice with a negative index to extract the word day." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists ##\n", "A *list* in Python is a sequence of objects (technically, it is a sequence of references to each element -- more on this below). We have already seen how to create lists, by including a comma separated list of elements in square brackets. Because lists are sequences, the same concepts regarding their length, indices, and slicing that apply for strings also applies." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "numbers = [7,10,13,21]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# how many numbers are in the list?\n", "len(numbers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# what are the first 2 elements?\n", "numbers[:2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# what is the last number?\n", "numbers[-1]" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "In addition, there are various *methods* that can be used on lists. For a list *l*,\n", "- `l.append(x)` adds the element 'x' to the end of the list\n", "\n", "- `l.pop(i)` removes and returns the element at index *i*; if _i_ is not specified, the last element will be removed and returned\n", "- `l.remove(x)` removes the first element with the value of *x*\n", "\n", "If *list1* and _list2_ are both lists, then `list1 + list2` will add the elements of list2 to the end of list1.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "x = [1,2,3]\n", "x.append(4)\n", "x" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Strings are immutable, while lists are not\n", "If a sequence is *immutable* then you cannot (directly) change any of its elements. Strings are immutable; trying to change an element will result in an error." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "s = 'hello'\n", "s[0] = 'H'" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "Lists are *not* immutable; so individual elements can be changed." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# create a list and then change the first element\n", "l = [1,2,3,4]\n", "l[0] = 7\n", "l" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "Although strings are immutable, you may create a new string and assign (bind) the string to a previously used variable. While this may appear to change the value of a string, it technically creates a new object (which is stored in a different location in memory)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# assign the value 'string' to 's'\n", "s = 'string'\n", "print(\"The id of the string 's' is:\", id(s))\n", "\n", "# assign the value 'strings' to 's'; this does not change the string, but rather creates a new object\n", "s = 'strings'\n", "print(\"The id of the string 's' is:\", id(s))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "## What is a list (technical answer) \n", "\n", "What happens when you have a list, and assign its value to another variable?\n", "\n", "```python\n", "list1 = [1,2,3,4]\n", "list2 = list1\n", "```\n", "Because the value of a list is a sequence of references to each of its elements, assignment of the form `list2 = list1` will assigns the sequence of references in the first list to the second list. In other words, both lists will reference the same objects in memory! This can have unintended consequences, as seen in the code below. We will also visualize this code using the Python Tutor at http://www.pythontutor.com/." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "list1 = [1,2,3,4]\n", "list2 = list1\n", "\n", "print('list1 = ', list1)\n", "print('list2 = ', list2)\n", "print()\n", "print('changing the first element of list1 changes the first element of list2!')\n", "\n", "list1[0] = 99\n", "print('list1 = ', list1)\n", "print('list2 = ', list2)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "# How to copy a list\n", "\n", "Because of the issue explained above, we must not use simple assignment when we want to copy the elements of one list to another. Instead we use the *list.copy* method\n", "\n", "```python\n", "list2 = list1.copy()\n", "```\n", "\n", "or equivalently,\n", "```python\n", "list2 = list1[:]\n", "```\n", "\n", "Technically, the operations above make *shallow* copies, where a new list object is created, and then populated with copies of the _values_ of the original list elements. Shallow copies will be sufficient for our purposes. However, note that if one of the list elements is another list (or non-primitive object), then both lists may still refer to the same object. If this is a concern, then *deep* copies, which recursively copy each element of the list, should be used (https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/)\n", "\n", "The code below illustrates how to assign a shallow copy of list1 to list2." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "list1 = [1,2,3,4]\n", "list2 = list1[:]\n", "\n", "print('list1 = ', list1)\n", "print('list2 = ', list2)\n", "print()\n", "print('following a shallow copy, changing the first element of list1 does not change the first element of list2!')\n", "\n", "list1[0] = 99\n", "print('list1 = ', list1)\n", "print('list2 = ', list2)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "## Split and join methods\n", "\n", "If *s* is a string, then \n", "\n", "```python\n", "s.split(sep)\n", "```\n", "\n", "will split *s* into multiple strings based on the delimiter _sep_, and will return a _list_ of results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "sentence = 'how are you today'\n", "sentence.split(' are ') # returns strings before and after ' are '" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "words = sentence.split() # if the separater is not specified, then the default delimiter is any whitespace character\n", "words" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "The string *join* method can be used to to combine elements from a list *l* into a single string, where each list element will be separated by _s_:\n", "\n", "```python\n", "s.join(l)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "' '.join(words) # create a string where each word in the 'words' list is separated by a space" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "'-'.join(words) # create a string where each word in the 'words' list is separated by a dash" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "**Exercise:** Use python to output the last word of the sentence" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "sentence = 'how are you today'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Tuples are like lists but are immutable ##\n", "A *tuple* is a sequence that is similar to a list but is immutable. A tuple is specified by including a comma separated list of elements in parentheses. The above notes regarding the length, indices, and slicing, also apply. In general, *lists* are usually used to store similar values where either the number of values or individual values might change; *tuples* are used to store structured data where the order of values has meaning, but different values may represent different things." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# example of a tuple storing (x,y) values\n", "p = (1,2)\n", "p" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# get the 'y' value (i.e., the second element with index 1)\n", "p[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# tuples are immutable, so we get an error if we try to change an element\n", "p[0] = 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# another example: using a tuple to store (name, age) values\n", "person = ('Bob', 20)\n", "print(person[0], 'is', person[1], 'years old.')" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Named tuples\n", "\n", "A *named tuple* is a special tuple where its elements (attributes) can be referred to by name. You first define the named tuple using the following syntax:\n", "\n", "```python\n", "TupleObject = namedtuple('typename', [attribute1, attribute2, ...]\n", "```\n", "Then create a named tuple using the following code:\n", "\n", "```python\n", "t = TupleObject(attribute1, attribute2, ...)\n", "```\n", "\n", "We can now access an attribute using the syntax\n", "\n", "```python\n", "t.attribute1\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "from collections import namedtuple\n", "Person = namedtuple('person', ['first','last','age']) # Create the named tuple\n", "\n", "fred = Person('Fred', 'Jones', 53) # Use the named tuple to describe a person\n", "\n", "# print out the named tuple\n", "print('fred = ', fred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# use the dot ('.') operator to access attributes of the tuple)\n", "print(fred.first, fred.last, 'is', fred.age, 'years old.')" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "## Getting help in Python\n", "\n", "Python has built-in *help* that documents how to use functions or methods. The *help* function has the form `help(function)` or `help(object.method)`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "help(print)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# get help on string 'split' method. Note since 'split' must be called from a string object (which has type 'str'), \n", "# we use 'str.split' in the 'help' function call\n", "help(str.split)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# alternatively, if a string exists we can use that string rather than the generic 'str'\n", "s = 'how are you?'\n", "help(s.split)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }