Introduction To Computer Science PDF
Document Details
Uploaded by FervidDune
ETH Zurich
2024
Manuela Fischer and Felix Friedrich
Tags
Summary
This document introduces the concept of dynamic memory and pointers in computer science. It uses a train example for illustration. It focuses on C++ code examples and discusses structs, pointers, and references.
Full Transcript
INTRODUCTION TO COMPUTER SCIENCE 252-0032, 252-0047, 252-0058 Document authors: Manuela Fischer and Felix Friedrich Department of Computer Science, ETH Zurich Fall 2024 1 Pointers and Dynamic Memory...
INTRODUCTION TO COMPUTER SCIENCE 252-0032, 252-0047, 252-0058 Document authors: Manuela Fischer and Felix Friedrich Department of Computer Science, ETH Zurich Fall 2024 1 Pointers and Dynamic Memory 176 Section 15 Pointers and Dynamic Memory In this section we introduce the concept of dynamic memory and pointers with a relatively simple toy example that might seem a bit artificial at first. Its use will become clear a bit later. Imagine you want to model a train comprising wagons. The goal is to link together an arbitrary number of wagons and model this in software. So, with what we have learned up this point. What can we do? Of course, we could use a vector of objects that model the wagons. However, we want to be a bit closer to reality and model the linking of the wagons a bit more literally. Moreover, it will turn out that vectors require dynamic memory behind the scenes and the point of this chapter is to learn about dynamic memory quite explicitly. So, let us follow a different attempt. How about modeling an object of type wag that stores the next wagon. Example 15.1 ((Non-working) first attempt) struct wagon { int id; wagon next; }; Trying to compile above example leads to the following error message field 'next' has incomplete type 'wagon' During compilation of struct wagon the compiler needs to have information about the mem- ory layout of all fields including that of wagon that is about to be defined. ??? id next Ultimately, a struct cannot have member variable of the same type (infinite recursion). A wagon would contain a wagon and that would contain a wagon and that would contain a wagon and that... So, how about using references instead? Knowing that usually store the addresses of the target object, we can try this: Pointers and Dynamic Memory 177 Example 15.2 (Another (not yet working) attempt) struct wagon { int id; wagon& next; wagon(int id) : id(id) {} }; Ok, this is better (accepted by the compiler) and leads to this kind of memory layout: id next However, when trying to create a wagon wagon w = wagon(1); we again get an error message uninitialized reference member in 'struct wagon&' References need to be initialized when declared. We should have known. The fact that references cannot be changed once they have been created renders them unusuable for what we wanted to do. We wanted to create a train and then append a wagon later. Pointers will come to our rescue. Subsection 15.1 Pointers Pointers, to be introduced next, are very similar to references. But, in constrast, they are allowed to point nowhere. wagon&: reference to a wagon wagon*: pointer to a wagon Pointers, moreover, can be modified after their creation! Given a type T, the construct T* defines a pointer type. An expression of type T* is called pointer to T. Pointers and Dynamic Memory 178 Example 15.4 (Declaring pointers) int* p =...; // pointer to an int std::string* q =...; // pointer to a std::string A pointer to T, T*, must actually point to a T. Example 15.5 (Pointer compatibility) int* p =...; std::string* q = p; // compiler error! Ok, now we know how to write a pointer type. What does it contain, what is the value of a pointer type? Her is the answer: The value of a pointer to T is the address of an object of type T. Example 15.6 (Value of a pointer) int (e.g. 5) addr addr p (e.g. 0x7ffd89d5f7cc) int* p =...; std::cout *p is of type T Note, furthermore, that T *a, *b is used in order to declare two pointers a and b to T The Null pointer A null-pointer is a special pointer value that signals that no object is pointed to. It is represented by the literal nullptr and is convertible to any pointer type T*. int* p = nullptr; The null-pointer cannot be dereferenced (undefined behavior!) Practically, dereferencing a null-pointer often leads to a runtime error. A null-pointer can be used to avoid undefined behaviour. int* p; // Accessing p is undefined behavior int* q = nullptr; // q explicitly points nowhere if (q == nullptr) { // which can be checked std::cout n notation very briefly introduced in Section 13.3: When calling r.numerator() on an object r of type rational, a pointer to rational, the address of r, is passed to the numerator function as pointer. this->n provide access to the field n behind the pointer this. Continuing the Wagon Example with Pointers Now we can continue with the wagon example and indeed chain wagons together as initially intended 1 2 3 We replace the references in our initial attempt by pointers to wagon. Example 15.10 (Wagons with pointers) struct wagon { int id; wagon* next; // initially no wagon is appended wagon(int id) : id(id), next(nullptr) {} }; Using the wagons with pointers, we can now indeed chain together a train by creating wagons first and then set next pointers as required: Pointers and Dynamic Memory 182 Example 15.11 (Producing a train) wagon w1 = wagon(1); wagon w2 = wagon(2); wagon w3 = wagon(3); w1.next = &w2; w2.next = &w3; std::cout next->id; // outputs 3 Functions with or without effects Like references, pointers can be used for functions with effect. Example 15.12 (Function with effect) void set_next(wagon* w1, wagon* w2) { w1->next = w2; } Many functions do not modify but only read the data. Again, we can use const for this. Example 15.13 (Read only function) void print(const wagon* w) { std::cout > n; assert(n >= 1); wagon first = wagon(1); wagon* last = &first; for (int i = 2; i next = new_wagon; last = new_wagon; } std::cout id; // outputs 2 std::cout next->id; // outputs 3 At this point, take your time and draw what happens here for, say, n = 4 in order to fully understand the significance of the new concept. In fact, pointers and dynamic memory lay the ground for being able to construct arbitrary data structures in memory, during runtime. 60 Every new must have a matching delete that releases the memory. We will cover this aspect a bit later. Dynamic Data Structures 185 Section 16 Dynamic Data Structures In this section we will discuss a first practically useful data structure, the linked list. As it will turn out now, the train example of Section 15 was designed to be very close in structure and therefore we are almost there already. Subsection 16.1 Linked Lists As a motivation for this chapter, consider the following problem: You have built a wall of lego bricks, like this: and now you want to get rid of the red brick. Otherwise you want to keep everyhting in order. So, the goal is this: The problem is obvious: you need to move all other bricks in order to close the gap. You have a similar situation, if you now want to insert a new black brick after the, say, purple brick. Again, you had to shift all bricks to the right. Now imagine the same kind of problem with a wall of thousands or millions of bricks. What a desaster. All that would have been so much simpler with a chain, where removing and inserting an element is easily possible without having to move too many things around. In Lego, I can build a chain as well: if I have a brick with a connector, Dynamic Data Structures 186 I can easily build a wall comprising such elements equipped with connectors, and, furthermore, I can easily open a wall at any position, and click in my connector element: A linked list is a data structure that supports such operations and is quite different from a vector. It has the following properties: It does not require contiguous area of memory. It features a simple and efficient deletion and insertion of elements. It does, however, not offer (computationally efficient) random access. Linked lists are used for the implementation of std::list, a container that features the just mentioned properties. A linked list is implemented in terms of elements that point to (are linked to) their successors, just like the train example from the previous section. 1 5 6 3 8 8 9 pointer Let us implement a linked-list. The main ingredient is a node for the list, we call it lnode. An lnode contains a slot for the user data (what we called element type for vectors) and a slot for the pointer to the next element: element (type struct lnode) 1 5 6 value (type int) next (type lnode*) We implement this using a struct as follows. struct lnode { int value; // user data Dynamic Data Structures 187 lnode* next; // next element pointer // Constructor accepting a value and a pointer to the next element lnode(int v, lnode* n): value(v), next(n) {} }; A linked list, basically, consists of a pointer to the first element. element (type struct lnode) 1 5 6 value (type int) next (type lnode*) We use a different class for representing the list, i.e. for wrapping the pointer to the first element and to offer all possible operations on a list. class our_list { lnode* head; // pointer to the first element of the list public: our_list(int size); void print()const; int size() const; void push_front(int element); void push_back(int element);... }; Note the encapsulation of the data into the class: the user does not need to know about the details of the implementation but only what it offers (the interface). We will see how to construct a list (add elements to the list) a bit later. Let us assume a list has been constructed already. How can we iterate over the list and output the value of each element? Output Here is a possible implementation of the print member function. // post: output element values of the linked list void our_list::print() const { for (lnode* n = head; n != nullptr; n = n->next){ std::cout value next; } return n->value; } What happens if i is too large? The for-loop does not terminate before n is the nullptr, yielding undefined behavior. As an exercise, you may want to change this function such that an assertion fails in such cases. Adding elements Ok, let us now build lists. Because we have a pointer to the first element of the list, it is particularly easy to add an element at the front of a list: void our_list::push_front(int e) { head = new lnode(e, head); } Time to see what exactly happens here. Assume that list.push_front(4) is called on the following list. head 1 5 6 The new expression first allocates memory for a new lnode. Then the lnode constructor is called, setting the value field of the node to 4 and the next field of the node to the content of the head-pointer, up to this point in time pointing to node with value 1. The result of this new expression is a pointer to the new element that points to the first element of the list. The result of new replaces the value of the head, yielding the following state. head 4 1 5 6 Dynamic Data Structures 189 Remark 16.1 (Caveat). Attention: if the new lnode were not allocated dynamically (with new), then it would be deleted as soon as push_front terminates resulting in undefined behavior. Constructor The constructor of our list can be implemented using push_front comfort- ably. our_list::our_list(int size) { head = nullptr; for (int i = 0; i < size; ++i){ push_front(0); } } This is how we could use our list: Example 16.1 (Using the list) our_list v = our_list(3); v.print(std::cout); // 0 0 0 Push-back The implementation of a function for appending elements at the end of a list is slightly more complicated and shows a few pitfalls that you should learn to avoid when implementing dynamic data structures. Let us start with a (wrong) implementation. Example 16.2 (A buggy implementation of push_back) void our_list::push_back(int e) { lnode* n = head; // list start for (; n->next != nullptr; n = n->next); // run to the end n->next = new lnode(e, nullptr); // append the new element } What is wrong about this implementation? (Hint: think about the case where the list is empty.) If the list is empty, then head == nullptr and therefore already the first access of n->next will fail (undefined behavior!). This is very typical for the implementation of dynamic data structures: take all possible cases into account when implementing. Think about invariants that hold (or do not hold). Here is a solution. Dynamic Data Structures 190 Example 16.3 (Correct implementation of push_back) void our_list::push_back(int e) { if (head == nullptr) { // Case 1: empty list head = new lnode(e, nullptr); } else { // Case 2: non-empty list (code as before) lnode* n = head; for (; n->next != nullptr; n = n->next); n->next = new lnode(e, nullptr); } } For those who are in love with pointers: here is an even shorted version that works with double-pointers. It is shorter but also less explicit and thus harder to understand. You can understand it, but you do not have to! Example 16.4 (Alternative compact, correct implementation of push_back) void our_list::push_back(int e) { lnode** target = &(head); for (; *target != nullptr; target = &((*target)->next)); *target = new lnode(e, nullptr); } As a short digression, let us discuss the following question: what is better, code of Exam- ple 16.3 or Example 16.4? You might argue that the more compact example is better code because it is more fancy and looks amazing. While we agree that the compact example is intriguing, we would also like to give a warning: never write code that others (or the later you) are unlikely to comprehend, unless you have very good reasons. Writing good programs is also an art of communication between developers. Trying to be clever is very often particularly unclever in programming. Example 16.4 is shown here for educational reasons and because it provides an interesting showcase of double pointers but it is not an example of good code. A more efficient version of push_back If you have a list of thousands or millions of elements, imagine what happens when push_back is executed a large number of times. Even without knowledge about software efficiency and how to measure it61 , you should be able to see that this is an operation that is expensive (in terms of number of instructions to be executed). There is a more efficient but slightly more complex approach: keep a second pointer tail in the list data structure pointing to the last element. This pointer can be used to append at the end of a list directly. 61 Something that will be covered in courses on data structures and algorithms Dynamic Data Structures 191 1 5 6 head tail Several corner cases, such as the list with only one element or the empty list need to be considered in some algorithms. Here is another example of code that can be improved by changing the data structure. Consider the following function to return the size of the list by traversing it completely: int our_list::size() const { int c = 0; for (lnode* n = head; n != nullptr; n = n->next){ ++c; } return c; } A more efficient implementation of this function can be implemented by maintaining the size of the list as another member variable (e.g. count) of our_list. Then, count must be updated each time an operations such as push_front affects the vector’s size. Depending on the use cases, different adaptations of the data structures might be beneficial. Variations of linked lists, like double linked lists or lists with sentinels address different weaknesses (and might have their own drawbacks). Subsection 16.2 List Iterator With the implementation of a list in the previous paragraphs, we have implemented a new kind of container. But in order to use a container in the flexible way we have learned in Section 14.2, we need to offer an iterator. Let us implement an iterator for our_list. We need an our_list-specific iterator with at least the following functionality: access the current element with operator*, advance iterator the iterator with operator++ and check that the end of iteration is reached with operator!= or operator==. Moreover, we need member functions of our_list, namely begin() and end(), to get an iterator to the beginning and past the end, respectively. Because the iterator we want to offer belongs to our list, we can embed its declaration into our class our_list. So, we define an iterator class as inner class of our_list.62 Instances of our iterator are now of type our_list::iterator. 62 Embedding the class definition of the iterator is a design choice we deliberately made to indicate that this iterator cannot live without its container. But, technically, it is not necessary to do it like this. Dynamic Data Structures 192 Example 16.6 (iterator as inner class of our_list) class our_list { public:... // definition of the iterator as (public) inner class class iterator {... };... // member functions to get iterators to the beginnig and end of data iterator begin(); iterator end();... } As described above, here is what the iterator itself needs to offer: Example 16.7 (our_list::iterator functionality) class iterator { lnode* node; // state: the currently pointed node public: iterator(lnode* n); // constructor iterator& operator++(); // advance int& operator*() const; // element access (dereference) bool operator!=(const iterator& other) const; // compare }; Let us now implement the functionality we promised. The constructor initializes the iterator to point to a node of the linked list. Example 16.8 (Constructor) our_list::iterator::iterator(lnode* n): node(n) {} The (pre-)increment operator advances the iterator to the next element (if available) Example 16.9 (Increment) // Pre-increment // pre: node != nullptr // post: iterator points to the next element our_list::iterator& our_list::iterator::operator++() { assert(node != nullptr); node = node->next; // advance Dynamic Data Structures 193 return *this; // behavior of pre-increment } The dereference operator allows access to the element behind the iterator. Note that in this case it indeed implies a dereferencing of a pointer (plus an access of the field value). Example 16.10 (Dereferencing) int& our_list::iterator::operator*() const { return node->value; } Comparing two iterators means comparing their targets (where they point to): Example 16.11 (Comparison) int& our_list::iterator::operator!=(const our_list::iterator& other) const { return node != other.node; } What is missing, finally, is the begin and end member functions of the list to issue iterators referring to the begin and past the end of the list data structure. We implement these two functions now: Example 16.12 (begin) our_list::iterator our_list::begin() { // iterator pointing to the first element of the list: return our_list::iterator(head); } Example 16.13 (end) our_list::iterator our_list::begin() { // iterator pointing past the last element of the list return our_list::iterator(nullptr); } As an example how to apply the new iterator, we implement a print function for our list: Example 16.14 (Print) void print(our_list l){ for (our_list::iterator it = l.begin(); it != l.end(); ++it){ std::cout