Abstract Data Types - IDATA 2302 - Algorithms and Data Structures PDF
Document Details
Uploaded by ElitePrudence9662
NTNU
2024
Franck C
Tags
Summary
This document is a lecture on abstract data types, part of IDATA 2302 at NTNU. It discusses various aspects of data types, including symbolic representation and programming interfaces, and different types like primitive and compound types. It explores how these concepts are used in algorithms.
Full Transcript
12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures Abstract Data Types Contents 1.1. Data Types 1.2. Abstract Data Types 1.3. Conclusi...
12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures Abstract Data Types Contents 1.1. Data Types 1.2. Abstract Data Types 1.3. Conclusion Version: 0.4.24 Lecture: Lecture 2.1 (slides) Objectives: Understand what is an abstract data type Concepts: Data, Computation, Computational Problem, Algorithm, Data Structure So far we have used only numbers, but in real-life, we need a little more. We want text, images, colors, sounds, dates, 2D points, users records, etc. In this lecture, we will look at data types, what they are and how they can help use design and reason on algorithms. Say we are task with converting an color image to grayscale. This sounds pretty remote from what our RAM can do? How can we write an algorithm that does that? How to represent more than just numbers with machine’s symbols? How to move beyond int, float, characters and define our own custom composite data types How to leverage data types in algorithms 1.1. Data Types So far, we have manipulated only integers, because our RAM use Arabic digits. What if we need the machine to manipulate other type of data, say an image for instance? Well, from the machine standpoint, there is nothing to do: Machines only crunch symbols. At the programming language level however, we need new data types. As shown on:numref:sequences/adt/data_type, a data type defines two things: A symbolic representation and a programming interface. The symbolic representation decides how we use the machine’s symbols to represent the data of interest (the image in our previous example). The programming interface includes all the procedures we use to manipulate this new data type. For images these could be resizing, cropping, blurring, switching to gray scales, etc. Important A data type is a set of sequences of machine symbols that adhere to a specific representation and which we manipulate using a specific programming interface. Data types come up handy for programmers because they make programs easier to understand. Consider the following Java program that decides whether the user answered “yes” or “no”. The keywords “boolean” and “String” gives hints about the intention. boolean isYes(String anwser) { return Character.toUpperCase(answer.charAt(0)) == 'Y'; } Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 1/6 12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures C, Pascal, Java and other statically-typed languages require programmers to mark variables and parameters with the appropriate data type. This way, the compiler can detect earlier invalid expressions such as var x = 25 / 'a' , where we attempt to divide a number by a character. Typing can however become a hassle when they are less obvious, and that is why dynamically-typed languages like Python, Ruby or JavaScript do not enforce type-correctness and let typing mistakes surface as runtime errors. 1.1.1. Symbolic Data Representation The first step towards a new data type is to find a way to represent data with the machine’s symbols. Once we agreed on a representation format, we can extend the I/O device to translate back and forth between data and symbols. Fig. 1.5 shows how pthe representation format decides both encoding and decoding. 1.1.1.1. Encoding Data into Symbols Encodings schemes define how we represent a given piece of data using the machine’s symbols (see Fig. 1.5). There are many encodings you may have heard of, such as ASCII or Unicode for characters, sign-and-magnitude or 2s-complement for integers, IEEE 754 for floating point values, PNG for images, etc. Take ASCII for example, the American standard for information interchange. ASCII was developed during the 60ies to represent text for computers and telecommunications. With ASCII, each character occupies 7 bits, so the ASCII format only accounts for 128 = 2 7 character symbols. That is enough anyway to capture the Latin alphabet, common punctuation symbols, a few math symbols and more. The character ’A’ (uppercase) is represented by the number 65 or (or 41 in hexadecimal), ’B’ by 66, ’C’ by 67, etc. Lowercase letters come a bit further down with ’a’ encoded with 97, ’b’ with 98, etc. 1.1.1.2. Decoding Data from Symbols The exact opposite of encoding—when we read data out of symbols—is known as decoding. Returning to ASCII, decoding the four bytes 4A-6F-68-6E (i.e., 74, 111, 104, 110 as decimal numbers) would be decoded as the text “John”, as shown on Fig. 1.5. The key point about decoding is that we need to know the underlying representation in advance. Given a sequence of symbols, one cannot say what encoding it comes from. Take again the four bytes 4A-6F-68-6E or example, they can represent: The natural number as a 32-bit integer value ; The real number as a 32-bit floating point value ; The text “John” in ASCII ; Some sort of greenish color as an RGBA color ; etc. Data Types in assembly code The machine itself remains completely oblivious of such encoding/decoding: It only transforms symbols. In our simplified RAM architecture, encoding and decoding would take place in the I/O device: It would convert specific representations and present data accordingly to the user. Important A data structure is the representation (i.e., the memory layout) chosen for a particular data type. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 2/6 12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures 1.1.2. Programming Interface The second thing that characterizes a data type is its programming interface: Procedures to manipulate the data. Consider for the example the integer type, which comes predefined in most programming languages (e.g., int in Java or C/C++). The integer type comes with predefined operations that mirror the arithmetic and logical operations that exists in mathematics (addition, subtraction, division, modulo, comparisons, etc.), as well as conversions to other data types such as floating point numbers or string. Some of these operations are directly supported by the underlying machine, such as the addition in our RAM, others require dedicated procedures. Character is another example. In Java, the associated programming interface includes procedures such as isLetter , isDigit , isWhiteSpace , isUpperCase , isLowerCase , toLowerCase , and many more. These procedures defines what one can do with a character, and in turn, what a character is. As shown on Fig. 1.4, the programming interface is all that matters to programmers, as one could possibly change internal representation as long as the interface remains the same. Think of programs that manipulate simple characters: The actual representation (ASCII, Unicode, etc.) may be irrelevant. Programming interface is what matters when it comes to algorithms and for this course in particular. 1.1.3. Primitive Types High-level programming languages such as C, Java, Python natively supports most common data types. The compilers (or interpreter) hides the underlying representations and expose their programming interface through keywords, operators or standard libraries. These primitive data types include Boolean values (true / false), which come along with conjunction (and), disjunction (or), and negation (not) operators. Integer values, which support both arithmetic operations as well as comparisons. Floating point values, which also support both arithmetic operations and comparisons Characters, often encoded either in ASCII or in Unicode. Bytes (8 bits), correspond to a raw sequence of symbols 1.1.4. Compound Types Primitive data types are programming languages give us to play with, but we very often need to compose them in order to build new “compound” data types that capture domain concepts, such as color, 2D point, dates, time, user record, etc. Programming languages provides three main ways to compose data types 1, namely structures, arrays, and variants. 1.1.4.1. Records (also known as structures or tuples) includes multiple entries called fields, each with its own data type. Records pp resemble tuples in mathematics which results from the Cartesian product over sets, such as (x, y) ∈ T1 × T2. The key point of records is to access fields using their name. In Pascal for example, one could describe a player in game using the following record type type Date = record name: string; score: integer; isCPU: boolean; end; Fig. 1.6 illustrate how the compiler may lay out the record fields in memory. The details vary from compiler to compiler, but the principle remains the same: Provided the record starts at the base address b, the address of the k-th field is given by: Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 3/6 12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures k address(f k ) = b + ∑ size(T i ) (1.1) i=1 The compiler takes case of this and provides us with direct access to each fields by name in constant time: Offset to all fields are precomputed at compile time. base address Size of Size of Size of Fig. 1.6 A possible memory layout of a record of type T 1 × T2 × T3 1.1.4.2. Arrays represents a sequence of items, all from the same data type. Formally, we can think of arrays as function a that maps integer to specific item such a : N → T. The key point of arrays it to access items using their position in the sequence. Lecture 3.2 will dive into arrays. In C for example, one could represent calendar dates using an array: typedef int date; Fig. 1.7 shows how an array containing 3 items of a type T could be laid out in memory. The array is allocated from a base address (b). Since arrays contains items of a single data type T (whose size is known), we can deduce the address of k-th item using: address(k) = b + k ∗ size(T ) (1.2) This is actually the same as Equation (1.1), but for a single data type. Here as well, the compiler takes care of this and provides us access by index in constant time. This makes array the go-to data structure for random access: When little is known on which item will be accessed most often. base address X X Y Y Z Z Size of allocated capacity Fig. 1.7 A common memory layout for an array of type T 6 1.1.4.3. Variants (also known as unions) represent a single field which can belong to multiple data types: It can be decoded using different Skip to main format. Formally, a variant captures the union of multiple data content types T 1 ∪ T2. We access data through the name we give to https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 4/6 12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures specific interpretation. For example, in C, we could write: union Number { int asInteger; float asFloat; }; Now, we can build whatever data type we please and we can combine them using arrays, records or variants. 1.2. Abstract Data Types Data types are what we need when we create a new programming language, but from a pure algorithmic perspective, this is not what we need. The internal representation is irrelevant. Ideally, we do not want to write algorithms that only apply to ASCII characters. We want algorithms that apply to characters in general, whether they are encoded in ASCII, in Unicode. From an algorithmic perspective, the representation is irrelevant, and what we should focus on is the programming interface. Unfortunately, the programming interface in itself is not enough to define what any character data type ought to offer. The procedures relates to one another in very specific ways. For instance, if I convert a character to upper case and then convert it back to lower case, I would get back to the same original character. So to be valid, a character data type has to offer a set procedures that behave properly. An abstract data type (ADT) 2 defines the programmer’s expectations over a programming interface, irrespective of its representation. Representation is an implementation detail, and by hiding it from our algorithms, they become more generally applicable. 1.2.1. Defining an ADT An abstract data type defines three elements: a set of domains, a set of operations and a set of axioms. Formally, an ADT defines an algebra over the possible values of the data types. We will illustrate each on a Boolean ADT. 1.2.1.1. Domains The domains describe what the ADT is about, including all the data types manipulated by the programming interface. The term “domain” comes from mathematics, where it stands for the set of values for which a given function is defined. For our Boolean ADT, the domain boil down to the set of boolean values B. 1.2.1.2. Operations The operations are the procedures that we can use to manipulate our ADT. They are often categorized into constructor, queries and commands. Constructors instantiate the ADT from something else, queries convert our ADT into something else, and commands modify it. The operations supported by the Boolean ADT includes: Constructors true : ∅ → B creates T f alse : ∅ → B creates F Queries: None Commands: and : B × B → B represents the conjunction of two Boolean values. represents the disjunction of two Boolean values. Skip to main content or : B × B → B https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 5/6 12/5/24, 2:06 PM 1. Abstract Data Types — IDATA 2302 — Algorithms and Data Structures not : B → B represents the negation. 1.2.1.3. Axioms The axioms are the relationships between the procedures that the ADT supports. In the case of the Boolean ADT, the axioms are: 1. ∀x ∈ B, and(x, f alse()) = f alse() 2. ∀x ∈ B, and(x, true()) = x 3. not(true()) = f alse() 4. not(f alse()) = true() 5. ∀x, y 2 ∈ B , or(x, y) = not(and(not(x), not(y))) Important An abstract data type (ADT) captures the inherent relationships between the procedures that form the programming interface. It includes a domain, a set of operations, and a set of axioms that constrain the behavior of the procedures. 1.2.2. ADT and Correctness We already touched upon correctness of algorithms in Lecture 1.3 and ADT is a very convenient tool for that. Since ADT for the specification of a data type, we can use them in two situations: Thinking about the correctness of an algorithm that uses our ADT. For example, if we have a program that contains the following boolean expression or(true(), x), we can use the axioms of our Boolean ADT as follows: ∀x ∈ B, or(true(), x) = not(and(not(true()), not(x)) (Axiom 5) = not(and(f alse(), not(x)) (Axiom 3) = not(f alse()) (Axiom 1) = true() (Axiom 4) ADTs make explicit what we assume to be true when we design algorithms and thus greatly simplify reasoning about correctness. Thinking about the correctness of an algorithm that implements the programming interface of our ADT. In this case, our axioms must be established: either proven or tested. Thinking in terms of ADT is one possible way to identify relevant test cases. Thinking about the correctness of an implementation of our ADT. 1.3. Conclusion We saw what is a data type, and how we can define new data types either primitive data types that encode specific type of data into machine symbols, or composite data types that recombines existing ones using records, arrays or variants. We also saw how to abstract away the underlying machine representation in order to define abstract data types. In the rest of this course, we will explore well-known ADTs for sequence, sets, trees, etc. We will look at alternative “representations” and see how and why they yield different efficiency trade-offs. https://franckc.folk.ntnu.no/idata2302/2024/sequences/adt.html 6/6 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures Computations Contents 1.1. Computation 1.2. Algorithms 1.3. Data Structures 1.4. How to Describe an Algorithm? 1.5. Conclusions Version: 0.4.24 Lecture: Lecture 1.1 (slides) Objectives: Understand core Computer Science concepts Concepts: Data, Computation, Computational Problem, Algorithm, Data Structure Welcome! I am very happy to welcome you in this course about Algorithms & Data Structures. I prepared this course as well as I could with the time, resources and energy I had, and I do hope you will find it relevant. Many thanks to the students who took this course in the past: Their feedback has been very valuable. Let me know what you think! Despite its official title, this course is first and foremost about solving “programming” problems. This is the most valuable tool a programmer has because it applies across technologies, languages and frameworks. The things I learned 15 years ago still apply to my daily engineering work. This is rock-solid. But it is also a competence that is hard to get. I would like to kick-off this course by presenting three core concepts: Computation, algorithms, and data structures. 1.1. Computation Computer Science (CS) is all about computers—as the name says. The term “computer” stands however for anything that “computes”, be it a person, a machine, or something else. What CS is really about is computation. But what is that? A computation is a transformation of data: It consumes some data and produces some data. That leads us directly to next Skip to main content question: What is data? https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 1/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures Data is an overloaded term. Intuitively, it stands for more or less anything: Texts, numbers, pictures, audio recording, etc. In CS however, data is anything that we can write down using a finite set of symbols. For example, the word “computation” is a sequence of letters, each drawn from the Latin alphabet. Numbers shows up as combinations of Arabic digits. Your smart phone computes using “binary” data, encoded using binary alphabet (0 or 1) …These are various forms of data. Important Data represents anything that can be encoded using a finite number of symbols 1.1.1. Symbols vs. Concepts Symbols are things that stands for something else, often a more “abtsract” thing. For example, in the Western civilization, the symbol ☠ stands for danger or death. Numbers are another example of “concepts” that we can write using a variety of symbols. We can write the concept of “two tens” as a combination of Arabic digits “20”, as the combination of Roman digits “XX”, or even with “14” in hexadecimal. We can also use Latin alphabet like in “twenty”, etc. All those represent the same number. CS is only concerned with symbols and not with their interpretation, which is peculiar to human cognition. Important A computation transforms of a sequence of symbols into another. 1.1.2. Computational Problems We use computations to solve problems, but only some problems can be solved by a computation! These are so-called computational problems, and are the focus of this course. A computational problem defines what are the valid outputs for any given input given. Such a problem sets the requirements for a computation. Here is an example: add : N × N ↦ N add (x, y) ↦ x + y The problem specified by Equation [eq:addition] is to find the number that is the sum of the two given inputs x and y. Note that the ’+’ symbol I used above does not tell us how to Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 2/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures actually carry out an addition. It simply denotes the concept of “addition” and I assume that everyone recognizes it and know what it means (associativity, neutral element, etc.). I bet you already know a procedure to find the sum of two natural numbers. In general, problems describe relations between concepts: natural numbers, addition, etc. They say nothing about the symbols to use. There are many types of computational problems. In decision problems for example, the output is a Boolean value (yes or no), such as testing whether a given number is a prime number. We will also meet search problems, where we have to find a particular value in a larger set, such as finding all the pairs of numbers whose sum is 10. Another type of problem is counting, where we try to figure out the number of solutions to a another problem. Finally there is also optimization problems, where we search for the solution that maximizes a specific criterion. All these problems can be described as mathematical functions. Important A computational problem maps input data to output data. Mathematical functions clearly capture how these inputs map to the appropriate output. 1.2. Algorithms How do we solve a computational problem? We need a procedure, that is a “recipe” that we can follow blindly—just like a machine—to get to the result. These recipes or procedures are algorithms: Sequences of instructions that solve a computational problem. 4 columns carry 1 1 1 x= 4 1 7 9 y= 9 6 7 result = 5 1 4 6 Fig. 1.1 Adding two natural numbers Returning to the addition of two natural numbers, I have learned in primary school an algorithm to do that. Fig. 1.1 shows the setup I would use to add to 967. Here are the steps I would follow: Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 3/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures 1. Write down the two given numbers in a grid and align their digits by significance: The least significant digit on the rightmost column. Keep a free row on top for possible carry-overs and another row below for the result. Keep an extra column on the left for a possible final carry-over. 2. Put your finger under the rightmost column. 3. If there is no digit to read, stop here. 4. Otherwise, read the digits in this column. 5. Add these digits to get their sum and the associated carry-over. 6. Write down this sum into the bottom cell of the current column. 7. Move your finger to the next column on the left. 8. If there is a carry-over, write it down in first cell. 9. Return to Step 3. This is our first algorithm: A recipe to add natural numbers! The notion of algorithm is however not so well defined. I am not aware of a single formal definition, upon which everyone agrees. In this course, I will reuse the definition given by D. Knuth in TAOCP 1 where he specifies the four following properties: An algorithm has inputs and outputs. It consumes some data and produces some results. Our addition takes two natural numbers and outputs their sum. An algorithm is finite: It must terminate at some point and cannot have an infinite number of steps. Our addition of two numbers stops when we have added all pairs of digits. An algorithm is well-defined, and each step is non-ambiguous. In our addition, each step is about reading, adding, comparing or writing numbers. Children do not need to know how to add, they can use an addition table that gives both the result digit and the carry over as shown on Fig. 1.2. An algorithm is effective and can be carried out by either a machine or human with pen and paper in a finite amount of time. Each step must be feasible. As for the addition, children add numbers this way on a daily basis, in a few minutes. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 4/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures 0 1 2... 9 0 00 01 02... 09 1 01 02 03... 10 2 02 03 04... 11............... 9 09 10 11... 18 Fig. 1.2 The addition table: Each cell contains two digits: the carry over and the sum. Do not confuse algorithm and computation. As for the addition, the algorithm is the list of steps to follow whereas the computation is what happen when a computer (a machine or a child) goes through a particular addition. 1.3. Data Structures An algorithm is a sequence of steps that manipulates data to solve a problem. It necessarily produces and transforms data and needs a place to store it—a sort of “scratch pad” if you will. This scratch pad is the data structure: How we organize the data our algorithm manipulates. In our addition example (see Fig. 1.1) we use a “grid” that stores all the data, including the two given numbers (the inputs), the result (output) and the carryovers (intermediate results). This grid has four rows and one more column than the longest given numbers has digits. Many data structure are possible for a given algorithm, and an appropriate data structure enables efficient algorithms. We will discuss various schemes such as lists, trees, graphs, etc. Each has its strengths and its weaknesses. As for our addition, we could have written it down as a list of symbols: 4179 + 967 = 5146 But that would have been harder. Primary school teachers use this grid because it makes things easier for children: They proceed—mechanically—by columns. Only when we become more fluent do we get rid of the grid. The very same applies to algorithms: Appropriate data structure is the key to their efficiency. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 5/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures Important An algorithm is a finite sequence of non-ambiguous instructions, which processes its inputs to produce the solution of a computational problem. To work efficiently, algorithms store their data into dedicated data structures. 1.4. How to Describe an Algorithm? Once we have solved a computational problem, we have to communicate our solution: Explain it to our colleagues or simply to “program” a machine to do it. So, what is the best way to describe an algorithm? If the computer is a person our bullet list of plain English instructions may very well do the job. If the computer is a machine however, there we will have a hard time to get the machine understand all the nuances of our natural languages. Let us review a few commonly used approaches: Natural languages, flowcharts, pseudo-code and programs. 1.4.1. Using Natural Language The simplest way to describe an algorithm is to use plain English, though this often lead to ambiguous text, which rules out the use machines. This is what we used in the previous section for our addition algorithm. 1.4.2. Using Flowcharts Another human-friendly way is to use a flowchart as shown on Fig. 1.3. In a flowchart, the steps of an algorithm are shown as boxes connected by arrows. The flowchart syntax distinguishes between various type of steps such as processes, document, decisions, references, etc using different shapes. Figure 1 only use “processes” (shown as rectangles) and decisions (shown as diamond). A flowchart makes the control structure (loops and decisions) very explicit, but the graphical syntax is quite space consuming and may not scale to complex algorithms. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 6/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures start Setup the grid Place finger under the rightmost column Move finger to [no] Write the carry Return the last the next column Digit in column? end in the first cell row on the left [yes] Write the sum Add the three Read the digits in the last row digits Fig. 1.3 The grade-school addition algorithm, portrayed as a flowchart. 1.4.3. Using Pseudo-code A third human-friendly solution is to use pseudo-code. The idea is to combine control structures common in most imperative programming languages (loops and conditional) with plain English or mathematical notation in order to express succinctly the main idea of an algorithm. For our addition algorithm we could write something like: Input x: Sequence of digits Input: y: Sequence of digits Output: z: Sequence of digits such as z = x + y 1. Setup x and y into a grid ; 2. Place your finger under the right most column. 3. while there are digits in the column do: a. Read these digits; b. Add them up to get the sum and the carry; c. Write down this sum in the bottom cell; d. Move your finger to the next column on the left; e. Write down this carry in the top cell; 3. return the last row of the grid; Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 7/8 12/5/24, 2:05 PM 1. Computations — IDATA 2302 — Algorithms and Data Structures 1.4.4. Using a Program All these representations help communicate algorithms with people, so they all rely on natural language, which may be ambiguous. We will see in the next Lecture how we can convert pseudo code into machine code, that code that machine can understand. Important There is an direct relationship between the actions we stipulate in an algorithm and the capabilities of the computer we use to execute it. 1.5. Conclusions Hopefully, you have a better grasp at what this is about. Let us get started! There is a lot or ground to cover. Please reach out if you have any questions or if you find mistakes in the slides, the lectures notes or the lab sessions. https://franckc.folk.ntnu.no/idata2302/2024/foundations/computation.html#computation 8/8 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Graph Theory Contents 1.1. Graphs in Real-life 1.2. Graphs Species 1.3. The Graph Jargon 1.4. Alternative Representations 1.5. Graph Problems 1.6. Graphs as Data Structures 1.7. Appendix Version: 0.4.24 Lecture: Lecture 6.1 (slides) Objectives: Understand what a graph is, what data structures exists to represent it, and when to choose these data structures. Concepts: Graph, path, circuit, edge list, adjacency list, adjacency matrix, depth-first traversal, breadth-first traversal, We started by looking at sequences, implemented either using arrays or linked lists. In a sequence, each item has at most one predecessor and one successor. We then moved on to trees, where each item can have many successors, but without forming cycles. Graphs generalize these structures: Items can have many successors, many predecessors, cycles, etc. There is no constraint. Fig. 1.28 below illustrates this evolution. Sequences Trees Graphs Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 1/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Fig. 1.28 From sequences to graphs, growing data structures in complexity Intuitively, a graph is just a collection of “things” linked together in a specific way. In Computer Science (and in Discrete Mathematics), we talk about vertices (singular vertex) connected by edges. Let see what we do with these. See also There is no need to become an expert in graph theory. Knowing the basic terminology has however been very useful for me. Here is a reference I used when I need to check something: 📕 Wilson, R. J. (1996). Introduction to graph theory. 4th edition, Addison-Wesley. Chapters 1 and 2 are already more than what you need to know for this course. 1.1. Graphs in Real-life Graph is such a versatile concept. Whatever application domain you work on, there is a graph laying low. The most obvious example—I believe—comes from transportations. I am sure you have seen buses or subways maps that show how the different lines connect the different stations. The distances are very often inaccurate, but the ordering of stations is enough to find our way. See for instance the Oslo subway map aside on Fig. 1.29. A common problem is thus to find a route between two stations. Graphs also plays a key role in social media such as Oslo Subway Map Facebook, LinkedIn, Snapchat and the likes. The underlying data model is a simple graph where The subway map of Oslo city is vertices represent persons, and edges indicate people a nice example of graph! As know one another. The main problems is social shown on Fig. 1.29, although the network are how many people will my “posts” reach. distance are inaccurate, what Who has the most followers, etc? matters for the traveler is only the order of stations. One Internet is another example of “giant” graph. Web common problem that arises is sites expose documents that point to other “route planning”: What is the documents using hyperlinks. Here the vertices are the best route to go from A to B. documents and the edges these hyperlinks. Search Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 2/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures engines classify web sites according to their visibility, an idea that relates to the probability of a user reaching a web site out of luck. PageRank is one famous such algorithm developed by Google. Most scientific disciplines use graphs. Electric circuits, which electrical engineers use are also graphs in disguise. Vertices represents electric components such as resistors, lamps, relays, etc., whereas edges Fig. 1.29 The subway map of represent the wiring. Chemists also rely on graphs to Oslo. Vertices are subway describe chemical compounds: Vertices represents stations and edges are possible atoms, and edges, bonds. In Biology, graphs describe connections. food-webs, where vertices are animal species and edges denote prey-predator relationship. Public health experts uses graphs to understand how virus propagates through our societies. Vertices are people and edges show the contagion. The same is used by social scientists who study the propagation of ideas. The list goes on and on… Computer Science is not left out. The flowcharts we have used to visualize algorithms are graphs: Nodes (of various shapes) represent blocks of instructions whereas edges represents the flow of control. In object-oriented analysis, class diagrams (say in UML) are also graphs, where nodes represents classes and edges captures their relationships. Entity-relationships models we use in database are also graphs. You get the idea. 1.2. Graphs Species A graph is a very abstract concept: A set of vertices connected by edges. In practice, there are many variations around this theme. Its is important to choose the one that best fit the problem at hand. Let me present the two most common ones, which we will use: Directed and weighted graphs. 1.2.1. Simple Graphs The basic form of graph, the simple graph, is a set of vertices and a set of associations between them. Formally, a simple graph G is a structure G = (V , E) where V is the set of vertices and E is a set of unordered pairs of vertices,Skip suchtothat mainEcontent ⊆ { {x, y} | x ∈ V ∧ y ∈ V}. https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 3/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Consider the “friendship” graph shown below by Fig. 1.30, which portrays a fragment of social network. The persons connected by lines know each other. e1 John Lisa e2 e3 e7 Mary Frank Thomas e4 e5 e6 Peter Olive Denis e8 Erik Fig. 1.30 A graph connecting people. Edges show who knows who. Following the definitions above, we get: The set of vertices contains all the persons, that is V = {D, E, F , J , L, M , O, P , T }. The set of edges includes 8 unordered pairs E = {{D, F }, {D, O}, {E, O}, {F , L}, {F , T }, {J , L}, {J , M }, {M , P }, {M , O}}. 1.2.2. Directed Graphs Sometimes, the relationship between vertices is not symmetrical. In the “friendship” graph above, the “knows” relationship is symmetrical, because we assume that if John knows Lisa, then Lisa knows John too. Contrast this with the food web shown below on Fig. 1.31, where the relationship “preying on” is directed (i.e., not symmetrical). Hawks preys on rabbits, but rabbits do not eat hawks. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 4/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Hawk Snake Rabit Mouse Flower Grass Fig. 1.31 A simple “food web” where vertices represents species and arrows represents who eats what. We call such graphs directed graphs, by opposition to simple graphs (aka undirected graphs). Formally, a directed graph G closely resembles a simple graph: It is also a structure G = (V , E) , but E is here a set of ordered pairs such that E ⊆ { (x, y) | (x, y) ∈ V 2 }. Returning to the food web example from Fig. 1.31, the vertices would be V = {F , G, H , M , R, S} and the edges would be: E = {(H , M ), (H , R), (H , S), (S, R), (S, M ), (M , F ), (M , G), (R, F ), (R, G)} 1.2.3. Weighted Graphs Sometimes it is necessary to capture a little more about edges and a straightforward approach is to equip them with a number, which represents a distance, a cost, a likelihood, a capacity, or any other quantity relevant for the problem at hand. This yields a weighted graph. For example, transportation systems often care about the “distance” between two locations (i.e., vertices), as shown on Fig. 1.32 below. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 5/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Trondheim 220 km 154 km 82 km Røros Molde Ålesund 260 km 425 km 290 km Bergen Oppdahl Hamar 290 km 400 km Oslo Fig. 1.32 A few Norwegian cities and the road distances that separate them Formally, a weighted graph is a structure G = (V , E, ϕ) where V is the set of vertices, E, the set of edges (directed or not), and ϕ a function that maps every edge to its weight, such that ϕ : E → R. Given this definition, the vertex set for the Norwegian cities (see Fig. 1.32) is V = {B, H , M , Op, Os, T , Å}, whereas the weight function ϕ would be: ϕ = {{T , M } ↦ 220, {T , R} ↦ 260, {M , Å} ↦ 82, { Å, B} ↦ 425, { Å, Op} ↦ 290, {R, H } ↦ 260, {Op, H } ↦ 290, {Op, Os} ↦ 400} 1.2.4. Other Graph Species There are a few other species of graph, which we won’t use in this course. I list them below for the sake of completeness. Multi-graphs are graphs where to vertices can be connected by more than a single edge. The above definition G = (V , E) falls short in that case, and other mathematical structures are needed, such asSkip theto main content incidence matrix for instance. https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 6/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Hyper-graphs are graphs where an edge can connect more than two vertices. Warning The algorithms and data structures we will study in this course should be adjusted for such more complex species of graph. 1.3. The Graph Jargon Besides vertex and edges, there are various terms that carry a specific meaning when it comes to graph theory. Let see the main ones so we can understand what we are talking about.. walk A sequence of edges path A walk where every vertex appears only once. In Fig. 1.30 the sequence “Denis, Olive, Mary” is a path. cycle A walk that starts where its ends. In Fig. 1.30 the sequence “Denis, Olive, Mary, John, Lisa, Frank, Denis” is a cycle loop An edge whose two members are the same vertex. There is no such a loop on Fig. 1.30. adjacent vertices Two vertices are adjacent if there exists an edge that connects them. Denis is only adjacent to Frank and Olive. incident edge (to a vertex) An edge is incident to a vertex it connects to or from this vertex. degree (of a vertex v) The number of edges that are incident to Vertex v. A vertex of degree zero is called an isolated vertex, and a vertex of degree 1 is an end-vertex. For directed graphs, the in- degree and out-degree are the numbers of edges to and from, respectively. connected graph Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 7/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures A graph where every two vertices are connected by a path. subgraph A subgraph of a graph G = (V , E) , is another graph G whose vertices and edges are ′ all included in V and E, respectively. The vertex set of G has to contain the subset of ′ vertices incident to the selected edges. component A “maximum” connected subgraph, that is connected subgraph, to which one cannot add any vertex without loosing the “connectivity” property complete graph A graph where every pair of vertex is adjacent regular graph A graph where every vertex has the same degree tree An undirected graph that has no cycle and where every pair of vertex is connected by exactly one path. forest An undirected graph that has no cycle and where every pair of vertex is connected by at most one path. In other words, a forest is a graph where every component is a tree. 1.4. Alternative Representations So far we have defined a graph by its list of edges, that is a structure G = (V , E) , where V is a set of vertices and E is the set of edges. At times, other “representations” come up handy, namely the adjacency matrix and the incidence matrix. 1.4.1. Adjacency Matrix The adjacency matrix A is an |V | × |V | matrix where the cell c contains the number of ij edges between vertices i and j. The adjacency matrix is, by definition, a square matrix. Consider again the friendship graph shown above on Fig. 1.30. Provided we number the vertices V = {D, E, F , J , L, M , O, P , T } in alphabetical order, the adjacency matrix would be: Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 8/20 12/5/24, 2:09 PM A = ⎢⎥ ⎡ ⎣ 0 0 0 1 0 0 0 1 0 1.4.2. Incidence Matrix cell c i,j I = ⎡ ⎣ 1.5. Graph Problems 0 −1 1 0 0 0 0 0 0 0 0 1 0 0 −1 0 0 1 0 Here is a few well known examples: Hamiltonian Path 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 −1 0 0 1 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures 0 0 1 1 0 0 0 0 0 1 0 0 −1 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 −1 0 0 0 0 0 0 0 1 0 0 0 i 0 0 0 1 0 −1 0 0 1 0 0 0 0 0 0 ⎤ ⎦ When edges have their own identity, we can use an incidence matrix, which maps every vertex onto its incident edges. The incidence matrix is therefore an |V | × |E| matrix I , such that every cell c i,j is 1 if an only if Vertex v is incident to Edge e. If the graph is directed, the This is one way to capture multi-graphs and/ hyper-graphs for instance. Look again at Fig. 1.31, the dummy food web. Provided that vertices and edges are ordered alphabetically, the incidence matrix I would be: 0 0 0 0 0 0 0 1 −1 1 0 0 0 −1 0 There are many graph problems that arises from the graph structure. Many of them are 0 1 0 0 −1 0 ⎤ ⎦ j can contains -1 if the vertex is the source, 1 if the vertex is the target and 0 otherwise. “difficult”, that is, for we only know “inefficient algorithms” (i.e., of exponential complexity). Find a Hamiltonian path in a graph, that is a path that visits every vertex only once. https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html Skip to main content 9/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Shortest path Find the shortest path between two vertices in a weighted graphs. We will look at three algorithms to tackle this, namely Dijkstra, Bellman-Ford, and Floyd-Warshall. Traveling Salesperson Problem (TSP) Given a weighted graph, find the shortest cycle that visits all the vertices. Graph Coloring Find a mapping from the set of vertices to a given set of colors, such that there is no two adjacent vertex that map to the same color. Minimum Spanning Tree (MST) Given a weighted graph, find the tree that covers all the vertices and has the minimum total weight. Cycle Detection Find cycles in a given graph. Vertex cover Given a graph G=(V,E), find a subset of vertices V’ such that every edges is incident to an vertex in V’. Independent set Given a graph G=(V,E), find a subset of vertices V’ such that no vertex in V’ is adjacent to a vertex in V. 1.6. Graphs as Data Structures Besides being a mathematical concept, a graph is also a data structure. Intuitively, the queries one runs against a graph include: 1.6.1. Graph ADT There is no agreement on what the graph abstract data type should be. There are indeed many operations that one can perform on a graph, finding cycles, components, unreachable vertices, etc. Graphs are highly application dependent. I put below what seems to me like a common minimal set of operations: empty Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 10/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Create a new empty graph, that is an graph with no vertex or edge. add vertex Add a new vertex to the given graph. What identifies the vertex, as well as the information its carries depend on the problem and or the selected internal representation. remove vertex Delete a specific vertex from the given graph. Either this raises an error if the selected vertex still has incoming or outgoing edges, or it removes these edges as well. add edge Create a new edge between the two given vertices. What identifies this edge as well as the information it carries depend on the problem and the chosen internal representation. remove edge Remove the selected edge. edges from Returns the list of edges whose source is the given vertex edges to Returns the list of edges whose target is the given vertex is_adjacent True if an only if there exist an edge between the two given vertices 1.6.2. Graph Representations There are many ways to implement the graph ADT above, from a simple list of edges to more complicated structures using hashing. As always, the one to use depends on the problem at hand. Let see the most common ones. In the general case, there is some information attached to either the vertex, the edges, or to both. The internal representation we choose impact however the implementation (and the efficiency) of the operations. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 11/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Operations Edge List Adjacency List Adjacency Matrix add vertex O(1) O(1) remove vertex O(V) O(V) add edge O(1) O(1) remove edge O(E) O(V+E) edges from O(E) O(deg(v)) edges to O(E) O(deg(v)) is adjacent O(E) O(1) 1.6.2.1. Edge List The simplest solution is simply to store a sequence of edges, where each edge includes a source vertex, a target vertex and a label. With this design a vertex is identified by its index in the underlying list of vertices, and an edge by its index in the underlying list of edges. e Edges (sequence of records) source target payload Edge e v2 v1 (record) v1 v2 Vertices (sequence of records) Vertex vk whatever the (record) problem requires Fig. 1.33 Overview of an edge list implementation. Vertices and edges are records, stored into Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 12/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures separate sequences. While simple, the main downside of this scheme is that most of operations on graphs rely on linear search to find either the vertex or the edge of interest. Listing 1.12 illustrates this issue on the edgesFrom method, where we have to iterate through all the edges to find those whose source vertex match the given one. Listing 1.12 Skeleton to implement a graph using an edge list in Java public class EdgeList { class Edge { private int source; private int target; private E label; } private List vertices; private List edges; public List edgesFrom(int source) { final var result = new ArrayList(); for(var edge: this.edges) { if (edge.source == source) { result.add(edge); } } return result; } // Etc. } 1.6.2.2. Adjacency List Another possible representation is the adjacency list: Each vertex gets its own list of incident edges. Often, store a hash table to retrieve specific vertex faster. When edges have no label/payload, each vertex is associated with the list of adjacent vertices (so the name “adjacency list”). The benefit of the this adjacency list is that is speed up the retrieval of incoming and outgoing edges. These are often very useful operation, for example, to traverse the graph (see Graph Traversals) Listing 1.13 Skeleton to implement a graph using an adjacency list in Java Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 13/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures public class AdjacencyList { class Edge { int source; int target; E payload; // etc. } class Vertex { V payload; List incidentEdges; // etc. } private List vertices; private List edges; public List edgesFrom(int source) { final var result = new ArrayList(); for(var edge: this.vertices[source].incidentEdges) { if (edge.source == source) { result.add(edge); } } return result; } // etc. } 1.6.2.3. Adjacency Matrix Note Graphs can quickly become very large. Take social media for instance with millions of users, or package dependencies in Python or JavaScript. Such large graphs requires dedicated storage solution, more like databases. Relational databases such as MySQL are however inappropriate and dedicated solution exists such as such as Neo4j for instance. 1.6.3. Graph Traversals As we have seen for trees, one may need to navigate through the structure of the graph, for instance to find whether there is a path between two vertices. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 14/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures We can adapt the two traversal strategies we have studied for trees, namely, the breadth-first and the depth-first traversal. The only difference is that graphs may contain cycles, and that we remember which vertices we have already “visited” to avoid “looping” forever in a cycle. As we progress through a graph we need to keep track of two things: The vertices that we have discovered but not yet processed. We will name these pending vertices The vertices that we have already visited (i.e., processed). We will name these the visited vertices. 1.6.3.1. Depth-first Traversal Intuitively, a depth-first traversal never turns back, except once it reaches a dead-end, either because the vertex it has reached has no outgoing edge, or because it has already “visited” all the surrounding vertices. The depth-first traversal would proceed as follows: 1. Add the entry vertex to the sequence of pending vertices 2. While there are pending vertices, 1. Take out the last pending vertex added 2. If that vertex is not in the set of visited ones 1. Add it to the set of visited vertices 2. Add all vertices that can be reached from there Listing 1.14 below shows a Java implementation of this algorithm. Note the use of a stack (see Line 3) which forces the extraction of the last inserted pending vertex. Listing 1.14 Java implementation of a depth-first traversal 1 public void depthFirstTraversal(Graph graph, Vertex entry) { 2 final var visited = new HashSet(); 3 final var pending = new Stack(); 4 pending.push(entry); 5 while (!pending.isEmpty()) { 6 var current = pending.pop(); 7 if (!visited.contains(current)) { 8 System.out.println(current); 9 visited.add(current); 10 for (var eachEdge: graph.edgesFrom(current)) { 11 pending.push(eachEdge.target) 12 } 13 } 14 } 15 } Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 15/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Consider for example Fig. 1.34 below, which shows a “friendship” graph. Vertex represent persons, whereas an edge between two persons indicates they know each other. Edges are bidirectional and can be navigated both ways. Starting from “Denis”, this depth-first traversal will reach the vertices in the following order: Denis, Frank, Lisa, John, Mary, Olive, Erik, Peter, and Thomas. Fig. 1.34 Traversing a graph “depth-first”, starting from the vertex “Denis” (assuming neighbors are visited in alphabetical order). This yields the following order: Denis, Frank, Lisa, John, Mary, Olive, Erik, Peter, Thomas. Let see how the visited and pending variables evolve in this example. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 16/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Step Current Pending visited See lines 0 D () {} Line 6 1 D () {D} Line 9 2 D (O, F ) {D} Line 10-12 3 F (O) {D} Line 6 4 F (O) {D, F } Line 9 5 F (O, T , L, D) {D, F } Line 10-12 6 D (O, T , L) {D, F } Line 6 7 L (O, T ) {D, F } Line 6 8 L (O, T ) {D, F , L} Line 9 9 L (O, T , J , F ) {D, F , L} Line 10-12 10 F (O, T , J ) {D, F , L} Line 6 11 J (O, T ) {D, F , L} Line 6 12 J (O, T ) {D, F , J , L} Line 9 13 J (O, T , M , L) {D, F , J , L} Line 10-12 14 L (O, T , M ) {D, F , J , L} Line 6 15 M (O, T ) {D, F , J , L} Line 6 16 M (O, T ) {D, F , J , L, M } Line 9 17 M (O, T , P , O, J ) {D, F , J , L, M } Line 10-12 18 J (O, T , P , O) {D, F , J , L, M } Line 6 19 O (O, T , P ) {D, F , J , L, M } Line 6 20 O (O, T , P ) {D, F , J , L, M , O} Line 9 21 O (O, T , P , M , E, D) {D, F , J , L, M , O} Line 10-12 22 D (O, T , P , M , E) Skip to main content {D, F , J , L, M , O} Line 6 https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 17/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures Step Current Pending visited See lines 23 E (O, T , P , M ) {D, F , J , L, M , O} Line 6 24 E (O, T , P , M ) {D, E, F , J , L, M , O} Line 9 25 E (O, T , P , M , O) {D, E, F , J , L, M , O} Line 10-12 26 O (O, T , P , M ) {D, E, F , J , L, M , O} Line 6 27 M (O, T , P ) {D, E, F , J , L, M , O} Line 6 28 P (O, T ) {D, E, F , J , L, M , O} Line 6 29 P (O, T ) {D, E, F , J , L, M , O, P } Line 9 30 P (O, T , M ) {D, E, F , J , L, M , O, P } Line 10-12 31 M (O, T ) {D, E, F , J , L, M , O, P } Line 6 32 T (O) {D, E, F , J , L, M , O, P } Line 6 33 T (O) {D, E, F , J , L, M , O, P , T } Line 9 34 T (O, F ) {D, E, F , J , L, M , O, P , T } Line 10-12 35 F (O) {D, E, F , J , L, M , O, P , T } Line 6 36 O () {D, E, F , J , L, M , O, P , T } Line 6 1.6.3.2. Breadth-first Traversal The breadth-first traversal closely resembles the depth-first traversal, but instead of pushing on forward until a dead-end, it systematically explores all children, levels by levels. First, the adjacent vertices, then the vertices two edges away, then those three edges away, etc. It could be summarized as follows: 1. Add the entry vertex to the pending vertices 2. While there are pending vertices, 1. Pick the first pending vertex (by order of insertion) 2. If that vertex has not already been visited 1. Mark it as visited Skip to 2. Add to pending vertices, all main content vertices that can be reached from that vertex https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 18/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures The only difference with the depth-first traversal is the element we pick from the list of pending vertices: The breadth-first traversal uses the first inserted, whereas the depth-first traversal uses the last inserted. Listing 1.15 below illustrates how a breadth-first traversal could look like in Java. We use a Queue to ensure we always pick the first pending vertex, by insertion order. Listing 1.15 Java implementation of a breadth-first traversal 1 public void breadthFirstTraversal(Graph graph, Vertex entry) { 2 final var visited = new HashSet(); 3 final var pending = new Queue(); 4 pending.push(entry); 5 while (!pending.isEmpty()) { 6 var current = pending.remove(0); 7 if (!visited.contains(current)) { 8 System.out.println(current); 9 visited.add(current); 10 for (var eachEdge: graph.edgesFrom(current)) { 11 pending.push(eachEdge.target) 12 } 13 } 14 } 15 } Consider again the “friendship” graph introduced in above in Fig. 1.34 and reproduced below on Fig. 1.35. A breadth-first traversal starting from “Denis” proceeds by “levels”. First it looks at all the adjacent vertices, namely Frank and Olive. Then, it looks at vertices that it can reach from those, that is Lisa, Thomas, Mary, and Olive. And then it continues exploring nodes one more edge away, namely John and Peter. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 19/20 12/5/24, 2:09 PM 1. Graph Theory — IDATA 2302 — Algorithms and Data Structures 8 John Lisa 4 5 Frank Thomas Mary 6 10 2 Olive Denis 1 Peter 3 7 Start Erik Fig. 1.35 Traversing a graph “breadth-first”. It enumerates vertices by “level”, as follows: Denis, Frank, Olive, Lisa, Thomas, Erik, Mary, John, Peter. 1.7. Appendix 1.7.1. Edge List Pattern in Ruby 1.7.2. Adjacency List in JavaScript 1.7.3. Adjacency Matrix in C https://franckc.folk.ntnu.no/idata2302/2024/graphs/intro/index.html 20/20 12/5/24, 2:07 PM 1. Procedure Calls — IDATA 2302 — Algorithms and Data Structures Procedure Calls Contents 1.1. Why Procedure Calls? 1.2. How Does “Calling” Work? 1.3. The Call Stack 1.4. How Much Does “Calling” Cost? Version: 0.4.24 Lectures: Lecture 3.1 (slides) Objectives: Understand the mechanisms of procedure calls and their runtime and memory cost Concepts: call-stack, call-frame, static and dynamic memory allocation So far we have used procedures here and there informally, to break down larger algorithms/programs into more manageable pieces. Now it is time to look more carefully to understand what happens at the machine-level and how much resources it takes. 1.1. Why Procedure Calls? Procedure call is an early construct that helps us progress from assembly code towards higher-level languages. They bring numerous benefits, including: Better code reuse and duplication removal. If the same fragment of code appears in multiple place, we can extract it into a separate procedure and get rid of the duplicated code. That helps with bug fixing in particular and avoid having to roll the same fix in multiple places. Better Code readability. Provided the procedures’ name are meaningful, the code reads better, and the intention (the algorithm) becomes more visible. Better Separation of concerns. Procedures are handy to separate different concerns, parameter validation, error handling, etc. That makes the things clearer. Remember for instance the selection sort, which I reproduce below in Listing 1.1. Skip to main content https://franckc.folk.ntnu.no/idata2302/2024/recursion/procedure_calls.html 1/8 12/5/24, 2:07 PM 1. Procedure Calls — IDATA 2302 — Algorithms and Data Structures Listing 1.1 A Ruby implementation of the selection sort, where find_minimum and swap are separate procedures 1 def selection_sort(sequence) 2 for index in 0.. sequence.length-2 3 minimum = find_minimum(sequence, index) 4 swap(sequence, index, minimum) 5 end 6 end 7 8 def find_minimum(sequence, start) 9 minimum = start 10 for index in start+1.. sequence.length-1 11 if sequence[index] < sequence[minimum] 12 minimum = index 13 end 14 end 15 return minimum 16 end 17 18 def swap(sequence, left, right) 19 tmp = sequence[right] 20 sequence[right] = sequence[left] 21 sequence[left] = tmp 22 end We extracted two procedures, namely find_minimum and swap , which makes the selection_sort algorithm much easier to understand—at least for me. But what about efficiency? Do p