AIPython PDF - Python for Artificial Intelligence

1 Python code for Artificial Intelligence Foundations of Computational Agents David L. Poole and Alan K. Mackworth Version 0.9.13 of November 3, 2024. https://aipython.org https://artint.info ©David L Poole and Alan K Mackworth 2017-2024. All code is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License. See: https://creativecommons.org/licenses/ by-nc-sa/4.0/deed.en This document and all the code can be downloaded from https://artint.info/AIPython/ or from https://aipython.org The authors and publisher of this book have used their best efforts in prepar- ing this book. These efforts include the development, research and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential dam- ages in connection with, or arising out of, the furnishing, performance, or use of these programs. https://aipython.org Version 0.9.13 November 3, 2024 Contents Contents 3 1 Python for Artificial Intelligence 9 1.1 Why Python?............................ 9 1.2 Getting Python........................... 10 1.3 Running Python.......................... 10 1.4 Pitfalls................................ 11 1.5 Features of Python......................... 11 1.5.1 f-strings............................ 11 1.5.2 Lists, Tuples, Sets, Dictionaries and Comprehensions.. 12 1.5.3 Functions as first-class objects................ 13 1.5.4 Generators........................... 14 1.6 Useful Libraries........................... 16 1.6.1 Timing Code......................... 16 1.6.2 Plotting: Matplotlib..................... 16 1.7 Utilities................................ 18 1.7.1 Display............................. 18 1.7.2 Argmax............................ 19 1.7.3 Probability........................... 20 1.8 Testing Code............................. 21 2 Agent Architectures and Hierarchical Control 25 2.1 Representing Agents and Environments............. 25 2.2 Paper buying agent and environment.............. 27 2.2.1 The Environment....................... 27 2.2.2 The Agent........................... 29 3 4 Contents 2.2.3 Plotting............................ 29 2.3 Hierarchical Controller....................... 31 2.3.1 Environment......................... 31 2.3.2 Body.............................. 32 2.3.3 Middle Layer......................... 34 2.3.4 Top Layer........................... 35 2.3.5 Plotting............................ 36 3 Searching for Solutions 41 3.1 Representing Search Problems.................. 41 3.1.1 Explicit Representation of Search Graph.......... 43 3.1.2 Paths.............................. 45 3.1.3 Example Search Problems.................. 47 3.2 Generic Searcher and Variants................... 53 3.2.1 Searcher............................ 53 3.2.2 GUI for Tracing Search.................... 55 3.2.3 Frontier as a Priority Queue................. 59 3.2.4 A∗ Search........................... 60 3.2.5 Multiple Path Pruning.................... 62 3.3 Branch-and-bound Search..................... 64 4 Reasoning with Constraints 69 4.1 Constraint Satisfaction Problems................. 69 4.1.1 Variables............................ 69 4.1.2 Constraints.......................... 70 4.1.3 CSPs.............................. 71 4.1.4 Examples........................... 74 4.2 A Simple Depth-first Solver.................... 83 4.3 Converting CSPs to Search Problems............... 84 4.4 Consistency Algorithms...................... 86 4.4.1 Direct Implementation of Domain Splitting........ 89 4.4.2 Consistency GUI....................... 91 4.4.3 Domain Splitting as an interface to graph searching... 93 4.5 Solving CSPs using Stochastic Local Search........... 95 4.5.1 Any-conflict.......................... 97 4.5.2 Two-Stage Choice....................... 98 4.5.3 Updatable Priority Queues................. 101 4.5.4 Plotting Run-Time Distributions.............. 102 4.5.5 Testing............................. 103 4.6 Discrete Optimization....................... 105 4.6.1 Branch-and-bound Search.................. 106 5 Propositions and Inference 109 5.1 Representing Knowledge Bases.................. 109 5.2 Bottom-up Proofs (with askables)................. 112 https://aipython.org Version 0.9.13 November 3, 2024 Contents 5 5.3 Top-down Proofs (with askables)................. 114 5.4 Debugging and Explanation.................... 115 5.5 Assumables............................. 119 5.6 Negation-as-failure......................... 122 6 Deterministic Planning 125 6.1 Representing Actions and Planning Problems.......... 125 6.1.1 Robot Delivery Domain................... 126 6.1.2 Blocks World......................... 128 6.2 Forward Planning.......................... 130 6.2.1 Defining Heuristics for a Planner.............. 132 6.3 Regression Planning........................ 135 6.3.1 Defining Heuristics for a Regression Planner....... 137 6.4 Planning as a CSP.......................... 138 6.5 Partial-Order Planning....................... 142 7 Supervised Machine Learning 149 7.1 Representations of Data and Predictions............. 150 7.1.1 Creating Boolean Conditions from Features........ 153 7.1.2 Evaluating Predictions.................... 155 7.1.3 Creating Test and Training Sets............... 157 7.1.4 Importing Data From File.................. 157 7.1.5 Augmented Features..................... 160 7.2 Generic Learner Interface..................... 163 7.3 Learning With No Input Features................. 163 7.3.1 Evaluation........................... 166 7.4 Decision Tree Learning....................... 167 7.5 Cross Validation and Parameter Tuning............. 172 7.6 Linear Regression and Classification............... 174 7.7 Boosting............................... 181 7.7.1 Gradient Tree Boosting.................... 184 8 Neural Networks and Deep Learning 187 8.1 Layers................................ 187 8.1.1 Linear Layer.......................... 188 8.1.2 ReLU Layer.......................... 190 8.1.3 Sigmoid Layer......................... 190 8.2 Feedforward Networks....................... 191 8.3 Improved Optimization...................... 193 8.3.1 Momentum.......................... 193 8.3.2 RMS-Prop........................... 193 8.4 Dropout............................... 194 8.5 Examples............................... 195 9 Reasoning with Uncertainty 201 https://aipython.org Version 0.9.13 November 3, 2024 6 Contents 9.1 Representing Probabilistic Models................ 201 9.2 Representing Factors........................ 201 9.3 Conditional Probability Distributions.............. 203 9.3.1 Logistic Regression...................... 204 9.3.2 Noisy-or............................ 204 9.3.3 Tabular Factors and Prob.................. 205 9.3.4 Decision Tree Representations of Factors......... 206 9.4 Graphical Models.......................... 208 9.4.1 Showing Belief Networks.................. 210 9.4.2 Example Belief Networks.................. 210 9.5 Inference Methods......................... 216 9.5.1 Showing Posterior Distributions.............. 217 9.6 Naive Search............................. 218 9.7 Recursive Conditioning...................... 220 9.8 Variable Elimination........................ 224 9.9 Stochastic Simulation........................ 228 9.9.1 Sampling from a discrete distribution........... 228 9.9.2 Sampling Methods for Belief Network Inference..... 229 9.9.3 Rejection Sampling...................... 230 9.9.4 Likelihood Weighting.................... 231 9.9.5 Particle Filtering....................... 232 9.9.6 Examples........................... 233 9.9.7 Gibbs Sampling........................ 235 9.9.8 Plotting Behavior of Stochastic Simulators........ 236 9.10 Hidden Markov Models...................... 239 9.10.1 Exact Filtering for HMMs.................. 241 9.10.2 Localization.......................... 242 9.10.3 Particle Filtering for HMMs................. 245 9.10.4 Generating Examples.................... 247 9.11 Dynamic Belief Networks..................... 248 9.11.1 Representing Dynamic Belief Networks.......... 249 9.11.2 Unrolling DBNs........................ 253 9.11.3 DBN Filtering......................... 255 10 Learning with Uncertainty 257 10.1 Bayesian Learning......................... 257 10.2 K-means............................... 261 10.3 EM.................................. 266 11 Causality 271 11.1 Do Questions............................ 271 11.2 Counterfactual Reasoning..................... 274 11.2.1 Choosing Deterministic System............... 274 11.2.2 Firing Squad Example.................... 277 https://aipython.org Version 0.9.13 November 3, 2024 Contents 7 12 Planning with Uncertainty 281 12.1 Decision Networks......................... 281 12.1.1 Example Decision Networks................ 283 12.1.2 Decision Functions...................... 289 12.1.3 Recursive Conditioning for decision networks...... 290 12.1.4 Variable elimination for decision networks........ 293 12.2 Markov Decision Processes.................... 296 12.2.1 Problem Domains....................... 297 12.2.2 Value Iteration........................ 305 12.2.3 Value Iteration GUI for Grid Domains........... 306 12.2.4 Asynchronous Value Iteration................ 311 13 Reinforcement Learning 313 13.1 Representing Agents and Environments............. 313 13.1.1 Environments......................... 313 13.1.2 Agents............................. 314 13.1.3 Simulating an Environment-Agent Interaction...... 315 13.1.4 Party Environment...................... 316 13.1.5 Environment from a Problem Domain........... 317 13.1.6 Monster Game Environment................ 318 13.2 Q Learning.............................. 321 13.2.1 Exploration Strategies.................... 323 13.2.2 Testing Q-learning...................... 324 13.3 Q-leaning with Experience Replay................ 326 13.4 Stochastic Policy Learning Agent................. 328 13.5 Model-based Reinforcement Learner............... 330 13.6 Reinforcement Learning with Features.............. 333 13.6.1 Representing Features.................... 334 13.6.2 Feature-based RL learner.................. 337 13.7 GUI for RL.............................. 340 14 Multiagent Systems 345 14.1 Minimax............................... 345 14.1.1 Creating a two-player game................. 345 14.1.2 Minimax and α-β Pruning.................. 348 14.2 Multiagent Learning........................ 350 14.2.1 Simulating Multiagent Interaction with an Environment 350 14.2.2 Example Games........................ 352 14.2.3 Testing Games and Environments............. 353 15 Individuals and Relations 355 15.1 Representing Datalog and Logic Programs........... 355 15.2 Unification.............................. 357 15.3 Knowledge Bases.......................... 358 15.4 Top-down Proof Procedure.................... 360 https://aipython.org Version 0.9.13 November 3, 2024 8 Contents 15.5 Logic Program Example...................... 362 16 Knowledge Graphs and Ontologies 365 16.1 Triple Store.............................. 365 16.2 Integrating Datalog and Triple Store............... 368 17 Relational Learning 371 17.1 Collaborative Filtering....................... 371 17.1.1 Plotting............................ 375 17.1.2 Loading Rating Sets from Files and Websites....... 378 17.1.3 Ratings of top items and users............... 379 17.2 Relational Probabilistic Models.................. 381 18 Version History 387 Bibliography 389 Index 391 https://aipython.org Version 0.9.13 November 3, 2024 Chapter 1 Python for Artificial Intelligence AIPython contains runnable code for the book Artificial Intelligence, foundations of computational agents, 3rd Edition [Poole and Mackworth, 2023]. It has the following design goals: Readability is more important than efficiency, although the asymptotic complexity is not compromised. AIPython is not a replacement for well- designed libraries, or optimized tools. Think of it like a model of an en- gine made of glass, so you can see the inner workings; don’t expect it to power a big truck, but it lets you see how a metal engine can power a truck. It uses as few libraries as possible. A reader only needs to understand Python. Libraries hide details that we make explicit. The only library used is matplotlib for plotting and drawing. 1.1 Why Python? We use Python because Python programs can be close to pseudo-code. It is designed for humans to read. Python is reasonably efficient. Efficiency is usually not a problem for small examples. If your Python code is not efficient enough, a general procedure to improve it is to find out what is taking most of the time, and implement just that part more efficiently in some lower-level language. Most of these lower- level languages interoperate with Python nicely. This will result in much less programming and more efficient code (because you will have more time to optimize) than writing everything in a low-level language. You will not have to do that for the code here if you are using it for larger projects. 9 10 1. Python for Artificial Intelligence 1.2 Getting Python You need Python 3.9 or later (https://python.org/) and a compatible version of matplotlib (https://matplotlib.org/). This code is not compatible with Python 2 (e.g., with Python 2.7). Download and install the latest Python 3 release from https://python.org/ or https://www.anaconda.com/download. This should also install pip3. You can install matplotlib using pip3 install matplotlib in a terminal shell (not in Python). That should “just work”. If not, try using pip instead of pip3. The command python or python3 should then start the interactive Python shell. You can quit Python with a control-D or with quit(). To upgrade matplotlib to the latest version (which you should do if you install a new version of Python) do: pip3 install --upgrade matplotlib We recommend using the enhanced interactive python ipython (https:// ipython.org/) [Pérez and Granger, 2007]. To install ipython after you have installed python do: pip3 install ipython 1.3 Running Python We assume that everything is done with an interactive Python shell. You can either do this with an IDE, such as IDLE that comes with standard Python distributions, or just running ipython3 or python3 (or perhaps just ipython or python) from a shell. Here we describe the most simple version that uses no IDE. If you down- load the zip file, and cd to the “aipython” folder where the.py files are, you should be able to do the following, with user input in bold. The first python command is in the operating system shell; the -i is important to enter interac- tive mode. python -i searchGeneric.py Testing problem 1: 7 paths have been expanded and 4 paths remain in the frontier Path found: A --> C --> B --> D --> G Passed unit test >>> searcher2 = AStarSearcher(searchProblem.acyclic_delivery_problem) #A* >>> searcher2.search() # find first path 16 paths have been expanded and 5 paths remain in the frontier o103 --> o109 --> o119 --> o123 --> r123 >>> searcher2.search() # find next path https://aipython.org Version 0.9.13 November 3, 2024 1.4. Pitfalls 11 21 paths have been expanded and 6 paths remain in the frontier o103 --> b3 --> b4 --> o109 --> o119 --> o123 --> r123 >>> searcher2.search() # find next path 28 paths have been expanded and 5 paths remain in the frontier o103 --> b3 --> b1 --> b2 --> b4 --> o109 --> o119 --> o123 --> r123 >>> searcher2.search() # find next path No (more) solutions. Total of 33 paths expanded. >>> You can then interact at the last prompt. There are many textbooks for Python. The best source of information about python is https://www.python.org/. The documentation is at https://docs. python.org/3/. The rest of this chapter is about what is special about the code for AI tools. We will only use the standard Python library and matplotlib. All of the exer- cises can be done (and should be done) without using other libraries; the aim is for you to spend your time thinking about how to solve the problem rather than searching for pre-existing solutions. 1.4 Pitfalls It is important to know when side effects occur. Often AI programs consider what would/might happen given certain conditions. In many such cases, we don’t want side effects. When an agent acts in the world, side effects are ap- propriate. In Python, you need to be careful to understand side effects. For example, the inexpensive function to add an element to a list, namely append, changes the list. In a functional language like Haskell or Lisp, adding a new element to a list, without changing the original list, is a cheap operation. For example if x is a list containing n elements, adding an extra element to the list in Python (using append) is fast, but it has the side effect of changing the list x. To construct a new list that contains the elements of x plus a new element, without changing the value of x, entails copying the list, or using a different representation for lists. In the searching code, we will use a different representation for lists for this reason. 1.5 Features of Python 1.5.1 f-strings Python can use matching ', ", ''' or """, the latter two respecting line breaks in the string. We use the convention that when the string denotes a unique symbol, we use single quotes, and when it is designed to be for printing, we use double quotes. https://aipython.org Version 0.9.13 November 3, 2024 12 1. Python for Artificial Intelligence We make extensive use of f-strings https://docs.python.org/3/tutorial/ inputoutput.html. In its simplest form f"str1{e1}str2{e2}str3" where e1 and e2 are expressions, is an abbreviation for "str1"+str(e2)+"str2"+str(e2)+"str3" where + is string concatenation, and str is the function that returns a string representation of its expression argument. 1.5.2 Lists, Tuples, Sets, Dictionaries and Comprehensions We make extensive uses of lists, tuples, sets and dictionaries (dicts). See https://docs.python.org/3/library/stdtypes.html One of the nice features of Python is the use of comprehensions1 (and also list, tuple, set and dictionary comprehensions). A generator expression is of the form (fe for e in iter if cond) enumerates the values fe for each e in iter for which cond is true. The “if cond” part is optional, but the “for” and “in” are not optional. Here e is a variable (or a pattern that can be on the left side of =), iter is an iterator, which can generate a stream of data, such as a list, a set, a range object (to enumerate integers between ranges) or a file. cond is an expression that evaluates to either True or False for each e, and fe is an expression that will be evaluated for each value of e for which cond returns True. The result can go in a list or used in another iteration, or can be called directly using next. The procedure next takes an iterator and returns the next element (advancing the iterator); it raises a StopIteration exception if there is no next element. The following shows a simple example, where user input is prepended with >>> >>> [e*e for e in range(20) if e%2==0] [0, 4, 16, 36, 64, 100, 144, 196, 256, 324] >>> a = (e*e for e in range(20) if e%2==0) >>> next(a) 0 >>> next(a) 4 >>> next(a) 16 >>> list(a) [36, 64, 100, 144, 196, 256, 324] 1 https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries https://aipython.org Version 0.9.13 November 3, 2024 1.5. Features of Python 13 >>> next(a) Traceback (most recent call last): File "", line 1, in StopIteration Notice how list(a) continued on the enumeration, and got to the end of it. Comprehensions can also be used for dictionaries. The following code cre- ates an index for list a: >>> a = ["a","f","bar","b","a","aaaaa"] >>> ind = {a[i]:i for i in range(len(a))} >>> ind {'a': 4, 'f': 1, 'bar': 2, 'b': 3, 'aaaaa': 5} >>> ind['b'] 3 which means that 'b' is the 3rd element of the list. The assignment of ind could have also be written as: >>> ind = {val:i for (i,val) in enumerate(a)} where enumerate is a built-in function that, given a dictionary, returns an itera- tor of (index, value) pairs. 1.5.3 Functions as first-class objects Python can create lists and other data structures that contain functions. There is an issue that tricks many newcomers to Python. For a local variable in a function, the function uses the last value of the variable when the function is called, not the value of the variable when the function was defined (this is called “late binding”). This means if you want to use the value a variable has when the function is created, you need to save the current value of that variable. Whereas Python uses “late binding” by default, the alternative that newcom- ers often expect is “early binding”, where a function uses the value a variable had when the function was defined. The following examples show how early binding can be implemented. Consider the following programs designed to create a list of 5 functions, where the ith function in the list is meant to add i to its argument:2 pythonDemo.py — Some tricky examples 11 fun_list1 = [] 12 for i in range(5): 13 def fun1(e): 14 return e+i 15 fun_list1.append(fun1) 2 Numbered lines are Python code available in the code-directory, aipython. The name of the file is given in the gray text above the listing. The numbers correspond to the line numbers in that file. https://aipython.org Version 0.9.13 November 3, 2024 14 1. Python for Artificial Intelligence 16 17 fun_list2 = [] 18 for i in range(5): 19 def fun2(e,iv=i): 20 return e+iv 21 fun_list2.append(fun2) 22 23 fun_list3 = [lambda e: e+i for i in range(5)] 24 25 fun_list4 = [lambda e,iv=i: e+iv for i in range(5)] 26 27 i=56 Try to predict, and then test to see the output, of the output of the following calls, remembering that the function uses the latest value of any variable that is not bound in the function call: pythonDemo.py — (continued) 29 # in Shell do 30 ## ipython -i pythonDemo.py 31 # Try these (copy text after the comment symbol and paste in the Python prompt): 32 # print([f(10) for f in fun_list1]) 33 # print([f(10) for f in fun_list2]) 34 # print([f(10) for f in fun_list3]) 35 # print([f(10) for f in fun_list4]) In the first for-loop, the function fun1 uses i, whose value is the last value it was assigned. In the second loop, the function fun2 uses iv. There is a separate iv variable for each function, and its value is the value of i when the function was defined. Thus fun1 uses late binding, and fun2 uses early binding. fun_list3 and fun_list4 are equivalent to the first two (except fun_list4 uses a different i variable). One of the advantages of using the embedded definitions (as in fun1 and fun2 above) over the lambda is that is it possible to add a __doc__ string, which is the standard for documenting functions in Python, to the embedded defini- tions. 1.5.4 Generators Python has generators which can be used for a form of lazy evaluation – only computing values when needed. The yield command returns a value that is obtained with next. It is typi- cally used to enumerate the values for a for loop or in generators. (The yield command can also be used for coroutines, but AIPython only uses it for gener- ators.) A version of the built-in range, with 2 or 3 arguments (and positive steps) can be implemented as: https://aipython.org Version 0.9.13 November 3, 2024 1.5. Features of Python 15 pythonDemo.py — (continued) 37 def myrange(start, stop, step=1): 38 """enumerates the values from start in steps of size step that are 39 less than stop. 40 """ 41 assert step>0, f"only positive steps implemented in myrange: {step}" 42 i = start 43 while i B J 4 G 3 E H 2 7 B 3 F 4 2 2 red: selected C 3 A 4 D blue: neighbors green: frontier yellow: goal step fine step auto search quit Figure 3.6: SearcherGUI(Searcher, simp delivery graph).go() searchGeneric.py — (continued) 61 # Depth-first search for problem1; do the following: 62 # searcher1 = Searcher(searchExample.problem1) 63 # searcher1.search() # find first solution 64 # searcher1.search() # find next solution (repeat until no solutions) 65 # searcher_sdg = Searcher(searchExample.simp_delivery_graph) 66 # searcher_sdg.search() # find first or next solution Exercise 3.1 Implement breadth-first search. Only add to frontier and/or pop need to be modified to implement a first-in first-out queue. 3.2.2 GUI for Tracing Search This GUI implements most of the functionality of the AISpace.org search app. Figure 3.6 shows the GUI to step through various algorithms. Here the path A → B is being expanded, and the neighbors are E and F. The other nodes at the end of paths of the frontier are C and D. Thus the frontier contains paths to C and D, used to also contain A → B, and now will contain A → B → E and A → B → F. SearcherGUI takes a search class and a problem, and lets one explore the search space after calling go(). A GUI can only be used for one search; at the end of the search the loop ends and the buttons no longer work. https://aipython.org Version 0.9.13 November 3, 2024 56 3. Searching for Solutions This is implemented by redefining display. The search algorithms don’t need to be modified. If you modify them (or create your own), you just have to be careful to use the appropriate number for the display. The first argument to display has the following meanings: 1. a solution has been found 2. what is shown for a “step” on a GUI; here it is assumed to be the path, the neighbors of the end of the path, and the other nodes at the end of paths on the frontier 3. (shown with “fine step” but not with “step”) the frontier and the path selected 4. (shown with “fine step” but not with “step”) the frontier. It is also useful to look at the Python console, as the display information is printed there. searchGUI.py — GUI for search 11 import matplotlib.pyplot as plt 12 from matplotlib.widgets import Button 13 import time 14 15 class SearcherGUI(object): 16 def __init__(self, SearchClass, problem, fontsize=10, 17 colors = {'selected':'red', 'neighbors':'blue', 'frontier':'green', 'goal':'yellow'}): 18 self.problem = problem 19 self.searcher = SearchClass(problem) 20 self.problem.fontsize = fontsize 21 self.colors = colors 22 #self.go() 23 24 def go(self): 25 fig, self.ax = plt.subplots() 26 plt.ion() # interactive 27 self.ax.set_axis_off() 28 plt.subplots_adjust(bottom=0.15) 29 step_butt = Button(plt.axes([0.05,0.02,0.15,0.05]), "step") 30 step_butt.on_clicked(self.step) 31 fine_butt = Button(plt.axes([0.25,0.02,0.15,0.05]), "fine step") 32 fine_butt.on_clicked(self.finestep) 33 auto_butt = Button(plt.axes([0.45,0.02,0.15,0.05]), "auto search") 34 auto_butt.on_clicked(self.auto) 35 quit_butt = Button(plt.axes([0.65,0.02,0.15,0.05]), "quit") 36 quit_butt.on_clicked(self.quit) 37 self.ax.text(0.85,0, '\n'.join(self.colors[a]+": "+a for a in self.colors)) 38 self.problem.show_graph(self.ax, node_color='white') https://aipython.org Version 0.9.13 November 3, 2024 3.2. Generic Searcher and Variants 57 39 self.problem.show_node(self.ax, self.problem.start, self.colors['frontier']) 40 for node in self.problem.nodes: 41 if self.problem.is_goal(node): 42 self.problem.show_node(self.ax, node,self.colors['goal']) 43 plt.show() 44 self.click = 7 # bigger than any display! 45 #while self.click == 0: 46 # plt.pause(0.1) 47 self.searcher.display = self.display 48 try: 49 while self.searcher.frontier: 50 path = self.searcher.search() 51 except ExitToPython: 52 print("Exited") 53 else: 54 print("No more solutions") 55 56 def display(self, level,*args,**nargs): 57 if level E C != 2 E Figure 4.2: csp2.show() 45 csp1 = CSP("csp1", {A, B, C}, 46 [C0, C1, C2]) 47 48 csp1s = CSP("csp1s", {A, B, C}, 49 [C0, C2]) # A= v for v in prediction.values()) else 0 226 if isinstance(prediction, list): 227 prev_val = prediction[actual] 228 return 1 if all(prev_val >= v for v in prediction) else 0 229 else: 230 return 1 if abs(actual-prediction) 1) https://aipython.org Version 0.9.13 November 3, 2024 7.1. Representations of Data and Predictions 159 275 if num_train is not None: 276 # training set is divided into training then text examples 277 # the file is only read once, and the data is placed in appropriate list 278 train = [] 279 for i in range(num_train): # will give an error if insufficient examples 280 train.append(next(data_tuples)) 281 test = list(data_tuples) 282 Data_set.__init__(self,train, test=test, target_index=target_index,header=header) 283 else: # randomly assign training and test examples 284 Data_set.__init__(self,data_tuples, test=None, prob_test=prob_test, 285 target_index=target_index, header=header, seed=seed, target_type=target_type, one_hot=one_hot) The following class is used for datasets where the training and test are in dif- ferent files learnProblem.py — (continued) 287 class Data_from_files(Data_set): 288 def __init__(self, train_file_name, test_file_name, separator=',', 289 has_header=False, target_index=0, one_hot=False, 290 categorical=[], target_type= None, include_only=None): 291 """create a dataset from separate training and file 292 separator is the character that separates the attributes 293 num_train is a number specifying the first num_train tuples are training, or None 294 prob_test is the probability an example should in the test set (if num_train is None) 295 has_header is True if the first line of file is a header 296 target_index specifies which feature is the target 297 one_hot specifies whether categorical features should be encoded as one-hot 298 categorical is a set (or list) of features that should be treated as categorical 299 target_type is either None for automatic detection of target type 300 or one of "numeric", "boolean", "categorical" 301 include_only is a list or set of indexes of columns to include 302 """ 303 with open(train_file_name,'r',newline='') as train_file: 304 with open(test_file_name,'r',newline='') as test_file: 305 # data_all = csv.reader(csvfile,delimiter=separator) # for more complicated CSV files 306 train_data = (line.strip().split(separator) for line in train_file) 307 test_data = (line.strip().split(separator) for line in test_file) 308 if include_only is not None: https://aipython.org Version 0.9.13 November 3, 2024 160 7. Supervised Machine Learning 309 train_data = ([v for (i,v) in enumerate(line) if i in include_only] 310 for line in train_data) 311 test_data = ([v for (i,v) in enumerate(line) if i in include_only] 312 for line in test_data) 313 if has_header: # this assumes the training file has a header and the test file doesn't 314 header = next(train_data) 315 else: 316 header = None 317 train_tuples = [interpret_elements(d) for d in train_data if len(d)>1] 318 test_tuples = [interpret_elements(d) for d in test_data if len(d)>1] 319 Data_set.__init__(self,train_tuples, test_tuples, 320 target_index=target_index, header=header, one_hot=one_hot) When reading from a file all of the values are strings. This next method tries to convert each value into a number (an int or a float) or Boolean, if it is possible. learnProblem.py — (continued) 322 def interpret_elements(str_list): 323 """make the elements of string list str_list numeric if possible. 324 Otherwise remove initial and trailing spaces. 325 """ 326 res = [] 327 for e in str_list: 328 try: 329 res.append(int(e)) 330 except ValueError: 331 try: 332 res.append(float(e)) 333 except ValueError: 334 se = e.strip() 335 if se in ["True","true","TRUE"]: 336 res.append(True) 337 elif se in ["False","false","FALSE"]: 338 res.append(False) 339 else: 340 res.append(e.strip()) 341 return res 7.1.5 Augmented Features Sometimes we want to augment the features with new features computed from the old features (e.g., the product of features). Here we allow the creation of a new dataset from an old dataset but with new features. Note that special https://aipython.org Version 0.9.13 November 3, 2024 7.1. Representations of Data and Predictions 161 cases of these are kernels; mapping the original feature space into a new space, which allow a neat way to do learning in the augmented space for many map- pings (the “kernel trick”). This is beyond the scope of AIPython; those inter- ested should read about support vector machines. A feature is a function of examples. A unary feature constructor takes a fea- ture and returns a new feature. A binary feature combiner takes two features and returns a new feature. learnProblem.py — (continued) 343 class Data_set_augmented(Data_set): 344 def __init__(self, dataset, unary_functions=[], binary_functions=[], include_orig=True): 345 """creates a dataset like dataset but with new features 346 unary_function is a list of unary feature constructors 347 binary_functions is a list of binary feature combiners. 348 include_orig specifies whether the original features should be included 349 """ 350 self.orig_dataset = dataset 351 self.unary_functions = unary_functions 352 self.binary_functions = binary_functions 353 self.include_orig = include_orig 354 self.target = dataset.target 355 Data_set.__init__(self,dataset.train, test=dataset.test, 356 target_index = dataset.target_index) 357 358 def create_features(self): 359 if self.include_orig: 360 self.input_features = self.orig_dataset.input_features.copy() 361 else: 362 self.input_features = [] 363 for u in self.unary_functions: 364 for f in self.orig_dataset.input_features: 365 self.input_features.append(u(f)) 366 for b in self.binary_functions: 367 for f1 in self.orig_dataset.input_features: 368 for f2 in self.orig_dataset.input_features: 369 if f1 != f2: 370 self.input_features.append(b(f1,f2)) The following are useful unary feature constructors and binary feature com- biner. learnProblem.py — (continued) 372 def square(f): 373 """a unary feature constructor to construct the square of a feature 374 """ 375 def sq(e): 376 return f(e)**2 377 sq.__doc__ = f.__doc__+"**2" https://aipython.org Version 0.9.13 November 3, 2024 162 7. Supervised Machine Learning 378 return sq 379 380 def power_feat(n): 381 """given n returns a unary feature constructor to construct the nth power of a feature. 382 e.g., power_feat(2) is the same as square, defined above 383 """ 384 def fn(f,n=n): 385 def pow(e,n=n): 386 return f(e)**n 387 pow.__doc__ = f.__doc__+"**"+str(n) 388 return pow 389 return fn 390 391 def prod_feat(f1,f2): 392 """a new feature that is the product of features f1 and f2 393 """ 394 def feat(e): 395 return f1(e)*f2(e) 396 feat.__doc__ = f1.__doc__+"*"+f2.__doc__ 397 return feat 398 399 def eq_feat(f1,f2): 400 """a new feature that is 1 if f1 and f2 give same value 401 """ 402 def feat(e): 403 return 1 if f1(e)==f2(e) else 0 404 feat.__doc__ = f1.__doc__+"=="+f2.__doc__ 405 return feat 406 407 def neq_feat(f1,f2): 408 """a new feature that is 1 if f1 and f2 give different values 409 """ 410 def feat(e): 411 return 1 if f1(e)!=f2(e) else 0 412 feat.__doc__ = f1.__doc__+"!="+f2.__doc__ 413 return feat Example: learnProblem.py — (continued) 415 # from learnProblem import Data_set_augmented,prod_feat 416 # data = Data_from_file('data/holiday.csv', has_header=True, num_train=19, target_index=-1) 417 # data = Data_from_file('data/iris.data', prob_test=1/3, target_index=-1) 418 ## Data = Data_from_file('data/SPECT.csv', prob_test=0.5, target_index=0) 419 # dataplus = Data_set_augmented(data,[],[prod_feat]) 420 # dataplus = Data_set_augmented(data,[],[prod_feat,neq_feat]) Exercise 7.3 For symmetric properties, such as product, we don’t need both f 1 ∗ f 2 as well as f 2 ∗ f 1 as extra properties. Allow the user to be able to declare https://aipython.org Version 0.9.13 November 3, 2024 7.2. Generic Learner Interface 163 feature constructors as symmetric (by associating a Boolean feature with them). Change construct features so that it does not create both versions for symmetric combiners. 7.2 Generic Learner Interface A learner takes a dataset (and possibly other arguments specific to the method). To get it to learn, we call the learn() method. This implements Displayable so that we can display traces at multiple levels of detail (perhaps with a GUI). learnProblem.py — (continued) 421 from display import Displayable 422 423 class Learner(Displayable): 424 def __init__(self, dataset): 425 raise NotImplementedError("Learner.__init__") # abstract method 426 427 def learn(self): 428 """returns a predictor, a function from a tuple to a value for the target feature 429 """ 430 raise NotImplementedError("learn") # abstract method 7.3 Learning With No Input Features If we make the same prediction for each example, what prediction should we make? This can be used as a naive baseline; if a more sophisticated method does not do better than this, it is not useful. This also provides the base case for some methods, such as decision-tree learning. To run demo to compare different prediction methods on various eval- uation criteria, in folder ”aipython”, load ”learnNoInputs.py”, using e.g., ipython -i learnNoInputs.py, and it prints some test results. There are a few alternatives as to what could be allowed in a prediction: a point prediction, where we are only allowed to predict one of the values of the feature. For example, if the values of the feature are {0, 1} we are only allowed to predict 0 or 1 or of the values are ratings in {1, 2, 3, 4, 5}, we can only predict one of these integers. a point prediction, where we are allowed to predict any value. For exam- ple, if the values of the feature are {0, 1} we may be allowed to predict 0.3, 1, or even 1.7. For all of the criteria we can imagine, there is no point in predicting a value greater than 1 or less that zero (but that doesn’t mean https://aipython.org Version 0.9.13 November 3, 2024 164 7. Supervised Machine Learning we can’t), but it is often useful to predict a value between 0 and 1. If the values are ratings in {1, 2, 3, 4, 5}, we may want to predict 3.4. a probability distribution over the values of the feature. For each value v, we predict a non-negative number pv , such that the sum over all predic- tions is 1. For regression, we do the first of these. For classification, we do the second. The third can be implemented by having multiple indicator functions for the target. Here are some prediction functions that take in an enumeration of values, a domain, and returns a value or dictionary of {value : prediction}. Note that cmedian returns one of the middle values when there are an even number of examples, whereas median gives the average of them (and so cmedian is appli- cable for ordinals that cannot be considered cardinal values). Similarly, cmode picks one of the values when more than one value has the maximum number of elements. learnNoInputs.py — Learning ignoring all input features 11 from learnProblem import Evaluate 12 import math, random, collections, statistics 13 import utilities # argmax for (element,value) pairs 14 15 class Predict(object): 16 """The class of prediction methods for a list of values. 17 Please make the doc strings the same length, because they are used in tables. 18 Note that we don't need self argument, as we are creating Predict objects, 19 To use call Predict.laplace(data) etc.""" 20 21 ### The following return a distribution over values (for classification) 22 def empirical(data, domain=[0,1], icount=0): 23 "empirical dist " 24 # returns a distribution over values 25 counts = {v:icount for v in domain} 26 for e in data: 27 counts[e] += 1 28 s = sum(counts.values()) 29 return {k:v/s for (k,v) in counts.items()} 30 31 def bounded_empirical(data, domain=[0,1], bound=0.01): 32 "bounded empirical" 33 return {k:min(max(v,bound),1-bound) for (k,v) in Predict.empirical(data, domain).items()} 34 35 def laplace(data, domain=[0,1]): 36 "Laplace " # for categorical data 37 return Predict.empirical(data, domain, icount=1) https://aipython.org Version 0.9.13 November 3, 2024 7.3. Learning With No Input Features 165 38 39 def cmode(data, domain=[0,1]): 40 "mode " # for categorical data 41 md = statistics.mode(data) 42 return {v: 1 if v==md else 0 for v in domain} 43 44 def cmedian(data, domain=[0,1]): 45 "median " # for categorical data 46 md = statistics.median_low(data) # always return one of the values 47 return {v: 1 if v==md else 0 for v in domain} 48 49 ### The following return a single prediction (for regression). domain is ignored. 50 51 def mean(data, domain=[0,1]): 52 "mean " 53 # returns a real number 54 return statistics.mean(data) 55 56 def rmean(data, domain=[0,1], mean0=0, pseudo_count=1): 57 "regularized mean" 58 # returns a real number. 59 # mean0 is the mean to be used for 0 data points 60 # With mean0=0.5, pseudo_count=2, same as laplace for [0,1] data 61 # this works for enumerations as well as lists 62 sum = mean0 * pseudo_count 63 count = pseudo_count 64 for e in data: 65 sum += e 66 count += 1 67 return sum/count 68 69 def mode(data, domain=[0,1]): 70 "mode " 71 return statistics.mode(data) 72 73 def median(data, domain=[0,1]): 74 "median " 75 return statistics.median(data) 76 77 all = [empirical, mean, rmean, bounded_empirical, laplace, cmode, mode, median,cmedian] 78 79 # The following suggests appropriate predictions as a function of the target type 80 select = {"boolean": [empirical, bounded_empirical, laplace, cmode, cmedian], 81 "categorical": [empirical, bounded_empirical, laplace, cmode, cmedian], 82 "numeric": [mean, rmean, mode, median]} https://aipython.org Version 0.9.13 November 3, 2024 166 7. Supervised Machine Learning 7.3.1 Evaluation To evaluate a point prediction, we first generate some data from a simple (Bernoulli) distribution, where there are two possible values, 0 and 1 for the target feature. Given prob, a number in the range [0, 1], this generate some training and test data where prob is the probability of each example being 1. To generate a 1 with probability prob, we generate a random number in range [0,1] and return 1 if that number is less than prob. A prediction is computed by applying the pre- dictor to the training data, which is evaluated on the test set. This is repeated num_samples times. Let’s evaluate the predictions of the possible selections according to the different evaluation criteria, for various training sizes. learnNoInputs.py — (continued) 83 def test_no_inputs(error_measures = Evaluate.all_criteria, num_samples=10000, 84 test_size=10, training_sizes= [1,2,3,4,5,10,20,100,1000]): 85 for train_size in training_sizes: 86 results = {predictor: {error_measure: 0 for error_measure in error_measures} 87 for predictor in Predict.all} 88 for sample in range(num_samples): 89 prob = random.random() 90 training = [1 if random.random() 0: 150 random.shuffle(self.dataset.train) 151 self.validation_set = self.dataset.train[-validation_num:] 152 self.training_set = self.dataset.train[:-validation_num] 153 else: 154 self.validation_set = [] 155 self.training_set = self.dataset.train 156 self.layers = [] 157 self.bn = 0 # number of batches run 158 159 def add_layer(self,layer): 160 """add a layer to the network. 161 Each layer gets number of inputs from the previous layers outputs. 162 """ 163 self.layers.append(layer) 164 self.num_outputs = layer.num_outputs 165 166 def predictor(self,ex): 167 """Predicts the value of the first output for example ex. 168 """ 169 values = [f(ex) for f in self.input_features] 170 for layer in self.layers: 171 values = layer.output_values(values) 172 return sigmoid(values) if self.output_type =="boolean" \ 173 else softmax(values, self.dataset.target.frange) if self.output_type == "categorical" \ 174 else values 175 https://aipython.org Version 0.9.13 November 3, 2024 192 8. Neural Networks and Deep Learning 176 def predictor_string(self): 177 return "not implemented" The learn method learns the paremeters of a network. learnNN.py — (continued) 179 def learn(self, epochs=5, batch_size=32, num_iter = None, report_each=10): 180 """Learns parameters for a neural network using stochastic gradient decent. 181 epochs is the number of times through the data (on average) 182 batch_size is the maximum size of each batch 183 num_iter is the number of iterations over the batches 184 - overrides epochs if provided (allows for fractions of epochs) 185 report_each means give the errors after each multiple of that iterations 186 """ 187 self.batch_size = min(batch_size, len(self.training_set)) # don't have batches bigger than training size 188 if num_iter is None: 189 num_iter = (epochs * len(self.training_set)) // self.batch_size 190 #self.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion in Evaluate.all_criteria)) 191 for i in range(num_iter): 192 batch = random.sample(self.training_set, self.batch_size) 193 for e in batch: 194 # compute all outputs 195 values = [f(e) for f in self.input_features] 196 for layer in self.layers: 197 values = layer.output_values(values, training=True) 198 # backpropagate 199 predicted = [sigmoid(v) for v in values] if self.output_type == "boolean"\ 200 else softmax(values) if self.output_type == "categorical"\ 201 else values 202 actuals = indicator(self.dataset.target(e), self.dataset.target.frange) \ 203 if self.output_type == "categorical"\ 204 else [self.dataset.target(e)] 205 errors = [pred-obsd for (obsd,pred) in zip(actuals,predicted)] 206 for layer in reversed(self.layers): 207 errors = layer.backprop(errors) 208 # Update all parameters in batch 209 for layer in self.layers: 210 layer.update() 211 self.bn+=1 212 if (i+1)%report_each==0: 213 self.display(0,self.bn,"\t", 214 "\t\t".join("{:.4f}".format( https://aipython.org Version 0.9.13 November 3, 2024 8.3. Improved Optimization 193 215 self.dataset.evaluate_dataset(self.validation_set, self.predictor, criterion)) 216 for criterion in Evaluate.all_criteria), sep="") 8.3 Improved Optimization 8.3.1 Momentum learnNN.py — (continued) 218 class Linear_complete_layer_momentum(Linear_complete_layer): 219 """a completely connected layer""" 220 def __init__(self, nn, num_outputs, limit=None, alpha=0.9, epsilon = 1e-07, vel0=0): 221 """A completely connected linear layer. 222 nn is a neural network that the inputs come from 223 num_outputs is the number of outputs 224 max_init is the maximum value for random initialization of parameters 225 vel0 is the initial velocity for each parameter 226 """ 227 Linear_complete_layer.__init__(self, nn, num_outputs, limit=limit) 228 # self.weights[o][i] is the weight between input i and output o 229 self.velocity = [[vel0 for inf in range(self.num_inputs+1)] 230 for outf in range(self.num_outputs)] 231 self.alpha = alpha 232 self.epsilon = epsilon 233 234 def update(self): 235 """updates parameters after a batch""" 236 batch_step_size = self.nn.learning_rate / self.nn.batch_size 237 for out in range(self.num_outputs): 238 for inp in range(self.num_inputs+1): 239 self.velocity[out][inp] = self.alpha*self.velocity[out][inp] - batch_step_size * self.delta[out][inp] 240 self.weights[out][inp] += self.velocity[out][inp] 241 self.delta[out][inp] = 0 8.3.2 RMS-Prop learnNN.py — (continued) 243 class Linear_complete_layer_RMS_Prop(Linear_complete_layer): 244 """a completely connected layer""" 245 def __init__(self, nn, num_outputs, limit=None, rho=0.9, epsilon = 1e-07): 246 """A completely connected linear layer. 247 nn is a neural network that the inputs come from 248 num_outputs is the number of outputs https://aipython.org Version 0.9.13 November 3, 2024 194 8. Neural Networks and Deep Learning 249 max_init is the maximum value for random initialization of parameters 250 """ 251 Linear_complete_layer.__init__(self, nn, num_outputs, limit=limit) 252 # self.weights[o][i] is the weight between input i and output o 253 self.ms = [[0 for inf in range(self.num_inputs+1)] 254 for outf in range(self.num_outputs)] 255 self.rho = rho 256 self.epsilon = epsilon 257 258 def update(self): 259 """updates parameters after a batch""" 260 for out in range(self.num_outputs): 261 for inp in range(self.num_inputs+1): 262 gradient = self.delta[out][inp] / self.nn.batch_size 263 self.ms[out][inp] = self.rho*self.ms[out][inp]+ (1-self.rho) * gradient**2 264 self.weights[out][inp] -= self.nn.learning_rate/(self.ms[out][inp]+self.epsilon)**0.5 * gradient 265 self.delta[out][inp] = 0 Exercise 8.1 Implement Adam [see Section 8.2.3 of Poole and Mackworth, 2023]. The implementation is slightly more complex than RMS-Prop. Try it first with the parameter settings of Keras, as reported by Poole and Mackworth. Does it matter if epsilon is inside or outside the square root? How sensitive is the perfor- mace to the parameter settings? 8.4 Dropout Dropout is implemented as a layer. learnNN.py — (continued) 267 from utilities import flip 268 class Dropout_layer(Layer): 269 """Dropout layer 270 """ 271 272 def __init__(self, nn, rate=0): 273 """ 274 rate is fraction of the input units to drop. 0 =< rate < 1 275 """ 276 self.rate = rate 277 Layer.__init__(self, nn) 278 279 def output_values(self, input_values, training=False): 280 """Returns the outputs for the input values. 281 It remembers the input values for the backprop. 282 """ 283 if training: https://aipython.org Version 0.9.13 November 3, 2024 8.5. Examples 195 284 scaling = 1/(1-self.rate) 285 self.mask = [0 if flip(self.rate) else 1 286 for _ in input_values] 287 return [x*y*scaling for (x,y) in zip(input_values, self.mask)] 288 else: 289 return input_values 290 291 def backprop(self,errors): 292 """Returns the derivative of the errors""" 293 return [x*y for (x,y) in zip(errors, self.mask)] 8.5 Examples The following constructs some neural networks (most with one hidden layer). The output is assumed to be Boolean or Real. If it is categorical, the final layer should have the same number of outputs as the number of cetegories (so it can use a softmax). learnNN.py — (continued) 295 #data = Data_from_file('data/mail_reading.csv', target_index=-1) 296 #data = Data_from_file('data/mail_reading_consis.csv', target_index=-1) 297 data = Data_from_file('data/SPECT.csv', prob_test=0.3, target_index=0, seed=12345) 298 #data = Data_from_file('data/iris.data', prob_test=0.2, target_index=-1) # 150 examples approx 120 test + 30 test 299 #data = Data_from_file('data/if_x_then_y_else_z.csv', num_train=8, target_index=-1) # not linearly sep 300 #data = Data_from_file('data/holiday.csv', target_index=-1) #, num_train=19) 301 #data = Data_from_file('data/processed.cleveland.data', target_index=-1) 302 #random.seed(None) 303 304 # nn3 is has a single hidden layer of width 3 305 nn3 = NN(data, validation_proportion = 0) 306 nn3.add_layer(Linear_complete_layer(nn3,3)) 307 #nn3.add_layer(Sigmoid_layer(nn3)) 308 nn3.add_layer(ReLU_layer(nn3)) 309 nn3.add_layer(Linear_complete_layer(nn3,1)) # when using output_type="boolean" 310 #nn3.learn(epochs = 100) 311 312 # nn3do is like nn3 but with dropout on the hidden layer 313 nn3do = NN(data, validation_proportion = 0) 314 nn3do.add_layer(Linear_complete_layer(nn3do,3)) 315 #nn3.add_layer(Sigmoid_layer(nn3)) # comment this or the next 316 nn3do.add_layer(ReLU_layer(nn3do)) 317 nn3do.add_layer(Dropout_layer(nn3do, rate=0.5)) 318 nn3do.add_layer(Linear_complete_layer(nn3do,1)) 319 #nn3do.learn(epochs = 100) https://aipython.org Version 0.9.13 November 3, 2024 196 8. Neural Networks and Deep Learning 320 321 # nn3_rmsp is like nn3 but uses RMS prop 322 nn3_rmsp = NN(data, validation_proportion = 0) 323 nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,3)) 324 #nn3_rmsp.add_layer(Sigmoid_layer(nn3_rmsp)) # comment this or the next 325 nn3_rmsp.add_layer(ReLU_layer(nn3_rmsp)) 326 nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,1)) 327 #nn3_rmsp.learn(epochs = 100) 328 329 # nn3_m is like nn3 but uses momentum 330 mm1_m = NN(data, validation_proportion = 0) 331 mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,3)) 332 #mm1_m.add_layer(Sigmoid_layer(mm1_m)) # comment this or the next 333 mm1_m.add_layer(ReLU_layer(mm1_m)) 334 mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,1)) 335 #mm1_m.learn(epochs = 100) 336 337 # nn2 has a single a hidden layer of width 2 338 nn2 = NN(data, validation_proportion = 0) 339 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,2)) 340 nn2.add_layer(ReLU_layer(nn2)) 341 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,1)) 342 343 # nn5 is has a single hidden layer of width 5 344 nn5 = NN(data, validation_proportion = 0) 345 nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,5)) 346 nn5.add_layer(ReLU_layer(nn5)) 347 nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,1)) 348 349 # nn0 has no hidden layers, and so is just logistic regression: 350 nn0 = NN(data, validation_proportion = 0) #learning_rate=0.05) 351 nn0.add_layer(Linear_complete_layer(nn0,1)) 352 # Or try this for RMS-Prop: 353 #nn0.add_layer(Linear_complete_layer_RMS_Prop(nn0,1)) Figure 8.1 shows the training and test performance on the SPECT dataset for the architectures above. Note the nn5 test had infinite log loss on the test set after about 45,000 steps. The noisiness of the predictions might indicate that the step size is too big. This was produced by the code below: learnNN.py — (continued) 355 from learnLinear import plot_steps 356 from learnProblem import Evaluate 357 358 # To show plots first choose a criterion to use 359 # crit = Evaluate.log_loss 360 # crit = Evaluate.accuracy 361 # plot_steps(learner = nn0, data = data, criterion=crit, num_steps=10000, log_scale=False, legend_label="nn0") 362 # plot_steps(learner = nn2, data = data, criterion=crit, num_steps=10000, log_scale=False, legend_label="nn2") https://aipython.org Version 0.9.13 November 3, 2024 8.5. Examples 197 2.5 2.0 nn0 training Average log loss (bits) 1.5 nn0 test nn2 training nn2 test nn3 training nn3 test nn5 training 1.0 nn5 test 0.5 0 20000 40000 60000 80000 100000 step Figure 8.1: Plotting train and test log loss for various algorithms on SPECT dataset 363 # plot_steps(learner = nn3, data = data, criterion=crit, num_steps=10000, log_scale=False, legend_label="nn3") 364 # plot_steps(learner = nn5, data = data, criterion=crit, num_steps=10000, log_scale=False, legend_label="nn5") 365 366 # for (nn,nname) in [(nn0,"nn0"),(nn2,"nn2"),(nn3,"nn3"),(nn5,"nn5")]: plot_steps(learner = nn, data = data, criterion=crit, num_steps=100000, log_scale=False, legend_label=nname) 367 368 # Print some training examples 369 #for eg in random.sample(data.train,10): print(eg,nn3.predictor(eg)) 370 371 # Print some test examples 372 #for eg in random.sample(data.test,10): print(eg,nn3.predictor(eg)) 373 374 # To see the weights learned in linear layers 375 # nn3.layers.weights 376 # nn3.layers.weights 377 378 # Print test: 379 # for e in data.train: print(e,nn0.predictor(e)) 380 https://aipython.org Version 0.9.13 November 3, 2024 198 8. Neural Networks and Deep Learning 381 def test(data, hidden_widths = , epochs=100, 382 optimizers = [Linear_complete_layer, 383 Linear_complete_layer_momentum, Linear_complete_layer_RMS_Prop]): 384 data.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion in Evaluate.all_criteria)) 385 for optimizer in optimizers: 386 nn = NN(data) 387 for width in hidden_widths: 388 nn.add_layer(optimizer(nn,width)) 389 nn.add_layer(ReLU_layer(nn)) 390 if data.target.ftype == "boolean": 391 nn.add_layer(optimizer(nn,1)) 392 else: 393 error(f"Not implemented: {data.output_type}") 394 nn.learn(epochs) The following tests are on the MNIST digit dataset. The original files are from http://yann.lecun.com/exdb/mnist/. This code assumes you use the csv files from Joseph Redmon (https://pjreddie.com/projects/mnist-in-csv/ or https://github.com/pjreddie/mnist-csv-png or https://www.kaggle.com/datasets/ oddrationale/mnist-in-csv) and put them in the directory../MNIST/. Note that this is very inefficient; you would be better to use Keras or Pytorch. There are 28 ∗ 28 = 784 input units and 512 hidden units, which makes 401,408 pa- rameters for the lowest linear layer. So don’t be surprised if it takes many hours in AIPython (even if it only takes a few seconds in Keras). learnNN.py — (continued) 396 # Simplified version: (6000 training instances) 397 # data_mnist = Data_from_file('../MNIST/mnist_train.csv', prob_test=0.9, target_index=0, boolean_features=False, target_type="categorical") 398 399 # Full version: 400 # data_mnist = Data_from_files('../MNIST/mnist_train.csv', '../MNIST/mnist_test.csv', target_index=0, boolean_features=False, target_type="categorical") 401 402 # nn_mnist = NN(data_mnist, validation_proportion = 0.02, learning_rate=0.001) #validation set = 1200 403 # nn_mnist.add_layer(Linear_complete_layer_RMS_Prop(nn_mnist,512)); nn_mnist.add_layer(ReLU_layer(nn_mnist)); nn_mnist.add_layer(Linear_complete_layer_RMS_Prop(nn_mnist,10)) 404 # start_time = time.perf_counter();nn_mnist.learn(epochs=1, batch_size=128);end_time = time.perf_counter();print("Time:", end_time - start_time,"seconds") #1 epoch 405 # determine test error: 406 # data_mnist.evaluate_dataset(data_mnist.test, nn_mnist.predictor, Evaluate.accuracy) 407 # Print some random predictions: 408 # for eg in random.sample(data_mnist.test,10): https://aipython.org Version 0.9.13 November 3, 2024 8.5. Examples 199 print(data_mnist.target(eg), nn_mnist.predictor(eg), nn_mnist.predictor(eg)[data_mnist.target(eg)]) Exercise 8.2 In the definition of nn3 above, for each of the following, first hy- pothesize what will happen, then test your hypothesis, then explain whether you testing confirms your hypothesis or not. Test it for more than one data set, and use more than one run for each data set. (a) Which fits the data better, having a sigmoid layer or a ReLU layer after the first linear layer? (b) Which is faster to learn, having a sigmoid layer or a ReLU layer after the first linear layer? (Hint: Plot error as a function of steps). (c) What happens if you have both the sigmoid layer and then a ReLU layer after the first linear layer and before the second linear layer? (d) What happens if you have a ReLU layer then a sigmoid layer after the first linear layer and before the second linear layer? (e) What happens if you have neither the sigmoid layer nor a ReLU layer after the first linear layer? Exercise 8.3 Using some of the test set as a validation set, and stopping when the validation error gets worse, which of the settings is the best? Suggest another algorithm (either changing architecure of hyperparameters) which you conjecture would be better. Is it better? https://aipython.org Version 0.9.13 November 3, 2024 Chapter 9 Reasoning with Uncertainty 9.1 Representing Probabilistic Models A probabilistic model uses the same definition of a variable as a CSP (Section 4.1.1, page 69). A variable consists of a name, a domain and an optional (x,y) position (for displaying). The domain of a variable is a list or a tuple, as the ordering matters for some representation of factors. 9.2 Representing Factors A factor is, mathematically, a function from variables into a number; that is, given a value for each of its variable, it gives a number. Factors are used for conditional probabilities, utilities in the next chapter, and are explicitly con- structed by some algorithms (in particular, variable elimination). A variable assignment, or just an assignment, is represented as a {variable : value} dictionary. A factor can be evaluated when all of its variables are as- signed. This is implemented in the can_evaluate method which can be over- ridden for representations that don’t require all variable be assigned (such as decision trees). The method get_value evaluates the factor for an assignment. The assignment can include extra variables not in the factor. This method needs to be defined for every subclass. probFactors.py — Factors for graphical models 11 from display import Displayable 12 import math 13 14 class Factor(Displayable): 15 nextid=0 # each factor has a unique identifier; for printing 16 201 202 9. Reasoning with Uncertainty 17 def __init__(self, variables, name=None): 18 self.variables = variables # list of variables 19 if name: 20 self.name = name 21 else: 22 self.name = f"f{Factor.nextid}" 23 Factor.nextid += 1 24 25 def can_evaluate(self,assignment): 26 """True when the factor can be evaluated in the assignment 27 assignment is a {variable:value} dict 28 """ 29 return all(v in assignment for v in self.variables) 30 31 def get_value(self,assignment): 32 """Returns the value of the factor given the assignment of values to variables. 33 Needs to be defined for each subclass. 34 """ 35 assert self.can_evaluate(assignment) 36 raise NotImplementedError("get_value") # abstract method The method __str__ returns a brief definition (like “f7(X,Y,Z)”).The method to_table returns string representations of a table showing all of the assign- ments of values to variables, and the corresponding value. probFactors.py — (continued) 38 def __str__(self): 39 """returns a string representing a summary of the factor""" 40 return f"{self.name}({','.join(str(var) for var in self.variables)})" 41 42 def to_table(self, variables=None, given={}): 43 """returns a string representation of the factor. 44 Allows for an arbitrary variable ordering. 45 variables is a list of the variables in the factor 46 (can contain other variables)""" 47 if variables==None: 48 variables = [v for v in self.variables if v not in given] 49 else: #enforce ordering and allow for extra variables in ordering 50 variables = [v for v in variables if v in self.variables and v not in given] 51 head = "\t".join(str(v) for v in variables)+"\t"+self.name 52 return head+"\n"+self.ass_to_str(variables, given, variables) 53 54 def ass_to_str(self, vars, asst, allvars): 55 #print(f"ass_to_str({vars}, {asst}, {allvars})") 56 if vars: 57 return "\n".join(self.ass_to_str(vars[1:], {**asst, vars:val}, allvars) 58 for val in vars.domain) https://aipython.org Version 0.9.13 November 3, 2024 9.3. Conditional Probability Distributions 203 59 else: 60 val = self.get_value(asst) 61 val_st = "{:.6f}".format(val) if isinstance(val,float) else str(val) 62 return ("\t".join(str(asst[var]) for var in allvars) 63 + "\t"+val_st) 64 65 __repr__ = __str__ 9.3 Conditional Probability Distributions A conditional probability distribution (CPD) is a factor that represents a con- ditional probability. A CPD representing P(X | Y1... Yk ) is a factor, which given values for X and each Yi returns a number. probFactors.py — (continued) 67 class CPD(Factor): 68 def __init__(self, child, parents): 69 """represents P(variable | parents) 70 """ 71 self.parents = parents 72 self.child = child 73 Factor.__init__(self, parents+[child], name=f"Probability") 74 75 def __str__(self): 76 """A brief description of a factor using in tracing""" 77 if self.parents: 78 return f"P({self.child}|{','.join(str(p) for p in self.parents)})" 79 else: 80 return f"P({self.child})" 81 82 __repr__ = __str__ A constant CPD has no parents, and has probability 1 when the variable has the value specified, and 0 when the variable has a different value. probFactors.py — (continued) 84 class ConstantCPD(CPD): 85 def __init__(self, variable, value): 86 CPD.__init__(self, variable, []) 87 self.value = value 88 def get_value(self, assignment): 89 return 1 if self.value==assignment[self.child] else 0 https://aipython.org Version 0.9.13 November 3, 2024 204 9. Reasoning with Uncertainty 9.3.1 Logistic Regression A logistic regression CPD, for Boolean variable X represents P(X=True | Y1... Yk ), using k + 1 real-valued weights so P(X=True | Y1... Yk ) = sigmoid(w0 + ∑ wi Yi ) i where for Boolean Yi , True is represented as 1 and False as 0. probFactors.py — (continued) 91 from learnLinear import sigmoid, logit 92 93 class LogisticRegression(CPD): 94 def __init__(self, child, parents, weights): 95 """A logistic regression representation of a conditional probability. 96 child is the Boolean (or 0/1) variable whose CPD is being defined 97 parents is the list of parents 98 weights is list of parameters, such that weights[i+1] is the weight for parents[i] 99 weights is the bias. 100 """ 101 assert len(weights) == 1+len(parents) 102 CPD.__init__(self, child, parents) 103 self.weights = weights 104 105 def get_value(self,assignment): 106 assert self.can_evaluate(assignment) 107 prob = sigmoid(self.weights 108 + sum(self.weights[i+1]*assignment[self.parents[i]] 109 for i in range(len(self.parents)))) 110 if assignment[self.child]: #child is true 111 return prob 112 else: 113 return (1-prob) 9.3.2 Noisy-or A noisy-or, for Boolean variable X with Boolean parents Y1... Yk is parametrized by k + 1 parameters p0 , p1 ,... , pk , where each 0 ≤ pi ≤ 1. The semantics is de- fined as though there are k + 1 hidden variables Z0 , Z1... Zk , where P(Z0 ) = p0 and P(Zi | Yi ) = pi for i ≥ 1, and where X is true if and only if Z0 ∨ Z1 ∨ · · · ∨ Zk (where ∨ is “or”). Thus X is false if all of the Zi are false. Intuitively, Z0 is the probability of X when all Yi are false and each Zi is a noisy (probabilistic) mea- sure that Yi makes X true, and X only needs one to make it true. probFactors.py — (continued) 115 class NoisyOR(CPD): 116 def __init__(self, child, parents, weights): https://aipython.org Version 0.9.13 November 3, 2024 9.3. Conditional Probability Distributions 205 117 """A noisy representation of a conditional probability. 118 variable is the Boolean (or 0/1) child variable whose CPD is being defined 119 parents is the list of Boolean (or 0/1) parents 120 weights is list of parameters, such that weights[i+1] is the weight for parents[i] 121 """ 122 assert len(weights) == 1+len(parents) 123 CPD.__init__(self, child, parents) 124 self.weights = weights 125 126 def get_value(self,assignment): 127 assert self.can_evaluate(assignment) 128 probfalse = (1-self.weights)*math.prod(1-self.weights[i+1] 129 for i in range(len(self.parents)) 130 if assignment[self.parents[i]]) 131 if assignment[self.child]: # child is assigned True in assignment 132 return 1-probfalse 133 else: 134 return probfalse 9.3.3 Tabular Factors and Prob A tabular factor is a factor that represents each assignment of values to vari- ables separately. It is represented by a Python array (or Python dict). If the variables are V1 , V2 ,... , Vk , the value of f (V1 = v1 , V2 = v1 ,... , Vk = vk ) is stored in f [v1 ][v2 ]... [vk ]. If the domain of Vi is [0,... , ni − 1] it can be represented as an array. Oth- erwise it can use a dictionary. Python is nice in that it doesn’t care, whether an array or dict is used except when enumerating the values; enumerating a dict gives the keys (the variables) but enumerating an array gives the values. So we had to be careful not to enumerate the values. probFactors.py — (continued) 136 class TabFactor(Factor): 137 138 def __init__(self, variables, values, name=None): 139 Factor.__init__(self, variables, name=name) 140 self.values = values 141 142 def get_value(self, assignment): 143 return self.get_val_rec(self.values, self.variables, assignment) 144 145 def get_val_rec(self, value, variables, assignment): 146 if variables == []: 147 return value 148 else: 149 return self.get_val_rec(value[assignment[variables]], 150 variables[1:],assignment) https://aipython.org Version 0.9.13 November 3, 2024 206 9. Reasoning with Uncertainty Prob is a factor that represents a conditional probability by enumerating all of the values. probFactors.py — (continued) 152 class Prob(CPD,TabFactor): 153 """A factor defined by a conditional probability table""" 154 def __init__(self, var, pars, cpt, name=None): 155 """Creates a factor from a conditional probability table, cpt 156 The cpt values are assumed to be for the ordering par+[var] 157 """ 158 TabFactor.__init__(self, pars+[var], cpt, name) 159 self.child = var 160 self.parents = pars 9.3.4 Decision Tree Representations of Factors A decision tree representation of a conditional probability of a child variable is either: IFeq(var, val, true_cond, false_cond) where true_cond and false_cond are decision trees. true_cond is used if variable var has value val in an assignment; false_cond is used if var has a different value a deterministic functions that has probability 1 if a parent has the same value as the child (using SameAs(parent)) a distribution over the child variable (using Dist(dict)). Note that not all parents need to be assigned to evaluate the decision tree; it only needs a branch down the tree that gives the distribution. probFactors.py — (continued) 162 class ProbDT(CPD): 163 def __init__(self, child, parents, dt): 164 CPD.__init__(self, child, parents) 165 self.dt = dt 166 167 def get_value(self, assignment): 168 return self.dt.get_value(assignment, self.child) 169 170 def can_evaluate(self, assignment): 171 return self.child in assignment and self.dt.can_evaluate(assignment) Decison trees are made up of conditons; here equality of a value and a variable: probFactors.py — (continued) 173 class IFeq: 174 def __init__(self, var, val, true_cond, false_cond): 175 self.var = var https://aipython.org Version 0.9.13 November 3, 2024 9.3. Conditional Probability Distributions 207 176 self.val = val 177 self.true_cond = true_cond 178 self.false_cond = false_cond 179 180 def get_value(self, assignment, child): 181 """ IFeq(var, val, true_cond, false_cond) 182 value of true_cond is used if var has value val in assignment, 183 value of false_cond is used if var has a different value 184 """ 185 if assignment[self.var] == self.val: 186 return self.true_cond.get_value(assignment, child) 187 else: 188 return self.false_cond.get_value(assignment,child) 189 190 def can_evaluate(self, assignment): 191 if self.var not in assignment: 192 return False 193 elif assignment[self.var] == self.val: 194 return self.true_cond.can_evaluate(assignment) 195 else: 196 return self.false_cond.can_evaluate(assignment) The following is a deterministic fuction that is true if the parent has the same value as the child. This is used for deterministic conditional probabilities (as is common for causal models, as described in Chapter 11). probFactors.py — (continued) 198 class SameAs: 199 def __init__(self, parent): 200 """1 when child has same value as parent, otherwise 0""" 201 self.parent = parent 202 203 def get_value(self, assignment, child): 204 return 1 if assignment[child]==assignment[self.parent] else 0 205 206 def can_evaluate(self, assignment): 207 return self.parent in assignment At the leaves are distribitions over the child variable. probFactors.py — (continued) 209 class Dist: 210 def __init__(self, dist): 211 """Dist is an array or dictionary indexed by value of current child""" 212 self.dist = dist 213 214 def get_value(self, assignment, child): 215 return self.dist[assignment[child]] 216 217 def can_evaluate(self, assignment): 218 return True https://aipython.org Version 0.9.13 November 3, 2024 208 9. Reasoning with Uncertainty The following shows a decision representation of the Example 9.18 of Poole and Mackworth. When the Action is to go out, the probability is a function of rain; otherwise it is a function of full. probFactors.py — (continued) 220 ##### A decision tree representation Example 9.18 of AIFCA 3e 221 from variable import Variable 222 223 boolean = [False, True] 224 225 action = Variable('Action', ['go_out', 'get_coffee'], position=(0.5,0.8)) 226 rain = Variable('Rain', boolean, position=(0.2,0.8)) 227 full = Variable('Cup Full', boolean, position=(0.8,0.8)) 228 229 wet = Variable('Wet', boolean, position=(0.5,0.2)) 230 p_wet = ProbDT(wet,[action,rain,full], 231 IFeq(action, 'go_out', 232 IFeq(rain, True, Dist([0.2,0.8]), Dist([0.9,0.1])), 233 IFeq(full, True, Dist([0.4,0.6]), Dist([0.7,0.3])))) 234 235 # See probRC for wetBN which expands this example to a complete network 9.4 Graphical Models A graphical model consists of a set of variables and a set of factors. probGraphicalModels.py — Graphical Models and Belief Networks 11 from display import Displayable 12 from variable import Variable 13 from probFactors import CPD, Prob 14 import matplotlib.pyplot as plt 15 16 class GraphicalModel(Displayable): 17 """The class of graphical models. 18 A graphical model consists of a title, a set of variables and a set of factors. 19 20 vars is a set of variables 21 factors is a set of factors 22 """ 23 def __init__(self, title, variables=None, factors=None): 24 self.title = title 25 self.variables = variables 26 self.factors = factors A belief network (also known as a Bayesian network) is a graphical model where all of the factors are conditional probabilities, and every variable has a conditional probability of it given its parents. This checks the first condi- https://aipython.org Version 0.9.13 November 3, 2024 9.4. Graphical Models 209 tion (that all factors are conditional probabilities), and builds some useful data structures. probGraphicalModels.py — (continued) 28 class BeliefNetwork(GraphicalModel): 29 """The class of belief networks.""" 30 31 def __init__(self, title, variables, factors): 32 """vars is a set of variables 33 factors is a set of factors. All of the factors are instances of CPD (e.g., Prob). 34 """ 35 GraphicalModel.__init__(self, title, variables, factors) 36 assert all(isinstance(f,CPD) for f in factors), factors 37 self.var2cpt = {f.child:f for f in factors} 38 self.var2parents = {f.child:f.parents for f in factors} 39 self.children = {n:[] for n in self.variables} 40 for v in self.var2parents: 41 for par in self.var2parents[v]: 42 self.children[par].append(v) 43 self.topological_sort_saved = None The following creates a topological sort of the nodes, where the parents of a node come before the node in the resulting order. This is based on Kahn’s algorithm from 1962. probGraphicalModels.py — (continued) 45 def topological_sort(self): 46 """creates a topological ordering of variables such that the parents of 47 a node are before the node. 48 """ 49 if self.topological_sort_saved: 50 return self.topological_sort_saved 51 next_vars = {n for n in self.var2parents if not self.var2parents[n] } 52 self.display(3,'topological_sort: next_vars',next_vars) 53 top_order=[] 54 while next_vars: 55 var = next_vars.pop() 56 self.display(3,'select variable',var) 57 top_order.append(var) 58 next_vars |= {ch for ch in self.children[var] 59 if all(p in top_order for p in self.var2parents[ch])} 60 self.display(3,'var_with_no_parents_left',next_vars) 61 self.display(3,"top_order",top_order) 62 assert set(top_order)==set(self.var2parents),(top_order,self.var2parents) 63 self.topologicalsort_saved=top_order 64 return top_order https://aipython.org Version 0.9.13 November 3, 2024 210 9. Reasoning with Uncertainty 4-chain A B C D Figure 9.1: bn 4ch.show() 9.4.1 Showing Belief Networks The show method uses matplotlib to show the graphical structure of a belief network. probGraphicalModels.py — (continued) 66 def show(self, fontsize=10, facecolor='orange'): 67 plt.ion() # interactive 68 ax = plt.figure().gca() 69 ax.set_axis_off() 70 plt.title(self.title, fontsize=fontsize) 71 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5",facecolor=facecolor) 72 for var in self.variables: #reversed(self.topological_sort()): 73 for par in self.var2parents[var]: 74 ax.annotate(var.name, par.position, xytext=var.position, 75 arrowprops={'arrowstyle':' 1: 107 # there are disconnected components 108 self.display(3,"splitting into connected components",comp,"in context",context) 109 return(math.prod(self.prob_search(context,f,eo) for (f,eo) in comp)) 110 else: 111 assert split_order, "split_order should not be empty to get here" 112 total = 0 113 var = split_order 114 self.display(3, "rc branching on", var) 115 for val in var.domain: 116 total += self.prob_search({var:val}|context, factors, split_order[1:]) 117 self.cache[ce] = total 118 self.display(2, "rc branching on", var,"returning", total) 119 return total connected_components returns a list of connected components, where a con- nected component is a set of factors and a set of variables, where the graph that connects variables and factors that involve them is connected. The connected components are built one at a time; with a current connected component. At all times fa

AIPython PDF - Python for Artificial Intelligence

Document Details

Tags

Related

Summary

Full Transcript