Dynamical Systems PDF
Document Details
Uploaded by WellWishersRomanArt3553
2024
Tags
Summary
This document provides an introduction to dynamical systems, focusing on concepts like deterministic systems, chaos, stability/instability of fixed points, and bifurcations. It includes discussions on conservative and dissipative systems, and the application of these ideas in various arenas, including human health and geology.
Full Transcript
03.10 | dynamical systems Created @October 3, 2024 1:17 PM Tags 1. Introduction to Dynamical Systems Deterministic Systems: In dynamical systems, deterministic equations represent the time...
03.10 | dynamical systems Created @October 3, 2024 1:17 PM Tags 1. Introduction to Dynamical Systems Deterministic Systems: In dynamical systems, deterministic equations represent the time evolution of a system without any randomness. The system evolves according to a set rule, meaning no randomness is involved. Despite being deterministic, dynamical systems can exhibit unpredictable behavior over time, particularly when systems exhibit chaotic dynamics. Multidimensional Systems: In real-world applications, systems are multidimensional, meaning they have many interacting parameters (e.g., the temperature of the skin, pH of blood, etc.). These parameters influence each other and evolve over time. The equation dtdx(t)describes the variation of one dimension over time, modeling how a particular state evolves. dx(t)dt\frac{dx(t)}{dt} A trajectory or orbit describes the path a system takes in its phase space, a space representing all possible states of the system. 2. Continuous vs. Discrete Dynamical Systems Continuous Dynamical Systems:dtdx(t)=f(x) The system evolves continuously over time, without discrete steps. The behavior is modeled by a continuous function, often represented as: dx(t)dt=f(x)\frac{dx(t)}{dt} = f(x) Continuous systems often have solutions that are smooth and evolve gradually. 03.10 | dynamical systems 1 Discrete Dynamical Systems (Maps):xn+1=M(xn) In discrete systems, the system evolves in steps, where the time variable n is discrete (e.g., n=0,1,2,…), and the state changes in discrete time intervals. nn n=0,1,2,…n = 0, 1, 2, \dots A discrete dynamical system is expressed as: xn+1=M(xn)x_{n+1} = M(x_n) Intersection points (i.e., specific moments in time) are considered, reducing a continuous system to a discrete one. 3. Chaos and Sensitive Dependence on Initial Conditions Chaos in Dynamical Systems: Even in deterministic systems, chaos can arise, where small changes in initial conditions lead to vastly different outcomes over time, making long- term prediction impossible. Lyapunov Exponent: A system exhibits chaotic behavior if its Lyapunov exponent is positive, indicating exponential divergence between two initially close points. This means that even minute differences in initial conditions can lead to wildly divergent trajectories as time progresses. Example: Logistic Map:xn+1=rxn(1−xn) A simple chaotic system is the logistic map: xn+1=rxn(1−xn)x_{n+1} = r x_n (1 - x_n) This function produces chaos for certain values of the parameter r ∈ (typically r [2.5,4]). rr r ∈[2.5,4]r \in [2.5, 4] 03.10 | dynamical systems 2 Bifurcations occur as r increases, where the system moves from a stable point to periodic oscillations and eventually to chaotic behavior. rr 4. Bifurcations and Attractors Bifurcations: As the parameter r in a system like the logistic map changes, the system may experience bifurcations, where the number of attractors (stable states) doubles, causing the system to transition between different types of behavior (e.g., from stable fixed points to periodic orbits). rr Period Doubling: At certain values of r, the system exhibits a period doubling bifurcation, where the number of periodic attractors doubles (e.g., 1 → 2 → 4 → 8...). rr Types of Attractors: Fixed Point: The system settles at a single state. Limit Cycle: The system oscillates in a periodic cycle between two or more states. Strange Attractors: In chaotic systems, attractors have a fractal structure and no repeating patterns. They are non-periodic and can appear in systems like the Lorenz system or the Henon map. 5. Stability of Fixed Points Fixed Points: A fixed point is a state where the system does not evolve anymore, i.e., dtdx=0. dxdt=0\frac{dx}{dt} = 0 The stability of these points depends on the derivative of the system at the fixed point. 03.10 | dynamical systems 3 If the derivative at the fixed point is less than 1, the point is stable (attracts nearby points). If the derivative is greater than 1, the point is unstable (repels nearby points). If the derivative equals 1, the fixed point is meta-stable, meaning small perturbations may lead to instability. 6. Conservative vs. Dissipative Systems Conservative Systems: A system is conservative if it preserves the total "volume" in phase space over time (i.e., it does not lose energy or change the system's total state). Example: A harmonic oscillator without damping is conservative. Dissipative Systems: A system is dissipative if it loses "volume" in phase space over time (i.e., energy is dissipated). Example: A damped harmonic oscillator loses energy due to friction and is dissipative. Dissipative systems can have attractors where all orbits eventually converge to a single state or periodic behavior. 7. Chaotic Systems and Fractals Chaotic Systems exhibit sensitive dependence on initial conditions. For example, small differences in the initial state of a chaotic system can lead to exponentially diverging behaviors. Strange Attractors in chaotic systems have a fractal dimension and exhibit complex, irregular patterns. These attractors exhibit self-similarity at different scales, a key property of fractals. 8. Symbolic Dynamics and Phase Space Partitioning 03.10 | dynamical systems 4 Symbolic Dynamics: Symbolic dynamics involves dividing the phase space of a system into regions, each of which is assigned a symbol from an alphabet. The system's trajectory can then be described as a symbolic sequence where each symbol corresponds to a region of the phase space that the system visits. Phase Space: The phase space represents all possible states of the system. In chaotic systems, the phase space can be partitioned into symbolic regions, and the trajectory can be traced as a sequence of symbols. 9. Real-World Applications Human Health: In complex biological systems (e.g., heart rhythms or brain dynamics), the characteristic time could represent a time frame relevant to the system’s life cycle (e.g., the human lifespan). Geological Systems: In geology, systems evolve over much longer time scales, like millions of years, and have different dynamics compared to human-scale systems. Living Systems: Living organisms are inherently multidimensional systems with many interacting parameters, such as temperature, pressure, and biochemical states. 10. Summary and Reflections Deterministic Chaos: Chaos theory demonstrates that deterministic systems can exhibit unpredictable and complex behavior, especially when the system has many interacting parameters. Bifurcations and Attractors: The transition from stable states to periodic orbits and chaos is a key feature of dynamical systems. Strange attractors and sensitive dependence on initial conditions are hallmarks of chaotic systems. 03.10 | dynamical systems 5 Fractality in Chaos: Fractal dimensions describe the complexity of chaotic systems, where attractors exhibit self-similar patterns at different scales. Practical Implications: Chaos theory has wide applications in fields like biology, physics, economics, and engineering, where it helps understand complex, non-linear behaviors that evolve over time. Through the study of dynamical systems and chaos, we gain insight into the unpredictability of nature and patterns of complexity that emerge from seemingly simple deterministic rules. 03.10 | dynamical systems 6 10.10 | dynamical systems II (fractal dimensions) Created @October 10, 2024 1:16 PM Tags Comprehensive Synthesis: Fractal Dimensions, Dynamical Systems, and Iterated Function Systems (IFS) 1. Introduction to Dynamical Systems and Fractal Dimensions Dynamical Systems involve the study of systems that evolve over time according to specific rules or equations. These systems can exhibit complex behaviors, including periodicity, chaos, and fractality. Fractal Dimensions are used to describe the complexity of fractals, capturing their self-similarity at different scales. 2. The Logistic Map and Iterated Function Systems (IFS) Logistic Function: The logistic map is a simple, nonlinear function that models population dynamics, commonly expressed as: xn+1=f(xn)=r⋅xn(1−xn) xn+1=f(xn)=r⋅xn(1−xn)x_{n+1} = f(x_n) = r \cdot x_n (1 - x_n) Where xnis the value at iteration n, and r is a control parameter (usually between 2 and 4). xnx_n nn rr 10.10 | dynamical systems II (fractal dimensions) 1 Iterating the function generates a series of values that evolve over time, leading to different behaviors depending on r. rr Iterated Function Systems (IFS): An Iterated Function System (IFS) is a collection of functions applied iteratively to generate complex geometric patterns or fractals. The concept is central to the study of chaotic systems and fractal structures. Bifurcations: As the parameter r increases in the logistic map, the system experiences bifurcations, where the number of attractors (stable states) doubles, leading to chaotic behavior. rr Feigenbaum Constants: Delta (Δ\DeltaΔ): A constant describing the rate of appearance of bifurcations in iterative maps, approximately 4.66. Alpha (α\alphaα): A scaling constant that describes the geometric change between bifurcations, found to be approximately -2.5. 3. Lyapunov Exponent and Sensitivity to Initial Conditions The Lyapunov exponent quantifies the sensitivity to initial conditions in a dynamical system. Positive Lyapunov exponent indicates chaotic behavior, meaning small changes in initial conditions lead to exponentially diverging outcomes. Saddle Points in dynamical systems refer to fixed points that are both attractors and repellers. This concept helps visualize the behavior of systems near equilibrium. The basin of attraction is the region around an attractor in which initial conditions will eventually lead to that attractor. 10.10 | dynamical systems II (fractal dimensions) 2 4. Entropy, Complexity, and Information Entropy measures randomness in a system, whereas complexity refers to the amount of information required to describe the system’s behavior. Periodic systems are deterministic and exhibit low entropy, but chaotic systems with high entropy are harder to predict over time. Complexity and Randomness: Randomness implies all possibilities are equally likely, resulting in maximal entropy. Complexity arises in systems that exhibit non-periodic, deterministic behaviors, such as those found in chaotic or fractal systems. 5. Fractals and Methods for Calculating Fractal Dimensions Fractals: A fractal is a self-similar geometric object whose structure appears similar at different scales. Fractal dimension quantifies how complex a fractal is, typically using several methods such as self-similarity, geometric methods, and box counting. Methods for Calculating Fractal Dimensions: 1. Self-Similarity:N=eDD=logelogN A fractal can be decomposed into smaller parts, each a reduced-scale copy of the whole. The fractal dimension D is determined by how the number of parts N increases with the scale e. The formula is: DD NN ee N=eDN = e^D 10.10 | dynamical systems II (fractal dimensions) 3 Logarithmic relationships are used to estimate the fractal dimension: D=logNlogeD = \frac{\log N}{\log e} 2. Geometric Method:logL(s)=(1−D)logs+b Measures the length or area of the fractal at different scales using a ruler or box of size s. The relationship between the length L(s) and the size s follows a power law: ss L(s)L(s) ss logL(s)=(1−D)logs+b\log L(s) = (1 - D) \log s + b Example: The fractal dimension of the coastline of Great Britain is computed by fitting this logarithmic model. 3. Box Counting Method: The space occupied by a fractal is divided into boxes of size s, and the number of boxes N(s) required to cover the fractal is counted. ss N(s)N(s) As s decreases, the number of boxes increases, and the fractal dimension can be estimated by the slope of the log-log plot of N(s) vs. s. ss N(s)N(s) ss Examples of Fractals: Peano Curve: A continuous curve that fills a 2D space, with fractal dimension D=2. D=2D = 2 Koch Curve: A self-replicating fractal that starts with a triangle and forms a snowflake shape with fractal dimension of approximately 1.26. 10.10 | dynamical systems II (fractal dimensions) 4 Cantor Set: A fractal where the interval is repeatedly divided into smaller intervals, with dimension D=log(2)/log(3). D=log(2)/log(3)D = \log(2)/\log(3) Sierpinski Triangle: A triangular fractal with dimension D=log(3)/log(2). D=log(3)/log(2)D = \log(3)/\log(2) 6. L-System (Lindenmayer System) Lindenmayer Systems (L-systems), introduced by Aristid Lindenmayer in 1968, are used to model the development of living organisms and their growth patterns. An L-system consists of: 1. An alphabet of symbols representing parts of the system (e.g., F,+,−). F,+,−F, +, - 2. An axiom or initial string that defines the starting point. 3. Production rules that describe how symbols are replaced iteratively. 4. A stopping condition that limits the number of iterations. Operations in L-Systems: Turtle Graphics: Used to visualize L-systems through the interpretation of symbols as movements in space (e.g., forward steps or rotations). Example: Koch Curve (Snowflake Fractal) Axiom: F FF Production Rules: F→F+F−F−F+FF \rightarrow F + F - F - F + FF→F+F−F−F+F Angle: δ=60∘ δ=60∘\delta = 60^\circ 10.10 | dynamical systems II (fractal dimensions) 5 7. Applications of Fractals and Dynamical Systems Fractals are widely used in: Nature: Modeling coastlines, mountain ranges, and clouds. Computer Graphics: Generating realistic landscapes and textures. Medicine: Analyzing structures like blood vessels, lungs, and the brain. Physics: Modeling systems with complex, irregular behaviors, like turbulence or diffusion. Dynamical Systems in modeling: Population dynamics in ecology. Perception-action systems such as walking and motor control (limit cycles). Social systems: Modeling behaviors in groups or crowds. 8. Conclusion: Integrating Dynamical Systems and Fractals Dynamical Systems and Fractals are powerful tools for modeling complex systems across various domains. Fractals provide a way to understand and quantify irregular structures and self-similar behaviors across scales. Dynamical Systems help analyze systems that evolve over time, including both periodic and chaotic behaviors. By combining these concepts, we can model phenomena from the microscopic (cellular processes, molecular biology) to the macroscopic (weather patterns, ecological systems), providing insights into how complex systems behave and evolve over time. 10.10 | dynamical systems II (fractal dimensions) 6 philosophy in practice Vida Artificial / Artificial Life (ALife) (MEI) & Modelos de Computação (MCCog) Luı́s Correia DI, Lasige – FCUL philosophy in practice in fact it should read Computational Artificial Life since we do not cover “wet” ALife what is ALife? philosophy in practice Artificial life is a field of study devoted to under- standing life by attempting to abstract the fundamental principles underlying biological phenomena, and recre- ating these dynamics in other media – such as com- puters – making them accessible to new kinds of ex- perimental manipulation and testing Langton (1991) Artificial Life 2, Preface life on Earth: life as we know it artificial life: life as it could be ALife in short philosophy in practice Condensed description Studies synthetic behaviour similar to biologic organisms, in computers, mobile robots and more ALife two families of goals philosophy in practice using artificial life techniques... modelling what can we learn about the living? engineering how can we obtain better solutions to known problems? in this course, we are going to cover some of the techniques, algorithms and models that are used to study questions like these an illustrative example philosophy in practice morphology and neural control under evolution some beings also have Karl Sims ’blokies’ (1994) sensory inputs [external video] evolutionary pressure to move to capture problem in development phase: ’self-propelling beings’ RQs big questions! Artificial life - RQs philosophy in practice evolution of complexity – is there a reason for the arrow of complexity? origins of life – understanding pre-biotic systems (template replicators, etc.) ETI* major transitions in evolution prokaryotes to eukaryotes, solitary to eusocial colony living, single to multi-cellular life * evolutionary transitions in individuality how does cultural evolution influence biological evolution, and vice versa? what are the fundamental requirements for intelligence? Bedau et al (2000) Artificial Life 6:363 course logistics I philosophy in practice Luı́s Correia (T, TP11, TP12) [email protected] course email: [email protected] course logistics II philosophy 2h T + 1.5h TP [TP11, TP12] we need to balance in practice students in the 2 TPs assessment project [70%] groups of 2 2 pg. proposals by 24/Oct (by email); pres TPs 27/Oct complete project by 23/Jan, presentation 26/Jan flash test [20%], 20 questions, 20 mins in last T class reports from TP exercises, 4pg max [10%] highlighting most interesting aspects, from students point of view course logistics III philosophy in practice project free theme on any of course topics, or related Vida Artificial (MEI, MI, MBBC, MEBB,...) implementation project Modelos de Computação (MCCog) implementation project or synthesis report your turn philosophy background in practice study and research interests what you want to learn complementary skills course overview philosophy in practice historical notes dynamical systems cellular automata evolutionary algorithms neural networks swarm models self-organisation mobile robots (as artificial beings) references I introduction to ALife philosophy Christoph Adami, “Introduction to Artificial Life”, in practice Springer Telos, 1998 Cristopher Langton, “Introduction”, pp. 1-47, in Cristopher Langton, ed., Artificial Life, Proceedings of the 1987 Workshop, Los Alamos, Addison-Wesley, 1989 Cristopher Langton, “Introduction”, pp. 3-23, in Langton, Taylor, Farmer, Rasmussen, eds., Artificial Life II, Addison-Wesley, 1992 Christopher Langton ed., Artificial Life: An Overview, 1997 Artificial Life Online – https://alife.org/ references II introduction to ALife Steven Levy, Artificial Life: The Quest for a New Creation, philosophy in practice Panteon Books, 1992 Claus Emmeche, The Garden in the Machine, Princeton University Press, 1991 Erwin Schrödinger, O que é a Vida?, Fragmentos, 1989 (original de 1943) Heins Pagels, Os Sonhos da Razão, Gradiva, Gradiva, 90 (original de 88) Edgar Morin, Para o Pensamento Complexo, in Ciência com Consciência, Publicações Europa- América, 1994 (original de 1990) John Horgan, From Complexity to Perplexity, Scientific American, Jun. 1995, pp. 74 - 79. ALife context philosophy in practice characteristics living systems (& ALife) philosophy in practice autonomy multiple similar components self-organisation local control reproduction evolution emergent properties ALife context philosophy in practice Life forms vs. environments LIFE examples natural artificial natural Bio-organisms Artificial beings Cellular Automata WORLD artificial Virtual reality Virus Sofbots Parallels philosophy in practice Some say As Artificial Intelligence is to intelligence, so Artificial Life is to life... ALife Pre-history... philosophy in practice Vaucanson’s duck (∼1730) Near ancestors philosophy in practice Artificial neuron model (∼1940) Turing’s morphogenesis (∼1950) von Neumann’s cellular automata (∼1950) Grey Walter turtles (∼1950) The turtles by Grey Walter philosophy in practice Machina Speculatrix - Elmer e Elsie Very simple behaviour If collision with an obstacle go away Approach light sources, except if light source too strong go away however, resulting interaction with environment is complex Turtles in action philosophy in practice © Burden Neurological Institute The beginning of ALife philosophy in practice I Workshop on ALIfe in 1987 To join researchers in several scientific domains, apparently with common interest in self-organised systems Organised in Los Alamos, by Christopher Langton (Santa-Fe Institute)... and continuation philosophy in practice Currently Conferences A-Life Conference ECAL - European Conference on Artificial Life SAB - Simulation of Adaptive Behavior Journals Artificial Life Adaptive Behavior Evolutionary Computation Transactions on Systems Man & Cybernetics Transactions on Evolutionary Computation Autonomous Robots ALife vs. AI philosophy in practice appears as opposing ALife ←→ AI behaviour ←→ solution dynamic evolution ←→ final state essentially parallel ←→ essentially sequential action ←→ reasoning Epistemology I/III philosophy in practice what is life? living organism has: Reproduction Evolution Phylogenetic - species Ontogenetic - integration of interactions with environment (learning) - nervous, immune and endocrine systems Epigenetic - individual growing, morphogenesis etc. Epistemology II/III philosophy in practice Is life a property of form or a property of mater? Von Neumann’s self-reproduction model preceded discovery of biological reproduction It is difficult (impossible?) to draw a clear line between life and non-life, but... epistemology III/III simplifying philosophy Pier Luigi Luisi: in practice All that is alive is constituted by cells with properties of: self-maintenance self-generation by internal activity of the organism, consuming: Energy/Nutrients... under life philosophy Self-organised systems in practice Composed of: simple components with emergent properties showing at the system level, but... nonexistent at the component level wider than ALife Ex: Temperature, pressure, turbulence, meteorologic phenomena, economy, society,... Self-organisation defining principles philosophy in practice 1 No external control in nature all SO is embodied ⇒ lessons to take to artificial SO Self-organisation defining principles philosophy in practice 1 No external control 2 Increase in order in nature all SO is embodied ⇒ lessons to take to artificial SO Self-organisation defining principles philosophy in practice 1 No external control 2 Increase in order 3 Adaptability in nature all SO is embodied ⇒ lessons to take to artificial SO Self-organisation defining principles philosophy in practice 1 No external control 2 Increase in order 3 Adaptability 4 Interaction in nature all SO is embodied ⇒ lessons to take to artificial SO Self-organisation defining principles philosophy in practice 1 No external control 2 Increase in order 3 Adaptability 4 Interaction 5 Asynchronism in nature all SO is embodied ⇒ lessons to take to artificial SO by the way of emergence the observer philosophy in practice in most cases: what emerges is in the eye of the beholder situation of syntactic emergence [Cariani] the system in itself did not acquire new capabilities the model of the system does not change to encompass the observed behaviour by the way of emergence the system philosophy in practice What emerges is in the system does it exist in computational systems? situation of semantic emergence [Cariani] the system acquires new capabilities we must change the system’s model to explain the observed behaviour by the way of emergence the system philosophy in practice What emerges is in the system does it exist in computational systems? situation of semantic emergence [Cariani] the system acquires new capabilities we must change the system’s model to explain the observed behaviour In a computational system we do not have more than a Turing machine = finite state machine ⇒ non emergent! ways to approach ALife philosophy in practice WEAK position ALife as a family of models to solve different types of problems interest in knowing what is done in other areas and to apply those ideas STRONG position ALife as a unifying paradigm underlying different life forms interest in finding fundamental parameters (models), common to all forms The other side of the mirror... critics philosophy critic to Unifying Theory or Unifying Principle of ALife and in practice self-organised systems Horgan, 95 ALife is the philosophical heir of AI. As such, it generated more portentous rhetoric than tangible results Most important finding of ALife: It is very difficult to do science in complex systems Compared Granularity life, natural vs. artificial philosophy in practice Nature Artificial sub-atomic particles - atoms e molecules - carbon technology silicon technology - materials technology physico-chemical interactions electric interaction organic molecules integrated circuits cells (neurons) software/hardware organs software integrated organism electronics + mechanics important relations philosophy in practice Evolutionary Algorithms Physics Complex Artificial Dynamic S. Immune S. Life Synthetic Cellular Chemistry Automata Ethology applications philosophy in practice modelling of biological systems modelling of socio–economic systems synthesis of artificial organisms in simulated environments (Tierra, iterated prisoner’s dilemma) and in real environments (mobile robots) solving of complex problmes (alike grid computing)... wet ALife is a wide different(?) area... Neural Nets - Backpropagation Aprendizagem Automática Avançada Luı́s Correia Ciências - ULisboa backprop calc MLP example forward; complete connection Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 1 / 62 backprop calc the additive model of artificial neuron wk0 = bk x0 = 1 wk0 m P yk = ϕ wkixi x1 wk1 i=0 x2 wk2 vk... Σ ϕ(·) yk xm wkm Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 2 / 62 backprop calc MLP multi layer perceptron (MLP) general designation of feedforward Artificial Neural Nets (ANN) nets with several (> 1) layers of neurons (without feedback) signals propagate layer by layer from input to output most common learning model: error backpropagation algorithm, or backpropagation, or backprop errors propagate layer by layer from output to input backprop is a generalisation of LMS Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 3 / 62 backprop calc backprop in two words two steps: forward step - input vector presented and output is computed by forward propagation with constant weights backward step - weights are adjusted from an error signal error = difference between real output and desired output error is propagated backwards adjusting weights layer by layer Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 4 / 62 backprop calc backprop in MLP requirements non linear and differentiable activation function typically sigmoid, such as the logistic function 1 yj = with a>0 1 + exp(−avj ) where vj is the induced local field (weighted sum of all inputs) of neuron j and yj its output or the hyperbolic tangent function yj = a tanh(bvj ) with a, b > 0 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 5 / 62 backprop calc notes on backprop in MLP if activation function was linear, a MLP would reduce to a single neuron(!) hidden layers (neither input nor output layers) allow to increase complexity of processing/classification... and also increase difficulty of analysis backprop is a computationally efficient learning algorithm Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 6 / 62 backprop calc notation to formalise backprop P E (n) error energy (1/2 j ej (n)) Pv j (n) induced local field ( i wji xi ) ej (n) error of neuron j in iteration n xji (n) i-th input of neuron j yj (n) real output ok (n) k-th output of the network dj (n) desired output η learning rate wji (n) synapse weight: L network depth; l = 0, 1,... , L layer output of i to input of j ml number of neurons in layer l ∆wji (n) learning correction m0 input size; mL output size; ϕ(·) activation function usually mL = M Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 7 / 62 backprop calc learning goal in MLP supposing a neuron j in the output layer ej (n) = dj (n) − yj (n) error energy: 1X 2 E (n) = ej (n) 2 j∈C with C as the set of neurons in the output layer designating N as the number of examples in the training set, the average error energy is: N 1 X Eav = E (n) N n=1 learning: minimise Eav by adjusting weights Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 8 / 62 backprop calc backprop algorithm introduction weights adjusted for each input vector as a function of the error repeat for all examples of the training set: 1 epoch repeat for several epochs until stopping criteria the average individual weight change is an estimate of the change needed to minimise the cost function over the training set, Eav Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 9 / 62 backprop calc backprop detailed part I/III the induced local field of output neuron j is m X vj (n) = wji (n)yi (n) i=0 its output: yj (n) = ϕj (vj (n)) alike LMS, backprop applies a correction ∆wji (n) to the weights, proportional to the partial derivative ∂E (n)/∂wji (n) applying the chain rule: ∂E (n) ∂E (n) ∂ej (n) ∂yj (n) ∂vj (n) = ∂wji (n) ∂ej (n) ∂yj (n) ∂vj (n) ∂wji (n) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 10 / 62 backprop calc backprop detailed part II/III these hold: ∂E (n) ∂yj (n) = ej (n) = ϕ0j (vj (n)) ∂ej (n) ∂vj (n) ∂ej (n) ∂vj (n) = −1 = yi (n) ∂yj (n) ∂wji (n) replacing in the equation of ∂E (n)/∂wji (n), we obtain: ∂E (n) = −ej (n)ϕ0j (vj (n))yi (n) ∂wji (n) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 11 / 62 backprop calc backprop detailed part III/III the weight correction is (delta rule): ∂E (n) ∆wji (n) = −η ∂wji (n) or, by replacing the error derivative equation: ∆wji (n) = ηδj (n)yi (n) in which δj (n) is the local gradient defined by: ∂E (n) ∂E (n) ∂ej (n) ∂yj (n) δj (n) = − = ∂vj (n) ∂ej (n) ∂yj (n) ∂vj (n) = ej (n)ϕ0j (vj (n)) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 12 / 62 backprop calc backprop comments output layer local gradient provides sign and magnitude of corrections to do in synaptic weights corrections only depend on local error and activation function derivative usually activation function is identical in all neurons of the network output error computation is immediate Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 13 / 62 backprop calc backprop hidden layers introduction error computation in hidden neuron is harder than in output layer its influence in any single output neuron is shared with other hidden neurons to determine how to change weights of a hidden neuron according to its share of influence in the result is a credit-assignment problem Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 14 / 62 backprop calc backprop hidden layers local gradient calculus I/V given k an output neuron j an hidden layer (just before output) neuron the local gradient may be rewritten: ∂E (n) ∂yj (n) δj (n) = − ∂yj (n) ∂vj (n) ∂E (n) 0 = − ϕ (vj (n)) ∂yj (n) j Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 15 / 62 backprop calc backprop hidden layers local gradient calculus II/V since: 1X 2 E (n) = ek (n) (k is an output neuron) 2 k∈C we obtain: ∂E (n) X ∂ek (n) = ek ∂yj (n) ∂yj (n) k notice indexes j and k Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 16 / 62 backprop calc backprop hidden layers hidden layer neuron error contributions to obtain the error derivative of a hidden layer neuron we need to take into account the contributions of all the output neurons to which it is connected Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 17 / 62 backprop calc backprop hidden layers local gradient calculus III/V by the chain rule: ∂E (n) X ∂ek (n) ∂vk (n) = ek ∂yj (n) ∂vk (n) ∂yj (n) k given that: ek (n) = dk (n) − yk (n) = dk (n) − ϕk (vk (n)) we obtain: ∂ek (n) = −ϕ0k (vk (n)) ∂vk (n) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 18 / 62 backprop calc backprop hidden layers local gradient calculus IV/V noting that: m X vk (n) = wkj (n)yj (n) with m as #inputs of neuron k j=0 the derivative in order to yj (n) becomes: ∂vk (n) = wkj (n) ∂yj (n) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 19 / 62 backprop calc backprop hidden layers local gradient calculus V/V replacing in ∂E (n)/∂yj (n) we get: ∂E (n) X = − ek (n)ϕ0k (vk (n))wkj (n) ∂yj (n) k X = − δk (n)wkj (n) k where: δk (n) = ek (n)ϕ0k (vk (n)) finally, replacing this in the rewritten δj (n) equation: X δj (n) = ϕ0j (vj (n)) δk (n)wkj (n) k Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 20 / 62 backprop calc backprop hidden layers further hidden... the factor X δk (n)wkj (n) k can be considered as the error of a hidden layer neuron it is easy to see that this analysis can be recursively applied to neurons of previous hidden layers... Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 21 / 62 backprop calc enters the activation function logistic I/III 1 yj (n) = ϕj (vj (n)) = 1 + exp(−avj (n)) with a > 0 and − ∞ < vj (n) < ∞ resulting in 0 ≤ yj ≤ 1 the derivative in order to vj (n) is: a exp(−avj (n)) ϕ0j (vj (n)) = [1 + exp(−avj (n))]2 since yj (n) = ϕj (vj (n)) the derivative can be written as ϕ0j (vj (n)) = ayj (n)[1 − yj (n)] Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 22 / 62 backprop calc enters the activation function logistic II/III analysing ϕ0j (vj (n)) = ayj (n)[1 − yj (n)] maximum when yj (n) = 0, 5 minimum when yj (n) = 0 or yj (n) = 1 since ∆w ∝ ϕ0j (vj (n)) max variation in the intermediate zone of the sigmoid least variation with a saturated neuron this feature provides stability to the backprop learning algorithm Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 23 / 62 backprop calc enters the activation function logistic III/III - simplifications for an output neuron, yk (n) = ok (n), therefore the local gradient is: δk (n) = ϕ0k (vk (n))ek (n) = aok (n)[1 − ok (n)][dk (n) − ok (n)] for a hidden layer neuron: X δj (n) = ϕ0j (vj (n)) δk (n)wkj (n) k X = ayj (n)[1 − yj (n)] δk (n)wkj (n) k Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 24 / 62 backprop calc enters the activation function hyperbolic tangent I/II general form: ϕj (vj (n)) = a tanh(bvj (n)) with a, b > 0 constants similar to the logistic function, with a different scale and a bias its derivative is: ϕ0j (vj (n)) = ab sech2 (bvj (n)) = ab(1 − tanh2 (bvj (n)) b = [a − yj (n)][a + yj (n)] a Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 25 / 62 backprop calc enters the activation function hyperbolic tangent II/II for an output neuron: δk (n) = ek (n)ϕ0k (vk (n)) b = [dk (n) − ok (n)][a − ok (n)][a + ok (n)] a for a hidden layer neuron: X δj (n) = ϕ0j (vj (n)) δk (n)wkj (n) k b X = [a − yj (n)][a + yj (n)] δk (n)wkj (n) a k Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 26 / 62 backprop calc enters the activation function rectified linear unit (ReLU) I/II general form: ( 0, if vj (n) < 0 ϕj (vj (n)) = vj (n), if vj (n) ≥ 0 very simple piecewise linear function its derivative is: ( 0, if vj (n) < 0 ϕ0j (vj (n)) = 1, if vj (n) ≥ 0 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 27 / 62 backprop calc enters the activation function ReLU II/II for an output neuron: δk (n) = ek (n)ϕ0k (vk (n)) ( 0, if vk (n) < 0 = dk (n) − ok (n), if vk (n) ≥ 0 for a hidden layer neuron: X δj (n) = ϕ0j (vj (n)) δk (n)wkj (n) k ( 0, if vj (n) < 0 = P k δk (n)wkj (n), if vj (n) ≥ 0 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 28 / 62 backprop calc backpropagation happy ending with both logistic, hyperbolic tangent and ReLU: backprop does not need to compute the activation function nor its derivative! Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 29 / 62 backprop config learning config I/IV η controls learning rate η > quick learning, but unstable (possible oscillations) workaround with a momentum parameter α: ∆wji (n) = α∆wji (n − 1) + ηδj (n)yi (n) generalised delta rule - delta rule as a particular case when α = 0 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 30 / 62 backprop config learning config II/IV generalised delta rule alternative formulation: n X ∆wji (n) = η αn−t δj (t)yi (t) t=0 but δj (t)yi (t) = −∂E (t)/∂wji (t) therefore: n X ∂E (t) ∆wji (n) = −η αn−t ∂wji (t) t=0 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 31 / 62 backprop config learning config III/IV ∆wji (n) is the sum of a series to converge we need 0 ≤ |α| < 1 (usually α > 0) momentum tends to accelerate descend in a downward segment ∂E (n)/∂wji (n) with identical signs in consecutive iterations makes ∆wji (n) grow (i.e. wji (n) adjustment grows) momentum tends to stabilise trajectories with signal changes ∂E (n)/∂wji (n) with opposed signs in successive iterations produces small ∆wji (n) (i.e. small wji (n) adjustments) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 32 / 62 backprop config learning config IV/IV momentum may contribute to better learning and to avoid local minima η may not be constant and we may have as many instances ηji as synapses with ηji = 0 there is no learning in that synapse Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 33 / 62 backprop config improved momentum adaptive momentum estimation (ADAM) computes exponentially decaying average of past gradients similar to a heavy ball with friction instead just heavy (momentum) mt = β1 mt−1 + (1 − β1 )∂E (t)/∂wji (t) vt = β2 vt−1 + (1 − β2 )(∂E (t)/∂wji (t))2 mt : estimate of 1st moment (mean), corrected for bias: m̂t = mt /(1 − β1 ) vt : estimate of second moment (variance), corrected: v̂t = vt /(1 − β2 ) η ∆wji (n) = − √ m̂t v̂t + with suggested values of β1 = 0.9, β2 = 0.999, = 10−8 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 34 / 62 backprop config sequential training an epoch is the presentation of all the set of training examples (x(1), d(1)),... , (x(N ), d(N )) 1 first example x(1) presented at the input and a learning cycle is performed: forward signal propagation and given d(1), the error backpropagation 2 the process is repeated to complete the epoch (x(N ), d(N )) if error not low enough the epoch is repeated in a different (random) order of examples randomisation order makes learning stochastic and avoids limit cycles (around local minima) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 35 / 62 backprop config batch training all weights are updated only once given epoch examples cost function for an epoch as the average square error 1 X X 2 Eav = N ej (n) 2N n=1 i∈C adjustment of synaptic weights according to delta rule: N ∂Eav η X ∂ej (n) ∆wji = −η =− ej (n) ∂wji N ∂wji n=1 with ∂ej (n)/∂wji as before ∆wji adjustment is done once incorporating all the individual errors of the epoch Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 36 / 62 backprop config sequential vs. batch sequential training batch training + simple implementation + easy to determine convergence + being stochastic may avoid local conditions minima + easy to parallelise + in general good results - doesn’t take advantage of - hard to model theoretically redundant examples Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 37 / 62 backprop config stopping criteria backprop has converged when the absolute variation of the average squared error per epoch is small enough “small” maybe from 1% downto 0,1%, or even 0,01%... better alternative: test generalisation capability and stop when it peaks Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 38 / 62 backprop config backprop heuristics I/IX 1 sequential vs. batch sequential simpler and more robust 2 maximise the information content of each example in the training set: use an example producing a large error use an example radically different from others possible problems: distorted example distribution learning of extreme cases may worsen backprop behaviour Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 39 / 62 backprop config backprop heuristics II/IX 3 activation function learning is faster with antisymmetric (odd) activation function: ϕ(−v) = −ϕ(v) logistic is not, but hyperbolic tangent ϕ(v) = a tanh(bv) yes! suggested parameter values: a = 1, 7159 b = 32 which result in: ϕ(1) = 1, ϕ(−1) = −1 ϕ0 (0) = ab = 1, 1424 ' 1 and max ϕ00 (v) at v = 1 Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 40 / 62 backprop config backprop heuristics III/IX 4 desired outputs (dk ) should fall within and at some distance () of the limiting values of ϕ for the case of the hyperbolic tangent in the positive value +a: dk = a − in the negative value −a: dk = −a + if a = 1, 7159 then with = 0, 7159 we obtain dj as ±1 well in the linear zone of ϕ Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 41 / 62 backprop config backprop heuristics IV/IX 5 input normalisation each input should have average null or close to it supposing all inputs positive, all weights in the input layer would tend to vary together input variables should not be correlated PCA may be useful for this scale variables so that their covariances (σXi Xj )† are similar ⇒ learning speed of synaptic weights is similar † σXi Xj = E[(Xi − E[Xi ])(Xj − E[Xj ])] Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 42 / 62 backprop config backprop heuristics V/IX 6 initialisation initial weight values not very large (saturation) nor very small (saddle point in antisymmetric ϕ) assuming null average inputs and unit variance: 2 µX = E[Xi ] = 0 ∧ σX = E[(Xi − µXi )2 ] = E[Xi2 ] = 1 ∀i an supposing non correlated inputs: 1 for i=j E[Xi Xj ] = 0 for i 6= j Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 43 / 62 backprop config backprop heuristics VI/IX initialisation (cont.) supposing initial random uniform synaptic weights with null average µw = E[wji ] = 0 for all (j, i) and variance 2 = E (wji − µw )2 = E[wji 2 σw ] the induced local field average is "m # m X X µv = E[vj ] = E wji xi = E[wji ]E[xi ] = 0 i=1 i=1 Pm (assuming null bias vj = i=1 wji xi ) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 44 / 62 backprop config backprop heuristics VII/IX initialisation (cont.) and the induced local field variance is σv2 = E (Vj − µv )2 = E[Vj2 ] "m m # XX = E wji wjk xi xk i=1 k=1 m XX m = E[wji wjk ]E[xi xk ] i=1 k=1 Xm 2 = E[wji ] i=1 2 = mσw Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 45 / 62 backprop config backprop heuristics VIII/IX for ϕ = tanh a good strategy for initialisation of the synaptic weights is σv = 1 so that, with a and b as previously defined weights will fall along the linear zone of the sigmoid and therefore σw = m−1/2 weight variance reciprocal of the number of synapses of a neuron Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 46 / 62 backprop config backprop heuristics IX/IX 7 learning clues use information about the function to learn to accelerate backprop ex: invariance, symmetry properties 8 learning rate for learning to occur at the same rhythm in all neurons, η should be smaller in the last layers (usually with higher local gradients) neurons with more inputs should have smaller η (inversely proportional to the square root of the nr. of synapses) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 47 / 62 backprop config backprop summary 1 initialisation if no information available, choose synaptic weights with random uniform distribution, null average and adequate variance to be in the linear zone of the sigmoid 2 training present an epoch while performing the learning steps for each example 3 iterations repeat epoch presentation with a different random order until stopping criterium adjust α and η (decreasing their values) with the number of iterations Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 48 / 62 backprop config pros & cons of backprop I/II connectionism (local computation) + metaphor for biological networks + (potential) graceful degradation + easy to parallelise - unrealistic in face of natural neurons + universal approximator of functions + computationally efficient (linear with W ) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 49 / 62 backprop config pros & cons of backprop II/II + robustness - small perturbations only produce small estimate variations - slow convergence stochastic algorithm - local gradient may not point to the minimum of the error surface possible overshoot with high gradient in a single weight, or small adjustments in flat error surfaces - local minima - backprop can get stuck - scaling - time may grow exponentially with nr. inputs try to simplify connections (and avoid fully connected MLP in complex problems) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 50 / 62 backprop config final comments to backprop advantages stem from: local learning method efficient in computing local error derivatives sequential (stochastic) mode vastly more used for simplicity non-linear neurons each one establishes a separation hyperplane the combination of all hyperplanes is iteratively adjusted to separate example patterns minimising classification error on average Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 51 / 62 backprop config references Haykin, S. S. (2009). Neural networks and learning machines. Pearson Education. Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 52 / 62 backprop config heuristics to adjust η to accelerate convergence 1 one η for each parameter 2 each η should be allowed to change in each iteration 3 when error derivative in order to one parameter maintains sign in consecutive iterations, increase respective η 4 when error derivative in order to one parameter alternates sign in consecutive iterations, decrease respective η... meta-learning... Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 53 / 62 backprop config generalisation I/III an ANN generalises if, for an input vector not used in training the result is correct (or nearly so... ) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 54 / 62 backprop config generalisation II/III 3 main factors influence generalisation capability: size and representativity of the training set network architecture problem complexity the last is not controllable! about the others: define the architecture and try to obtain a good training set define the training set and try to obtain a good architecture Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 55 / 62 backprop config generalisation III/III maintaining the architecture... rule of thumb: W N =O where N size of the training set W total of free parameters admissible error to determine the architecture we need to go deeper... Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 56 / 62 backprop config function approximation I/IV universal approximation theorem (UAP) let ϕ(·) be a non constant, bounded and monotonous-increasing function. Let Im0 be the m0 -dimensional unit hypercube [0, 1]m0. The space of the continuous functions in Im0 is denoted by C(Im0 ). Then, given any function f ∈ C(Im0 ) and > 0, exists an integer m1 and sets of real constants αi , βi and wij , where i = 1,... , m1 and j = 1,... , m0 , such that we may define m1 X Xm0 F (x1 ,... , xm0 ) = αi ϕ wij xj + bi i=1 j=1 as an approximation of function f (·); meaning, |F (x1 ,... , xm0 ) − f (x1 ,... , xm0 )| < for all x1 , x2 ,... , xm0 within the input space Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 57 / 62 backprop config function approximation II/IV the universal approximation theorem says that a single hidden layer is sufficient for a MLP to compute a approximation to a given training set x1 , x2 ,... , xm0 and the corresponding desired output f (x1 ,... , xm0 ) however, it does not guarantee optimality of training time, or generalisation Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 58 / 62 backprop config function approximation III/IV an error risk bound of using a MLP with m0 input nodes and m1 hidden layer neurons is [Barron, 1992]: ! Cf2 m m 0 1 R≤O +O log N m1 N where Cf ≈ smoothness of function to learn f expresses a tradeoff between accuracy of best approximation (1st term) - m1 must be large (see also UAP) accuracy of empirical fit to the approximation (2nd term) - ratio m1 /N must be small (for N constant m1 should be small) Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 59 / 62 backprop config function approximation IV/IV with 2 hidden layers learning is more manageable: the first hidden layer extracts local features - partition of input and learning of local features to each one the second hidden layer extracts global features - learns global features combining outputs of neurons in the first hidden layer for a particular region of the output space Back Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 60 / 62 backprop config cross-validation I/II statistics technique very used in machine learning data set divided into: training sets training set used to fit the model validation set used to estimate generalisation, or prediction error of the model to avoid overfit test set used to assess the generalisation error of the obtained model test set is not used for training! Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 61 / 62 backprop config cross-validation II/II early stop method 1 perform a few epochs of training 2 with parameters fixed, present the validation set and measure the error for each of its examples 3 repeat from 1, until validation error increases rule of thumb examples partition: 80% for training set + 20% for validation set (see also k−fold cross validation) Back Luı́s Correia (Ciências - ULisboa) Neural Nets - Backpropagation 62 / 62 Hopfield nets Mills, Correia RNN Hopfield Artificial Life / Vida Artificial & networks Computational methodologies / Metodologias da Computação: Discrete H.N. Hopfield Networks Rob Mills Luı́s Correia Informatics FCUL November 2014 Outline Hopfield nets Mills, Correia RNN Hopfield networks 1 Recurrent Neural Networks Discrete H.N. 2 Hopfield networks 3 Discrete H.N. Outline Hopfield nets Mills, Correia RNN Hopfield networks 1 Recurrent Neural Networks Discrete H.N. 2 Hopfield networks 3 Discrete H.N. Recurrent Neural networks Hopfield nets Mills, Correia Feed forward: Recurrent: RNN transform I → O; without with feedback loops; responds Hopfield memory to input, current state networks Discrete H.N. Recurrent Neural networks Hopfield nets Mills, Correia RNN Hopfield Networks with feedback: networks Local: at the level of the neuron Discrete H.N. Global: between parts of the overall network Representing time (implicitly) Recurrent Neural networks Hopfield nets Mills, Correia RNN Hopfield Networks with feedback: networks Local: at the level of the neuron Discrete H.N. Global: between parts of the overall network Representing time (implicitly) There are problems with the stability of a recurrent network We desire stable networks (in the sense of Lyapunov stability) — non-linear dynamical systems theory Motivation for Associative Memories Pattern storage/recall Hopfield nets Explicit training (Hebb’s rule) Run network to ‘find Mills, Correia out what it knows’ RNN Training Hopfield networks Discrete H.N. Nativ 2007, Trappenberg 2002, Hopfield 1982, Hebb 1949 Motivation for Associative Memories Pattern storage/recall Hopfield nets Explicit training (Hebb’s rule) Run network to ‘find Mills, Correia out what it knows’ RNN Training Hopfield networks Discrete H.N. Recall Nativ 2007, Trappenberg 2002, Hopfield 1982, Hebb 1949 Motivation for Associative Memories Pattern storage/recall Hopfield nets Mills, Correia RNN Hopfield networks Discrete H.N. Watson et al 2011 Dynamical systems Representation Hopfield nets Mills, Correia RNN For an order-N system, we write the state vector as Hopfield networks x(t) = x1 (t), x2 (t),... , xN (t), (1) Discrete H.N. where variables describe the states of a system, as a function of time. In general, first-order differential equations of the form d x(t) = F(x(t)), (2) dt where F is a vector of N non-linear functions. Dynamical systems Orbits Hopfield nets Mills, Correia Flux lines (trajectories from different initial conditions) in a 2D RNN phase space: Hopfield networks Discrete H.N. The velocity is given by the tangent vector, dx/dt Dynamical systems Attractors Hopfield nets Dissipative systems have contractive trajectories (lower dim) Mills, Correia point Chaotic RNN Hopfield limit cycle hyperbolic networks Discrete H.N. Each attractor has a basin of attraction Equilibrium is not necessarily static! Limit cycles are a state of equilibrium Haykin 1999 Neurodynamics = DS + neural networks Characteristics Hopfield nets Mills, Correia RNN Large number of degrees of freedom – Hopfield networks many neurons, with collective dynamics Discrete H.N. Non-linearity – this is essential to provide universal computation Dissipation – Convergence to an attractor over time Noise – neural networks characteristically exhibit noise (but not treated here) Neurodynamics Manipulating attractors Hopfield nets Mills, Correia RNN We desire the dynamical Hopfield attractors to correspond networks to input patterns Discrete H.N. somehow control the positions of the attractors Exploit the energy-minimisation dynamics to perform useful computation Watson et al 2011 Neurodynamics Manipulating attractors Hopfield nets Mills, Correia RNN Hopfield Ideals of an associative memory: networks Let us store a set of patterns {tp } = tip somehow, Discrete H.N. such that when presented with a new pattern s, the network produces the (stored) pattern that most closely resembles s. How can we induce the attractors of a dynamical system that corresponds to such a computational machine? Outline Hopfield nets Mills, Correia RNN Hopfield networks 1 Recurrent Neural Networks Discrete H.N. 2 Hopfield networks 3 Discrete H.N. Hopfield network Hopfield nets Mills, Correia Single layer network all neurons with feedback, and unit delays every neuron RNN Hopfield no self-connections networks Discrete H.N. Hopfield 1982 Hopfield network model definition Hopfield nets Mills, Correia RNN The additive model: Hopfield networks N Discrete H.N. dvj (t) vj (t) X Cj =− + wji ψ(vi (t)) + Ij (3) dt Rj i=1 Matrix of weights is symmetric (wij = wji ∀ i, j) Every neuron has a non-linear function, ψ(.) The non-linear function is invertible, i.e.,, vi = ψ −1 (xi ) Hopfield network model definition Hopfield nets Using a hyperbolic tangent function as the squashing function: Mills, Correia a v 1 − exp(−a v ) i i i i RNN xi = ψi (vi ) = tanh =. (4) Hopfield 2 1 + exp(−ai vi ) networks Discrete H.N. Then the inverse is 1 1−x vi = ψi−1 (xi ) = − log. (5) ai 1+x It is also possible to write as: −1 1−x ψ (xi ) = − log , (6) 1+x 1 −1 ∴ ψi−1 (xi ) = − ψ (xi ). (7) ai Hopfield network energy function Hopfield nets Mills, Correia The energy function can be defined as RNN Hopfield 1 XX X Z xj 1 X networks E =− wji xi xj + ψ −1 (x)dx − Ij xj. (8) Discrete H.N. 2 0 Rj i j j j The dynamics of the network minimise this energy function. Some useful resulting properties: ψj−1 (xj ) monotonically increases with xj ; 2 dxj dt ≥ 0 ∀xj , thus, dE dt ≤ 0. Hopfield network energy function Hopfield nets Mills, Correia RNN Since the energy function monotonically decreases with time, Hopfield networks the trajectory given by that energy function will reach fixed Discrete H.N. points its the state space/energy landscape. (n.b. the H.N. is asymptotically stable.) Key result the fixed-point attractors are minima in the energy function, and vice versa Outline Hopfield nets Mills, Correia RNN Hopfield networks 1 Recurrent Neural Networks Discrete H.N. 2 Hopfield networks 3 Discrete H.N. Discrete Hopfield network Definition Hopfield nets Mills, Correia RNN We use the additive model as per the McCulloch-Pitts network Hopfield networks The output of a neuron at asymptotic values Discrete H.N. ( +1 if vj = ∞ xj = (9) −1 if vj = −∞ and the midpoint of the activation function is ψ(0) = 0 We avoid self-connections, i.e., wj,j = 0 ∀ j We set the bias term Ij = 0 ∀ j Discrete Hopfield network Definition Hopfield nets The energy function can be defined as Mills, Correia 1 XX X Z xj 1 RNN E =− wji xi xj + ψj−1 (x)dx. (10) Hopfield 2 0 Rj i j j networks Discrete H.N. Or we can use the result from (5) to write it as 1 XX X Z xj 1 E =− wji xi xj + ψ −1 (x)dx, (11) 2 0 a R j j i j j and then consider the extreme case where aj = ∞, which then provides the simplified form 1 XX E =− wji xi xj. (12) 2 i j Discrete Hopfield network Activation functions Hopfield nets Mills, Correia RNN With the McCulloch-Pitts activation function, we have Hopfield networks ( Discrete H.N. +1 if vj > 0, xj = (13) −1 if vj < 0, (which is equivalent to the sign function). States can be updated either synchronously or asynchronously. Synch can get into a 2-cycle (assumptions violated) ⇒ asynch ofen more desirable The storage phase in Hopfield networks Hopfield nets Mills, Correia Hebb’s rule: neurons that fire together wire together. RNN wji (t + 1) = wji (t) + αxj (t)xi (t). Hopfield networks Discrete H.N. If we want to store M patterns, each of N binary dimensions, ξm , then according to a generalised Hebb’s rule, the weights should be M 1 X wji = ξmj ξmi , (14) N m where ξmi is the ith component of pattern m. The weight matrix is symmetric Fully connected network, except no self-feedback Hebb 1949 The retrieval phase in Hopfield networks Hopfield nets Mills, Correia What does the network “know”? RNN Hopfield networks Discrete H.N. The retrieval phase in Hopfield networks Hopfield nets Mills, Correia What does the network “know”? RNN Hopfield networks To find out, we assert a state on all neurons xi = ±1: Discrete H.N. xj (0) = ξtest,j , ∀ j. (15) The retrieval phase in Hopfield networks Hopfield nets Mills, Correia What does the network “know”? RNN Hopfield networks To find out, we assert a state on all neurons xi = ±1: Discrete H.N. xj (0) = ξtest,j , ∀ j. (15) Then run the state dynamics for each neuron, N ! X xj (n + 1) = sgn wji xi (n) + bj , (16) i asynchronously, until the network stabilises. → recalled a memory. Discrete Hopfield network Remarks Hopfield nets Mills, Correia RNN How many patterns can a network store? Hopfield — Error rates increase with more patterns, spurious networks attractors arise,... Discrete H.N. — approximate capacity = ∝ N. symmetry in weights is important for stability Desirable characteristics include: associative memory partial recall (/completion) noise removal Applications to comb optimisation... Self-organising Maps Mills, Correia Self-organising Maps Artificial Life / Vida Artificial & The 3 Computational methodologies / Metodologias da Computação: component mechanisms Self-organising Maps Properties example Rob Mills Luı́s Correia Informatics FCUL November 2014 Outline Self-organising Maps Mills, Correia Self-organising Maps 1 Self-