Python for Finance PDF
Document Details
Uploaded by StimulatingDivisionism
Thammasat University
Watthanasak Jeamwatthanachai
Tags
Summary
This document introduces financial engineering, blending Python programming with advanced finance and AI applications. It delves into the historical context of finance and the role of technology, emphasizing the increasing importance of quantitative analysis and computational methods in the field. The document highlights how Python is becoming a crucial language in finance.
Full Transcript
Financial Engineering - Building Bridges Between Finance and Technology Dive into Financial Engineering: Blend Python proficiency, advanced finance, and AI applications for mastery in modern finance. Watthanasak Jeamwatthanachai, BEng MSc PhD EMBA C|CISO CTO, Forti5 Technologies (UK) CTO, ArokaGO...
Financial Engineering - Building Bridges Between Finance and Technology Dive into Financial Engineering: Blend Python proficiency, advanced finance, and AI applications for mastery in modern finance. Watthanasak Jeamwatthanachai, BEng MSc PhD EMBA C|CISO CTO, Forti5 Technologies (UK) CTO, ArokaGO (Thailand) Researcher, National Electronics and Computer Technology Center Introduction to Financial Engineering Navigate the Complexity of Modern Finance In today's intricate and fast-paced financial landscape, the ability to innovate, analyze, and manage risks effectively is paramount. Financial engineering emerges as the cornerstone discipline that blends the rigors of quantitative analysis with the intricacies of financial markets to address these demands. Financial engineering can be envisioned as the marriage between finance and mathematics, where complex financial problems are approached with analytical rigor and computational precision. It encompasses a diverse array of techniques, tools, and methodologies aimed at optimizing financial decision-making, managing risks, and designing innovative financial products. Financial Engineering 2 A Brief History of Finance A Brief History of Finance To better understand the current state of 1. The ancient period (pre-1950) 6. The computational period (2000–2020) finance and the financial industry, it is A period mainly characterized by informal reasoning, This period saw a shift from a theoretical focus in finance to a helpful to have a look at how they have rules of thumb, and the experience of market computational one, driven by advances in both hardware and developed over time. practitioners. software used in finance. The paper by Longstaff and Schwartz (2001)—providing an efficient numerical algorithm to value The history of finance as a scientific field 2. The classical period (1950–1980) American options by Monte Carlo simulation—illustrates this can be divided roughly into three A period characterized by the introduction of formal paradigm shift quite well. Their algorithm is computationally periods according to Rubinstein (2006) reasoning and mathematics to the field. Specialized demanding in that hundreds of thousands of simulations and models (for example, Black and Scholes’s (1973) option multiple ordinary least-squares regressions are required in pricing model) as well as general frameworks (for general to value only a single option. example, Harrison and Kreps’s (1979) risk-neutral pricing approach) were developed during this period. 7. The artificial intelligence period (post-2020) Advances in artificial intelligence (AI) and related success 3. The modern period (1980–2000) stories have spurred interest to make use of the capabilities of This period generated many advances in specific AI in the financial domain. While there are already successful subfields of finance (for example, computational applications of AI in finance, it can be assumed that from 2020 finance) and tackled, among others, important empirical onward there will be a systematic paradigm shift toward AI-first phenomena in the financial markets, such as stochastic finance. AI-first finance describes the shift from simple, in interest rates (for example, Cox, Ingersoll, and Ross general linear, models in finance to the use of advanced (1985)) or stochastic volatility (for example, Heston models and algorithms from AI—such as deep neural networks (1993)). or reinforcement learning—to capture, describe, and explain financial phenomena. Financial Engineering 3 Major Trends in Finance Like many fields, finance has evolved into a formalized Data: In earlier periods, financial data was primarily scientific discipline, driven by the integration of formal sourced from printed publications. With the advent of mathematics, advanced technology, increased data the modern period, electronic financial data sets availability, and improved algorithms, including AI. became prevalent. The computational period, however, These advancements can be summarized by four has seen a massive surge in the availability of high- major trends shaping the evolution of finance. frequency intraday data, which has replaced end-of- day prices as the primary basis for research. This Mathematics: Since the 1950s, finance has evolved into increase in data availability, including open or free data a formalized discipline, systematically using fields like sets, has significantly lowered barriers to entry for linear algebra and stochastic calculus. Markowitz's computational finance, algorithmic trading, and mean-variance portfolio theory (1952) marked a major financial econometrics. breakthrough in quantitative finance, transitioning from the ancient period's informal reasoning to a more Artificial Intelligence: The abundance of financial data, structured, mathematical approach. or "big financial data," necessitates the use of AI algorithms from machine learning, deep learning, and Technology: The widespread use of personal reinforcement learning. Traditional statistical methods computers, workstations, and servers since the late often fall short in dealing with the complexities of 1980s has significantly impacted finance. Initially modern financial markets. AI-based algorithms are limited in power, today's advanced technology allows increasingly essential for uncovering relevant patterns, us to tackle complex financial problems with brute generating insights, and enhancing prediction force, often making specialized models less necessary. capabilities in nonlinear, multidimensional, and ever- The modern approach emphasizes scaling hardware changing financial environments. and using contemporary software with appropriate numerical methods. Furthermore, the powerful hardware available in everyday settings, like dorm rooms and living rooms, now supports high- performance techniques like parallel processing, greatly lowering the barriers to entry for computational and AI-driven finance. Financial Engineering 4 5 Financial Engineering A Four-Languages World Against this background, finance engineering has become a world of four languages: Natural language Today, the English language is the only relevant language in the field when it comes to published research, books, articles, or news. Financial language Like every other field, finance has technical terms, notions, and expressions that describe certain phenomena or ideas that are usually not relevant in other domains. Mathematical language Mathematics is the tool and language of choice when it comes to formalizing the notions and concepts of finance. Programming language As the quote at the beginning of the preface points out, Python as a programming language has become the language of choice in many corners of the financial industry. Broader domain of finance Mathematics + Programming The field of financial engineering is characterized by its interdisciplinary nature, → Computation, Numerical Analysis drawing insights from mathematics, economics, statistics, and computer Science + Mathematics + Programming science. → Scientific Computing This interdisciplinary approach enables financial engineers to tackle multifaceted Finance + Mathematics challenges in finance, leveraging insights from various disciplines to develop holistic → Mathematical Finance solutions. Finance + Mathematics + Programming → Computational Finance Finance + Mathematics + Programming + Data → Quantitative Finance 6 Financial Engineering Technology in Finance With these “rough ideas” of what Python is all about, Technology Spending it makes sense to step back a bit and to briefly Banks and financial institutions together form the contemplate the role of technology in finance. This industry that spends the most on technology on an will put one in a position to better judge the role annual basis. The following statement therefore shows Python already plays and, even more importantly, not only that technology is important for the financial will probably play in the financial industry of the industry, but that the financial industry is also important future. to the technology sector: In a sense, technology per se is nothing special to FRAMINGHAM, Mass., June 14, 2018 – Worldwide financial institutions (as compared, for instance, to spending on information technology (IT) by financial biotechnology companies) or to the finance services firms will be nearly $500 billion in 2021, function (as compared to other corporate functions, growing from $440 billion in 2018, according to new like logistics). However, in recent years, spurred by data from a series of Financial Services IT Spending innovation and regulation, banks and other financial Guides from International Data Corporation (IDC). institutions like hedge funds have evolved more and more into technology companies instead of being Banks and other financial institutions are engaging in a just financial intermediaries. Technology has race to make their business and operating models become a major asset for almost any financial digital: institution around the globe, having the potential to lead to competitive advantages as well as Bank spending on new technologies was predicted to disadvantages. Some background information can amount to 19.9 billion U.S. dol- lars in 2017 in North shed light on the reasons for this development. America. The banks develop current systems and work on new technological solutions to increase their competitiveness on the global market and to attract clients interested in new online and mobile technologies. It is a big opportunity for global fintech companies which provide new ideas and software solutions for the banking industry. Statista Financial Engineering 7 Technology as Enabler The technological development has also As a side effect of the increasing efficiency, contributed to innovations and efficiency competitive advantages must often be looked for improvements in the financial sector. in ever more complex products or transactions. Typically, projects in this area run under the This in turn inherently increases risks and makes umbrella of digitalization. risk management as well as oversight and regulation more and more difficult. The financial services industry has seen drastic technology-led changes over the The financial crisis of 2007 and 2008 tells the story past few years. Many executives look to of potential dangers resulting from such their IT departments to improve efficiency developments. In a similar vein, “algorithms and and facilitate game-changing computers gone wild” represent a potential risk to innovation—while somehow also lowering the financial markets; this materialized costs and continuing to support legacy dramatically in the so-called flash crash of May systems. Meanwhile, FinTech start-ups 2010, where automated selling led to large are encroaching upon established intraday drops in certain stocks and stock indices. markets, leading with customer-friendly We will cover topics related in the algorithmic solutions developed from the ground up trading of financial instruments. and unencumbered by legacy systems. — PwC 19th Annual Global CEO Survey 2016 Technology and Talent as Barriers to Entry On the one hand, technology advances reduce cost Now is different! over time. On the other hand, financial institutions The same computing power that Meriwether had to buy continue to invest heavily in technology to both gain for millions of dollars is today probably available for market share and defend their current positions. thousands or can be rented from a cloud provider To be active today in certain areas in finance often based on a flexible fee plan. brings with it the need for large-scale investments in The budgets for such a professional infrastructure start both technology and skilled staff. at a few USD per month. On the other hand, trading, Not only is it costly and time-consuming to build a pricing, and risk management have become so full-fledged financial instrument, such as derivatives complex for larger financial institutions that today they analytics library, but you also need to have enough need to deploy IT infrastructures with tens of thousands experts to do so. And these experts have to have the of computing cores. right tools and technologies available to accomplish their tasks. With the development of the Python ecosystem, such efforts have become more efficient and budgets in this regard can be reduced significantly today compared to, say, 14 years ago. Meriwether spent $20 million on a state-of-the- art computer system and hired a crack team of financial engineers to run the show at LTCM, which set up shop in Greenwich, Connecticut. It was risk management on an industrial level. —Patterson (2010) Financial Engineering 9 Ever-Increasing Speeds, Frequencies, and Data Volumes The one dimension of the finance industry that Thirty years’ worth of daily stock price data for a has been influenced most by technological single stock represents roughly 7,500 closing advances is the speed and frequency with quotes. In comparison, on a typical trading day which financial transactions are decided and during a single trading hour the stock price of Apple executed. Lewis (2014) describes so-called Inc. (AAPL) may be quoted around 15,000 times— flash trading—i.e., trading at the highest roughly twice the number of quotes compared to speeds possible. In this day we called it “high available end-of-day closing quotes over 30 years. frequency trading” This brings with it a number of challenges: On the one hand, increasing data availability Data processing: it does not suffice to consider on ever-smaller time scales makes it and process end-of-day quotes for stocks or other necessary to react in real time. On the other financial instruments; “too much” happens during hand, the increasing speed and frequency of the day, and for some instruments during 24 hours trading makes the data volumes further for 7 days a week. increase. This leads to processes that reinforce each other and push the average time scale Analytics speed: decisions often have to be made for financial transactions systematically down. in milliseconds or even faster, making it necessary This is a trend that had already started a to build the respective analytics capabilities and to decade ago: analyze large amounts of data in real time. Theoretical foundations: although traditional Renaissance’s Medallion fund gained an finance theories and concepts are far from being astonishing 80 percent in 2008, capitalizing perfect, they have been well tested (and on the market’s extreme volatility with its sometimes well rejected) over time; for the milli- lightning-fast computers. Jim Simons was second and microsecond scales important as of the hedge fund world’s top earner for the today, consistent financial concepts and theories in year, pocketing a cool $2.5 billion. the traditional sense that have proven to be somewhat robust over time are still missing. —Patterson (2010) The Rise of Real-Time Analytics There is one discipline that has seen a strong There are two major challenges that financial increase in importance in the finance industry: institutions face in this context: financial and data analytics. Big data This phenomenon has a close relationship to the Banks and other financial institutions had to deal with insight that speeds, frequencies, and data volumes massive amounts of data even before the term “big increase at a rapid pace in the industry. In fact, real- data” was coined; however, the amount of data that time analytics can be considered the industry’s has to be processed during single analytics tasks has answer to this trend. increased tremendously over time, demanding both increased computing power and ever-larger memory Roughly speaking, “financial and data analytics” and storage capacities. refers to the discipline of applying software and technology in combination with (possibly Real-time economy advanced) algorithms and methods to gather, In the past, decision makers could rely on structured, process, and analyze data in order to gain insights, regular planning as well as decision and (risk) to make decisions, or to fulfill regulatory management processes, whereas they today face the requirements, for instance. need to take care of these functions in real time; several tasks that have been taken care of in the past via Examples might include the estimation of sales overnight batch runs in the back office have now been impacts induced by a change in the pricing moved to the front office and are executed in real time. structure for a financial product in the retail branch of a bank, or the large-scale overnight calculation of One can observe an interplay between advances in credit valuation adjustments (CVA) for complex technology and financial/business practice. On the one portfolios of derivatives trades of an investment hand, there is the need to constantly improve analytics bank. approaches in terms of speed and capability by applying modern technologies. On the other hand, advances on the technology side allow new analytics approaches that were considered impossible (or infeasible due to budget constraints) a couple of years or even months ago. Financial Engineering 11 The Rise of Real-Time Analytics – Tech Trend One major trend in the analytics space has been the utilization of parallel architectures on the central processing unit (CPU) side and massively parallel architectures on the general-purpose graphics processing unit (GPGPU) side. Current GPGPUs have computing cores in the thousands, making necessary a sometimes-radical rethinking of what parallelism might mean to different algorithms. What is still an obstacle in this regard is that users generally have to learn new programming paradigms and techniques to harness the power of such hardware. Financial Engineering 12 2 1 3 SL 4 OP 1 3 4 OP Assessment Policy This program split into 2 assessments Plagiarism 15% Midterm Examination All submission need to be the student’s own work unless mentioned otherwise. You are 15% Final Examination allowed to use ideas and strategies reported in academic papers, case studies, and so on, as 40% Group Coursework long as you acknowledge the papers in your A group of 3-4 people report. Time Series Analysis 15% This is important as any violations, deliberate or otherwise, will be automatically reported to the Portfolio Optimization 10% Academic Integrity Officer. Option Pricing (tentative) 15% 30% Algorithmic Trading Competition A group of 3-4 people You will do your own research, analyze, and design a trading strategies You will write a technical report You will delivery a presentation about your algorithms (strategies) Late Penalties Work submitted up to 5 days after the deadline should be marked as usual, including moderation or second marking, and feedback prepared and given to the student The final agreed mark is then reduced by the factors Financial Engineering 15 Resource Python For Finance - Yves Hilpisch 16 Financial Engineering Getting Started with Python One of the benefits of Python is that it is an SciPy open source language, which holds true for This package is a collection of scientific functions that are the absolute majority of important packages required, for example, to solve typical optimization problems. as well. SciPy This allows for easy installation of the SciPy is a collection of subpackages and functions implementing language and required packages on all major important standard functionality often needed in science or operating systems, such as macOS, Windows, finance; for example, one finds functions for cubic splines and Linux. There are only a few major interpolation as well as for numerical integration. packages that are required for the code of this book and finance in general in addition to matplotlib a basic Python interpreter: This package is the standard package in Python for visualization. It allows you to generate and customize different types of plots, such NumPy as line plots, bar charts, and histograms. This package allows the efficient handling of large, n-dimensional numerical data sets. scikit-learn scikit-learn is a popular machine learning (ML) package that pandas provides a unified application programming interface (API) for This package is primarily for the efficient many different ML algorithms, such as for estimation, classification, handling of tabular data sets, such as or clustering. financial time series data. Although not required for the purposes of this book, pandas PyTables has become one of the most popular Python PyTables is a popular wrapper for the HDF5 data storage package; packages in finance. it is a package to implement optimized, disk-based I/O operations based on a hierarchical database/file format. 17 Financial Engineering Recommended Development Environments for Python Jupyter Notebook/Lab Google Colab Recommendation: For better computation and a Alternative: If you prefer an online and free solution, Google comprehensive development environment, it is Colab is an excellent choice. It allows you to write and highly recommended to use Jupyter Notebook or execute Python code in your browser, and it comes pre- JupyterLab. installed with many popular data science libraries. The main These platforms offer interactive computing advantage is that it requires no installation and is ready to features that enhance productivity, especially for use. However, keep in mind that it may be a bit slower data analysis and visualization tasks. However, note compared to running computations on your local machine. that setting up Jupyter Notebook/Lab requires several steps in the installation process. For installation, you can look at Chapter 2 of the given book 18 Financial Engineering 19 Financial Engineering Familiarize Yourself Let's familiarize ourselves with what we will be doing over these 8 weeks. Instructions: 1. Download the Jupyter notebook for practice. https://drive.google.com/drive/folders/1-DT7uw0nI- zq2730JMdw0ryBjyq8Npgp?usp=share_link 2. Open each notebook one by one on Google Colab. 3. Run the commands and learn from their results. Note that these Notebook require libraries – see Notebook #2 and performs pip install xxx That is all about introduction Don’t forget to get yourself familiarized with Python Financial Engineering 20 Introduction to Financial Engineering Mathematical Tools in Modern Finance Mathematicians are often regarded as the modern-day priests, as Bill Gaede suggests. Since the influx of "Rocket Scientists" on Wall Street in the 1980s and 1990s, finance has transformed into a discipline deeply rooted in applied mathematics. While early research papers in finance contained extensive textual explanations and few mathematical expressions, contemporary papers are primarily composed of mathematical equations and expressions, with some explanatory text. This session introduces several essential mathematical tools for finance, focusing on their practical application using Python rather than providing a detailed theoretical background. The following topics are covered: Financial Engineering 21 22 Financial Engineering Mathematical Tools Approximation Regression and interpolation are among the most frequently used numerical techniques in finance. Convex Optimization Many financial disciplines require tools for convex optimization, such as derivatives analytics for model calibration. Integration The valuation of financial (derivative) assets often involves the evaluation of integrals. Symbolic Computation Python offers SymPy, a powerful package for symbolic mathematics, which can be used to solve equations and systems of equations. Mathematical Tools Approximation Approximation methods are essential in finance for modeling, forecasting, and analyzing data. Two of the most used numerical techniques are regression and interpolation. Regression analysis involves identifying the relationship Interpolation analysis is the process of estimating between a dependent variable and one or more unknown values that fall between known values. It is independent variables. The goal is to model this often used to construct new data points within the relationship to make predictions or understand underlying range of a discrete set of known data points. patterns. 1. Linear Interpolation: Connects two known points 1. Linear Regression: Models the relationship between with a straight line. two variables by fitting a linear equation to the observed data. 2. Polynomial Interpolation: Uses a polynomial to fit through the known data points. Example: Lagrange 2. Multiple Regression: Extends linear regression by using polynomial interpolation. multiple independent variables. 3. Spline Interpolation: Uses piecewise polynomials 3. Logistic Regression: Used when the dependent variable to fit data points. Often used for smoother curves is binary. than polynomial interpolation. Applications in Finance Applications in Finance 1. Risk Management: Predicting the likelihood of default 1. Yield Curve Construction: Estimating interest rates based on financial ratios and other indicators. for maturities that are not directly observable. 2. Portfolio Management: Estimating the expected 2. Option Pricing: Estimating volatility surfaces or other returns based on historical data and economic parameters that vary continuously. indicators. 3. Time Series Analysis: Filling in missing data points 3. Valuation: Estimating the value of assets or companies or smoothing data. using financial metrics and market data. Financial Engineering 23 Mathematical Tools Convex Optimization Convex optimization is a fundamental tool in various Applications in Finance financial disciplines. It involves finding the minimum or maximum of a convex function over a convex set. This 1. Portfolio Optimization: tool is particularly important in finance for tasks such Objective: Maximize return or minimize risk for as portfolio optimization, risk management, and a given level of return. derivatives pricing. Common model: Mean-Variance Optimization by Harry Markowitz. 2. Risk Management: Objective: Minimize potential losses subject to constraints. Example: Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) optimization. 3. Derivatives Pricing: Objective: Calibrate models to market data. Example: Finding the optimal parameters for the Black-Scholes model. Techniques and Methods Linear Programming (LP) Used for optimization problems where both the objective function and the constraints are linear. Quadratic Programming (QP) Involves a quadratic objective function and linear constraints. Financial Engineering 24 Mathematical Tools Convex Optimization In finance and economics, convex This key is that we need to find the global minima – this is like optimization plays an important role. the concept of Deep learning. Examples are the calibration of option pricing models to market data or the optimization of This can be done with in 2 stages, both a global minimization an agent’s utility function. approach and a local one are implemented. Global minimization is like an overall optimization while the local minimization is like a finetune optimization. But this is based on unconstrained optimization – in the real-world scenario, we always have constrained at all time, specially in large classes of economic or financial optimization problems are constrained by one or multiple constraints. As a simple example, consider the utility maximization problem of an (expected utility maximizing) investor who can invest in two risky securities. Both securities cost qa = qb = 10 USD today. After one year, they have a payoff of 15 USD and 5 USD, respectively, in state u, and of 5 USD and 12 USD, respectively, in state d. Both states are equally likely. Denote This figure shows the function graphically for the vector payoffs for the two securities by ra and rb, the defined intervals for x and y. Visual respectively. inspection already reveals that this function The investor has a budget of w_0 = 100 USD to invest and derives has multiple local minima. The existence of a utility from future wealth according to the utility function u(w) = global minimum cannot really be confirmed w1/2, where w is the wealth (USD amount) available. by this graphical representation, but it seems to exist: Financial Engineering 25 Mathematical Tools Integration Integration is a crucial mathematical tool in finance, Methods of Numerical Integration particularly for the valuation of financial (derivative) assets. It involves calculating the area under a curve, Trapezoidal Rule approximates the area under a which can represent various financial quantities, such curve by dividing it into trapezoids. as cumulative returns, probability distributions, or Application: Useful for simple, quick approximations. pricing kernels. Definite Integrals: A definite integral calculates the area under a curve between two points, a and b. Simpson’s Rule uses parabolic arcs instead of straight lines to approximate the area. Application: More accurate than the trapezoidal rule Applications: Calculating total returns over a period, for many functions. finding probabilities in a distribution. Indefinite Integrals: An indefinite integral represents the antiderivative of a function. Monte Carlo Integration uses random sampling to approximate the integral. Application: Particularly useful for high-dimensional Applications: Finding the original function from its rate integrals, such as those in financial simulations. of change, used in differential equations in finance. Applications in Finance: Derivative Pricing (price options), Risk Measures: Value-at-Risk (VaR), Portfolio Optimization Financial Engineering 26 Mathematical Tools Symbolic Computation Symbolic computation involves manipulating mathematical expressions in a symbolic form rather than as numerical values. This approach is particularly useful in finance for solving equations, performing algebraic manipulations, and deriving analytical solutions to problems. Symbolic Mathematics Applications in Finance Symbols: Represent variables and constants 1. Solving Equations: Finding analytical solutions to symbolically. financial models. Example: Solving for the yield to maturity of a bond. Expressions: Combinations of symbols and operations. 2. Deriving Formulas: Deriving pricing formulas for derivatives. Example: Black-Scholes formula for Equations: Mathematical statements that assert the option pricing. equality of two expressions. 3. Algebraic Manipulation: Simplifying complex expressions. Example: Simplifying the expression for portfolio variance. Financial Engineering 27 28 Financial Engineering Stochastics Predictability is not how things will go, but how they can go. — Raheel Farooq Stochastics has become a crucial mathematical and numerical discipline in finance. During the 1970s and 1980s, the primary focus of financial research was to develop closed-form solutions for problems such as option pricing within specific financial models. However, the demands of the financial markets have evolved significantly. Today, it is not only important to accurately value individual financial instruments, but also to consistently value entire portfolios of derivatives. Moreover, consistent risk measures across a financial institution, such as value-at-risk (VaR) and credit valuation adjustments (CVA), require consideration of the institution's entire book and its counterparties. These complex tasks necessitate the use of flexible and efficient numerical methods. As a result, stochastics and Monte Carlo simulation have become prominent in the financial field. Key Topics in Stochastics from a Python Perspective: Random Number, Simulation, Valuation, and Risk Measures Stochastics from a Python Perspective Random Numbers Valuation The foundation of all simulation efforts lies Valuation in finance primarily involves two main disciplines: in pseudo-random numbers. Despite the the valuation of derivatives with European exercise and growing popularity of quasi-random those with American exercise. European options can only be numbers (e.g., based on Sobol sequences) exercised at a specific date, while American options can be in finance, pseudo-random numbers exercised at any time within a specific interval. Additionally, remain the benchmark. there are instruments with Bermudan exercise, which allows for exercise at a finite set of specific dates. Simulation Risk Measures In finance, two primary simulation tasks Simulation techniques are particularly effective for are of particular importance: Simulation calculating various risk measures, including value-at-risk of Random Variables: This involves (VaR), credit value-at-risk (CVaR), and credit valuation generating random variables to model adjustments (CVA). These simulations help in assessing various financial phenomena. Simulation potential losses and managing financial risks of Stochastic Processes: This involves comprehensively. simulating paths of stochastic processes Understanding and implementing these concepts in Python to model the evolution of financial allows for the development of robust financial models and variables over time. simulations, essential for modern financial analysis and risk management. Random Numbers Distributions Not just that one random can solve everything. You need to consider the type of distribution that suit to your data/environment. Pick the right Distribution Beta Distribution Exponential Distribution Model random variables limited to intervals of [0, 1], Models the time between events in a Poisson often used to represent probabilities. process. Use Case: Modeling probabilities. Example: Probability Use Case: Time between events. Example: Time of success in an A/B test. between arrivals at a service point. Binomial Distribution F Distribution Represents the number of successes in a fixed Used primarily in ANOVA for comparing variances. number of independent trials with a constant Use Case: Analysis of variance. Example: Comparing probability of success. variances of two populations. Use Case: Counting successes in Gamma Distribution trials. Example: Number of defective items in a batch. Brief Explanation: Models waiting times for multiple Chi-Square Distribution events. Used in hypothesis testing, especially for tests of Use Case: Modeling waiting times. Example: Time independence and goodness-of-fit. until a certain number of events occur. Use Case: Hypothesis testing. Example: Testing Geometric Distribution independence in a contingency table. Brief Explanation: Models the number of trials needed Dirichlet Distribution for the first success. A multivariate generalization of the beta distribution Use Case: Number of trials to first used as a prior in Bayesian inference. success. Example: Inspections to find the first Use Case: Bayesian inference. Example: Modeling defective product. proportions of different categories. Financial Engineering 32 Pick the right Distribution Gumbel Distribution Lognormal Distribution Used in extreme value theory to model the distribution The lognormal distribution is used to model a of the maximum (or minimum) of a sample. variable whose logarithm is normally distributed. Use Case: Extreme value theory. Example: Maximum Use Case: Stock prices, biological daily rainfall in a year. data. Example: Modeling the distribution of stock prices, which cannot be negative. Hypergeometric Distribution Logseries Distribution Brief Explanation: Models the number of successes in draws without replacement from a finite population. The logseries distribution is a discrete probability distribution. Use Case: Sampling without replacement. Example: Probability of drawing a Use Case: Species abundance, certain number of successes in a lottery. linguistics. Example: Modeling the number of species in ecological studies. Laplace Distribution Multinomial Distribution The Laplace distribution, also known as the double exponential distribution, is used to model data with The multinomial distribution generalizes the binomial heavier tails than the normal distribution. distribution to more than two outcomes. Use Case: Financial modeling, signal Use Case: Categorical data processing. Example: Modeling the returns of an asset analysis. Example: Modeling the outcome of rolling a with heavy-tailed distributions. dice multiple times. Logistic Distribution Multivariate Normal Distribution The logistic distribution is similar to the normal The multivariate normal distribution generalizes the distribution but has heavier tails. normal distribution to multiple dimensions. Use Case: Logistic regression, growth Use Case: Multivariate data models. Example: Modeling the growth of a analysis. Example: Modeling the joint distribution of population. multiple financial returns. Financial Engineering 33 Pick the right Distribution Negative Binomial Distribution Pareto Distribution The negative binomial distribution models the number The Pareto distribution is a power-law probability of failures before a specified number of successes distribution used in various fields. occurs. Use Case: Economics, insurance. Example: Modeling Use Case: Overdispersed count the distribution of wealth. data. Example: Modeling the number of insurance Poisson Distribution claims. The Poisson distribution models the number of events Noncentral Chi-Square Distribution occurring within a fixed interval of time or space. The noncentral chi-square distribution is used in Use Case: Count data. Example: Modeling the hypothesis testing for non-centrality parameters. number of emails received per hour. Use Case: Power analysis in statistics. Example: Power Power Distribution calculations in clinical trials. The power distribution is used to model random Noncentral F Distribution variables with a power-law distribution. The noncentral F distribution is used in the analysis of Use Case: Statistical physics. Example: Modeling the variance with noncentrality parameters. distribution of city sizes. Use Case: Hypothesis testing. Example: Comparing Rayleigh Distribution variances of two populations with a known noncentral parameter. The Rayleigh distribution is used to model the magnitude of a vector with normally distributed Normal Distribution components. The normal distribution, or Gaussian distribution, is the Use Case: Signal processing. Example: Modeling the most commonly used continuous distribution in distribution of wind speeds. statistics. Use Case: Many natural phenomena. Example: Modeling heights of people. Financial Engineering 34 Pick the right Distribution Standard Cauchy Distribution Standard T Distribution The standard Cauchy distribution has heavier tails The t distribution is used in hypothesis testing when than the normal distribution and undefined mean and the sample size is small and the population standard variance. deviation is unknown. Use Case: Robust statistics. Example: Modeling data Use Case: Small sample with outliers. statistics. Example: Calculating confidence intervals for the mean. Standard Exponential Distribution Triangular Distribution The standard exponential distribution models the time between events in a Poisson process. The triangular distribution is a continuous probability distribution with lower, upper, and mode parameters. Use Case: Reliability analysis. Example: Modeling the time between failures of a machine. Use Case: Risk analysis. Example: Modeling uncertain quantities with a known range and most likely value. Standard Gamma Distribution Uniform Distribution The gamma distribution is used to model waiting times and is parameterized by shape and scale. The uniform distribution models a random variable that has equal probability in any interval of the same Use Case: Queuing theory. Example: Modeling the time length within its range. until the next bus arrives. Use Case: Simulations. Example: Modeling a perfectly Standard Normal Distribution random choice between several options. The standard normal distribution is a special case of Von Mises Distribution the normal distribution with a mean of 0 and standard deviation of 1. The von Mises distribution is a continuous probability distribution on the circle. Use Case: Standardized test scores. Example: Modeling the z-scores of test results. Use Case: Circular statistics. Example: Modeling the direction of wind. Financial Engineering 35 Pick the right Distribution Wald Distribution These distributions cover a wide range of applications and are essential tools in various fields of study. The Wald distribution, or inverse Gaussian distribution, is used to model positively skewed data. Each distribution has its unique properties and use cases, making them suitable for specific types of data Use Case: Survival analysis. Example: Modeling the analysis and modeling. time until a stock reaches a certain price. Weibull Distribution The Weibull distribution is used in reliability analysis and survival studies. Use Case: Life data analysis. Example: Modeling the life of a mechanical component. Zipf Distribution The Zipf distribution is used to model the frequency of occurrences of items. Use Case: Linguistics, web traffic. Example: Modeling word frequency in natural languages. Financial Engineering 36 Simulation Monte Carlo simulation (MCS) is one of the What is Monte Carlo Simulation? Computational Burden most important and widely used numerical techniques in finance. Monte Carlo simulation is a method that uses random The flexibility of Monte Carlo simulation comes at a cost. It requires sampling to obtain numerical results. It involves generating a significant amount of computational power because it involves Its popularity arises because it is an extremely a large number of random samples to explore the possible performing hundreds of thousands or even millions of calculations flexible method for evaluating mathematical outcomes of a mathematical model or system. By analyzing to generate a reliable estimate. Each simulation involves a expressions, such as integrals, and is these outcomes, we can make predictions and estimates complex computation, and performing these numerous times can particularly useful for valuing financial about the behavior of the system. be time-consuming and resource-intensive. derivatives like options and futures. Why is it Important in Finance? Example in Finance Key Benefits In finance, Monte Carlo simulation is essential because it Imagine you want to estimate the future price of a stock option. Flexibility: MCS can be applied to a wide can handle complex models and a variety of financial The future price depends on many unpredictable factors like range of problems and can accommodate instruments. Traditional analytical methods might not be changes in stock prices, interest rates, and volatility. By using complex financial models. able to easily or accurately value these instruments due to Monte Carlo simulation, you can create a model that simulates the Accuracy: By increasing the number of their complexity or the need to consider a wide range of stock price's path over time based on random inputs for these simulations, we can improve the accuracy of possible future scenarios. Monte Carlo simulation, on the factors. Running the simulation thousands of times will give you a the estimates. other hand, can simulate a multitude of possible future distribution of possible future prices, from which you can estimate outcomes and provide a robust estimate. the option's value. Versatility: It can be used for various types of financial instruments and risk management purposes. Financial Engineering 37 Financial Data Science Statistic is just a tool I can prove anything by statistics except the truth. —George Canning Statistics is an extensive field, and its tools and results have become indispensable for finance. This explains the popularity of domain-specific languages like R in the finance industry. As statistical models become more elaborate and complex, the need for easy- to-use and high-performing computational solutions grows. Given the richness and depth of statistics, it is essential to focus on selected topics that are important or provide a good starting point for using Python for specific tasks. This discussion has four focal points: Normality Tests, Portfolio Optimization, Bayesian Statistics, and Machine Learning Financial Engineering 38 Why Statistics Statistics might seem like a complex field at first, but it plays a significant role in our daily lives, often in ways we don't even realize. Let's start by understanding how we use basic statistical thinking in everyday decisions. Example: Weather Patterns and Carrying Other Everyday Examples of Statistics: an Umbrella Shopping and Discounts: Imagine you’re deciding whether to carry an Scenario: Stores often offer discounts at certain umbrella when you leave the house. You times of the year. If you notice that items are might check the weather forecast, which is usually discounted by 20% around the holidays, based on statistical analysis of past weather you might decide to delay your purchase until patterns. then. Past Weather Data: Meteorologists use Statistical Thinking: You are using past data historical weather data to predict future (previous discounts) to predict future events conditions. For example, if it rained on 8 out (upcoming sales) to make a decision that of the last 10 days when the forecast showed benefits you. a 70% chance of rain, you might expect similar conditions today. Decision Making: Based on this information, you decide to carry an umbrella if there is a high probability of rain. This is a simple use of statistics where past data informs your future decision. Why Statistics Key Points: Importance in Finance Pattern Recognition: Statistics helps Decision Making: Statistics helps financial analysts and us recognize patterns in data. By investors make informed decisions based on data. For analyzing historical data, we can example, by analyzing historical price data of a stock, identify trends and make predictions. one can estimate future price movements. Probability: Statistics involves Risk Management: It allows for the assessment and understanding probabilities. In the management of financial risks. For instance, statistical umbrella example, a 70% chance of models can estimate the likelihood of a financial loss rain means there is a high probability under different scenarios. that it will rain, influencing your Model Validation: In finance, various models are used to decision to carry an umbrella. predict market behavior and asset prices. Statistics helps Decision Making: By using statistical validate these models to ensure they are accurate and thinking, we can make informed reliable. decisions that are more likely to lead Complex Models: to favorable outcomes. As financial markets and products become more complex, the models used to understand and predict In finance, statistics is used to analyze them also become more sophisticated. These models historical market data, assess risks, and often require advanced statistical techniques and high- make investment decisions. Just like performing computational solutions to handle large carrying an umbrella based on weather amounts of data and complex calculations. forecasts, investors make decisions For example, modern trading algorithms may analyze based on the analysis of financial data. thousands of data points per second to make buy or sell decisions. Without robust statistical tools and efficient computation, managing this volume of data would be impossible. 41 Financial Engineering 4 Focal Points in Statistic Normality Tests Many important financial models, such as modern portfolio theory (MPT) and the capital asset pricing model (CAPM), rely on the assumption that securities' returns are normally distributed. In today section, we will explore various approaches to test a given time series for the normality of returns. Significance: Many financial models, like Modern Portfolio Theory (MPT) and the Capital Asset Pricing Model (CAPM), assume that returns are normally distributed. Application: This section covers methods to test if a time series of returns follows a normal distribution. Portfolio Theory Modern Portfolio Theory (MPT) represents one of the greatest achievements of statistics in finance. Originating in the early 1950s with the pioneering work of Harry Markowitz, this theory began to replace reliance on judgment and experience with rigorous mathematical and statistical methods for investing in financial markets. It is often considered the first quantitative model and approach in finance. Modern Portfolio Theory (MPT): Developed by Harry Markowitz in the 1950s, MPT uses mean and variance of returns to determine optimal portfolio allocation. Objective: Minimize risk (variance) for a given level of expected return or maximize return for a given level of risk. 42 Financial Engineering 4 Focal Points in Statistic Bayesian Statistics Bayesian statistics introduces the concept of agent beliefs and the updating of these beliefs to the field of statistics. In linear regression, for example, this might involve having a statistical distribution for regression parameters instead of single point estimates, such as for the intercept and slope of the regression line. Bayesian methods are widely used in finance today, and this section illustrates Bayesian techniques with relevant examples. Concept: Introduces the idea of updating beliefs with new information. Applications: Bayesian methods are used in finance for more flexible and dynamic modeling, including Bayesian regression. Machine Learning Machine learning, also known as statistical learning, is a subdiscipline of artificial intelligence (AI) based on advanced statistical methods. It offers a rich set of approaches and models to learn from datasets and make predictions. Different algorithms are used for various types of learning, such as supervised and unsupervised learning, and they address different problems, such as estimation or classification. The examples in this section focus on supervised learning for classification. Overview: Machine learning, a subset of AI, employs advanced statistical methods for data analysis and prediction. Types: Covers supervised learning (e.g., classification) and unsupervised learning (e.g., clustering). Financial Application: Examples include feature transformation and model validation using Python libraries. 4 Focal Points in Statistics Normality Tests Importance of Normality in Finance: Methods to Test for Normality: Many financial models assume that the returns (profits or Visual Inspection: losses) from investments follow a normal distribution. Histogram: Plotting the returns on a This assumption simplifies the analysis and helps in making histogram to see if the shape approximates predictions about future returns. a bell curve. Models like Modern Portfolio Theory (MPT) and the Capital Q-Q Plot (Quantile-Quantile Plot): Plots the Asset Pricing Model (CAPM) rely heavily on this assumption. quantiles of the sample data against the quantiles of a normal distribution. If the What is a Normal Distribution? - Characteristics: points lie on a straight line, the data is A normal distribution is a bell-shaped curve where most of the normally distributed. data points are clustered around the mean (average), with Statistical Tests: fewer points as you move away. Shapiro-Wilk Test: Tests the null hypothesis It is symmetric, meaning the left and right sides of the curve that the data was drawn from a normal are mirror images. distribution. If the p-value is low, the null Defined by two parameters: the mean (µ) and the standard hypothesis is rejected, indicating the data is deviation (σ), which determine the center and the spread of not normally distributed. the distribution, respectively. Kolmogorov-Smirnov Test: Compares the Why Test for Normality? - Model Accuracy: sample distribution with the normal distribution. A significant result indicates a If the returns of an asset do not follow a normal distribution, deviation from normality. using models that assume normality can lead to incorrect conclusions and poor investment decisions. Anderson-Darling Test: Gives more weight to the tails than the Kolmogorov-Smirnov Testing for normality helps in validating the assumptions of Test, making it more sensitive to deviations the model, ensuring that the results are reliable. from normality. Financial Engineering 43 4 Focal Points in Statistics Normality Tests In Finance In finance, the normal distribution is a key concept and one of the main statistical foundations. Many important financial theories are built on the idea that the returns (profits or losses) of a financial instrument (like stocks) follow a normal distribution. Note: Another important assumption is linearity, which means financial markets often assume a direct relationship between demand (like for shares of a stock) and price. This implies that changes in demand don't affect the unit price of a financial instrument. Here are some key theories that rely on this assumption: Portfolio Theory Efficient Markets Hypothesis When stock returns follow a normal distribution, creating an This theory states that stock prices reflect all optimal investment portfolio becomes easier. In this context, the available information, meaning prices change most important factors are the expected average return, the randomly and returns are normally distributed. volatility (how much returns vary), and how different stocks' An efficient market means it's hard to predict returns move together. This helps in deciding the best mix of stock prices because they already include all investments. known information. Capital Asset Pricing Model (CAPM) Option Pricing Theory This model explains how individual stock prices are related to a Brownian motion is used to model random broad market index, assuming returns are normally distributed. It price movements of financial instruments. The uses a measure called beta (β) to describe how much a stock's famous Black-Scholes-Merton option pricing return moves in relation to the market. formula relies on this concept, leading to stock prices that follow a log-normal distribution and returns that follow a normal distribution. Financial Engineering 44 4 Focal Points in Statistics Portfolio Theory Portfolio Theory, also known as Modern Portfolio Theory (MPT), was developed by Harry Markowitz in the early 1950s. It is a mathematical framework for constructing a portfolio of assets in such a way that the expected return is maximized for a given level of risk, or equivalently, the risk is minimized for a given level of expected return. Key Concepts 1. Expected Return: The anticipated return on an 5. Efficient Frontier: A graphical representation of investment, calculated as the weighted average of optimal portfolios that offer the highest expected the possible returns, with the weights being the return for a defined level of risk. Portfolios that lie on probabilities of occurrence. the efficient frontier are considered well-balanced in terms of risk and return. 2. Risk (Volatility): Often measured by the standard deviation of returns, risk represents the uncertainty 6. Risk-Free Rate: The return on an investment with or variability of returns. In MPT, risk is associated with zero risk, typically represented by government the likelihood of different outcomes deviating from bonds. It serves as a benchmark for evaluating other the expected return. investments. 3. Diversification: The process of spreading 7. Sharpe Ratio: A measure of risk-adjusted return, investments across various assets to reduce risk. calculated as the difference between the portfolio The idea is that the performance of different assets return and the risk-free rate, divided by the is not perfectly correlated, so when some assets portfolio's standard deviation. A higher Sharpe ratio perform poorly, others may perform well, thus indicates better risk-adjusted performance. reducing the overall risk of the portfolio. 4. Covariance and Correlation: Covariance measures how two assets move together. Positive covariance means they tend to move in the same direction, We’ll cover this later in Portfolio while negative covariance means they move in opposite directions. Correlation is a standardized Optimization measure of covariance that ranges between -1 and 1. Financial Engineering 45 Bayesian Statistics Bayesian statistics is a branch of statistics that uses Bayes' theorem to update the probability of a hypothesis as more evidence or information becomes available. Unlike traditional frequentist statistics, which relies on fixed probabilities, Bayesian statistics allows for a more dynamic approach to probability, incorporating prior knowledge and new data. Bayes’ Formula Applications in Finance Bayesian methods are particularly useful in finance for several reasons: 1. Updating Beliefs: Investors can update their beliefs about expected returns, volatility, and other financial metrics as new data becomes Where: available. This continuous updating process helps in making more P(A∣B) is the posterior probability, the informed decisions. probability of hypothesis A given the data B 2. Risk Management: Bayesian statistics can improve risk assessment P(B∣A) is the likelihood, the probability of the by incorporating prior knowledge about market conditions and data B given that hypothesis A is true. updating risk estimates as new information is received. P(A) is the prior probability, the initial belief 3. Portfolio Optimization: Bayesian methods allow for more flexible about the probability of A before seeing the and adaptive portfolio optimization. Instead of relying solely on data. historical data, investors can incorporate their prior beliefs and P(B) is the marginal likelihood, the total adjust their portfolios as new data emerges. probability of the data under all possible hypotheses. 4. Asset Pricing: Bayesian models can be used to estimate asset prices by updating the probabilities of different economic scenarios In finance, the most common interpretation of as new information becomes available. This approach can lead to Bayes’ formula is the diachronic more accurate pricing models. interpretation. This means that over time, as new information is obtained, one updates 5. Predictive Modeling: Bayesian statistics is valuable for predictive their beliefs about certain variables or modeling in finance. For example, Bayesian inference can be used parameters of interest, such as the mean to predict future stock prices, interest rates, or economic indicators return of a time series. by continuously updating the model with new data. 46 Financial Engineering Practical Example: Bayesian Linear Regression In Bayesian linear regression, instead of estimating a single set of regression coefficients, we estimate a distribution over the possible values of these coefficients. This provides a measure of uncertainty around the estimates, which is valuable for risk management and decision-making. For instance, suppose we want to predict the future return of a stock based on past returns. Using Bayesian linear regression, we can incorporate prior beliefs about the stock's performance and update these beliefs as new return data is observed. This results in a posterior distribution for the regression coefficients, reflecting our updated beliefs. Financial Engineering 47 Machine Learning In today's world, machine learning (ML) is a key Machine learning algorithms can be broadly Practical Applications in Finance tool used in finance and many other fields. It categorized into two types: supervised learning involves using algorithms to learn patterns, and unsupervised learning. Fraud Detection: Identifying unusual patterns that may indicate fraudulent activity. relationships, and insights from raw data. Supervised Learning: Here's an easier-to-understand overview of Credit Scoring: Predicting the likelihood of a borrower defaulting on a machine learning in finance. In supervised learning, the algorithm is trained loan. on a labeled dataset, which means that each What is Machine Learning? Algorithmic Trading: Developing automated trading strategies based on data point has an input and an output label. The Machine learning consists of different types of goal is to learn a function that maps inputs to historical data and patterns. algorithms that can learn and identify patterns outputs. Unsupervised Learning Algorithms discover insights from raw data without from raw data without being explicitly any further guidance. Two common unsupervised learning algorithms are: programmed. This means that instead of Example: Predicting stock prices based on historical data. The algorithm learns from past giving the computer step-by-step instructions, K-Means Clustering: This algorithm groups data into a number of we give it data and let it figure out the prices (input) and predicts future prices clusters. Each data point is assigned to a cluster based on its features. (output). relationships on its own. Unsupervised Learning: Example: Segmenting customers into different groups based on their Why is Machine Learning Important? purchasing behavior. Machine learning is crucial because it allows In unsupervised learning, the algorithm works with data that has no labels. The goal is to find Gaussian Mixture: This algorithm assumes that the data is composed of for more accurate predictions and smarter several Gaussian distributions (bell curves) and identifies these decision-making in finance. For example, it hidden patterns or groupings within the data. distributions. can help predict stock prices, detect Example: Grouping similar customers together fraudulent transactions, or automate trading based on their spending behavior. The algorithm Example: Finding patterns in financial returns that follow different strategies. identifies clusters of customers with similar traits. distributions. Financial Engineering 48 49 Financial Engineering Let’s Do It Instructions: 1. Download the Jupyter notebook for practice. https://drive.google.com/drive/folders/13e Z6OqZGb18CcEwKtYP5F57Yq7Hd8a2R?usp =share_link 2. Open each notebook one by one on Google Colab. 3. Run the commands and learn from their results. That is all for today Questions? Financial Engineering 50