Math Deep: Algebra, Topology, Differential Calculus, and Optimization for Computer Science
Document Details
Uploaded by BraveSitar
University of Pennsylvania
2024
Jean Gallier and Jocelyn Quaintance
Tags
Summary
This textbook details the core concepts of algebra, topology, differential calculus, and optimization theory, focusing on their application in computer science and machine learning. Written by Jean Gallier and Jocelyn Quaintance, it is a valuable resource for undergraduate students.
Full Transcript
Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning Jean Gallier and Jocelyn Quaintance Department of Computer and Information Science University of Pennsylvania Philadelphia, P...
Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning Jean Gallier and Jocelyn Quaintance Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA e-mail: [email protected] © Jean Gallier February 9, 2024 2 Contents Contents 3 1 Introduction 19 2 Groups, Rings, and Fields 21 2.1 Groups, Subgroups, Cosets........................... 21 2.2 Cyclic Groups.................................. 35 2.3 Rings and Fields................................. 38 I Linear Algebra 47 3 Vector Spaces, Bases, Linear Maps 49 3.1 Motivations: Linear Combinations, Linear Independence, Rank....... 49 3.2 Vector Spaces.............P...................... 61 3.3 Indexed Families; the Sum Notation i∈I ai.................. 64 3.4 Linear Independence, Subspaces........................ 70 3.5 Bases of a Vector Space............................. 77 3.6 Matrices...................................... 84 3.7 Linear Maps................................... 91 3.8 Quotient Spaces................................. 99 3.9 Linear Forms and the Dual Space........................ 100 3.10 Summary..................................... 103 3.11 Problems..................................... 104 4 Matrices and Linear Maps 113 4.1 Representation of Linear Maps by Matrices.................. 113 4.2 Composition of Linear Maps and Matrix Multiplication........... 118 4.3 Change of Basis Matrix............................. 124 4.4 The Effect of a Change of Bases on Matrices................. 129 4.5 Summary..................................... 134 4.6 Problems..................................... 134 5 Haar Bases, Haar Wavelets, Hadamard Matrices 141 3 4 CONTENTS 5.1 Introduction to Signal Compression Using Haar Wavelets.......... 141 5.2 Haar Matrices, Scaling Properties of Haar Wavelets.............. 143 5.3 Kronecker Product Construction of Haar Matrices.............. 148 5.4 Multiresolution Signal Analysis with Haar Bases............... 150 5.5 Haar Transform for Digital Images....................... 153 5.6 Hadamard Matrices............................... 159 5.7 Summary..................................... 161 5.8 Problems..................................... 162 6 Direct Sums 167 6.1 Sums, Direct Sums, Direct Products...................... 167 6.2 Matrices of Linear Maps and Multiplication by Blocks............ 177 6.3 The Rank-Nullity Theorem; Grassmann’s Relation.............. 190 6.4 Summary..................................... 196 6.5 Problems..................................... 197 7 Determinants 205 7.1 Permutations, Signature of a Permutation................... 205 7.2 Alternating Multilinear Maps.......................... 209 7.3 Definition of a Determinant........................... 213 7.4 Inverse Matrices and Determinants....................... 222 7.5 Systems of Linear Equations and Determinants................ 225 7.6 Determinant of a Linear Map.......................... 227 7.7 The Cayley–Hamilton Theorem......................... 228 7.8 Permanents.................................... 233 7.9 Summary..................................... 235 7.10 Further Readings................................. 237 7.11 Problems..................................... 237 8 Gaussian Elimination, LU, Cholesky, Echelon Form 243 8.1 Motivating Example: Curve Interpolation................... 243 8.2 Gaussian Elimination.............................. 247 8.3 Elementary Matrices and Row Operations................... 252 8.4 LU -Factorization................................. 255 8.5 P A = LU Factorization............................. 261 8.6 Proof of Theorem 8.5 ~............................. 269 8.7 Dealing with Roundoff Errors; Pivoting Strategies............... 274 8.8 Gaussian Elimination of Tridiagonal Matrices................. 276 8.9 SPD Matrices and the Cholesky Decomposition................ 278 8.10 Reduced Row Echelon Form........................... 287 8.11 RREF, Free Variables, Homogeneous Systems................. 293 8.12 Uniqueness of RREF............................... 296 8.13 Solving Linear Systems Using RREF...................... 298 CONTENTS 5 8.14 Elementary Matrices and Columns Operations................ 304 8.15 Transvections and Dilatations ~........................ 305 8.16 Summary..................................... 311 8.17 Problems..................................... 312 9 Vector Norms and Matrix Norms 323 9.1 Normed Vector Spaces.............................. 323 9.2 Matrix Norms.................................. 335 9.3 Subordinate Norms................................ 340 9.4 Inequalities Involving Subordinate Norms................... 347 9.5 Condition Numbers of Matrices......................... 349 9.6 An Application of Norms: Inconsistent Linear Systems............ 358 9.7 Limits of Sequences and Series......................... 359 9.8 The Matrix Exponential............................. 362 9.9 Summary..................................... 365 9.10 Problems..................................... 367 10 Iterative Methods for Solving Linear Systems 373 10.1 Convergence of Sequences of Vectors and Matrices.............. 373 10.2 Convergence of Iterative Methods........................ 376 10.3 Methods of Jacobi, Gauss–Seidel, and Relaxation............... 378 10.4 Convergence of the Methods........................... 386 10.5 Convergence Methods for Tridiagonal Matrices................ 389 10.6 Summary..................................... 394 10.7 Problems..................................... 395 11 The Dual Space and Duality 399 11.1 The Dual Space E ∗ and Linear Forms..................... 399 11.2 Pairing and Duality Between E and E ∗.................... 406 11.3 The Duality Theorem and Some Consequences................ 411 11.4 The Bidual and Canonical Pairings....................... 417 11.5 Hyperplanes and Linear Forms......................... 419 11.6 Transpose of a Linear Map and of a Matrix.................. 420 11.7 Properties of the Double Transpose....................... 427 11.8 The Four Fundamental Subspaces....................... 429 11.9 Summary..................................... 432 11.10 Problems..................................... 433 12 Euclidean Spaces 437 12.1 Inner Products, Euclidean Spaces........................ 437 12.2 Orthogonality and Duality in Euclidean Spaces................ 446 12.3 Adjoint of a Linear Map............................. 453 12.4 Existence and Construction of Orthonormal Bases.............. 456 6 CONTENTS 12.5 Linear Isometries (Orthogonal Transformations)................ 463 12.6 The Orthogonal Group, Orthogonal Matrices................. 466 12.7 The Rodrigues Formula............................. 468 12.8 QR-Decomposition for Invertible Matrices................... 471 12.9 Some Applications of Euclidean Geometry................... 476 12.10 Summary..................................... 477 12.11 Problems..................................... 479 13 QR-Decomposition for Arbitrary Matrices 491 13.1 Orthogonal Reflections.............................. 491 13.2 QR-Decomposition Using Householder Matrices................ 496 13.3 Summary..................................... 506 13.4 Problems..................................... 506 14 Hermitian Spaces 513 14.1 Hermitian Spaces, Pre-Hilbert Spaces..................... 513 14.2 Orthogonality, Duality, Adjoint of a Linear Map............... 522 14.3 Linear Isometries (Also Called Unitary Transformations)........... 527 14.4 The Unitary Group, Unitary Matrices..................... 529 14.5 Hermitian Reflections and QR-Decomposition................. 532 14.6 Orthogonal Projections and Involutions.................... 537 14.7 Dual Norms.................................... 540 14.8 Summary..................................... 547 14.9 Problems..................................... 548 15 Eigenvectors and Eigenvalues 553 15.1 Eigenvectors and Eigenvalues of a Linear Map................. 553 15.2 Reduction to Upper Triangular Form...................... 561 15.3 Location of Eigenvalues............................. 565 15.4 Conditioning of Eigenvalue Problems...................... 569 15.5 Eigenvalues of the Matrix Exponential..................... 571 15.6 Summary..................................... 573 15.7 Problems..................................... 574 16 Unit Quaternions and Rotations in SO(3) 585 16.1 The Group SU(2) and the Skew Field H of Quaternions........... 585 16.2 Representation of Rotation in SO(3) By Quaternions in SU(2)....... 587 16.3 Matrix Representation of the Rotation rq................... 592 16.4 An Algorithm to Find a Quaternion Representing a Rotation........ 594 16.5 The Exponential Map exp : su(2) → SU(2).................. 597 16.6 Quaternion Interpolation ~........................... 600 16.7 Nonexistence of a “Nice” Section from SO(3) to SU(2)............ 602 16.8 Summary..................................... 604 CONTENTS 7 16.9 Problems..................................... 605 17 Spectral Theorems 609 17.1 Introduction................................... 609 17.2 Normal Linear Maps: Eigenvalues and Eigenvectors.............. 609 17.3 Spectral Theorem for Normal Linear Maps................... 615 17.4 Self-Adjoint and Other Special Linear Maps.................. 620 17.5 Normal and Other Special Matrices....................... 626 17.6 Rayleigh–Ritz Theorems and Eigenvalue Interlacing............. 629 17.7 The Courant–Fischer Theorem; Perturbation Results............. 634 17.8 Summary..................................... 637 17.9 Problems..................................... 638 18 Computing Eigenvalues and Eigenvectors 645 18.1 The Basic QR Algorithm............................ 647 18.2 Hessenberg Matrices............................... 653 18.3 Making the QR Method More Efficient Using Shifts............. 659 18.4 Krylov Subspaces; Arnoldi Iteration...................... 664 18.5 GMRES...................................... 668 18.6 The Hermitian Case; Lanczos Iteration..................... 669 18.7 Power Methods.................................. 670 18.8 Summary..................................... 672 18.9 Problems..................................... 673 19 Introduction to The Finite Elements Method 675 19.1 A One-Dimensional Problem: Bending of a Beam............... 675 19.2 A Two-Dimensional Problem: An Elastic Membrane............. 686 19.3 Time-Dependent Boundary Problems...................... 689 20 Graphs and Graph Laplacians; Basic Facts 697 20.1 Directed Graphs, Undirected Graphs, Weighted Graphs........... 700 20.2 Laplacian Matrices of Graphs.......................... 707 20.3 Normalized Laplacian Matrices of Graphs................... 711 20.4 Graph Clustering Using Normalized Cuts................... 715 20.5 Summary..................................... 717 20.6 Problems..................................... 718 21 Spectral Graph Drawing 721 21.1 Graph Drawing and Energy Minimization................... 721 21.2 Examples of Graph Drawings.......................... 724 21.3 Summary..................................... 728 22 Singular Value Decomposition and Polar Form 731 8 CONTENTS 22.1 Properties of f ∗ ◦ f............................... 731 22.2 Singular Value Decomposition for Square Matrices.............. 737 22.3 Polar Form for Square Matrices......................... 741 22.4 Singular Value Decomposition for Rectangular Matrices........... 743 22.5 Ky Fan Norms and Schatten Norms...................... 747 22.6 Summary..................................... 748 22.7 Problems..................................... 748 23 Applications of SVD and Pseudo-Inverses 753 23.1 Least Squares Problems and the Pseudo-Inverse................ 753 23.2 Properties of the Pseudo-Inverse........................ 760 23.3 Data Compression and SVD........................... 765 23.4 Principal Components Analysis (PCA)..................... 767 23.5 Best Affine Approximation........................... 778 23.6 Summary..................................... 782 23.7 Problems..................................... 783 II Affine and Projective Geometry 787 24 Basics of Affine Geometry 789 24.1 Affine Spaces................................... 789 24.2 Examples of Affine Spaces............................ 798 24.3 Chasles’s Identity................................ 799 24.4 Affine Combinations, Barycenters........................ 800 24.5 Affine Subspaces................................. 805 24.6 Affine Independence and Affine Frames..................... 811 24.7 Affine Maps.................................... 817 24.8 Affine Groups................................... 824 24.9 Affine Geometry: A Glimpse.......................... 826 24.10 Affine Hyperplanes................................ 830 24.11 Intersection of Affine Spaces........................... 832 25 Embedding an Affine Space in a Vector Space 835 25.1 The “Hat Construction,” or Homogenizing................... 835 25.2 Affine Frames of E and Bases of Ê....................... 842 25.3 Another Construction of Ê........................... 845 25.4 Extending Affine Maps to Linear Maps..................... 848 26 Basics of Projective Geometry 853 26.1 Why Projective Spaces?............................. 853 26.2 Projective Spaces................................. 858 26.3 Projective Subspaces............................... 863 CONTENTS 9 26.4 Projective Frames................................ 866 26.5 Projective Maps................................. 880 26.6 Finding a Homography Between Two Projective Frames........... 886 26.7 Affine Patches.................................. 899 26.8 Projective Completion of an Affine Space................... 902 26.9 Making Good Use of Hyperplanes at Infinity................. 907 26.10 The Cross-Ratio................................. 910 26.11 Fixed Points of Homographies and Homologies................ 914 26.12 Duality in Projective Geometry......................... 928 26.13 Cross-Ratios of Hyperplanes........................... 932 26.14 Complexification of a Real Projective Space.................. 934 26.15 Similarity Structures on a Projective Space.................. 936 26.16 Some Applications of Projective Geometry................... 945 III The Geometry of Bilinear Forms 951 27 The Cartan–Dieudonné Theorem 953 27.1 The Cartan–Dieudonné Theorem for Linear Isometries............ 953 27.2 Affine Isometries (Rigid Motions)........................ 965 27.3 Fixed Points of Affine Maps........................... 967 27.4 Affine Isometries and Fixed Points....................... 969 27.5 The Cartan–Dieudonné Theorem for Affine Isometries............ 975 28 Isometries of Hermitian Spaces 979 28.1 The Cartan–Dieudonné Theorem, Hermitian Case............... 979 28.2 Affine Isometries (Rigid Motions)........................ 988 29 The Geometry of Bilinear Forms; Witt’s Theorem 993 29.1 Bilinear Forms.................................. 993 29.2 Sesquilinear Forms................................ 1001 29.3 Orthogonality................................... 1005 29.4 Adjoint of a Linear Map............................. 1010 29.5 Isometries Associated with Sesquilinear Forms................. 1012 29.6 Totally Isotropic Subspaces........................... 1016 29.7 Witt Decomposition............................... 1022 29.8 Symplectic Groups................................ 1030 29.9 Orthogonal Groups and the Cartan–Dieudonné Theorem........... 1034 29.10 Witt’s Theorem................................. 1041 10 CONTENTS IV Algebra: PID’s, UFD’s, Noetherian Rings, Tensors, Modules over a PID, Normal Forms 1047 30 Polynomials, Ideals and PID’s 1049 30.1 Multisets..................................... 1049 30.2 Polynomials.................................... 1050 30.3 Euclidean Division of Polynomials....................... 1056 30.4 Ideals, PID’s, and Greatest Common Divisors................. 1058 30.5 Factorization and Irreducible Factors in K[X]................. 1066 30.6 Roots of Polynomials.............................. 1070 30.7 Polynomial Interpolation (Lagrange, Newton, Hermite)............ 1077 31 Annihilating Polynomials; Primary Decomposition 1085 31.1 Annihilating Polynomials and the Minimal Polynomial............ 1087 31.2 Minimal Polynomials of Diagonalizable Linear Maps............. 1089 31.3 Commuting Families of Linear Maps...................... 1092 31.4 The Primary Decomposition Theorem..................... 1095 31.5 Jordan Decomposition.............................. 1101 31.6 Nilpotent Linear Maps and Jordan Form.................... 1104 31.7 Summary..................................... 1110 31.8 Problems..................................... 1111 32 UFD’s, Noetherian Rings, Hilbert’s Basis Theorem 1115 32.1 Unique Factorization Domains (Factorial Rings)................ 1115 32.2 The Chinese Remainder Theorem........................ 1129 32.3 Noetherian Rings and Hilbert’s Basis Theorem................ 1135 32.4 Futher Readings................................. 1139 33 Tensor Algebras 1141 33.1 Linear Algebra Preliminaries: Dual Spaces and Pairings........... 1143 33.2 Tensors Products................................. 1148 33.3 Bases of Tensor Products............................ 1160 33.4 Some Useful Isomorphisms for Tensor Products................ 1161 33.5 Duality for Tensor Products........................... 1165 33.6 Tensor Algebras................................. 1171 33.7 Symmetric Tensor Powers............................ 1178 33.8 Bases of Symmetric Powers........................... 1182 33.9 Some Useful Isomorphisms for Symmetric Powers............... 1185 33.10 Duality for Symmetric Powers.......................... 1185 33.11 Symmetric Algebras............................... 1189 33.12 Problems..................................... 1192 34 Exterior Tensor Powers and Exterior Algebras 1195 CONTENTS 11 34.1 Exterior Tensor Powers............................. 1195 34.2 Bases of Exterior Powers............................ 1200 34.3 Some Useful Isomorphisms for Exterior Powers................ 1203 34.4 Duality for Exterior Powers........................... 1203 34.5 Exterior Algebras................................ 1207 34.6 The Hodge ∗-Operator.............................. 1211 34.7 Left and Right Hooks ~............................. 1215 34.8 Testing Decomposability ~........................... 1225 34.9 The Grassmann-Plücker’s Equations and Grassmannians ~......... 1228 34.10 Vector-Valued Alternating Forms........................ 1231 34.11 Problems..................................... 1235 35 Introduction to Modules; Modules over a PID 1237 35.1 Modules over a Commutative Ring....................... 1237 35.2 Finite Presentations of Modules......................... 1246 35.3 Tensor Products of Modules over a Commutative Ring............ 1252 35.4 Torsion Modules over a PID; Primary Decomposition............. 1255 35.5 Finitely Generated Modules over a PID.................... 1261 35.6 Extension of the Ring of Scalars........................ 1277 36 Normal Forms; The Rational Canonical Form 1283 36.1 The Torsion Module Associated With An Endomorphism.......... 1283 36.2 The Rational Canonical Form.......................... 1291 36.3 The Rational Canonical Form, Second Version................. 1298 36.4 The Jordan Form Revisited........................... 1299 36.5 The Smith Normal Form............................. 1302 V Topology, Differential Calculus 1315 37 Topology 1317 37.1 Metric Spaces and Normed Vector Spaces................... 1317 37.2 Topological Spaces................................ 1324 37.3 Continuous Functions, Limits.......................... 1333 37.4 Connected Sets.................................. 1341 37.5 Compact Sets and Locally Compact Spaces.................. 1350 37.6 Second-Countable and Separable Spaces.................... 1361 37.7 Sequential Compactness............................. 1365 37.8 Complete Metric Spaces and Compactness................... 1371 37.9 Completion of a Metric Space.......................... 1374 37.10 The Contraction Mapping Theorem...................... 1381 37.11 Continuous Linear and Multilinear Maps.................... 1385 37.12 Completion of a Normed Vector Space..................... 1392 12 CONTENTS 37.13 Normed Affine Spaces.............................. 1395 37.14 Futher Readings................................. 1395 38 A Detour On Fractals 1397 38.1 Iterated Function Systems and Fractals.................... 1397 39 Differential Calculus 1405 39.1 Directional Derivatives, Total Derivatives................... 1405 39.2 Properties of Derivatives............................. 1414 39.3 Jacobian Matrices................................ 1419 39.4 The Implicit and The Inverse Function Theorems............... 1427 39.5 Tangent Spaces and Differentials........................ 1434 39.6 Second-Order and Higher-Order Derivatives.................. 1436 39.7 Taylor’s formula, Faà di Bruno’s formula.................... 1442 39.8 Vector Fields, Covariant Derivatives, Lie Brackets............... 1447 39.9 Futher Readings................................. 1449 39.10 Summary..................................... 1449 39.11 Problems..................................... 1450 VI Preliminaries for Optimization Theory 1453 40 Extrema of Real-Valued Functions 1455 40.1 Local Extrema and Lagrange Multipliers.................... 1456 40.2 Using Second Derivatives to Find Extrema................... 1468 40.3 Using Convexity to Find Extrema....................... 1471 40.4 Summary..................................... 1481 40.5 Problems..................................... 1482 41 Newton’s Method and Its Generalizations 1485 41.1 Newton’s Method for Real Functions of a Real Argument.......... 1485 41.2 Generalizations of Newton’s Method...................... 1487 41.3 Summary..................................... 1496 41.4 Problems..................................... 1496 42 Quadratic Optimization Problems 1505 42.1 Quadratic Optimization: The Positive Definite Case............. 1505 42.2 Quadratic Optimization: The General Case.................. 1515 42.3 Maximizing a Quadratic Function on the Unit Sphere............ 1520 42.4 Summary..................................... 1525 42.5 Problems..................................... 1526 43 Schur Complements and Applications 1527 CONTENTS 13 43.1 Schur Complements............................... 1527 43.2 SPD Matrices and Schur Complements..................... 1530 43.3 SP Semidefinite Matrices and Schur Complements.............. 1531 43.4 Summary..................................... 1533 43.5 Problems..................................... 1533 VII Linear Optimization 1535 44 Convex Sets, Cones, H-Polyhedra 1537 44.1 What is Linear Programming?......................... 1537 44.2 Affine Subsets, Convex Sets, Hyperplanes, Half-Spaces............ 1539 44.3 Cones, Polyhedral Cones, and H-Polyhedra.................. 1542 44.4 Summary..................................... 1547 44.5 Problems..................................... 1548 45 Linear Programs 1549 45.1 Linear Programs, Feasible Solutions, Optimal Solutions........... 1549 45.2 Basic Feasible Solutions and Vertices...................... 1556 45.3 Summary..................................... 1563 45.4 Problems..................................... 1563 46 The Simplex Algorithm 1567 46.1 The Idea Behind the Simplex Algorithm.................... 1567 46.2 The Simplex Algorithm in General....................... 1576 46.3 How to Perform a Pivoting Step Efficiently.................. 1583 46.4 The Simplex Algorithm Using Tableaux.................... 1587 46.5 Computational Efficiency of the Simplex Method............... 1596 46.6 Summary..................................... 1597 46.7 Problems..................................... 1597 47 Linear Programming and Duality 1601 47.1 Variants of the Farkas Lemma.......................... 1601 47.2 The Duality Theorem in Linear Programming................. 1607 47.3 Complementary Slackness Conditions..................... 1615 47.4 Duality for Linear Programs in Standard Form................ 1616 47.5 The Dual Simplex Algorithm.......................... 1619 47.6 The Primal-Dual Algorithm........................... 1626 47.7 Summary..................................... 1636 47.8 Problems..................................... 1636 14 CONTENTS VIII NonLinear Optimization 1641 48 Basics of Hilbert Spaces 1643 48.1 The Projection Lemma............................. 1643 48.2 Duality and the Riesz Representation Theorem................ 1656 48.3 Farkas–Minkowski Lemma in Hilbert Spaces.................. 1661 48.4 Summary..................................... 1662 48.5 Problems..................................... 1663 49 General Results of Optimization Theory 1665 49.1 Optimization Problems; Basic Terminology.................. 1665 49.2 Existence of Solutions of an Optimization Problem.............. 1669 49.3 Minima of Quadratic Functionals........................ 1673 49.4 Elliptic Functionals............................... 1680 49.5 Iterative Methods for Unconstrained Problems................ 1683 49.6 Gradient Descent Methods for Unconstrained Problems........... 1686 49.7 Convergence of Gradient Descent with Variable Stepsize........... 1693 49.8 Steepest Descent for an Arbitrary Norm.................... 1697 49.9 Newton’s Method For Finding a Minimum................... 1699 49.10 Conjugate Gradient Methods; Unconstrained Problems............ 1703 49.11 Gradient Projection for Constrained Optimization.............. 1714 49.12 Penalty Methods for Constrained Optimization................ 1717 49.13 Summary..................................... 1719 49.14 Problems..................................... 1720 50 Introduction to Nonlinear Optimization 1725 50.1 The Cone of Feasible Directions......................... 1727 50.2 Active Constraints and Qualified Constraints................. 1733 50.3 The Karush–Kuhn–Tucker Conditions..................... 1740 50.4 Equality Constrained Minimization....................... 1751 50.5 Hard Margin Support Vector Machine; Version I............... 1756 50.6 Hard Margin Support Vector Machine; Version II............... 1761 50.7 Lagrangian Duality and Saddle Points..................... 1769 50.8 Weak and Strong Duality............................ 1778 50.9 Handling Equality Constraints Explicitly.................... 1786 50.10 Dual of the Hard Margin Support Vector Machine.............. 1789 50.11 Conjugate Function and Legendre Dual Function............... 1794 50.12 Some Techniques to Obtain a More Useful Dual Program.......... 1804 50.13 Uzawa’s Method................................. 1808 50.14 Summary..................................... 1814 50.15 Problems..................................... 1815 51 Subgradients and Subdifferentials ~ 1817 CONTENTS 15 51.1 Extended Real-Valued Convex Functions.................... 1819 51.2 Subgradients and Subdifferentials........................ 1828 51.3 Basic Properties of Subgradients and Subdifferentials............. 1840 51.4 Additional Properties of Subdifferentials.................... 1846 51.5 The Minimum of a Proper Convex Function.................. 1850 51.6 Generalization of the Lagrangian Framework................. 1857 51.7 Summary..................................... 1860 51.8 Problems..................................... 1862 52 Dual Ascent Methods; ADMM 1863 52.1 Dual Ascent................................... 1865 52.2 Augmented Lagrangians and the Method of Multipliers............ 1869 52.3 ADMM: Alternating Direction Method of Multipliers............. 1874 52.4 Convergence of ADMM ~............................ 1877 52.5 Stopping Criteria................................. 1886 52.6 Some Applications of ADMM.......................... 1887 52.7 Solving Hard Margin (SVMh2 ) Using ADMM................. 1892 52.8 Applications of ADMM to `1 -Norm Problems................. 1894 52.9 Summary..................................... 1899 52.10 Problems..................................... 1900 IX Applications to Machine Learning 1903 53 Positive Definite Kernels 1905 53.1 Feature Maps and Kernel Functions...................... 1905 53.2 Basic Properties of Positive Definite Kernels.................. 1912 53.3 Hilbert Space Representation of a Positive Kernel............... 1918 53.4 Kernel PCA................................... 1921 53.5 Summary..................................... 1924 53.6 Problems..................................... 1925 54 Soft Margin Support Vector Machines 1927 54.1 Soft Margin Support Vector Machines; (SVMs1 )................ 1930 54.2 Solving SVM (SVMs1 ) Using ADMM...................... 1945 54.3 Soft Margin Support Vector Machines; (SVMs2 )................ 1946 54.4 Solving SVM (SVMs2 ) Using ADMM...................... 1953 54.5 Soft Margin Support Vector Machines; (SVMs20 )............... 1954 54.6 Classification of the Data Points in Terms of ν (SVMs20 )........... 1964 54.7 Existence of Support Vectors for (SVMs20 )................... 1967 54.8 Solving SVM (SVMs20 ) Using ADMM..................... 1978 54.9 Soft Margin Support Vector Machines; (SVMs3 )................ 1982 54.10 Classification of the Data Points in Terms of ν (SVMs3 )........... 1989 16 CONTENTS 54.11 Existence of Support Vectors for (SVMs3 )................... 1991 54.12 Solving SVM (SVMs3 ) Using ADMM...................... 1993 54.13 Soft Margin SVM; (SVMs4 )........................... 1996 54.14 Solving SVM (SVMs4 ) Using ADMM...................... 2005 54.15 Soft Margin SVM; (SVMs5 )........................... 2007 54.16 Solving SVM (SVMs5 ) Using ADMM...................... 2011 54.17 Summary and Comparison of the SVM Methods............... 2013 54.18 Problems..................................... 2026 55 Ridge Regression, Lasso, Elastic Net 2031 55.1 Ridge Regression................................. 2032 55.2 Ridge Regression; Learning an Affine Function................ 2035 55.3 Kernel Ridge Regression............................. 2044 55.4 Lasso Regression (`1 -Regularized Regression)................. 2048 55.5 Lasso Regression; Learning an Affine Function................. 2052 55.6 Elastic Net Regression.............................. 2058 55.7 Summary..................................... 2064 55.8 Problems..................................... 2064 56 ν-SV Regression 2067 56.1 ν-SV Regression; Derivation of the Dual.................... 2067 56.2 Existence of Support Vectors.......................... 2078 56.3 Solving ν-Regression Using ADMM....................... 2088 56.4 Kernel ν-SV Regression............................. 2094 56.5 ν-Regression Version 2; Penalizing b...................... 2097 56.6 Summary..................................... 2104 56.7 Problems..................................... 2105 X Appendices 2107 A Total Orthogonal Families in Hilbert Spaces 2109 A.1 Total Orthogonal Families, Fourier Coefficients................ 2109 A.2 The Hilbert Space `2 (K) and the Riesz–Fischer Theorem........... 2118 A.3 Summary..................................... 2127 A.4 Problems..................................... 2128 B Matlab Programs 2129 B.1 Hard Margin (SVMh2 )............................. 2129 B.2 Soft Margin SVM (SVMs20 )........................... 2133 B.3 Soft Margin SVM (SVMs3 )........................... 2141 B.4 ν-SV Regression................................. 2146 CONTENTS 17 C Zorn’s Lemma; Some Applications 2153 C.1 Statement of Zorn’s Lemma........................... 2153 C.2 Proof of the Existence of a Basis in a Vector Space.............. 2154 C.3 Existence of Maximal Proper Ideals...................... 2155 Bibliography 2157 Index 2169 18 CONTENTS Chapter 1 Introduction 19 20 CHAPTER 1. INTRODUCTION Chapter 2 Groups, Rings, and Fields In the following four chapters, the basic algebraic structures (groups, rings, fields, vector spaces) are reviewed, with a major emphasis on vector spaces. Basic notions of linear alge- bra such as vector spaces, subspaces, linear combinations, linear independence, bases, quo- tient spaces, linear maps, matrices, change of bases, direct sums, linear forms, dual spaces, hyperplanes, transpose of a linear maps, are reviewed. 2.1 Groups, Subgroups, Cosets The set R of real numbers has two operations + : R × R → R (addition) and ∗ : R × R → R (multiplication) satisfying properties that make R into an abelian group under +, and R − {0} = R∗ into an abelian group under ∗. Recall the definition of a group. Definition 2.1. A group is a set G equipped with a binary operation · : G × G → G that associates an element a · b ∈ G to every pair of elements a, b ∈ G, and having the following properties: · is associative, has an identity element e ∈ G, and every element in G is invertible (w.r.t. ·). More explicitly, this means that the following equations hold for all a, b, c ∈ G: (G1) a · (b · c) = (a · b) · c. (associativity); (G2) a · e = e · a = a. (identity); (G3) For every a ∈ G, there is some a−1 ∈ G such that a · a−1 = a−1 · a = e. (inverse). A group G is abelian (or commutative) if a · b = b · a for all a, b ∈ G. A set M together with an operation · : M × M → M and an element e satisfying only Conditions (G1) and (G2) is called a monoid. For example, the set N = {0, 1,... , n,...} of natural numbers is a (commutative) monoid under addition. However, it is not a group. Some examples of groups are given below. 21 22 CHAPTER 2. GROUPS, RINGS, AND FIELDS Example 2.1. 1. The set Z = {... , −n,... , −1, 0, 1,... , n,...} of integers is an abelian group under addition, with identity element 0. However, Z∗ = Z − {0} is not a group under multiplication. 2. The set Q of rational numbers (fractions p/q with p, q ∈ Z and q 6= 0) is an abelian group under addition, with identity element 0. The set Q∗ = Q − {0} is also an abelian group under multiplication, with identity element 1. 3. Given any nonempty set S, the set of bijections f : S → S, also called permutations of S, is a group under function composition (i.e., the multiplication of f and g is the composition g ◦ f ), with identity element the identity function idS. This group is not abelian as soon as S has more than two elements. The permutation group of the set S = {1,... , n} is often denoted Sn and called the symmetric group on n elements. 4. For any positive integer p ∈ N, define a relation on Z, denoted m ≡ n (mod p), as follows: m ≡ n (mod p) iff m − n = kp for some k ∈ Z. The reader will easily check that this is an equivalence relation, and, moreover, it is compatible with respect to addition and multiplication, which means that if m1 ≡ n1 (mod p) and m2 ≡ n2 (mod p), then m1 + m2 ≡ n1 + n2 (mod p) and m1 m2 ≡ n1 n2 (mod p). Consequently, we can define an addition operation and a multiplication operation of the set of equivalence classes (mod p): [m] + [n] = [m + n] and [m] · [n] = [mn]. The reader will easily check that addition of residue classes (mod p) induces an abelian group structure with as zero. This group is denoted Z/pZ. 5. The set of n × n invertible matrices with real (or complex) coefficients is a group under matrix multiplication, with identity element the identity matrix In. This group is called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)). 6. The set of n × n invertible matrices A with real (or complex) coefficients such that det(A) = 1 is a group under matrix multiplication, with identity element the identity matrix In. This group is called the special linear group and is usually denoted by SL(n, R) (or SL(n, C)). 7. The set of n × n matrices Q with real coefficients such that QQ> = Q> Q = In 2.1. GROUPS, SUBGROUPS, COSETS 23 is a group under matrix multiplication, with identity element the identity matrix In ; we have Q−1 = Q>. This group is called the orthogonal group and is usually denoted by O(n). 8. The set of n × n invertible matrices Q with real coefficients such that QQ> = Q> Q = In and det(Q) = 1 is a group under matrix multiplication, with identity element the identity matrix In ; as in (6), we have Q−1 = Q>. This group is called the special orthogonal group or rotation group and is usually denoted by SO(n). The groups in (5)–(8) are nonabelian for n ≥ 2, except for SO(2) which is abelian (but O(2) is not abelian). It is customary to denote the operation of an abelian group G by +, in which case the inverse a−1 of an element a ∈ G is denoted by −a. The identity element of a group is unique. In fact, we can prove a more general fact: Proposition 2.1. For any binary operation · : M × M → M , if e0 ∈ M is a left identity and if e00 ∈ M is a right identity, which means that e0 · a = a for all a ∈ M (G2l) and a · e00 = a for all a ∈ M, (G2r) then e0 = e00. Proof. If we let a = e00 in equation (G2l), we get e0 · e00 = e00 , and if we let a = e0 in equation (G2r), we get e0 · e00 = e0 , and thus e0 = e0 · e00 = e00 , as claimed. Proposition 2.1 implies that the identity element of a monoid is unique, and since every group is a monoid, the identity element of a group is unique. Furthermore, every element in a group has a unique inverse. This is a consequence of a slightly more general fact: 24 CHAPTER 2. GROUPS, RINGS, AND FIELDS Proposition 2.2. In a monoid M with identity element e, if some element a ∈ M has some left inverse a0 ∈ M and some right inverse a00 ∈ M , which means that a0 · a = e (G3l) and a · a00 = e, (G3r) then a0 = a00. Proof. Using (G3l) and the fact that e is an identity element, we have (a0 · a) · a00 = e · a00 = a00. Similarly, Using (G3r) and the fact that e is an identity element, we have a0 · (a · a00 ) = a0 · e = a0. However, since M is monoid, the operation · is associative, so a0 = a0 · (a · a00 ) = (a0 · a) · a00 = a00 , as claimed. Remark: Axioms (G2) and (G3) can be weakened a bit by requiring only (G2r) (the exis- tence of a right identity) and (G3r) (the existence of a right inverse for every element) (or (G2l) and (G3l)). It is a good exercise to prove that the group axioms (G2) and (G3) follow from (G2r) and (G3r). Another important property about inverse elements in monoids is stated below. Proposition 2.3. In a monoid M with identity element e, if a and b are invertible elements of M , where a−1 is the inverse of a and b−1 is the inverse of b, then ab is invertible and its inverse is given by (ab)−1 = b−1 a−1. Proof. Using associativity and the fact that e is the identity element we have (ab)(b−1 a−1 ) = a(b(b−1 a−1 )) associativity = a((bb−1 )a−1 ) associativity = a(ea−1 ) b−1 is the inverse of b = aa−1 e is the identity element = e. a−1 is the inverse of a. 2.1. GROUPS, SUBGROUPS, COSETS 25 We also have (b−1 a−1 )(ab) = b−1 (a−1 (ab)) associativity = b−1 ((a−1 a)b) associativity = b−1 (eb) a−1 is the inverse of a = b−1 b e is the identity element = e. b−1 is the inverse of b. Therefore b−1 a−1 is the inverse of ab. Observe that the inverse of ba is a−1 b−1. Proposition 2.3 implies that the set of invertible elements of a monoid M is a group, also with identity element e. Definition 2.2. If a group G has a finite number n of elements, we say that G is a group of order n. If G is infinite, we say that G has infinite order. The order of a group is usually denoted by |G| (if G is finite). Given a group G, for any two subsets R, S ⊆ G, we let RS = {r · s | r ∈ R, s ∈ S}. In particular, for any g ∈ G, if R = {g}, we write gS = {g · s | s ∈ S}, and similarly, if S = {g}, we write Rg = {r · g | r ∈ R}. From now on, we will drop the multiplication sign and write g1 g2 for g1 · g2. Definition 2.3. Let G be a group. For any g ∈ G, define Lg , the left translation by g, by Lg (a) = ga, for all a ∈ G, and Rg , the right translation by g, by Rg (a) = ag, for all a ∈ G. The following simple fact is often used. Proposition 2.4. Given a group G, the translations Lg and Rg are bijections. Proof. We show this for Lg , the proof for Rg being similar. If Lg (a) = Lg (b), then ga = gb, and multiplying on the left by g −1 , we get a = b, so Lg injective. For any b ∈ G, we have Lg (g −1 b) = gg −1 b = b, so Lg is surjective. Therefore, Lg is bijective. Definition 2.4. Given a group G, a subset H of G is a subgroup of G iff (1) The identity element e of G also belongs to H (e ∈ H); 26 CHAPTER 2. GROUPS, RINGS, AND FIELDS (2) For all h1 , h2 ∈ H, we have h1 h2 ∈ H; (3) For all h ∈ H, we have h−1 ∈ H. The proof of the following proposition is left as an exercise. Proposition 2.5. Given a group G, a subset H ⊆ G is a subgroup of G iff H is nonempty and whenever h1 , h2 ∈ H, then h1 h−1 2 ∈ H. If the group G is finite, then the following criterion can be used. Proposition 2.6. Given a finite group G, a subset H ⊆ G is a subgroup of G iff (1) e ∈ H; (2) H is closed under multiplication. Proof. We just have to prove that Condition (3) of Definition 2.4 holds. For any a ∈ H, since the left translation La is bijective, its restriction to H is injective, and since H is finite, it is also bijective. Since e ∈ H, there is a unique b ∈ H such that La (b) = ab = e. However, if a−1 is the inverse of a in G, we also have La (a−1 ) = aa−1 = e, and by injectivity of La , we have a−1 = b ∈ H. Example 2.2. 1. For any integer n ∈ Z, the set nZ = {nk | k ∈ Z} is a subgroup of the group Z. 2. The set of matrices GL+ (n, R) = {A ∈ GL(n, R) | det(A) > 0} is a subgroup of the group GL(n, R). 3. The group SL(n, R) is a subgroup of the group GL(n, R). 4. The group O(n) is a subgroup of the group GL(n, R). 5. The group SO(n) is a subgroup of the group O(n), and a subgroup of the group SL(n, R). 2.1. GROUPS, SUBGROUPS, COSETS 27 6. It is not hard to show that every 2 × 2 rotation matrix R ∈ SO(2) can be written as cos θ − sin θ R= , with 0 ≤ θ < 2π. sin θ cos θ Then SO(2) can be considered as a subgroup of SO(3) by viewing the matrix cos θ − sin θ R= sin θ cos θ as the matrix cos θ − sin θ 0 Q = sin θ cos θ 0. 0 0 1 7. The set of 2 × 2 upper-triangular matrices of the form a b a, b, c ∈ R, a, c 6= 0 0 c is a subgroup of the group GL(2, R). 8. The set V consisting of the four matrices ±1 0 0 ±1 is a subgroup of the group GL(2, R) called the Klein four-group. Definition 2.5. If H is a subgroup of G and g ∈ G is any element, the sets of the form gH are called left cosets of H in G and the sets of the form Hg are called right cosets of H in G. The left cosets (resp. right cosets) of H induce an equivalence relation ∼ defined as follows: For all g1 , g2 ∈ G, g1 ∼ g2 iff g1 H = g2 H (resp. g1 ∼ g2 iff Hg1 = Hg2 ). Obviously, ∼ is an equivalence relation. Now, we claim the following fact: Proposition 2.7. Given a group G and any subgroup H of G, we have g1 H = g2 H iff g2−1 g1 H = H iff g2−1 g1 ∈ H, for all g1 , g2 ∈ G. Proof. If we apply the bijection Lg2−1 to both g1 H and g2 H we get Lg2−1 (g1 H) = g2−1 g1 H and Lg2−1 (g2 H) = H, so g1 H = g2 H iff g2−1 g1 H = H. If g2−1 g1 H = H, since 1 ∈ H, we get g2−1 g1 ∈ H. Conversely, if g2−1 g1 ∈ H, since H is a group, the left translation Lg2−1 g1 is a bijection of H, so g2−1 g1 H = H. Thus, g2−1 g1 H = H iff g2−1 g1 ∈ H. 28 CHAPTER 2. GROUPS, RINGS, AND FIELDS It follows that the equivalence class of an element g ∈ G is the coset gH (resp. Hg). Since Lg is a bijection between H and gH, the cosets gH all have the same cardinality. The map Lg−1 ◦ Rg is a bijection between the left coset gH and the right coset Hg, so they also have the same cardinality. Since the distinct cosets gH form a partition of G, we obtain the following fact: Proposition 2.8. (Lagrange) For any finite group G and any subgroup H of G, the order h of H divides the order n of G. Definition 2.6. Given a finite group G and a subgroup H of G, if n = |G| and h = |H|, then the ratio n/h is denoted by (G : H) and is called the index of H in G. The index (G : H) is the number of left (and right) cosets of H in G. Proposition 2.8 can be stated as |G| = (G : H)|H|. The set of left cosets of H in G (which, in general, is not a group) is denoted G/H. The “points” of G/H are obtained by “collapsing” all the elements in a coset into a single element. Example 2.3. 1. Let n be any positive integer, and consider the subgroup nZ of Z (under addition). The coset of 0 is the set {0}, and the coset of any nonzero integer m ∈ Z is m + nZ = {m + nk | k ∈ Z}. By dividing m by n, we have m = nq + r for some unique r such that 0 ≤ r ≤ n − 1. But then we see that r is the smallest positive element of the coset m + nZ. This implies that there is a bijection betwen the cosets of the subgroup nZ of Z and the set of residues {0, 1,... , n − 1} modulo n, or equivalently a bijection with Z/nZ. 2. The cosets of SL(n, R) in GL(n, R) are the sets of matrices A SL(n, R) = {AB | B ∈ SL(n, R)}, A ∈ GL(n, R). Since A is invertible, det(A) 6= 0, and we can write A = (det(A))1/n ((det(A))−1/n A) if det(A) > 0 and A = (− det(A))1/n ((− det(A))−1/n A) if det(A) < 0. But we have (det(A))−1/n A ∈ SL(n, R) if det(A) > 0 and −(− det(A))−1/n A ∈ SL(n, R) if det(A) < 0, so the coset A SL(n, R) contains the matrix (det(A))1/n In if det(A) > 0, −(− det(A))1/n In if det(A) < 0. It follows that there is a bijection between the cosets of SL(n, R) in GL(n, R) and R. 2.1. GROUPS, SUBGROUPS, COSETS 29 3. The cosets of SO(n) in GL+ (n, R) are the sets of matrices A SO(n) = {AQ | Q ∈ SO(n)}, A ∈ GL+ (n, R). It can be shown (using the polar form for matrices) that there is a bijection between the cosets of SO(n) in GL+ (n, R) and the set of n × n symmetric, positive, definite matrices; these are the symmetric matrices whose eigenvalues are strictly positive. 4. The cosets of SO(2) in SO(3) are the sets of matrices Q SO(2) = {QR | R ∈ SO(2)}, Q ∈ SO(3). The group SO(3) moves the points on the sphere S 2 in R3 , namely for any x ∈ S 2 , x 7→ Qx for any rotation Q ∈ SO(3). Here, S 2 = {(x, y, z) ∈ R3 | x2 + y 2 + z 2 = 1}. Let N = (0, 0, 1) be the north pole on the sphere S 2. Then it is not hard to show that SO(2) is precisely the subgroup of SO(3) that leaves N fixed. As a consequence, all rotations QR in the coset Q SO(2) map N to the same point QN ∈ S 2 , and it can be shown that there is a bijection between the cosets of SO(2) in SO(3) and the points on S 2. The surjectivity of this map has to do with the fact that the action of SO(3) on S 2 is transitive, which means that for any point x ∈ S 2 , there is some rotation Q ∈ SO(3) such that QN = x. It is tempting to define a multiplication operation on left cosets (or right cosets) by setting (g1 H)(g2 H) = (g1 g2 )H, but this operation is not well defined in general, unless the subgroup H possesses a special property. In Example 2.3, it is possible to define multiplication of cosets in (1), but it is not possible in (2) and (3). The property of the subgroup H that allows defining a multiplication operation on left cosets is typical of the kernels of group homomorphisms, so we are led to the following definition. Definition 2.7. Given any two groups G and G0 , a function ϕ : G → G0 is a homomorphism iff ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 ), for all g1 , g2 ∈ G. Taking g1 = g2 = e (in G), we see that ϕ(e) = e0 , and taking g1 = g and g2 = g −1 , we see that ϕ(g −1 ) = (ϕ(g))−1. 30 CHAPTER 2. GROUPS, RINGS, AND FIELDS Example 2.4. 1. The map ϕ : Z → Z/nZ given by ϕ(m) = m mod n for all m ∈ Z is a homomorphism. 2. The map det : GL(n, R) → R is a homomorphism because det(AB) = det(A) det(B) for any two matrices A, B. Similarly, the map det : O(n) → R is a homomorphism. If ϕ : G → G0 and ψ : G0 → G00 are group homomorphisms, then ψ ◦ ϕ : G → G00 is also a homomorphism. If ϕ : G → G0 is a homomorphism of groups, and if H ⊆ G, H 0 ⊆ G0 are two subgroups, then it is easily checked that Im ϕ = ϕ(H) = {ϕ(g) | g ∈ H} is a subgroup of G0 and ϕ−1 (H 0 ) = {g ∈ G | ϕ(g) ∈ H 0 } is a subgroup of G. In particular, when H 0 = {e0 }, we obtain the kernel , Ker ϕ, of ϕ. Definition 2.8. If ϕ : G → G0 is a homomorphism of groups, and if H ⊆ G is a subgroup of G, then the subgroup of G0 , Im ϕ = ϕ(H) = {ϕ(g) | g ∈ H}, is called the image of H by ϕ, and the subgroup of G, Ker ϕ = {g ∈ G | ϕ(g) = e0 }, is called the kernel of ϕ. Example 2.5. 1. The kernel of the homomorphism ϕ : Z → Z/nZ is nZ. 2. The kernel of the homomorphism det : GL(n, R) → R is SL(n, R). Similarly, the kernel of the homomorphism det : O(n) → R is SO(n). The following characterization of the injectivity of a group homomorphism is used all the time. Proposition 2.9. If ϕ : G → G0 is a homomorphism of groups, then ϕ : G → G0 is injective iff Ker ϕ = {e}. (We also write Ker ϕ = (0).) Proof. Assume ϕ is injective. Since ϕ(e) = e0 , if ϕ(g) = e0 , then ϕ(g) = ϕ(e), and by injectivity of ϕ we must have g = e, so Ker ϕ = {e}. Conversely, assume that Ker ϕ = {e}. If ϕ(g1 ) = ϕ(g2 ), then by multiplication on the left by (ϕ(g1 ))−1 we get e0 = (ϕ(g1 ))−1 ϕ(g1 ) = (ϕ(g1 ))−1 ϕ(g2 ), 2.1. GROUPS, SUBGROUPS, COSETS 31 and since ϕ is a homomorphism (ϕ(g1 ))−1 = ϕ(g1−1 ), so e0 = (ϕ(g1 ))−1 ϕ(g2 ) = ϕ(g1−1 )ϕ(g2 ) = ϕ(g1−1 g2 ). This shows that g1−1 g2 ∈ Ker ϕ, but since Ker ϕ = {e} we have g1−1 g2 = e, and thus g2 = g1 , proving that ϕ is injective. Definition 2.9. We say that a group homomorphism ϕ : G → G0 is an isomorphism if there is a homomorphism ψ : G0 → G, so that ψ ◦ ϕ = idG and ϕ ◦ ψ = idG0. (†) If ϕ is an isomorphism we say that the groups G and G0 are isomorphic. When G0 = G, a group isomorphism is called an automorphism. The reasoning used in the proof of Proposition 2.2 shows that if a a group homomorphism ϕ : G → G0 is an isomorphism, then the homomorphism ψ : G0 → G satisfying Condition (†) is unique. This homomorphism is denoted ϕ−1. The left translations Lg and the right translations Rg are automorphisms of G. Suppose ϕ : G → G0 is a bijective homomorphism, and let ϕ−1 be the inverse of ϕ (as a function). Then for all a, b ∈ G, we have ϕ(ϕ−1 (a)ϕ−1 (b)) = ϕ(ϕ−1 (a))ϕ(ϕ−1 (b)) = ab, and so ϕ−1 (ab) = ϕ−1 (a)ϕ−1 (b), which proves that ϕ−1 is a homomorphism. Therefore, we proved the following fact. Proposition 2.10. A bijective group homomorphism ϕ : G → G0 is an isomorphism. Observe that the property gH = Hg, for all g ∈ G. (∗) is equivalent by multiplication on the right by g −1 to gHg −1 = H, for all g ∈ G, and the above is equivalent to gHg −1 ⊆ H, for all g ∈ G. (∗∗) This is because gHg −1 ⊆ H implies H ⊆ g −1 Hg, and this for all g ∈ G. Proposition 2.11. Let ϕ : G → G0 be a group homomorphism. Then H = Ker ϕ satisfies Property (∗∗), and thus Property (∗). 32 CHAPTER 2. GROUPS, RINGS, AND FIELDS Proof. We have ϕ(ghg −1 ) = ϕ(g)ϕ(h)ϕ(g −1 ) = ϕ(g)e0 ϕ(g)−1 = ϕ(g)ϕ(g)−1 = e0 , for all h ∈ H = Ker ϕ and all g ∈ G. Thus, by definition of H = Ker ϕ, we have gHg −1 ⊆ H. Definition 2.10. For any group G, a subgroup N of G is a normal subgroup of G iff gN g −1 = N, for all g ∈ G. This is denoted by N C G. Proposition 2.11 shows that the kernel Ker ϕ of a homomorphism ϕ : G → G0 is a normal subgroup of G. Observe that if G is abelian, then every subgroup of G is normal. Consider Example 2.2. Let R ∈ SO(2) and A ∈ SL(2, R) be the matrices 0 −1 1 1 R= , A=. 1 0 0 1 Then −1 1 −1 A = 0 1 and we have −1 1 1 0 −1 1 −1 1 −1 1 −1 1 −2 ARA = = = , 0 1 1 0 0 1 1 0 0 1 1 −1 and clearly ARA−1 ∈ / SO(2). Therefore SO(2) is not a normal subgroup of SL(2, R). The same counter-example shows that O(2) is not a normal subgroup of GL(2, R). Let R ∈ SO(2) and Q ∈ SO(3) be the matrices 0 −1 0 1 0 0 R = 1 0 0 , Q = 0 0 −1. 0 0 1 0 1 0 Then 1 0 0 Q−1 = Q> = 0 0 1 0 −1 0 2.1. GROUPS, SUBGROUPS, COSETS 33 and we have 1 0 0 0 −1 0 1 0 0 0 −1 0 1 0 0 QRQ−1 = 0 0 −1 1 0 0 0 0 1 = 0 0 −1 0 0 1 0 1 0 0 0 1 0 −1 0 1 0 0 0 −1 0 0 0 −1 = 0 1 0 . 1 0 0 Observe that QRQ−1 ∈ / SO(2), so SO(2) is not a normal subgroup of SO(3). Let T and A ∈ GL(2, R) be the following matrices 1 1 0 1 T = , A=. 0 1 1 0 We have −1 0 1 A = = A, 1 0 and −1 0 1 1 1 0 1 0 1 0 1 1 0 AT A = = =. 1 0 0 1 1 0 1 1 1 0 1 1 The matrix T is upper triangular, but AT A−1 is not, so the group of 2 × 2 upper triangular matrices is not a normal subgroup of GL(2, R). Let Q ∈ V and A ∈ GL(2, R) be the following matrices 1 0 1 1 Q= , A=. 0 −1 0 1 We have −1 1 −1 A = 0 1 and −1 1 1 1 0 1 −1 1 −1 1 −1 1 −2 AQA = = =. 0 1 0 −1 0 1 0 −1 0 1 0 −1 Clearly AQA−1 ∈ / V , which shows that the Klein four group is not a normal subgroup of GL(2, R). The reader should check that the subgroups nZ, GL+ (n, R), SL(n, R), and SO(n, R) as a subgroup of O(n, R), are normal subgroups. If N is a normal subgroup of G, the equivalence relation ∼ induced by left cosets (see Definition 2.5) is the same as the equivalence induced by right cosets. Furthermore, this equivalence relation is a congruence, which means that: For all g1 , g2 , g10 , g20 ∈ G, 34 CHAPTER 2. GROUPS, RINGS, AND FIELDS (1) If g1 N = g10 N and g2 N = g20 N , then g1 g2 N = g10 g20 N , and (2) If g1 N = g2 N , then g1−1 N = g2−1 N. As a consequence, we can define a group structure on the set G/ ∼ of equivalence classes modulo ∼, by setting (g1 N )(g2 N ) = (g1 g2 )N. Definition 2.11. Let G be a group and N be a normal subgroup of G. The group obtained by defining the multiplication of (left) cosets by (g1 N )(g2 N ) = (g1 g2 )N, g1 , g2 ∈ G is denoted G/N , and called the quotient of G by N. The equivalence class gN of an element g ∈ G is also denoted g (or [g]). The map π : G → G/N given by π(g) = g = gN is a group homomorphism called the canonical projection. Since the kernel of a homomorphism is a normal subgroup, we obtain the following very useful result. Proposition 2.12. Given a homomorphism of groups ϕ : G → G0 , the groups G/Ker ϕ and Im ϕ = ϕ(G) are isomorphic. Proof. Since ϕ is surjective onto its image, we may assume that ϕ is surjective, so that G0 = Im ϕ. We define a map ϕ : G/Ker ϕ → G0 as follows: ϕ(g) = ϕ(g), g ∈ G. We need to check that the definition of this map does not depend on the representative chosen in the coset g = g Ker ϕ, and that it is a homomorphism. If g 0 is another element in the coset g Ker ϕ, which means that g 0 = gh for some h ∈ Ker ϕ, then ϕ(g 0 ) = ϕ(gh) = ϕ(g)ϕ(h) = ϕ(g)e0 = ϕ(g), since ϕ(h) = e0 as h ∈ Ker ϕ. This shows that ϕ(g 0 ) = ϕ(g 0 ) = ϕ(g) = ϕ(g), so the map ϕ is well defined. It is a homomorphism because ϕ(gg 0 ) = ϕ(gg 0 ) = ϕ(gg 0 ) = ϕ(g)ϕ(g 0 ) = ϕ(g)ϕ(g 0 ). The map ϕ is injective because ϕ(g) = e0 iff ϕ(g) = e0 iff g ∈ Ker ϕ, iff g = e. The map ϕ is surjective because ϕ is surjective. Therefore ϕ is a bijective homomorphism, and thus an isomorphism, as claimed. 2.2. CYCLIC GROUPS 35 Proposition 2.12 is called the first isomorphism theorem. A useful way to construct groups is the direct product construction. Definition 2.12. Given two groups G an H, we let G × H be the Cartestian product of the sets G and H with the multiplication operation · given by (g1 , h1 ) · (g2 , h2 ) = (g1 g2 , h1 h2 ). It is immediately verified that G × H is a group called the direct product of G and H. Similarly, given any n groups G1 ,... , Gn , we can define the direct product G1 × · · · × Gn is a similar way. If G is an abelian group and H1 ,... , Hn are subgroups of G, the situation is simpler. Consider the map a : H1 × · · · × Hn → G given by a(h1 ,... , hn ) = h1 + · · · + hn , using + for the operation of the group G. It is easy to verify that a is a group homomorphism, so its image is a subgroup of G denoted by H1 + · · · + Hn , and called the sum of the groups Hi. The following proposition will be needed. Proposition 2.13. Given an abelian group G, if H1 and H2 are any subgroups of G such that H1 ∩ H2 = {0}, then the map a is an isomorphism a : H1 × H2 → H1 + H2. Proof. The map is surjective by definition, so we just have to check that it is injective. For this, we show that Ker a = {(0, 0)}. We have a(a1 , a2 ) = 0 iff a1 + a2 = 0 iff a1 = −a2. Since a1 ∈ H1 and a2 ∈ H2 , we see that a1 , a2 ∈ H1 ∩ H2 = {0}, so a1 = a2 = 0, which proves that Ker a = {(0, 0)}. Under the conditions of Proposition 2.13, namely H1 ∩ H2 = {0}, the group H1 + H2 is called the direct sum of H1 and H2 ; it is denoted by H1 ⊕ H2 , and we have an isomorphism H1 × H2 ∼ = H1 ⊕ H2. 2.2 Cyclic Groups Given a group G with unit element 1, for any element g ∈ G and for any natural number n ∈ N, we define g n as follows: g0 = 1 g n+1 = g · g n. 36 CHAPTER 2. GROUPS, RINGS, AND FIELDS For any integer n ∈ Z, we define g n by ( n gn if n ≥ 0 g = (g −1 )(−n) if n < 0. The following properties are easily verified: g i · g j = g i+j (g i )−1 = g −i gi · gj = gj · gi, for all i, j ∈ Z. Define the subset hgi of G by hgi = {g n | n ∈ Z}. The following proposition is left as an exercise. Proposition 2.14. Given a group G, for any element g ∈ G, the set hgi is the smallest abelian subgroup of G containing g. Definition 2.13. A group G is cyclic iff there is some element g ∈ G such that G = hgi. An element g ∈ G with this property is called a generator of G. The Klein four group V of Example 2.2 is abelian, but not cyclic. This is because V has four elements, but all the elements different from the identity have order 2. Cyclic groups are quotients of Z. For this, we use a basic property of Z. Recall that for any n ∈ Z, we let nZ denote the set of multiples of n, nZ = {nk | k ∈ Z}. Proposition 2.15. Every subgroup H of Z is of the form H = nZ for some n ∈ N. Proof. If H is the trivial group {0}, then let n = 0. If H is nontrivial, for any nonzero element m ∈ H, we also have −m ∈ H and either m or −m is positive, so let n be the smallest positive integer in H. By Proposition 2.14, nZ is the smallest subgroup of H containing n. For any m ∈ H with m 6= 0, we can write m = nq + r, with 0 ≤ r < n. Now, since nZ ⊆ H, we have nq ∈ H, and since m ∈ H, we get r = m − nq ∈ H. However, 0 ≤ r < n, contradicting the minimality of n, so r = 0, and H = nZ. 2.2. CYCLIC GROUPS 37 Given any cyclic group G, for any generator g of G, we can define a mapping ϕ : Z → G by ϕ(m) = g m. Since g generates G, this mapping is surjective. The mapping ϕ is clearly a group homomorphism, so let H = Ker ϕ be its kernel. By a previous observation, H = nZ for some n ∈ Z, so by the first homomorphism theorem, we obtain an isomorphism ϕ : Z/nZ −→ G from the quotient group Z/nZ onto G. Obviously, if G has finite order, then |G| = n. In summary, we have the following result. Proposition 2.16. Every cyclic group G is either isomorphic to Z, or to Z/nZ, for some natural number n > 0. In the first case, we say that G is an infinite cyclic group, and in the second case, we say that G is a cyclic group of order n. The quotient group Z/nZ consists of the cosets m + nZ = {m + nk | k ∈ Z}, with m ∈ Z, that is, of the equivalence classes of Z under the equivalence relation ≡ defined such that x≡y iff x − y ∈ nZ iff x ≡ y (mod n). We also denote the equivalence class x + nZ of x by x, or if we want to be more precise by [x]n. The group operation is given by x + y = x + y. For every x ∈ Z, there is a unique representative, x mod n (the nonnegative remainder of the division of x by n) in the class x of x, such that 0 ≤ x mod n ≤ n − 1. For this reason, we often identity Z/nZ with the set {0,... , n − 1}. To be more rigorous, we can give {0,... , n − 1} a group structure by defining +n such that x +n y = (x + y) mod n. Then, it is easy to see that {0,... , n − 1} with the operation +n is a group with identity element 0 isomorphic to Z/nZ. We can also define a multiplication operation · on Z/nZ as follows: a · b = ab = ab mod n. Then, it is easy to check that · is abelian, associative, that 1 is an identity element for ·, and that · is distributive on the left and on the right with respect to addition. This makes Z/nZ into a commutative ring. We usually suppress the dot and write a b instead of a · b. Proposition 2.17. Given any integer n ≥ 1, for any a ∈ Z, the residue class a ∈ Z/nZ is invertible with respect to multiplication iff gcd(a, n) = 1. 38 CHAPTER 2. GROUPS, RINGS, AND FIELDS Proof. If a has inverse b in Z/nZ, then a b = 1, which means that ab ≡ 1 (mod n), that is ab = 1 + nk for some k ∈ Z, which is the Bezout identity ab − nk = 1 and implies that gcd(a, n) = 1. Conversely, if gcd(a, n) = 1, then by Bezout’s identity there exist u, v ∈ Z such that au + nv = 1, so au = 1 − nv, that is, au ≡ 1 (mod n), which means that a u = 1, so a is invertible in Z/nZ. Definition 2.14. The group (under multiplication) of invertible elements of the ring Z/nZ is denoted by (Z/nZ)∗. Note that this group is abelian and only defined if n ≥ 2. The Euler ϕ-function plays an important role in the theory of the groups (Z/nZ)∗. Definition 2.15. Given any positive integer n ≥ 1, the Euler ϕ-function (or Euler totient function) is defined such that ϕ(n) is the number of integers a, with 1 ≤ a ≤ n, which are relatively prime to n; that is, with gcd(a, n) = 1.1 Then, by Proposition 2.17, we see that the group (Z/nZ)∗ has order ϕ(n). For n = 2, (Z/2Z)∗ = {1}, the trivial group. For n = 3, (Z/3Z)∗ = {1, 2}, and for n = 4, we have (Z/4Z)∗ = {1, 3}. Both groups are isomorphic to the group {−1, 1}. Since gcd(a, n) = 1 for every a ∈ {1,... , n − 1} iff n is prime, by Proposition 2.17 we see that (Z/nZ)∗ = Z/nZ − {0} iff n is prime. 2.3 Rings and Fields The groups Z, Q, R, C, Z/nZ, and Mn (R) are more than abelian groups, they are also commutative rings. Furthermore, Q, R, and C are fields. We now introduce rings and fields. Definition 2.16. A ring is a set A equipped with two operations + : A × A → A (called addition) and ∗ : A × A → A (called multiplication) having the following properties: (R1) A is an abelian group w.r.t. +; (R2) ∗ is associative and has an identity element 1 ∈ A; 1 We allow a = n to accomodate the special case n = 1. 2.3. RINGS AND FIELDS 39 (R3) ∗ is distributive w.r.t. +. The identity element for addition is denoted 0, and the additive inverse of a ∈ A is denoted by −a. More explicitly, the axioms of a ring are the following equations which hold for all a, b, c ∈ A: a + (b + c) = (a + b) + c (associativity of +) (2.1) a+b=b+a (commutativity of +) (2.2) a+0=0+a=a (zero) (2.3) a + (−a) = (−a) + a = 0 (additive inverse) (2.4) a ∗ (b ∗ c) = (a ∗ b) ∗ c (associativity of ∗) (2.5) a∗1=1∗a=a (identity for ∗) (2.6) (a + b) ∗ c = (a ∗ c) + (b ∗ c) (distributivity) (2.7) a ∗ (b + c) = (a ∗ b) + (a ∗ c) (distributivity) (2.8) The ring A is commutative if a ∗ b = b ∗ a for all a, b ∈ A. From (2.7) and (2.8), we easily obtain a∗0 = 0∗a=0 (2.9) a ∗ (−b) = (−a) ∗ b = −(a ∗ b). (2.10) Note that (2.9) implies that if 1 = 0, then a = 0 for all a ∈ A, and thus, A = {0}. The ring A = {0} is called the trivial ring. A ring for which 1 6= 0 is called nontrivial. The multiplication a ∗ b of two elements a, b ∈ A is often denoted by ab. Example 2.6. 1. The additive groups Z, Q, R, C, are commutative rings. 2. For any positive integer n ∈ N, the group Z/nZ is a group under addition. We can also define a multiplication operation by a · b = ab = ab mod n, for all a, b ∈ Z. The reader will easily check that the ring axioms are satisfied, with 0 as zero and 1 as multiplicative unit. The resulting ring is denoted by Z/nZ.2 3. The group R[X] of polynomials in one variable with real coefficients is a ring under multiplication of polynomials. It is a commutative ring. 2 The notation Zn is sometimes used instead of Z/nZ but it clashes with the notation for the n-adic integers so we prefer not to use it. 40 CHAPTER 2. GROUPS, RINGS, AND FIELDS 4. Let d be any positive integer. If d is not divisible by any integer of the form m2 , with m ∈ N and m ≥ 2, then we say that d is square-free. For example, d = 1, 2, 3, 5, 6, 7, 10 are square-free, but 4, 8, 9, 12 are not square-free. If d is any square-free integer and if d ≥ 2, then the set of real numbers √ √ Z[ d] = {a + b d ∈ R | a, b ∈ Z} √ √ √ is a commutative a ring. If z = a + b d ∈ Z[ d], we write z = a − b d. Note that zz = a2 − db2. 5. Similarly, if d ≥ 1 is a positive square-free integer, then the set of complex numbers √ √ Z[ −d] = {a + ib d ∈ C | a, b ∈ Z} √ √ √ is a commutative ring. If z = a + ib d ∈ Z[ −d], we write z = a − ib d. Note that zz = a2 + db2 √. The case where d = 1 is a famous example that was investigated by Gauss, and Z[ −1], also denoted Z[i], is called the ring of Gaussian integers. 6. The group of n × n matrices Mn (R) is a ring under matrix multiplication. However, it is not a commutative ring. 7. The group C(a, b) of continuous functions f : (a, b) → R is a ring under the operation f · g defined such that (f · g)(x) = f (x)g(x) for all x ∈ (a, b). Definition 2.17. Given a ring A, for any element a ∈ A, if there is some element b ∈ A such that b 6= 0 and ab = 0, then we say that a is a zero divisor. A ring A is an integral domain (or an entire ring) if 0 6= 1, A is commutative, and ab = 0 implies that a = 0 or b = 0, for all a, b ∈ A. In other words, an integral domain is a nontrivial commutative ring with no zero divisors besides 0. Example 2.7. 1. The rings Z, Q, R, C, are integral domains. 2. The ring R[X] of polynomials in one variable with real coefficients is an integral domain. 3. For any positive integer, n ∈ N, we have the ring Z/nZ. Observe that if n is composite, then this ring has zero-divisors. For example, if n = 4, then we have 2·2≡0 (mod 4). The reader should prove that Z/nZ is an integral domain iff n is prime (use Proposition 2.17). 2.3. RINGS AND FIELDS 41 √ 4. If d is a square-free positive integer and if d ≥ 2, the ring Z[ d] is√an integral domain. Similarly, if d ≥ 1 is a square-free positive integer, the ring Z[ −d] is an integral domain. Finding the invertible elements of these rings is a very interesting problem. 5. The ring of n × n matrices Mn (R) has zero divisors. A homomorphism between rings is a mapping preserving addition and multiplication (and 0 and 1). Definition 2.18. Given two rings A and B, a homomorphism between A and B is a function h : A → B satisfying the following conditions for all x, y ∈ A: h(x + y) = h(x) + h(y) h(xy) = h(x)h(y) h(0) = 0 h(1) = 1. Actually, because B is a group under addition, h(0) = 0 follows from h(x + y) = h(x) + h(y). Example 2.8. 1. If A is a ring, for any integer n ∈ Z, for any a ∈ A, we define n · a by n·a=a | + ·{z · · + a} n if n ≥ 0 (with 0 · a = 0) and n · a = −(−n) · a if n < 0. Then, the map h : Z → A given by h(n) = n · 1A is a ring homomorphism (where 1A is the multiplicative identity of A). 2. Given any real λ ∈ R, the evaluation map ηλ : R[X] → R defined by ηλ (f (X)) = f (λ) for every polynomial f (X) ∈ R[X] is a ring homomorphism. Definition 2.19. A ring homomorphism h : A → B is an isomorphism iff there is a ring homomorphism g : B → A such that g ◦ f = idA and f ◦ g = idB. An isomorphism from a ring to itself is called an automorphism. 42 CHAPTER 2. GROUPS, RINGS, AND FIELDS As in the case of a group isomorphism, the homomorphism g is unique and denoted by −1 h , and it is easy to show that a bijective ring homomorphism h : A → B is an isomorphism. Definition 2.20. Given a ring A, a subset A0 of A is a subring of A if A0 is a subgroup of A (under addition), is closed under multiplication, and contains 1. For example, we have the following sequence in which every ring on the left of an inlcusion sign is a subring of the ring on the right of the inclusion sign: Z ⊆ Q ⊆ R ⊆ C. √ √ √ √ Z is a subring of both Z[ d] and Z[ −d], the ring Z[ d] is a subring of R and the The ring ring Z[ −d] is a subring of C. If h : A → B is a homomorphism of rings, then it is easy to show for any subring A0 , the image h(A0 ) is a subring of B, and for any subring B 0 of B, the inverse image h−1 (B 0 ) is a subring of A. As for groups, the kernel of a ring homomorphism h : A → B is defined by Ker h = {a ∈ A | h(a) = 0}. Just as in the case of groups, we have the following criterion for the injectivity of a ring homomorphism. The proof is identical to the proof for groups. Proposition 2.18. If h : A → B is a homomorphism of rings, then h : A → B is injective iff Ker h = {0}. (We also write Ker h = (0).) The kernel of a ring homomorphism is an abelian subgroup of the additive group A, but in general it is not a subring of A, because it may not contain the multiplicative identity element 1. However, it satisfies the following closure property under multiplication: ab ∈ Ker h and ba ∈ Ker h for all a ∈ Ker h and all b ∈ A. This is because if h(a) = 0, then for all b ∈ A we have h(ab) = h(a)h(b) = 0h(b) = 0 and h(ba) = h(b)h(a) = h(b)0 = 0. Definition 2.21. Given a ring A, an additive subgroup I of A satisfying the property below ab ∈ I and ba ∈ I for all a ∈ I and all b ∈ A (∗ideal ) is called a two-sided ideal. If A is a commutative ring, we simply say an ideal. It turns out that for any ring A and any two-sided ideal I, the set A/I of additive cosets a + I (with a ∈ A) is a ring called a quotient ring. Then we have the following analog of Proposition 2.12, also called the first isomorphism theorem. 2.3. RINGS AND FIELDS 43 Proposition 2.19. Given a homomorphism of rings h : A → B, the rings A/Ker h and Im h = h(A) are isomorphic. A field is a commutative ring K for which K − {0} is a group under multiplication. Definition 2.22. A set K is a field if it is a ring and the following properties hold: (F1) 0 6= 1; (F2) For every a ∈ K, if a 6= 0, then a has an inverse w.r.t. ∗; (F3) ∗ is commutative. Let K ∗ = K − {0}. Observe that (F1) and (F2) are equivalent to the fact that K ∗ is a group w.r.t. ∗ with identity element 1. If ∗ is not commutative but (F1) and (F2) hold, we say that we have a skew field (or noncommutative field ). Note that we are assuming that the operation ∗ of a field is commutative. This convention is not universally adopted, but since ∗ will be commutative for most fields we will encounter, we may as well include this condition in the definition. Example 2.9. 1. The rings Q, R, and C are fields. 2. The set of (formal) fractions f (X)/g(X) of polynomials f (X), g(X) ∈ R[X], where g(X) is not the null polynomial, is a field. 3. The ring C(a, b) of continuous functions f : (a, b) → R such that f (x) 6= 0 for all x ∈ (a, b) is a field. 4. Using Proposition 2.17, it is easy to see that the ring Z/pZ is a field iff p is prime. 5. If d is a square-free positive integer and if d ≥ 2, the set √ √ Q( d) = {a + b d ∈ R | a, b ∈ Q} √ √ √ is a field. If z = a + b d ∈ Q( d) and z = a − b d, then it is easy to check that if z 6= 0, then z −1 = z/(zz). 6. Similarly, If d ≥ 1 is a square-free positive integer, the set of complex numbers √ √ Q( −d) = {a + ib d ∈ C | a, b ∈ Q} √ √ √ is a field. If z = a + ib d ∈ Q( −d) and z = a − ib d, then it is easy to check that if z 6= 0, then z −1 = z/(zz). 44 CHAPTER 2. GROUPS, RINGS, AND FIELDS Definition 2.23. A homomorphism h : K1 → K2 between two fields K1 and K2 is just a homomorphism between the rings K1 and K2. However, because K1∗ and K2∗ are groups under multiplication, a homomorphism of fields must be injective. Proof. First, observe that for any x 6= 0, 1 = h(1) = h(xx−1 ) = h(x)h(x−1 ) and 1 = h(1) = h(x−1 x) = h(x−1 )h(x), so h(x) 6= 0 and h(x−1 ) = h(x)−1. But then, if h(x) = 0, we must have x = 0. Consequently, h is injective. Definition 2.24. A field homomorphism h : K1 → K2 is an isomorphism iff there is a homomorphism g : K2 → K1 such that g ◦ f = idK1 and f ◦ g = idK2. An isomorphism from a field to itself is called an automorphism. Then, just as in the case of rings, g is unique and denoted by h−1 , and a bijective field homomorphism h : K1 → K2 is an isomorphism. Definition 2.25. Since every homomorphism h : K1 → K2 between two fields is injective, the image f (K1 ) of K1 is a subfield of K2. We say that K2 is an extension of K1. √ For example, R is an extension of Q and C is an extension of√R. The fields Q( d) and √ Q( −d) are extensions √ of Q, the field R is an extension of Q( d) and the field C is an extension of Q( −d). Definition 2.26. A field K is said to be algebraically closed if every polynomial p(X) with coefficients in K has some root in K; that is, there is some a ∈ K such that p(a) = 0. It can be shown that every field K has some minimal extension Ω which is algebraically closed, called an algebraic closure of K. For example, C is the algebraic closure of R. The algebraic closure of Q is called the field of algebraic numbers. This field consists of all complex numbers that are zeros of a polynomial with coefficients in Q. Definition 2.27. Given a field K and an automorphism h : K → K of K, it is easy to check that the set Fix(h) = {a ∈ K | h(a) = a} of elements of K fixed by h is a subfield of K called the field fixed by h. 2.3. RINGS AND FIELDS 45 √ √ For example, if d ≥ 2 is square-free, then the map c : Q( d) → Q( d) given by √ √ c(a + b d) = a − b d √ is an automorphism of Q( d), and Fix(c) = Q. If K is a field, we have the ring homomorphism h : Z → K given by h(n) = n · 1. If h is injective, then K contains a copy of Z, and since it is a field, it contains a copy of Q. In this case, we say that K has characteristic 0. If h is not injective, then h(Z) is a subring of K, and thus an integral domain, the kernel of h is a subgroup of Z, which by Proposition 2.15 must be of the form pZ for some p ≥ 1. By the first isomorphism theorem, h(Z) is isomorphic to Z/pZ for some p ≥ 1. But then, p must be prime since Z/pZ is an integral domain iff it is a field iff p is prime. The prime p is called the characteristic of K, and we also says that K is of finite characteristic. Definition 2.28. If K is a field, then either (1) n · 1 6= 0 for all integer n ≥ 1, in which case we say that K has characteristic 0, or (2) There is some smallest prime number p such that p · 1 = 0 called the characteristic of K, and we say K is of finite characteristic. A field K of characteristic 0 contains a copy of Q, thus is infinite. As we will see in Section 8.10, a finite field has nonzero characteristic p. However, there are infinite fields of nonzero characteristic. 46 CHAPTER 2. GROUPS, RINGS, AND FIELDS Part I Linear Algebra 47 Chapter 3 Vector Spaces, Bases, Linear Maps 3.1 Motivations: Linear Combinations, Linear Inde- pendence and Rank In linear optimization problems, we often encounter systems of linear equations. For example, consider the problem of solving the following system of three linear equations in the three variables x1 , x2 , x3 ∈ R: x1 + 2x2 − x3 = 1 2x1 + x2 + x3 = 2 x1 − 2x2 − 2x3 = 3. One way to approach this problem is introduce the “vectors” u, v, w, and b, given by 1 2 −1 1 u= 2 v= 1 w= 1 b = 2