Mastering Python PDF - 2nd Edition

Mastering Python Second Edition Write powerful and efficient code using the full range of Python’s capabilities Rick van Hattem BIRMINGHAM—MUMBAI “Python” and the Python Logo are trademarks of the Python Software Foundation. Mastering Python Second Edition Copyright © 2022 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Senior Publishing Product Manager: Dr. Shailesh Jain Contracting Acquisition Editor: Ben Renow-Clarke Acquisition Editor – Peer Reviews: Suresh Jain Project Editor: Janice Gonsalves Content Development Editor: Lucy Wan, Joanne Lovell Copy Editor: Safis Editing Technical Editor: Aditya Sawant Proofreader: Safis Editing Indexer: Tejal Daruwale Soni Presentation Designer: Ganesh Bhadwalkar First published: April 2016 Second edition: May 2022 Production reference: 1120522 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80020-772-1 www.packt.com Contributors About the author Rick van Hattem is an entrepreneur who has founded and sold several successful start-ups. His expertise is in designing and scaling system architectures to many millions of users and/or large amounts of data that need to be accessed in real time. He’s been programming for well over 25 years and has over 20 years of experience with Python. Rick has done consulting for many companies including (Y-Combinator) start-ups, banks, and airports. One of the start-ups he founded, Fashiolista.com, was one of the largest social networks for fashion in the world, featuring millions of users. He also wrote Mastering Python, First Edition, and he was one of the technical reviewers for PostgreSQL Server Programming, Second Edition. For my wife, who is always there for me. For my sister, who always goes above and beyond to help. For my mother, who raised me to be inquisitive. And for my sweet children, who pique my curiosity and allow me to learn every day. About the reviewer Alexander Afanasyev is a software engineer with about 15 years of diverse experience in a variety of different domains and roles. Currently, Alexander is an independent contractor pursuing ideas in the space of computer vision, NLP, and building advanced data collections systems in the cyber and physical threat intelligence domains. Outside of daily work, he is an active contributor to Stack Over- flow and GitHub. Previously, Alexander helped review the Selenium Testing Cookbook and Advanced Natural Language Processing with Transformers books by Packt Publishing. I would like to thank the author of the book for the incredibly hard work and comprehensive content; the wonderful team of editors and coordinators with excellent communication skills; and my family, who was and always are supportive of my ideas and my work. Join our community on Discord Join our community’s Discord space for discussions with the author and other readers: https://discord.gg/QMzJenHuJf Table of Contents Preface xxiii Chapter 1: Getting Started – One Environment per Project 1 Virtual environments 1 Why virtual environments are a good idea 1 Using venv and virtualenv 2 Creating a venv 3 Activating a venv/virtualenv 4 Installing packages 5 Using pyenv 6 Using Anaconda 8 Getting started with Anaconda Navigator 8 Getting started with conda 9 Managing dependencies 13 Using pip and a requirements.txt file 13 Version specifiers 14 Installing through source control repositories 15 Additional dependencies using extras 16 Conditional dependencies using environment markers 16 Automatic project management using poetry 17 Creating a new poetry project 17 Adding dependencies 18 Upgrading dependencies 19 Running commands 20 Automatic dependency tracking using pipenv 20 Updating your packages 23 Deploying to production 23 Running cron commands 24 Exercises 24 Reading the Python Enhancement Proposals (PEPs) 24 Combining pyenv and poetry or pipenv 25 viii Table of Contents Converting an existing project to a poetry project 25 Summary 25 Chapter 2: Interactive Python Interpreters 27 The Python interpreter 27 Modifying the interpreter 28 Enabling and enhancing autocompletion 29 Alternative interpreters 32 bpython 33 Rewinding your session 34 Reloading modules 35 ptpython 35 IPython and Jupyter 36 Basic interpreter usage 37 Saving and loading sessions 38 Regular Python prompt/doctest mode 39 Introspection and help 40 Autocompletion 41 Jupyter 42 Installing Jupyter 44 IPython summary 46 Exercises 47 Summary 47 Chapter 3: Pythonic Syntax and Common Pitfalls 49 A brief history of Python 49 Code style – What is Pythonic code? 51 Whitespace instead of braces 51 Formatting strings – printf, str.format, or f-strings? 51 Simple formatting 52 Named variables 53 Arbitrary expressions 53 PEP 20, the Zen of Python 54 Beautiful is better than ugly 55 Explicit is better than implicit 56 Simple is better than complex 57 Flat is better than nested 59 Table of Contents ix Sparse is better than dense 60 Readability counts 60 Practicality beats purity 61 Errors should never pass silently 62 In the face of ambiguity, refuse the temptation to guess 64 One obvious way to do it 64 Hard to explain, easy to explain 65 Namespaces are one honking great idea 65 Explaining PEP 8 66 Duck typing 67 Differences between value and identity comparisons 68 Loops 70 Maximum line length 71 Verifying code quality, pep8, pyflakes, and more 72 Recent additions to the Python syntax 76 PEP 572: Assignment expressions/the walrus operator 76 PEP 634: Structural pattern matching, the switch statement 77 Common pitfalls 84 Scope matters! 84 Global variables 84 Mutable function default arguments 86 Class properties 87 Overwriting and/or creating extra built-ins 88 Modifying while iterating 90 Catching and storing exceptions 91 Late binding and closures 92 Circular imports 93 Import collisions 95 Summary 95 Chapter 4: Pythonic Design Patterns 97 Time complexity – The big O notation 98 Core collections 101 list – A mutable list of items 101 dict – A map of items 104 set – Like a dict without values 107 tuple – The immutable list 108 x Table of Contents Pythonic patterns using advanced collections 111 Smart data storage with type hinting using dataclasses 111 Combining multiple scopes with ChainMap 114 Default dictionary values using defaultdict 116 enum – A group of constants 119 Sorting collections using heapq 121 Searching through sorted collections using bisect 122 Global instances using Borg or Singleton patterns 126 No need for getters and setters with properties 127 Dict union operators 128 Exercises 129 Summary 129 Chapter 5: Functional Programming – Readability Versus Brevity 131 Functional programming 131 Purely functional 132 Functional programming and Python 132 Advantages of functional programming 133 list, set, and dict comprehensions 133 Basic list comprehensions 134 set comprehensions 135 dict comprehensions 135 Comprehension pitfalls 136 lambda functions 138 The Y combinator 139 functools 141 partial – Prefill function arguments 141 reduce – Combining pairs into a single result 143 Implementing a factorial function 143 Processing trees 144 Reducing in the other direction 146 itertools 147 accumulate – reduce with intermediate results 147 chain – Combining multiple results 147 compress – Selecting items using a list of Booleans 148 dropwhile/takewhile – Selecting items using a function 149 count – Infinite range with decimal steps 149 Table of Contents xi groupby – Grouping your sorted iterable 150 Exercises 151 Summary 151 Chapter 6: Decorators – Enabling Code Reuse by Decorating 153 Decorating functions 154 Generic function decorators 155 The importance of functools.wraps 158 Chaining or nesting decorators 159 Registering functions using decorators 160 Memoization using decorators 161 Decorators with (optional) arguments 165 Creating decorators using classes 166 Decorating class functions 167 Skipping the instance – classmethod and staticmethod 168 Properties – Smart descriptor usage 172 Decorating classes 176 Singletons – Classes with a single instance 177 Total ordering – Making classes sortable 178 Useful decorators 180 Single dispatch – Polymorphism in Python 181 contextmanager — with statements made easy 184 Validation, type checks, and conversions 186 Useless warnings – How to ignore them safely 187 Exercises 189 Summary 189 Chapter 7: Generators and Coroutines – Infinity, One Step at a Time 191 Generators 191 Creating generators 192 Creating infinite generators 195 Generators wrapping iterables 195 Generator comprehensions 196 Class-based generators and iterators 197 Generator examples 200 Breaking an iterable up into chunks/groups 200 itertools.islice – Slicing iterables 202 xii Table of Contents itertools.chain – Concatenating multiple iterables 204 itertools.tee – Using an output multiple times 204 contextlib.contextmanager – Creating context managers 205 Coroutines 207 A basic example 208 Priming 208 Closing and throwing exceptions 209 Mixing generators and coroutines 211 Using the state 214 Exercises 217 Summary 217 Chapter 8: Metaclasses – Making Classes (Not Instances) Smarter 219 Dynamically creating classes 219 A basic metaclass 221 Arguments to metaclasses 222 Accessing metaclass attributes through classes 224 Abstract classes using collections.abc 225 Internal workings of the abstract classes 225 Custom type checks 229 Automatically registering plugin systems 230 Importing plugins on-demand 233 Importing plugins through configuration 234 Importing plugins through the filesystem 235 Dataclasses 236 Order of operations when instantiating classes 240 Finding the metaclass 240 Preparing the namespace 240 Executing the class body 240 Creating the class object (not instance) 241 Executing the class decorators 241 Creating the class instance 241 Example 241 Storing class attributes in definition order 243 The classic solution without metaclasses 243 Using metaclasses to get a sorted namespace 245 Exercises 246 Table of Contents xiii Summary 246 Chapter 9: Documentation – How to Use Sphinx and reStructuredText 249 Type hinting 250 Basic example 250 Custom types 251 Generics 253 Type checking 253 Python type interface files 254 Type hinting conclusion 255 reStructuredText and Markdown 255 Getting started with reStructuredText 257 Getting started with Markdown 258 Inline markup 258 Headers 259 Headers with reStructuredText 260 Headers with Markdown 262 Lists 263 Enumerated lists 263 Bulleted lists 264 Option lists 265 Definition lists (reST only) 266 Nested lists 266 Links, references, and labels 267 Images 270 Images with reStructuredText 270 Images with Markdown 272 Substitutions 272 Blocks, code, math, comments, and quotes 273 Conclusion 275 The Sphinx documentation generator 276 Getting started with Sphinx 276 Using sphinx-quickstart 277 Using sphinx-apidoc 280 Sphinx directives 287 Sphinx roles 288 Documenting code 289 xiv Table of Contents Documenting a class with the Sphinx style 290 Documenting a class with the Google style 292 Documenting a class with the NumPy style 293 Which style to choose 294 Exercises 294 Summary 295 Chapter 10: Testing and Logging – Preparing for Bugs 297 Using documentation as tests with doctest 298 A simple doctest example 298 Writing doctests 301 Testing with documentation 302 The doctest flags 305 True and False versus 1 and 0 306 Normalizing whitespace 307 Ellipsis 308 Doctest quirks 309 Testing dictionaries 309 Testing floating-point numbers 311 Times and durations 311 Testing with py.test 312 The difference between the unittest and py.test output 312 The difference between unittest and py.test tests 317 Simplifying assertions 317 Parameterizing tests 320 Automatic arguments using fixtures 322 Print statements and logging 325 Plugins 327 Mock objects 333 Using unittest.mock 334 Using py.test monkeypatch 335 Testing multiple environments with tox 336 Getting started with tox 336 The tox.ini config file 337 Running tox 339 Logging 340 Configuration 341 Table of Contents xv Basic logging configuration 341 Dictionary configuration 343 JSON configuration 344 ini file configuration 345 The network configuration 346 Logger 348 Usage 349 Formatting 350 Modern formatting using f-strings and str.format 351 Logging pitfalls 353 Debugging loggers 354 Exercises 356 Summary 356 Chapter 11: Debugging – Solving the Bugs 359 Non-interactive debugging 359 Inspecting your script using trace 362 Debugging using logging 366 Showing the call stack without exceptions 368 Handling crashes using faulthandler 369 Interactive debugging 371 Console on demand 371 Debugging using Python debugger (pdb) 372 Breakpoints 373 Catching exceptions 376 Aliases 377 commands 378 Debugging with IPython 379 Debugging with Jupyter 380 Other debuggers 382 Debugging services 383 Exercises 384 Summary 385 Chapter 12: Performance – Tracking and Reducing Your Memory and CPU Usage 387 What is performance? 388 xvi Table of Contents Measuring CPU performance and execution time 389 Timeit – comparing code snippet performance 389 cProfile – Finding the slowest components 394 First profiling run 395 Calibrating your profiler 397 Selective profiling using decorators 400 Using profile statistics 401 Line profiler – Tracking performance per line 404 Improving execution time 406 Using the right algorithm 407 Global interpreter lock 407 try versus if 408 Lists versus generators 409 String concatenation 409 Addition versus generators 410 Map versus generators and list comprehensions 411 Caching 411 Lazy imports 412 Using slots 412 Using optimized libraries 413 Just-in-time compiling 414 Converting parts of your code to C 415 Memory usage 416 tracemalloc 416 Memory Profiler 417 Memory leaks 418 Circular references 420 Analyzing memory usage using the garbage collector 422 Weak references 423 Weakref limitations and pitfalls 424 Reducing memory usage 425 Generators versus lists 428 Recreating collections versus removing items 428 Using slots 428 Performance monitoring 430 Exercises 431 Summary 432 Table of Contents xvii Chapter 13: asyncio – Multithreading without Threads 435 Introduction to asyncio 436 Backward compatibility and async/await statements 436 Python 3.4 436 Python 3.5 437 Python 3.7 437 A basic example of parallel execution 438 asyncio concepts 440 Coroutines, Futures, and Tasks 441 Event loops 441 Executors 445 Asynchronous examples 448 Processes 448 Interactive processes 451 Echo client and server 453 Asynchronous file operations 455 Creating async generators to support async for 456 Asynchronous constructors and destructors 458 Debugging asyncio 460 Forgetting to await a coroutine 461 Slow blocking functions 462 Forgetting to check the results or exiting early 463 Exiting before all tasks are done 464 Exercises 467 Summary 468 Chapter 14: Multiprocessing – When a Single CPU Core Is Not Enough 469 The Global Interpreter Lock (GIL) 470 The use of multiple threads 470 Why do we need the GIL? 470 Why do we still have the GIL? 471 Multiple threads and processes 471 Basic examples 472 concurrent.futures 472 threading 474 multiprocessing 476 xviii Table of Contents Cleanly exiting long-running threads and processes 477 Batch processing using concurrent.futures 480 Batch processing using multiprocessing 482 Sharing data between threads and processes 484 Shared memory between processes 485 Thread safety 492 Deadlocks 495 Thread-local variables 497 Processes, threads, or a single thread? 498 threading versus concurrent.futures 499 multiprocessing versus concurrent.futures 499 Hyper-threading versus physical CPU cores 500 Remote processes 502 Distributed processing using multiprocessing 502 Distributed processing using Dask 505 Installing Dask 505 Basic example 506 Running a single thread 507 Distributed execution across multiple machines 507 Distributed processing using ipyparallel 509 ipython_config.py 509 ipython_kernel_config.py 510 ipcontroller_config.py 510 ipengine_config.py 511 ipcluster_config.py 511 Summary 513 Chapter 15: Scientific Python and Plotting 515 Installing the packages 515 Arrays and matrices 516 NumPy – Fast arrays and matrices 516 Numba – Faster Python on CPU or GPU 519 SciPy – Mathematical algorithms and NumPy utilities 520 Sparse matrices 521 Pandas – Real-world data analysis 522 Input and output options 525 Pivoting and grouping 525 Table of Contents xix Merging 527 Rolling or expanding windows 527 Statsmodels – Statistical models on top of Pandas 528 xarray – Labeled arrays and datasets 530 STUMPY – Finding patterns in time series 532 Mathematics and precise calculations 533 gmpy2 – Fast and precise calculations 534 Sage – An alternative to Mathematica/Maple/MATLAB 534 mpmath – Convenient, precise calculations 535 SymPy – Symbolic mathematics 536 Patsy – Describing statistical models 537 Plotting, graphing, and charting 538 Matplotlib 538 Seaborn 541 Yellowbrick 543 Plotly 545 Bokeh 547 Datashader 552 Exercises 554 Summary 554 Chapter 16: Artificial Intelligence 557 Introduction to artificial intelligence 558 Types of AI 558 Installing the packages 559 Image processing 559 scikit-image 559 Installing scikit-image 560 Edge detection 560 Face detection 561 scikit-image overview 564 OpenCV 564 Installing OpenCV for Python 564 Edge detection 565 Object detection 567 OpenCV versus scikit-image 570 Natural language processing 570 xx Table of Contents NLTK – Natural Language Toolkit 571 spaCy – Natural language processing with Cython 572 Gensim – Topic modeling for humans 573 Machine learning 573 Types of machine learning 573 Supervised learning 574 Reinforcement learning 574 Unsupervised learning 574 Combinations of learning methods 575 Deep learning 575 Artificial neural networks and deep learning 575 Tensors 576 PyTorch – Fast (deep) neural networks 576 PyTorch Lightning and PyTorch Ignite – High-level PyTorch APIs 580 Skorch – Mixing PyTorch and scikit-learn 580 TensorFlow/Keras – Fast (deep) neural networks 581 TensorFlow versus PyTorch 584 Evolutionary algorithms 584 Support-vector machines 588 Bayesian networks 589 Versatile AI libraries and utilities 589 scikit-learn – Machine learning in Python 589 Supervised learning 590 Unsupervised learning 592 auto-sklearn – Automatic scikit-learn 593 mlxtend – Machine learning extensions 593 scikit-lego – scikit-learn utilities 594 XGBoost – eXtreme Gradient Boosting 595 Featuretools – Feature detection and prediction 595 Snorkel – Improving your ML data automatically 595 TPOT – Optimizing ML models using genetic programming 595 Exercises 596 Summary 597 Chapter 17: Extensions in C/C++, System Calls, and C/C++ Libraries 599 Setting up tooling 599 Do you need C/C++ modules? 600 Table of Contents xxi Windows 600 OS X 600 Linux/Unix 601 Calling C/C++ with ctypes 602 Platform-specific libraries 602 Windows 602 Linux/Unix 602 OS X 603 Making it easy 603 Calling functions and native types 603 Complex data structures 606 Arrays 607 Gotchas with memory management 608 CFFI 609 Complex data structures 611 Arrays 612 ABI or API? 613 CFFI or ctypes? 615 Native C/C++ extensions 615 A basic example 615 C is not Python – Size matters 620 The example explained 621 static 622 PyObject* 622 Parsing arguments 622 C is not Python – Errors are silent or lethal 624 Calling Python from C – Handling complex types 625 Exercises 628 Summary 628 Chapter 18: Packaging – Creating Your Own Libraries or Applications 631 Introduction to packages 631 Types of packages 632 Wheels – The new eggs 632 Source packages 633 Package tools 634 Package versioning 634 xxii Table of Contents Building packages 635 Packaging using pyproject.toml 635 Creating a basic package 637 Installing packages for development 638 Adding code and data 638 Adding executable commands 639 Managing dependencies 639 Building the package 641 Building C/C++ extensions 641 Packaging using setuptools with setup.py or setup.cfg 643 Creating a basic package 644 Installing the package for development 645 Adding packages 645 Adding package data 646 Managing dependencies 648 Adding executable commands 649 Building the package 650 Publishing packages 650 Adding URLs 650 PyPI trove classifiers 651 Uploading to PyPI 651 C/C++ extensions 652 Regular C/C++ extensions 653 Cython extensions 654 Testing 655 unittest 655 py.test 656 Exercises 657 Summary 657 Other Books You May Enjoy 661 Index 665 Preface Python is a language that is easy to learn and anyone can get started with a “Hello, World!” script within minutes. Mastering Python, however, is a completely different question. Every programming problem has multiple possible solutions and choosing the Pythonic (idiomatic Python) solution is not always obvious; it can also change with time. This book will not only illustrate a range of different and new techniques but also explain where and when a method should be applied. To quote The Zen of Python by Tim Peters: “There should be one—and preferably only one—obvious way to do it. Although that way may not be obvious at first unless you’re Dutch.” Even though it does not always help, the author of this book is actually Dutch. This book is not a beginner’s guide to Python. It is a book that can teach you about the more advanced techniques possible within Python, such as asyncio. It even includes Python 3.10 features, such as structural pattern matching (Python’s switch statement), in great detail. As a Python programmer with many years of experience, I will attempt to rationalize the choices made in this book with relevant background information. These rationalizations are in no way strict guide- lines, however, as several of these cases boil down to personal style in the end. Just know that they stem from experience and are, in many cases, the solutions recommended by the Python community. Some of the references in this book might not be obvious to you if you are not a fan of Monty Python. This book regularly uses spam and eggs instead of foo and bar in code samples because the Python pro- gramming language was named after Monty Python. To provide some background information about spam and eggs, I would recommend you watch the Spam sketch from Monty Python. It is positively silly. Who this book is for This book is meant for programmers who are already experienced in Python and want to learn more about the advanced features that Python offers. With the depth of this book, I can guarantee that almost anyone can learn something new here if they wish. xxiv Preface If you only know the basics of Python, however, don’t worry. The book starts off relatively slow and builds to the more advanced subjects, so you should be fine. What this book covers Chapter 1, Getting Started – One Environment per Project, demonstrates several options for managing Python versions, virtual environments, and package dependencies. Chapter 2, Interactive Python Interpreters, explores Python interpreter options. Python’s default in- terpreter is perfectly functional, but better alternatives are available. With a few modifications or a replacement, you can get auto-completion, syntax highlighting, and graphical output. Chapter 3, Pythonic Syntax and Common Pitfalls, discusses Pythonic coding, which is the art of writing beautiful and readable Python code. This chapter is not the holy grail, but it is filled with tips and best practices to achieve something along those lines. Chapter 4, Pythonic Design Patterns, continues on the theme of Chapter 3. Writing Pythonic code is not just about code style, but also about using the right design patterns and data structures. This chapter tells you about the data structures available and their performance characteristics. Chapter 5, Functional Programming – Readability Versus Brevity, covers functional programming. Func- tional programming is considered a bit of a black art by some, but when applied correctly it can be a really powerful tool that makes code reuse trivial. It is probably as close to the underlying mathematics as you can get within programming. Chapter 6, Decorators – Enabling Code Reuse by Decorating, discusses decorators, an amazing tool for reusing a method. With decorators, you can wrap functions and classes with some other function to modify their parameters and return values – an extremely useful tool. Chapter 7, Generators and Coroutines – Infinity, One Step at a Time, discusses generators. Lists and tuples are fantastic if you already know that you are going to use every element, but the faster alternative is to only calculate the elements you actually need. That is what a generator does for you: generate items on demand. Chapter 8, Metaclasses – Making Classes (not Instances) Smarter, explores metaclasses, the classes that make other classes. It is a magic you rarely need, but it does have practical uses cases such as plugin systems. Chapter 9, Documentation – How to Use Sphinx and reStructuredText, gives you some documentation-re- lated tips. Writing documentation might not be the favorite activity for most programmers, but it is useful. This chapter shows you how to make that easier by using Sphinx and reStructuredText to generate large portions automatically. Chapter 10, Testing and Logging – Preparing for Bugs, covers how to implement tests and logging to prevent and detect bugs. Bugs are inevitable and by using logging, we can trace the cause. Often, bugs can be prevented by using tests. Chapter 11, Debugging – Solving the Bugs, builds on Chapter 10. The previous chapter helped us find the bugs; now we need to solve them. Debuggers can be a huge help when hunting down difficult bugs, Preface xxv and this chapter shows you several debugging options. Chapter 12, Performance – Tracking and Reducing Your Memory and CPU Usage, discusses the performance of your code. A common problem programmers have is trying to optimize code that does not need it, a fun but generally futile exercise. This chapter helps you find the code that needs to be optimized. Chapter 13, asyncio – Multithreading without Threads, covers asyncio. Waiting for external resources such as network resources is the most common bottleneck for applications. With asyncio, we can stop waiting for those bottlenecks and switch to another task instead. Chapter 14, Multiprocessing – When a Single CPU Core Is Not Enough, discusses performance from a dif- ferent perspective. With multiprocessing, we can use multiple processors (even remotely) in parallel. When your processor is the bottleneck, this can help a lot. Chapter 15, Scientific Python and Plotting, covers the most important libraries for scientific computing. Python has become the language of choice for scientific purposes. Chapter 16, Artificial Intelligence, shows many AI algorithms and the libraries available for implement- ing them. In addition to being the language of choice for scientific purposes, most AI libraries are currently being built using Python as well. Chapter 17, Extensions in C/C++, System Calls, and C/C++ Libraries, shows you how to use existing C/C++ libraries from Python, which not only allows reuse but can also speed up execution greatly. Python is a wonderful language, but it is often not the fastest solution. Chapter 18, Packaging – Creating Your Own Libraries or Applications, will help you package your code into a fully functioning Python package that others can use. After building your wonderful new library, you might want to share it with the world. To get the most out of this book Depending on your level of experience you should start reading from the beginning, or gloss over the chapters to skip to sections that are interesting for you. This book is suitable for intermediate to expert level Python programmers, but not all sections will be equally useful for everyone. As an example, the first two chapters are about setting up your environment and Python interpreter and seem like chapters you can skip entirely as an advanced or expert Python programmer, but I would advise against fully skipping them, as a few useful utilities and libraries are covered which you might not be familiar with. The chapters of this book do build on each other to a certain degree, but there is no strict reading order and you can easily cherry-pick the parts you wish to read. If there is a reference to an earlier chapter, it is clearly indicated. The most up-to-date version of the code samples can always be found at https://github.com/ mastering-python/code_2. The code in this repository is automatically tested and, if you have any suggestions, pull requests are always welcome. xxvi Preface Most chapters of this book also include exercises at the end that will allow you to test what you have learned. Since there are always multiple solutions to problems, you, and every other reader of this book, can submit and compare your solutions on GitHub: https://github.com/mastering-python/ exercises You are encouraged to create a pull request with your solution to the problems. And you can learn from others here as well, of course. Download the example code files The code bundle for the book is hosted on GitHub at https://github.com/mastering-python/code_2 and pull requests with improvements are welcome. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out! Download the color images We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800207721_ColorImages.pdf. Conventions used There are a number of text conventions used throughout this book. While this book largely adheres to the PEP8 styling conventions, there are a few concessions made due to the space limitations of a book format. Simply put, code samples that span multiple pages are hard to read, so some parts use less whitespace than you would usually expect. The full version of the code is available on GitHub and is automatically tested to be PEP8-compliant. CodeInText: Indicates code words in text, database table names, folder names, filenames, file ex- tensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “The itertools. chain() generator is one of the simplest yet one of the most useful generators in the Python library.” A block of code is set as follows: from. import base class A(base.Plugin): pass When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold: :show-inheritance: :private-members: :special-members: :inherited-members: Any command-line input or output is written as follows: $ pip3 install -U mypy Preface xxvii Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes. For example: “Sometimes interactive interpreters are referred to as REPL. This stands for Read-Eval-Print-Loop.” Warnings or important notes appear like this. Tips and tricks appear like this. Get in touch Feedback from our readers is always welcome. General feedback: Email [email protected], and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at questions@packtpub. com. Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details. Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com. xxviii Preface Share your thoughts Once you’ve read Mastering Python, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback. Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content. 1 Getting Started – One Environment per Project In this chapter, you’ll learn about the different ways of setting up Python environments for your projects and how to use multiple Python versions on a single system outside of what your package manager offers. After the environment is set up, we will continue with the installation of packages using both the Python Package Index (PyPI) and conda-forge, the package index that is coupled with Anaconda. Lastly, we will look at several methods of keeping track of project dependencies. To summarize, the following topics will be covered: Creating environments using venv, pipenv, poetry, pyenv, and anaconda Package installation through pip, poetry, pipenv, and conda Managing dependencies using requirements.txt, poetry, and pipenv Virtual environments The Python ecosystem offers many methods of installing and managing packages. You can simply download and extract code to your project directory, use the package manager from your operating system, or use a tool such as pip to install a package. To make sure your packages don’t collide, it is recommended that you use a virtual environment. A virtual environment is a lightweight Python installation with its own package directories and a Python binary copied (or linked) from the binary used to create the environment. Why virtual environments are a good idea It might seem like a hassle to create a virtual environment for every Python project, but it offers enough advantages to do so. More importantly, there are several reasons why installing packages globally using pip is a really bad idea: 2 Getting Started – One Environment per Project Installing packages globally usually requires elevated privileges (such as sudo, root, or administrator), which is a huge security risk. When executing pip install , the setup.py of that package is executed as the user that executed the pip install command. That means that if the package contains malware, it now has superuser privileges to do whatever it wants. Don’t forget that anyone can upload a package to PyPI (pypi.org) without any vetting. As you will see later in this book, it only takes a couple of minutes for anyone to create and upload a package. Depending on how you installed Python, it can mess with the existing packages that are in- stalled by your package manager. On an Ubuntu Linux system, that means you could break pip or even apt itself because a pip install -U installs and updates both the package and all of the dependencies. It can break your other projects. Many projects try their best to remain backward compatible, but every pip install could pull in new/updated dependencies that could break compatibility with other packages and projects. The Django Web Framework, for example, changes enough between versions that many projects using Django will need several changes after an upgrade to the latest release. So, when you’re upgrading Django on your system to the latest version and have a project that was written for a previous version, your project will most likely be broken. It pollutes your list of packages, making it hard to keep track of your project’s dependencies. In addition to alleviating the issues above, there is a major advantage as well. You can specify the Python version (assuming you have it installed) when creating the virtual environment. This allows you to test and debug your projects in multiple Python versions easily while keeping the exact same package versions beyond that. Using venv and virtualenv You are probably already familiar with virtualenv, a library used to create a virtual environment for your Python installation. What you might not know is the venv command, which has been included with Python since version 3.3 and can be used as a drop-in replacement for virtualenv in most cases. To keep things simple, I recommend creating a directory where you keep all of your environments. Some people opt for an env,.venv, or venv directory within the project, but I advise against that for several reasons: Your project files are important, so you probably want to back them up as often as possible. By keeping the bulky environment with all of the installed packages outside of your backups, your backups become faster and lighter. Your project directory stays portable. You can even keep it on a remote drive or flash drive without having to worry that the virtual environment will only work on a single system. It prevents you from accidentally adding the virtual environment files to your source control system. If you do decide to keep your virtual environment inside your project directory, make sure that you add that directory to your.gitignore file (or similar) for your version control system. And if you want to keep your backups faster and lighter, exclude it from the backups. With correct dependency tracking, the virtual environment should be easy enough to rebuild. Chapter 1 3 Creating a venv Creating a venv is a reasonably simple process, but it varies slightly according to the operating system being used. The following examples use the virtualenv module directly, but for ease I recommend using poetry instead, which is covered later in this chapter. This module will automati- cally create a virtual environment for you when you first use it. Before you make the step up to poetry, however, it is important to understand how virtual environments work. Since Python 3.6, the pyvenv command has been deprecated in favor of python -m venv. In the case of Ubuntu, the python3-venv package has to be installed through apt because the Ubuntu developers have mutilated the default Python installation by not including ensurepip. For Linux/Unix/OS X, using zsh or bash as a shell, it is: $ python3 -m venv envs/your_env $ source envs/your_env/bin/activate (your_env) $ And for Windows cmd.exe (assuming python.exe is in your PATH), it is: C:\Users\wolph>python.exe -m venv envs\your_env C:\Users\wolph>envs\your_env\Scripts\activate.bat (your_env) C:\Users\wolph> PowerShell is also supported and can be used in a similar fashion: PS C:\Users\wolph>python.exe -m venv envs\your_env PS C:\Users\wolph> envs\your_env\Scripts\Activate.ps1 (your_env) PS C:\Users\wolph> The first command creates the environment and the second activates the environment. After activating the environment, commands such as python and pip use the environment-specific versions, so pip install only installs within your virtual environment. A useful side effect of activating the environ- ment is the prefix with the name of your environment, which is (your_env) in this case. Note that we are not using sudo or other methods of elevating privileges. Elevating priv- ileges is both unnecessary and a potential security risk, as explained in the Why virtual environments are a good idea section. 4 Getting Started – One Environment per Project Using virtualenv instead of venv is as simple as replacing the following command: $ python3 -m venv envs/your_env with this one: $ virtualenv envs/your_env An additional advantage of using virtualenv instead of venv, in that case, is that you can specify the Python interpreter: $ virtualenv -p python3.8 envs/your_env Whereas with the venv command, it uses the currently running Python installation, so you need to change it through the following invocation: $ python3.8 -m venv envs/your_env Activating a venv/virtualenv Every time you get back to your project after closing the shell, you need to reactivate the environment. The activation of a virtual environment consists of: Modifying your PATH environment variable to use envs\your_env\Script or envs/your_env/ bin for Windows or Linux/Unix, respectively Modifying your prompt so that instead of $, you see (your_env) $, indicating that you are working in a virtual environment In the case of poetry, you can use the poetry shell command to create a new shell with the activated environment. While you can easily modify those manually, an easier method is to run the activate script that was generated when creating the virtual environment. For Linux/Unix with zsh or bash as the shell, it is: $ source envs/your_env/bin/activate (your_env) $ For Windows using cmd.exe, it is: C:\Users\wolph>envs\your_env\Scripts\activate.bat (your_env) C:\Users\wolph> For Windows using PowerShell, it is: PS C:\Users\wolph> envs\your_env\Scripts\Activate.ps1 (your_env) PS C:\Users\wolph> Chapter 1 5 By default, the PowerShell permissions might be too restrictive to allow this. You can change this policy for the current PowerShell session by executing: Set-ExecutionPolicy Unrestricted -Scope Process If you wish to permanently change it for every PowerShell session for the current user, execute: Set-ExecutionPolicy Unrestricted -Scope CurrentUser Different shells, such as fish and csh, are also supported by using the activate.fish and activate. csh scripts, respectively. When not using an interactive shell (with a cron job, for example), you can still use the environment by using the Python interpreter in the bin or scripts directory for Linux/Unix or Windows, respectively. Instead of running python script.py or /usr/bin/python script.py, you can use: /home/wolph/envs/your_env/bin/python script.py Note that commands installed through pip (and pip itself) can be run in a similar fashion: /home/wolph/envs/your_env/bin/pip Installing packages Installing packages within your virtual environment can be done using pip as normal: $ pip3 install The great advantage comes when looking at the list of installed packages: $ pip3 freeze Because our environment is isolated from the system, we only see the packages and dependencies that we have explicitly installed. Fully isolating the virtual environment from the system Python packages can be a downside in some cases. It takes up more disk space and the package might not be in sync with the C/C++ libraries on the system. The PostgreSQL database server, for example, is often used together with the psycopg2 package. While binaries are available for most platforms and building the package from the source is fairly easy, it can sometimes be more convenient to use the package that is bundled with your system. That way, you are certain that the package is compatible with both the installed Python and PostgreSQL versions. To mix your virtual environment with system packages, you can use the --system-site-packages flag when creating the environment: $ python3 -m venv --system-site-packages envs/your_env 6 Getting Started – One Environment per Project When enabling this flag, the environment will have the system Python environment sys.path appended to your virtual environment’s sys.path, effectively providing the system packages as a fallback when an import from the virtual environment fails. Explicitly installing or updating a package within your virtual environment will effectively hide the system package from within your virtual environment. Uninstalling the package from your virtual environment will make it reappear. As you might suspect, this also affects the results of pip freeze. Luckily, pip freeze can be told to only list the packages local to the virtual environment, which excludes the system packages: $ pip3 freeze --local Later in this chapter, we will discuss pipenv, which transparently handles the creation of the virtual environment for you. Using pyenv The pyenv library makes it really easy to quickly install and switch between multiple Python versions. A common issue with many Linux and Unix systems is that the package managers opt for stability over recency. In most cases, this is definitely an advantage, but if you are running a project that requires the latest and greatest Python version, or a really old version, it requires you to compile and install it manually. The pyenv package makes this process really easy for you but does still require the compiler to be installed. A nice addition to pyenv for testing purposes is the tox library. This library allows you to run your tests on a whole list of Python versions simultaneously. The usage of tox is covered in Chapter 10, Testing and Logging – Preparing for Bugs. To install pyenv, I recommend visiting the pyenv project page, since it depends highly on your operat- ing system and operating system version. For Linux/Unix, you can use the regular pyenv installation manual or the pyenv-installer (https://github.com/pyenv/pyenv-installer) one-liner, if you deem it safe enough: $ curl https://pyenv.run | bash Make sure that you follow the instructions given by the installer. To ensure pyenv works properly, you will need to modify your.zshrc or.bashrc. Chapter 1 7 Windows does not support pyenv natively (outside of Windows Subsystem for Linux) but has a pyenv fork available: https://github.com/pyenv-win/pyenv-win#installation After installing pyenv, you can view the list of supported Python versions using: $ pyenv install --list The list is rather long, but can be shortened with grep on Linux/Unix: $ pyenv install --list | grep 3.10 3.10.0 3.10-dev... Once you’ve found the version you like, you can install it through the install command: $ pyenv install 3.10-dev Cloning https://github.com/python/cpython... Installing Python-3.10-dev... Installed Python-3.10-dev to /home/wolph/.pyenv/versions/3.10-dev The pyenv install command takes an optional --debug parameter, which builds a de- bug version of Python that makes debugging C/C++ extensions possible using a debugger such as gdb. Once the Python version has been built, you can activate it globally, but you can also use the pyenv- virtualenv plugin (https://github.com/pyenv/pyenv-virtualenv) to create a virtualenv for your newly created Python environment: $ pyenv virtualenv 3.10-dev your_pyenv you can see in the preceding example, as opposed to the venv and virtualenv commands, pyenv virtualenv automatically creates the environment in the ~/.pyenv/versions//envs/ directory so you’re not allowed to fully specify your own path. You can change the base path (~/. pyenv/) through the PYENV_ROOT environment variable, however. Activating the environment using the activate script in the environment directory is still possible, but more complicated than it needs to be since there’s an easy shortcut: $ pyenv activate your_pyenv Now that the environment is activated, you can run environment-specific commands, such as pip, and they will only modify your environment. 8 Getting Started – One Environment per Project Using Anaconda Anaconda is a distribution that supports both the Python and R programming languages. It is much more than simply a virtual environment manager, though; it’s a whole different Python distribution with its own virtual environment system and even a completely different package system. In addition to supporting PyPI, it also supports conda-forge, which features a very impressive number of pack- ages focused on scientific computing. For the end user, the most important difference is that packages are installed through the conda com- mand instead of pip. This brings a much more advanced dependency check when installing packages. Whereas pip will simply install a package and all of its dependencies without regard for other installed packages, conda will look at all of the installed packages and make sure it won’t install a version that is not supported by the installed packages. The conda package manager is not alone in smart dependency checking. The pipenv package manager (discussed later in this chapter) does something similar. Getting started with Anaconda Navigator Installing Anaconda is quite easy on all common platforms. For Windows, OS X, and Linux, you can go to the Anaconda site and download the (graphical) installer: https://www.anaconda.com/products/ distribution#Downloads Once it’s installed, the easiest way to continue is by launching Anaconda Navigator, which should look something like this: Figure 1.1: Anaconda Navigator – Home Chapter 1 9 Creating an environment and installing packages is pretty straightforward as well: 1. Click on the Environments button on the left. 2. Click on the Create button below. 3. Enter your name and Python version. 4. Click on Create to create your environment and wait a bit until Anaconda is done: Figure 1.2: Anaconda Navigator – Creating an environment Once Anaconda has finished creating your environment, you should see a list of installed packages. Installing packages can be done by changing the filter of the package list from Installed to All, marking the checkbox near the packages you want to install, and applying the changes. While creating an environment, Anaconda Navigator shows you where the environment will be created. Getting started with conda While Anaconda Navigator is a really nice tool to use to get an overview, being able to run your code from the command line can be convenient too. With the conda command, that is luckily very easy. First, you need to open the conda shell. You can do this from Anaconda Navigator if you wish, but you can also run it straightaway. On Windows, you can open Anaconda Prompt or Anaconda PowerShell Prompt from the start menu. On Linux and OS X, the most convenient method is to initialize the shell integration. For zsh, you can use: $ conda init zsh 10 Getting Started – One Environment per Project For other shells, the process is similar. Note that this process modifies your shell configuration to automatically activate the base environment every time you open a shell. This can be disabled with a simple configuration option: $ conda config --set auto_activate_base false If automatic activation is not enabled, you will need to run the activate command to get back into the conda base environment: $ conda activate (base) $ If, instead of the conda base environment, you wish to activate the environment you created earlier, you need to specify the name: $ conda activate conda_env (conda_env) $ If you have not created the environment yet, you can do so using the command line as well: $ conda create --name conda_env Collecting package metadata (current_repodata.json): done Solving environment: done... Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done... To list the available environments, you can use the conda info command: $ conda info --envs # conda environments # base * /usr/local/anaconda3 conda_env /usr/local/anaconda3/envs/conda_env Installing conda packages Now it’s time to install a package. For conda packages, you can simply use the conda install com- mand. For example, to install the progressbar2 package that I maintain, use: (conda_env) $ conda install progressbar2 Collecting package metadata (current_repodata.json): done Solving environment: done Chapter 1 11 ## Package Plan ## environment location: /usr/local/anaconda3/envs/conda_env added / updated specs: - progressbar2 The following packages will be downloaded:... The following NEW packages will be INSTALLED:... Proceed ([y]/n)? y Downloading and Extracting Packages... Now you can run Python and see that the package has been installed and is working properly: (conda_env) $ python Python 3.8.0 (default, Nov 6 2019, 15:49:01) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import progressbar >>> for _ in progressbar.progressbar(range(5)): pass... 100% (5 of 5) |##############################| Elapsed Time: 0:00:00 Time: 0:00:00 Another way to verify whether the package has been installed is by running the conda list command, which lists the installed packages similarly to pip list: (conda_env) $ conda list # packages in environment at /usr/local/anaconda3/envs/conda_env: # # Name Version Build Channel... Installing PyPI packages With PyPI packages, we have two options within the Anaconda distribution. The most obvious is using pip, but this has the downside of partially circumventing the conda dependency checker. While conda install will take the packages installed through PyPI into consideration, the pip command might upgrade packages undesirably. This behavior can be improved by enabling the conda/pip interoper- ability setting, but this seriously impacts the performance of conda commands: $ conda config --set pip_interop_enabled True 12 Getting Started – One Environment per Project Depending on how important fixed versions or conda performance is for you, you can also opt for converting the package to a conda package: (conda_env) $ conda skeleton pypi progressbar2 Warning, the following versions were found for progressbar2... Use --version to specify a different version.... ## Package Plan ##... The following NEW packages will be INSTALLED:... INFO:conda_build.config:--dirty flag and --keep-old-work not specified. Removing build/test folder after successful build/test. Now that we have a package, we can modify the files if needed, but using the automatically generated files works most of the time. All that is left now is to build and install the package: (conda_env) $ conda build progressbar2... (conda_env) $ conda install --use-local progressbar2 Collecting package metadata (current_repodata.json): done Solving environment: done... And now we are done! The package has been installed through conda instead of pip. Sharing your environment When collaborating with others, it is essential to have environments that are as similar as possible to avoid debugging local issues. With pip, we can simply create a requirements file by using pip freeze, but that will not include the conda packages. With conda, there’s actually an even better solution, which stores not only the dependencies and versions but also the installation channels, environment name, and environment location: (conda_env) $ conda env export –file environment.yml (conda_env) $ cat environment.yml name: conda_env channels: - defaults dependencies:... prefix: /usr/local/anaconda3/envs/conda_env Chapter 1 13 Installing the packages from that environment file can be done while creating the environment: $ conda env create --name conda_env –file environment.yml Or they can be added to an existing environment: (conda_env) $ conda env update --file environment.yml Collecting package metadata (repodata.json): done... Managing dependencies The simplest way of managing dependencies is storing them in a requirements.txt file. In its sim- plest form, this is a list of package names and nothing else. This file can be extended with version requirements and can even support environment-specific installations. A fancier method of installing and managing your dependencies is by using a tool such as poetry or pipenv. Internally, these use the regular pip installation method, but they build a full dependency graph of all the packages. This makes sure that all package versions are compatible with each other and allows the parallel installation of non-dependent packages. Using pip and a requirements.txt file The requirements.txt format allows you to list all of the dependencies of your project as broadly or as specifically as you feel is necessary. You can easily create this file yourself, but you can also tell pip to generate it for you, or even to generate a new file based on a previous requirements.txt file so you can view the changes. I recommend using pip freeze to generate an initial file and cherry-picking the dependencies (versions) you want. For example, assuming that we run pip freeze in our virtual environment from before: (your_env) $ pip3 freeze pkg-resources==0.0.0 If we store that file in a requirements.txt file, install a package, and look at the difference, we get this result: (your_env) $ pip3 freeze > requirements.txt (your_env) $ pip3 install progressbar2 Collecting progressbar2... Installing collected packages: six, python-utils, progressbar2 Successfully installed progressbar2-3.47.0 python-utils-2.3.0 six-1.13.0 (your_env) $ pip3 freeze -r requirements.txt pkg-resources==0.0.0 ## The following requirements were added by pip freeze: 14 Getting Started – One Environment per Project progressbar2==3.47.0 python-utils==2.3.0 six==1.13.0 As you can see, the pip freeze command automatically detected the addition of the six, progressbar2, and python-utils packages, and it immediately pinned those versions to the currently installed ones. The lines in the requirements.txt file are understood by pip on the command line as well, so to install a specific version, you can run: $ pip3 install 'progressbar2==3.47.0' Version specifiers Often, pinning a version as strictly as that is not desirable, however, so let’s change the requirements file to only contain what we actually care about: # We want a progressbar that is at least version 3.47.0 since we've tested that. # But newer versions are ok as well. progressbar2>=3.47.0 If someone else wants to install all of the requirements in this file, they can simply tell pip to include that requirement: (your_env) $ pip3 install -r requirements.txt Requirement already satisfied: progressbar2>=3.47.0 in your_env/lib/python3.9/ site-packages (from -r requirements.txt (line 1)) Requirement already satisfied: python-utils>=2.3.0 in your_env/lib/python3.9/ site-packages (from progressbar2>=3.47.0->-r requirements.txt (line 1)) Requirement already satisfied: six in your_env/lib/python3.9/site-packages (from progressbar2>=3.47.0->-r requirements.txt (line 1)) In this case, pip checks to see whether all packages are installed and will install or update them if needed. -r requirements.txt works recursively, allowing you to include multiple requirements files. Now let’s assume we’ve encountered a bug in the latest version and we wish to skip it. We can assume that only this specific version is affected, so we will only blacklist that version: # Progressbar 2 version 3.47.0 has a silly bug but anything beyond 3.46.0 still works with our code progressbar2>=3.46,!=3.47.0 Chapter 1 15 Lastly, we should talk about wildcards. One of the most common scenarios is needing a specific ma- jor version number but still wanting the latest security update and bug fixes. There are a few ways to specify these: # Basic wildcard: progressbar2 ==3.47.* # Compatible release: progressbar2 ~=3.47.1 # Compatible release above is identical to: progressbar2 >=3.47.1, ==3.47.* With the compatible release pattern (~=), you can select the newest version that is within the same major release but is at least the specified version. The version identification and dependency specification standard is described thoroughly in PEP 440: https://peps.python.org/pep-0440/ Installing through source control repositories Now let’s say that we’re really unlucky and there is no working release of the package yet, but it has been fixed in the develop branch of the Git repository. We can install that either through pip or through a requirements.txt file, like this: (your_env) $ pip3 install --editable 'git+https://github.com/wolph/python- progressbar@develop#egg=progressbar2' Obtaining progressbar2 from git+https://github.com/wolph/python-progressbar@ develop#egg=progressbar2 Updating your_env/src/progressbar2 clone (to develop) Requirement already satisfied: python-utils>=2.3.0 in your_env/lib/python3.9/ site-packages (from progressbar2) Requirement already satisfied: six in your_env/lib/python3.9/site-packages (from progressbar2) Installing collected packages: progressbar2 Found existing installation: progressbar2 3.47.0 Uninstalling progressbar2-3.47.0: Successfully uninstalled progressbar2-3.47.0 Running setup.py develop for progressbar2 Successfully installed progressbar2 You may notice that pip not only installed the package but actually did a git clone to your_env/src/ progressbar2. This is an optional step caused by the --editable (short option: -e) flag, which has the additional advantage that every time you re-run the command, the git clone will be updated. It also makes it rather easy to go to that directory, modify the code, and create a pull request with a fix. 16 Getting Started – One Environment per Project In addition to Git, other source control systems such as Bazaar, Mercurial, and Subversion are also supported. Additional dependencies using extras Many packages offer optional dependencies for specific use cases. In the case of the progressbar2 library, I have added tests and docs extras to install the test or documentation building dependencies needed to run the tests for the package. Extras can be specified using square brackets separated by commas: # Install the documentation and test extras in addition to the progressbar progressbar2[docs,tests] # A popular example is the installation of encryption libraries when using the requests library: requests[security] Conditional dependencies using environment markers If your project needs to run on multiple systems, you will most likely encounter dependencies that are not required on all systems. One example of this is libraries that are required on some operating systems but not on others. An example of this is the portalocker package I maintain; on Linux/Unix systems, the locking mechanisms needed are supported out of the box. On Windows, however, they require the pywin32 package to work. The install_requires part of the package (which uses the same syntax as requirements.txt) contains this line: pywin32!=226; platform_system == "Windows" This specifies that on Windows, the pywin32 package is required, and version 226 was blacklisted due to a bug. In addition to platform_system, there are several more markers, such as python_version and platform_machine (contains architecture x86_64, for example). The full list of markers can be found in PEP 496: https://peps.python.org/pep-0496/. One other useful example of this is the dataclasses library. This library has been included with Python since version 3.7, so we only need to install the backport for older Python versions: dataclasses; python_version < '3.7' Chapter 1 17 Automatic project management using poetry The poetry tool provides a really easy-to-use solution for creating, updating, and sharing your Python projects. It’s also very fast, which makes it a fantastic starting point for a project. Creating a new poetry project Starting a new project is very easy. It will automatically handle virtual environments, dependencies, and other project-related tasks for you. To start, we will use the poetry init wizard: $ poetry init This command will guide you through creating your pyproject.toml config. Package name [t_00_poetry]: Version [0.1.0]: Description []: Author [Rick van Hattem , n to skip]: License []: Compatible Python versions [^3.10]: Would you like to define your main dependencies interactively? (yes/no) [yes] no Would you like to define your development dependencies interact...? (yes/no) [yes] no... Do you confirm generation? (yes/no) [yes] Following these few questions, it automatically creates a pyproject.toml file for us that contains all the data we entered and some automatically generated data. As you may have noticed, it automatically prefilled several values for us: The project name. This is based on the current directory name. The version. This is fixed to 0.1.0. The author field. This looks at your git user information. This can be set using: $ git config --global user.name "Rick van Hattem" $ git config --global user.email "[email protected]" The Python version. This is based on the Python version you are running poetry with, but it can be customized using poetry init --python=... Looking at the generated pyproject.toml, we can see the following: [tool.poetry] name = "t_00_poetry" version = "0.1.0" 18 Getting Started – One Environment per Project description = "" authors = ["Rick van Hattem "] [tool.poetry.dependencies] python = "^3.10" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api" Adding dependencies Once we have the project up and running, we can now add dependencies: $ poetry add progressbar2 Using version ^3.55.0 for progressbar2... Writing lock file... Installing progressbar2 (3.55.0) This automatically installs the package, adds it to the pyproject.toml file, and adds the specific ver- sion to the poetry.lock file. After this command, the pyproject.toml file has a new line added to the tool.poetry.dependencies section: [tool.poetry.dependencies] python = "^3.10" progressbar2 = "^3.55.0" The poetry.lock file is a bit more specific. Whereas the progressbar2 dependency could have a wild- card version, the poetry.lock file stores the exact version, the file hashes, and all the dependencies that were installed: [[package]] name = "progressbar2" version = "3.55.0"... [package.dependencies] python-utils = ">=2.3.0"... [package.extras] docs = ["sphinx (>=1.7.4)"]... Chapter 1 19 [metadata] lock-version = "1.1" python-versions = "^3.10" content-hash = "c4235fba0428ce7877f5a94075e19731e5d45caa73ff2e0345e5dd269332bff0" [metadata.files] progressbar2 = [ {file = "progressbar2-3.55.0-py2.py3-none-any.whl", hash = "sha256:..."}, {file = "progressbar2-3.55.0.tar.gz", hash = "sha256:..."}, ]... By having all this data, we can build or rebuild a virtual environment for a poetry-based project on another system exactly as it was created on the original system. To install, upgrade, and/or downgrade the packages exactly as specified in the poetry.lock file, we need a single command: $ poetry install Installing dependencies from lock file... This is very similar to how the npm and yarn commands work if you are familiar with those. Upgrading dependencies In the previous examples, we simply added a dependency without specifying an explicit version. Of- ten this is a safe approach, as the default version requirement will allow for any version within that major version. If the project uses normal Python versioning or semantic versioning (more about that in Chapter 18, Packaging - Creating Your Own Libraries or Applications), that should be perfect. At the very least, all of my projects (such as progressbar2) are generally both backward and largely forward compatible, so simply fixing the major version is enough. In this case, poetry defaulted to version ^3.55.0, which means that any version newer than or equal to 3.55.0, up to (but not including) 4.0.0, is valid. Due to the poetry.lock file, a poetry install will result in those exact versions being installed in- stead of the new versions, however. So how can we upgrade the dependencies? For this purpose, we will start by installing an older version of the progressbar2 library: $ poetry add 'progressbar2=3.1.0' Now we will relax the version in the pyproject.toml file to ^3.1.0: [tool.poetry.dependencies] progressbar2 = "^3.1.0" 20 Getting Started – One Environment per Project Once we have done this, a poetry install will still keep the 3.1.0 version, but we can make poetry update the dependencies for us: $ poetry update... Updating progressbar2 (3.1.0 -> 3.55.0) Now, poetry has nicely updated the dependencies in our project while still adhering to the require- ments we set in the pyproject.toml file. If you set the version requirements of all packages to *, it will always update everything to the latest available versions that are compatible with each other. Running commands To run a single command using the poetry environment, you can use poetry run: $ poetry run pip For an entire development session, however, I would suggest using the shell command: $ poetry shell After this, you can run all Python commands as normal, but these will now be running from the activated virtual environment. For cron jobs this is similar, but you will need to make sure that you change directories first: 0 3 * * * cd /home/wolph/workspace/poetry_project/ && poetry run python script.py This command runs every day at 03:00 (24-hour clock, so A.M.). Note that cron might not be able to find the poetry command due to having a different environment. In that case, I would recommend using the absolute path to the poetry command, which can be found using which: $ which poetry /usr/local/bin/poetry Automatic dependency tracking using pipenv For large projects, your dependencies can change often, which makes the manual manipulation of the requirements.txt file rather tedious. Additionally, having to create a virtual environment before you can install your packages is also a pretty repetitive task if you work on many projects. The pipenv tool aims to transparently solve these issues for you, while also making sure that all of your dependen- cies are compatible and updated. And as a final bonus, it combines the strict and loose dependency versions so you can make sure your production environment uses the exact same versions you tested. Initial usage is simple; go to your project directory and install a package. Let’s give it a try: Chapter 1 21 $ pipenv install progressbar2 Creating a virtualenv for this project...... Using /usr/local/bin/python3 (3.10.4) to create virtualenv...... ✔ Successfully created virtual environment!... Creating a Pipfile for this project... Installing progressbar2... Adding progressbar2 to Pipfile's [packages]... ✔ Installation Succeeded Pipfile.lock not found, creating...... ✔ Success! Updated Pipfile.lock (996b11)! Installing dependencies from Pipfile.lock (996b11)... 🐍🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0/0 — 00:00:0 That’s quite a bit of output even when abbreviated. But let’s look at what happened: A virtual environment was created. A Pipfile was created, which contains the dependency as you specified it. If you specify a specific version, that will be added to the Pipfile; otherwise, it will be a wildcard requirement, meaning that any version will be accepted as long as there are no conflicts with other packages. A Pipfile.lock was created containing the exact list of packages and versions as installed. This allows an identical install on a different machine with the exact same versions. The generated Pipfile contains the following: [[source]] name = "pypi" url = "https://pypi.org/simple" verify_ssl = true [dev-packages] [packages] progressbar2 = "*" [requires] python_version = "3.10" 22 Getting Started – One Environment per Project And the Pipfile.lock is a bit larger, but immediately shows another advantage of this method: {... "default": { "progressbar2": { "hashes": [ "sha256:14d3165a1781d053...", "sha256:2562ba3e554433f0..." ], "index": "pypi", "version": "==4.0.0" }, "python-utils": { "hashes": [ "sha256:4dace6420c5f50d6...", "sha256:93d9cdc8b8580669..." ], "markers": "python_version >= '3.7'", "version": "==3.1.0" },... }, "develop": {} } As you can see, in addition to the exact package versions, the Pipfile.lock contains the hashes of the packages as well. In this case, the package provides both a.tar.gz (source) and a.whl (wheel) file, which is why there are two hashes. Additionally, the Pipfile.lock contains all packages installed by pipenv, including all dependencies. Using these hashes, you can be certain that during a deployment, you will receive the exact same file and not some corrupt or even malicious file. Because the versions are completely fixed, you can also be certain that anyone deploying your project using the Pipfile.lock will get the exact same package versions. This is very useful when working together with other developers. To install all the necessary packages as specified in the Pipfile (even for the initial install), you can simply run: $ pipenv install Installing dependencies from Pipfile.lock (5c99e1)… 🐍🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 3/3 — 00:00:00 Chapter 1 23 To activate this project's virtualenv, run pipenv shell. Alternatively, run a command inside the virtualenv with pipenv run. Any time you run pipenv install package, the Pipfile will be automatically modified with your changes and checked for incompatible packages. The big downside is that pipenv can become terribly slow for large projects. I have encountered multiple projects where a no-op pip install would take several minutes due to the fetching and checking of the entire dependency graph. In most cases, it’s still worth it, however; the added functionality can save you a lot of headaches. Don’t forget to run your regular Python commands with the pipenv run prefix or from pipenv shell. Updating your packages Because of the dependency graph, you can easily update your packages without having to worry about dependency conflicts. With one command, you’re done: $ pipenv update Should you still encounter issues with the versions because some packages haven’t been checked against each other, you can fix that by specifying the versions of the package you do or do not want: $ pipenv install 'progressbar2!=3.47.0' Installing progressbar2!=3.47.0… Adding progressbar2 to Pipfile's [packages]… ✔ Installation Succeeded Pipfile.lock (c9327e) out of date, updating to (5c99e1)… ✔ Success! Updated Pipfile.lock (c9327e)! Installing dependencies from Pipfile.lock (c9327e)… 🐍🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 3/3 — 00:00:00 By running that command, the packages section of the Pipfile changes to: [packages] progressbar2 = "!=3.47.0" Deploying to production Getting the exact same versions on all of your production servers is absolutely essential to prevent hard-to-trace bugs. For this very purpose, you can tell pipenv to install everything as specified in the Pipenv.lock file while still checking to see whether Pipfile.lock is out of date. With one command, you have a fully functioning production virtual environment with all packages installed. 24 Getting Started – One Environment per Project Let’s create a new directory and see if it all works out: $ mkdir../pipenv_production $ cp Pipfile Pipfile.lock../pipenv_production/ $ cd../pipenv_production/ $ pipenv install --deploy Creating a virtualenv for this project... Pipfile: /home/wolph/workspace/pipenv_production/Pipfile Using /usr/bin/python3 (3.10.4) to create virtualenv...... ✔ Successfully created virtual environment!... Installing dependencies from Pipfile.lock (996b11)... 🐍🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 2/2 — 00:00:01 $ pipenv shell Launching subshell in virtual environment... (pipenv_production) $ pip3 freeze progressbar2==4.0.0 python-utils==3.1.0 All of the versions are exactly as expected and ready for use. Running cron commands To run your Python commands outside of the pipenv shell, you can use the pipenv run prefix. In- stead of python, you would run pipenv run python. In normal usage, this is a lot less practical than activating the pipenv shell, but for non-interactive sessions, such as cron jobs, this is an essential feature. For example, a cron job that runs at 03:00 (24-hour clock, so A.M.) every day would look something like this: 0 3 * * * cd /home/wolph/workspace/pipenv_project/ && pipenv run python script.py Exercises Many of the topics discussed in this chapter already gave full examples, leaving little room for exercises. There are additional resources to discover, however. Reading the Python Enhancement Proposals (PEPs) A good way to learn more about the topics discussed in this chapter (and all the following chapters) is to read the PEP pages. These proposals were written before the changes were accepted into the Python core. Note that not all of the PEPs on the Python site have been accepted, but they will remain on the Python site: Chapter 1 25 PEP 440 – Version Identification and Dependency Specification: https://peps.python.org/ pep-0440/ PEP 496 – Environment Markers: https://peps.python.org/pep-0496/ Combining pyenv and poetry or pipenv Even though the chapter did not cover it, there is nothing stopping you from telling poetry or pipenv to use a pyenv-based Python interpreter. Give it a try! Converting an existing project to a poetry project Part of this exercise should be to either create a brand new pyproject.toml or to convert an existing requirements.txt file to a pyproject.toml. Summary In this chapter, you learned why virtual environments are useful and you discovered several imple- mentations of them and their advantages. We explored how to create virtual environments and how to install multiple different Python versions. Finally, we covered how to manage the dependencies for your Python projects. Since Python is an interpreted language, it is easily possible to run code from the interpreter directly instead of through a Python file. The default Python interpreter already features command history and depending on your install, basic autocompletion. But with alternative interpreters we can have many more features in our interpreter such as syntax highlighting, smart autocompletion which includes documentation, and more. The next chapter will show us several alternative interpreters and their advantages. Join our community on Discord Join our community’s Discord space for discussions with the author and other readers: https://discord.gg/QMzJenHuJf 2 Interactive Python Interpreters Now that we have a working Python installation, we need to run some code. The most obvious way is to create a Python file and execute it. Often, it can be faster to interactively develop code from an interactive Python interpreter, however. While the standard Python interpreter is already quite pow- erful, many enhancements and alternatives are available. The alternative interpreters/shells offer features such as: Smart autocompletion Syntax highlighting Saving and loading sessions Automatic indenting Graphing/charting output In this chapter, you will learn about: Alternative interpreters: bpython ptpython ipython jupyter How to enhance interpreters The Python interpreter The standard Python interpreter is already fairly powerful, but more options are available through customization. First, let’s start with a 'Hello world!'. Because the interpreter uses REPL, all output will be automatically printed and we can simply create a string. 28 Interactive Python Interpreters Sometimes interactive interpreters are referred to as REPL. This stands for Read-Eval- Print-Loop. This effectively means that all of your statements will be executed and printed to your screen immediately. First, we need to start the interpreter; after that, we can type our commands: $ python3 Python 3.9.0 [GCC 7.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> 'Hello world!' 'Hello world!' That was easy enough. And note that we didn’t have to use print('Hello world!') to show the output. Many interpreters have only limited support for Windows. While they all work to some de- gree, your experience will be better with Linux or OS X systems. I recommend trying them from a (virtual) Linux/Unix machine at least once to experience the full range of features. Modifying the interpreter As our first enhancement, we will add a few convenient shortcuts to the scope of the interpreter. Instead of having to type import pprint; pprint.pprint(...) to pretty-print our output, it would be useful to use pp(...) instead without having to run an import statement every time we start our interpreter. To do this, we will create a Python file that will be executed every time we run Python. On Linux and OS X systems, I would recommend ~/.config/python/init.py; on Windows, something like C:\Users\rick\AppData\Local\Python\init.py might be more suitable. Within this file, we can add regular Python code that will be executed. Python won’t find the file automatically; you need to tell Python where to look for the file by using the PYTHONSTARTUP environment variable. On Linux and OS X, you can change the ~/.zshrc, ~/.bashrc file, or whatever your shell has, and add: $ export PYTHONSTARTUP=~/.config/python/init.py This file is automatically executed every time you open a new shell session. So, once you open a new shell session, you are done. If you want to activate this for your current shell, you can also run the export line above in your current shell. On Windows, you need to find the Advanced System Settings and change the environment variables on that screen. Chapter 2 29 Now we can add these lines to the file to make pretty print (pprint/pp) and pretty format (pformat/ pf) available by default: from pprint import pprint as pp from pprint import pformat as pf When we run the Python interpreter, now we will have pp and pf available in our scope: >>> pp(dict(spam=0xA, eggs=0xB)) {'eggs': 11, 'spam': 10} >>> pf(dict(spam=0xA, eggs=0xB)) "{'eggs': 11, 'spam': 10}" With a few of these minor changes, you can make your life a lot easier. You could modify your sys. path to include a directory with custom libraries, for example. And you can also change your prompt using the sys.ps1 and sys.ps2 variables. To illustrate, we’ll look at the interpreter before our changes: # Modifying prompt >>> if True:... print('Hello!') Hello! And now we will modify sys.ps1 and sys.ps2 and run the exact same code again: >>> import sys >>> sys.ps1 = '> ' >>> sys.ps2 = '. ' # With modified prompt > if True:. print('Hello!') Hello! The configuration above shows that you can easily change the interpreter to a slightly different output if you wish. For consistency purposes, however, it might be better to keep it the same. Enabling and enhancing autocompletion One of the most useful additions to the interpreter is the rlcompleter module. This module enables tab-activated autocompletion in your interpreter and is automatically activated if the readline mod- ule is available. The rlcompleter module depends on the availability of the readline module, which is not bundled with Python on Windows systems. Luckily, an alternative can be installed easily: $ pip3 install pyreadline 30 Interactive Python Interpreters It would be very useful to add some extra options to the autocompletion. First, look at the default output: >>> sandwich = dict(spam=2, eggs=1, sausage=1) >>> sandwich. sandwich.clear( sandwich.fromkeys( sandwich.items( sandwich.pop( sandwich.setdefault( sandwich.values( sandwich.copy( sandwich.get( sandwich.keys( sandwich.popitem( sandwich.update( >>> sandwich[ As you can see, the tab completion for "." works perfectly, but the tab completion for "[" does noth- ing. It would be useful to know the available items, so now we will work on adding that feature. It should be noted that this example uses a few techniques that are explained in later chapters, but that shouldn’t matter for now: import __main__ import re import atexit import readline import rlcompleter class Completer(rlcompleter.Completer): ITEM_RE = re.compile(r'(?P.+?)\[(?P[^\[]*)') def complete(self, text, state): # Init namespace. From 'rlcompleter.Completer.complete' if self.use_main_ns: self.namespace = __main__.__dict__ # If we find a [, try and return the keys if '[' in text: # At state 0 we need to prefetch the matches, after # that we use the cached results if state == 0: self.matches = list(self.item_matches(text)) # Try and return the match if it exists try: return self.matches[state] except IndexError: pass else: Chapter 2 31 # Fallback to the normal completion return super().complete(text, state) def item_matches(self, text): # Look for the pattern expression[key match = self.ITEM_RE.match(text) if match: search_key = match.group('key').lstrip() expression = match.group('expression') # Strip quotes from the key if search_key and search_key in {"'", '"'}: search_key = search_key.strip(search_key) # Fetch the object from the namespace object_ = eval(expression, self.namespace) # Duck typing, check if we have a 'keys()' attribute if hasattr(object_, 'keys'): # Fetch the keys by executing the 'keys()' method # Can you guess where the bug is? keys = object_.keys() for i, key in enumerate(keys): # Limit to 25 items for safety, could be infinite if i >= 25: break # Only return matching results if key.startswith(search_key): yield f'{expression}[{key!r}]' # By default readline doesn't call the autocompleter for [ because # it's considered a delimiter. With a little bit of work we can # fix this however :) delims = readline.get_completer_delims() # Remove [, ' and "

Mastering Python PDF - 2nd Edition

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue