Full Transcript

MACHINE LEARNING TOOLS AND TECHNIQUES MOHSEN GHODRAT Introduction Mohsen Ghodrat Teaching Experience Work Experience Senior Data Scientist @Servus Credit Union Senior Machine Learning Engineer @Silvacom Machine Learning Engineer @Serious Labs Education Postdoctoral Researcher at Universi...

MACHINE LEARNING TOOLS AND TECHNIQUES MOHSEN GHODRAT Introduction Mohsen Ghodrat Teaching Experience Work Experience Senior Data Scientist @Servus Credit Union Senior Machine Learning Engineer @Silvacom Machine Learning Engineer @Serious Labs Education Postdoctoral Researcher at University of Alberta PhD, Electrical and Computer Engineering, University of Alberta MSc, BSc Mechanical Engineering, Shiraz University Link Introduction Course Description Students explore the world and models of machine learning and how to use best practices with data to help the learning algorithm find patterns to map the target attributes. Students consider different patterns in outputs to discover if the machine learning model can predict new data sets of potential new targets. Student Performance Assessment Assignment Description % Class participation + Students to contribute to group discussions, actively participate in the 10 activities class, and IBM Badge. Assignment #1 Students will receive a dataset to make data modeling using 10 classification or clustering algorithms Assignment #2 Students will receive a dataset to make data modeling using artificial 10 neural network Assignment #3 Students will apply and evaluate the concepts related model evaluation, 10 early model stop, and model simulation. Quizzes 3 Quizzes x 5% 15 Mid-Term Exam Coverage: Material Weeks 1-7 25 Project + Presentation Instructor will assign a case that students should analyze, develop AI 20 solution, and create a comprehensive Business Report Course Syllabus Textbook: Mathematics for Machine Learning, Marc Machine Learning Fundamentals Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong · Cambridge University Press, Supervised Machine Learning 2020 Unsupervised Machine Learning Neural Networks Deep Learning Convolutional Neural Networks Performance evaluation of Machine Learning Models Cloud Computing download Academic Writing ▪ Please review this link for examples of academic misconduct (for example – Plagiarism) o Similarity score! ▪ Make sure you follow the rules of the Publication Manual of the American Psychological Association (APA) ▪ Example -APA requirement for citation (link): o Any material that is not your own words or thoughts has to be cited o You must provide citation whenever using someone else’s exact words, quotations or phrases; and any images, figures, diagrams or charts even if you are presenting them in your own words o You do not need to provide a citation for your own opinion, interpretation or analysis, or for things considered to be common knowledge, like the fact that water freezes at 0 degrees Celsius. ▪ You can learn more about APA Style from UCW’s Library, which has created a variety of videos, handouts, presentations and other resources to help familiarize students with APA Style guidelines. Google Colab Initial Setup Step 1: Download the attached files from Week 1>Python Materials: Google Colab – Basics (pdf document) data (csv file) Basic – Data Upload (python notebook file) Basic Data Upload (pdf document) Instruction for Initial Setup (pdf document) Step 2: Go to the Google Colab: https://colab.research.google.com/ Sign in using your google account and upload the Basic - Data Upload.ipynb. Note: Normally the upload window will automatically pop up the first time you go to Google Colab site. But if the upload window did not open automatically, go to the "File", and choose "Upload notebook" to see it. Step 3: Go to your Google Drive, there should be a new folder added called "Colab Notebooks" (yellow color): Make sure you can see all three uploaded files from Step 2 in this folder: Important Note: From now on, whenever you want to upload a python file, you can directly upload it in this folder (in your google drive) and NO NEED to go to Google Colab site. Step 4: Create a folder called "BUSI 651" inside your "Colab Notebooks" folder and move the file there. Step 5: Create another folder called "files" inside your "Colab Notebooks" folder to store the data.csv file. In this folder, upload the dataset file that you downloaded from course shell: data.csv. Note 1: Make sure that these files are “csv” format and not “xlsx”. If this is not the case, open them in excel and save them in “csv” format. Note 2: Once you download the dataset, open them in excel to make sure they are NOT read-only. If you see a warning saying the files are read-only, then from the excel save the file using "Save As" (choose a name for saving). This will save the files in way that they are not read-only. Now upload these new files to your "Colab Notebooks" and save them under "files" folder. At the end, the content of "Colab Notebooks" folder should look like: Google Colab for Machine Learning and Deep Learning What is Google Colab? Google Colaboratory is a free online cloud-based Jupyter notebook environment that allows us to train our machine learning and deep learning models on CPUs, GPUs, and TPUs. It does not matter which computer you have, what it’s configuration is, and how ancient it might be. You can still use Google Colab! All you need is a Google account and a web browser. And here’s the cherry on top – you get access to GPUs like Tesla K80 and even a TPU, for free! What is a Notebook in Google Colab? In Google Colab, a notebook is a web-based environment for creating and running code. Notebooks are similar to scripts or code files in other programming environments but offer some unique advantages. Notebooks allow you to write and execute code in a web browser, displaying the output in real time. This makes it easy to iterate on your code and visualize the results as you go. Colab notebooks also support markdown, allowing you to include formatted text, equations, and images alongside your code. You can also add comments and notes to your code, which makes it easier to understand and collaborate with others. Overall, notebooks are a powerful tool for data scientists and machine learning practitioners, providing a flexible and interactive environment for writing and testing code. Google Colab Features 1. Colab provides users free access to GPUs and TPUs, which can significantly speed up the training and inference of machine learning and deep learning models. 2. Colab’s interface is web-based, so installing any software on your local machine is unnecessary. The interface is also intuitive and user-friendly, making it easy to get started with coding. 3. Colab allows multiple users to work on the same notebook simultaneously, making collaborating with team members easy. Colab also integrates with other Google services, such as Google Drive and GitHub, making it easy to share your work. 4. Colab notebooks support markdown, which allows you to include formatted text, equations, and images alongside your code. This makes it easier to document your work and communicate your ideas. 5. Colab comes pre-installed with many popular libraries and tools for machine learning and deep learning, such as TensorFlow and PyTorch. This saves time and eliminates the need to manually install and configure these tools. GPUs and TPUs on Google Colab Ask anyone who uses Colab why they love it. The answer is unanimous – the availability of free GPUs and TPUs. Training models, especially deep learning ones, takes numerous hours on a CPU. We’ve all faced this issue on our local machines. GPUs and TPUs, on the other hand, can train these models in a matter of minutes or seconds. If you still need a reason to work with GPUs, check out this post: Why are GPUs necessary for training Deep Learning models? It gives you a decent GPU for free, which you can continuously run for 12 hours. For most data science folks, this is sufficient to meet their computation needs. Especially if you are a beginner, then I would highly recommend you start using Google Colab. Google Colab gives us three types of runtime for our notebooks: CPUs, GPUs, and TPUs As mentioned, Colab gives us 12 hours of continuous execution time. After that, the whole virtual machine is cleared and we have to start again. We can run multiple CPU, GPU, and TPU instances simultaneously, but our resources are shared between these instances. Let’s take a look at the specifications of different runtimes offered by Google Colab: It will cost you A LOT to buy a GPU or TPU from the market. Why not save that money and use Google Colab from the comfort of your own machine? How to Use Google Colab? You can go to Google Colab using this link: https://colab.research.google.com/ This is the screen you’ll get when you open Colab (and Sign in): Click on the NEW NOTEBOOK button to create a new Colab notebook. Upload your local notebook to Colab by clicking the upload button: You can also import your notebook from Google Drive or GitHub, but they require an authentication process. You can rename your notebook by clicking on the notebook name and change it to anything you want. I usually name them according to the project I’m working on. Google Colab Runtimes – Choosing the GPU or TPU Option The ability to choose different types of runtimes is what makes Colab so popular and powerful. Here are the steps to change the runtime of your notebook: Step 1: Click ‘Runtime’ on the top menu and select ‘Change Runtime Type’: Step 2: Here you can change the runtime according to your need: A wise man once said, “With great power comes great responsibility.” I implore you to shut down your notebook after you have completed your work so that others can use these resources because various users share them. You can terminate your notebook like this: Using Terminal Commands on Google Colab You can use the Colab cell for running terminal commands. Most of the popular libraries come installed by default on Google Colab. Yes, Python libraries like Pandas, NumPy, scikit-learn are all pre-installed. If you want to run a different Python library, you can always install it inside your Colab notebook like this: !pip install library_name Pretty easy, right? Everything is similar to how it works in a regular terminal. We just you have to put an exclamation(!) before writing each command like: !ls or: !pwd Uploading Files and Datasets Here’s a must-know aspect for any data scientist. The ability to import your dataset into Colab is the first step in your data analysis journey. The most basic approach is to upload your dataset to Colab directly: You can use this approach if your dataset or file is very small because the upload speed in this method is quite low. Another approach that I recommend is to upload your dataset to Google Drive and mount your drive on Colab. You can do this in just one click of your mouse: You can also upload your dataset to any other platform and access it using its link. I tend to go with the second approach more often than not (when feasible). Saving Your Notebook All the notebooks on Colab are stored on your Google Drive. The best thing about Colab is that your notebook is automatically saved after a certain time period and you don’t lose your progress. If you want, you can export and save your notebook in both *.py and *.ipynb formats: Sharing Your Notebook Google Colab also gives us an easy way of sharing our work with others. This is one of the best things about Colab: Just click the Share button, and it gives us the option of creating a shareable link that we can share through any platform. You can also invite others using their email IDs. It’s exactly the same as sharing a Google Doc or Google Sheet. The intricacies and simplicity of Google’s ecosystem are astounding! Reference: https://www.analyticsvidhya.com/blog/2020/03/google-colab-machine-learning-deep-learning/ Basic Data Upload How to Deal With Files in Google Colab: Everything You Need to Know What is Google Colaboratory (Google Colab): Google Colaboratory is a free Jupyter notebook environment that runs on Google’s cloud servers, letting the user leverage backend hardware like GPUs and TPUs. This lets you do everything you can in a Jupyter notebook hosted in your local machine, without requiring the installations and setup for hosting a notebook in your local machine. Colab comes with (almost) all the setup you need to start coding, but what it doesn’t have out of the box is your datasets! How do you access your data from within Colab? Method 1: Using Google Drive Accessing Google Drive from Google Colab You can use the drive module from https://colab.research.google.com/ to mount your entire Google Drive to Colab by: 1. Executing the below code which will provide you with an authentication link 2. It redirects you to do authentication and verify to connect your google drive to colab. 3. Finally click on the “refresh” option visible on top of the directory to see the mounted drive "gdrive". Once you see the "gdrive" in the directory list, then the mount is complete. Now you can interact with your Google Drive as if it was a folder in your Colab environment. Any changes to this folder will reflect directly in your Google Drive. You can read the files in your Google Drive as any other file. How to copy the path of the files From the file-explorer pane (on the left), find your desired file (data.csv), copy the file path by clicking on the three dots visible when you hover over the file name. Important Note: The path below belongs to where data.csv stored in my google drive, yours would be different. Make sure to replace the following path with yours.

Use Quizgecko on...
Browser
Browser