Deep Learning Session 2 - transcript-full.pdf

00:15 - 00:44 Speaker 1: Good evening professor. Good evening. 07:06 - 07:25 Dr Anand Jayaraman: Hello everyone, this is Professor Anand Jeyraman joining the session again. We are going to be doing the deep learning and experience module. Today is session 2. I see 07:30 - 07:30 Speaker 1: a whole lot of people joining. 07:30 - 07:35 Dr Anand Jayaraman: Let's see how many participants we have right now. 2 07:43 - 07:44 Speaker 1: minutes. 07:52 - 07:57 Dr Anand Jayaraman: Let's give people another 5 minutes. Then we can get started. 08:00 - 08:14 Speaker 3: Good evening. Professor last week, you mentioned that you shared the syllabus with upgrade and they will send us but they didn't send us actually till now. So just wanted to let you know that. 08:15 - 08:32 Dr Anand Jayaraman: Okay. I will send out an email. I'll remind them to do that. I'm sharing my screen which has the schedule. 08:35 - 08:38 Speaker 4: So even the slides for last week is not received. 08:40 - 08:45 Dr Anand Jayaraman: Okay. So is it, does it come through in email? How is it? 08:45 - 08:47 Speaker 4: Yeah, normally the body will send it across yeah 08:48 - 08:55 Dr Anand Jayaraman: I see I see I will make sure that that happens I'll send a message 08:55 - 08:56 Speaker 4: okay thanks yeah 08:56 - 08:58 Dr Anand Jayaraman: to people and make sure that it happens 09:15 - 09:19 Speaker 3: So, professor there are 11 sessions or there is also 12 after that. 09:22 - 09:50 Dr Anand Jayaraman: So, there are all 12 sessions. I mean, there are 15 sessions altogether. So, but of the 15, 3 of them is going to be hands-on sessions, right? So, there is supposed to be 12 lecture sessions, but actually 11 of them are lecture sessions. The very last session is a end term evaluation model. 09:50 - 10:23 Dr Anand Jayaraman: I will talk a little bit about it in sometime. Okay. When we get started. We give people a minute and then we can get started. 10:30 - 10:37 Speaker 5: Hello, Professor Sachin here, Are we going to talk about the as well in this deep learning session? 10:37 - 10:40 Dr Anand Jayaraman: That's right. The very 10:45 - 10:45 Speaker 1: last 10:51 - 10:51 Speaker 4: session 10:57 - 11:00 Speaker 1: is on okay. 11:56 - 12:25 Dr Anand Jayaraman: Okay, let's get started. Okay, let's get started. Welcome back everyone. We, I see that our 31 learners have logged in so wonderful. Just to remind you again, my name is, I'm Professor Anand Jayaraman and I, last week we started our journey into understanding deep learning. 12:26 - 12:56 Dr Anand Jayaraman: In last week's session we gave an introduction to artificial neural networks. So we just introduced artificial perceptron. That's where we were last class. I'll remind you again exactly what we talked about last week. But before we move further into the lesson today, let me here is the overall session plan. 12:57 - 13:29 Dr Anand Jayaraman: We have this is what happened last week. We were introduced neural networks. We'll continue our journey in understanding neural networks today as well. I believe we will have a lot more to say on neural networks tomorrow as well before we move on to deep learning. We will perhaps start deep learning, really addressing deep learning sometime tomorrow, halfway through, I guess. 13:30 - 14:08 Dr Anand Jayaraman: And then the next week, we are going to be talking mostly about different kinds of issues that come up in deep learning. And in all of that, we will be using structured data as a reference point and we will discuss these topics. By session 6, I hope to be able to get to autoencoders and we will, this is again 1 type of deep learning architecture. So we'll talk about that. And then we are going to have a lab session on February 4. 14:09 - 15:04 Dr Anand Jayaraman: February 4, the lab session is going to be implemented using, they're going to show you how to implement neural networks using Microsoft Azure. And starting from in week 5, we will talk about embeddings, again, very important idea in neural network and it has connections to language model which you will learn later on. And then we will talk about LSTM, recurrent neural network and LSTMs. Week 6, we'll have 1 session of lab and a second session where we will start to introduce CNNs or convolutional neural networks. So we'll have at least 2 sessions on CNN. 15:05 - 15:45 Dr Anand Jayaraman: There will be a lab on CNN. And finally, we'll have 1 session on GATS, on generative neural networks. The very last session is meant for end term evaluation module. So the way the course is going to be graded is there are going to be 2 hands-on assignment and 1 final exam, right? The hands-on, the 2 hands-on assignments together will carry 70% of the weight of the whole grade. 15:46 - 16:13 Dr Anand Jayaraman: These will be assigned after you do your hands-on sessions. So there are going to be 2 of them. And the final evaluation is going to be a 30 points, multiple choice questions kind of evaluation. So you have a mixture of hands on work and also an exam. Questions? 16:15 - 16:21 Speaker 4: So the hands on is going to be sort of a proctored 1 or I mean we do it and send across right the assignment 16:21 - 16:23 Dr Anand Jayaraman: yes that's all yeah yeah 16:23 - 16:26 Speaker 4: okay and the other 1 will be proctored 16:27 - 16:29 Dr Anand Jayaraman: this 1 will be the last final exam yeah 16:29 - 16:30 Speaker 4: the last 1 yeah 16:31 - 16:44 Speaker 3: yeah so professor 1 question the exam will be like like we have to log in at the same time because I'm just asking because I have a travel on March 3rd right so so will we get a chance to do it separately or we have to log in in the class. 16:45 - 17:19 Dr Anand Jayaraman: It's usually where you log in and do that but there is we recognize that you know not everyone might be able to log in at that time and so there will be another slot given as well I'll provide you the details of that little bit closer to the time. Once I have, this is again the first time I'm doing it with UpGrad. So I don't understand the exact mechanics of how it's done. 17:19 - 17:20 Speaker 4: So for the last DBA, I 17:20 - 17:22 Dr Anand Jayaraman: do understand there is a way. 17:23 - 17:31 Speaker 4: Yeah we had a last DBA right because I've been doing the earlier 1 so they had a window around 3 days yeah so within that anytime we can log in yeah. 17:33 - 17:45 Speaker 6: Professor Gopal here, for the hands-on session would it be something like a real time use case project? Or is it going to be an assignment? What is it 17:45 - 17:45 Speaker 7: going to be? 17:46 - 18:19 Dr Anand Jayaraman: It's going to be an assignment. We are not going to, clearly, these are going to be no code, no coding is going to be involved. So we will, ultimately, I want you to be able to use some data and actually try to build something using Azure ML Studio. So these hands-on sessions are mostly the sessions that are done live, for example, 1 on February 4. This is mostly a show on television, right? 18:19 - 18:45 Dr Anand Jayaraman: So somebody else is going to be showing and you're just going to be observing it, right? But I really do want you to get your hands dirty. I want you to get the data and actually try building something. You will clearly, you know, that adds a lot. I know that you will never actually be you'll have other people who are going to be working under you who are going to be executing these projects. 18:45 - 19:08 Dr Anand Jayaraman: But once in a while you need to be familiar with how exactly to do it right and the thinking process when somebody else is showing always seems easier when you start doing it that's when you'll have to make a lot of these decisions and sometimes it's not very clear. So it's only when you start doing that where do you get confidence. 19:11 - 19:14 Speaker 4: This will be 70 percent right? The 2 assignments. 19:24 - 19:27 Dr Anand Jayaraman: So any other questions before we get started? 19:28 - 19:30 Speaker 4: And the pass rate, how much it will be? 19:35 - 19:37 Dr Anand Jayaraman: You mean what will be the cutoff? 19:37 - 19:38 Speaker 4: Yeah, the pass rate, yeah, the cutoff. 19:38 - 20:22 Dr Anand Jayaraman: Yeah, I will talk to the person who's actually going to be doing the lab, we'll evaluate, we'll figure out how exactly to do that. So clearly, so the way 1 has to evaluate these hands-on sessions is that you will find that 2 people would have followed exactly the same architecture and still get slightly different answers. So, and we'll talk about why and all of that. So the grading is not necessarily going to be based on just a final answer. So we will, the intention is that, is the architecture correct? 20:22 - 20:47 Dr Anand Jayaraman: Is the steps that you're doing correct? And is your final conclusion about what results you're seeing, is that correct? And as long as those are all right, then you get 4 points for it, right? And the only way you lose points is that when you forget some key steps. But I'll definitely provide a lot more details once we start doing. 20:47 - 21:01 Dr Anand Jayaraman: Professor, there is somebody's mic, excuse me, professor, there's somebody's mic, which is on mute. And it's very disturbing. Can you please tell them all to unmute the mics while you're speaking please? Sure. Okay. 21:01 - 21:02 Dr Anand Jayaraman: Yeah. Okay. Now, 21:04 - 21:15 Speaker 6: so would there be an option to explain our intent behind the architecture or whatever decision we take in during this to, you know, during this assignments? 21:15 - 21:30 Dr Anand Jayaraman: Yes, you will have to submit a document along with the assignments. Wonderful. Thank you. That's so that's basically the thing, right? Honestly, the assignment itself is not going to be challenging. 21:32 - 22:06 Dr Anand Jayaraman: Mostly because, you know, these platforms have done an excellent job of guiding you through the process, right? So that's, you know, trying to, it's not like you're writing code, right? When you're writing code, there's like lots of decisions to be made and so on. These platforms help you a lot. But then still there are gonna be some decisions that are gonna be made and which is why we'll be asking you to submit along with the document, talking about what was your thinking process, what was your learnings and what's your conclusion. 22:08 - 22:18 Dr Anand Jayaraman: So here's the thing, I'm not sure, has Dr. Dakshanamurthy had any sessions with you all? 22:20 - 22:24 Speaker 4: Yeah, 1 session he had taken. Yeah. So, 22:24 - 22:26 Dr Anand Jayaraman: yeah. What which session was that? 22:27 - 22:33 Speaker 4: I think that was a second session when Dr. Sreedhamamurthy couldn't attend. Yeah. So then he took 1 session. Introduction. 22:33 - 22:34 Speaker 4: Yeah. So the 22:37 - 23:12 Dr Anand Jayaraman: thing is, in a way, what we all are trying to do is, you are not going to be coding, right? You're going to be, your senior positions, you're really not going to be doing the coding part, right? So what are we doing with this program? What we are doing with this program is you to be able to take a business use case, right? Understand how does that business use case gets translated to a machine learning problem, right? 23:14 - 23:49 Dr Anand Jayaraman: And there's going to be a bunch of different steps that goes there, some thinking process that goes in there. And then when the machine learning algorithm throws an output, you need to be able to interpret that output. And this is your skill set is going to be deciding which machine learning algorithm and then this know what platforms execute the machine learning algorithm, right? And then you're going to get some answers and you're going to have to evaluate the answers and explain it in business steps, right? That is what this process will attempt to help you guide you through. 23:49 - 24:04 Dr Anand Jayaraman: And so the document that you are going to be submitting along with those hands-on assignments are going to be important. We will talk more about this closer to the once we start coming to the hands-on sessions. 24:04 - 24:10 Speaker 4: So this time Professor Debro will be the VIP because last time we escaped in the module 1. Yeah, 24:12 - 24:48 Dr Anand Jayaraman: Debro is the 1 who is going to be doing the demo and he will help you with this. For me, I place a lot of value in the real world connection, business connection. I'm a practitioner. So I think that is a key step. Taking the business problem, converting it into the seminal problem, and then looking at the numbers that are coming out of ML and making the business connection. 24:48 - 25:16 Dr Anand Jayaraman: For me, that is key. And I think this entire program is to enable you to be able to do that. So that's how I will be evaluating this, these assignments. Okay, let's get started. So last week, we talked about, we introduced this perceptron, right? 25:16 - 26:17 Dr Anand Jayaraman: What we said was, we started out as if we are going to sorry, I need to open a different, yeah, okay. So, we talked about perceptron. We talked about intention of trying to write a mathematical model of a basic neuron. This was a neural cell and it had a bunch of different inputs over there and it has 1 output. These bunch of inputs that are coming in x0, x1, xn, these different inputs that are coming in, they get modulated by appropriate weights, w1, w2, so on. 26:17 - 26:42 Dr Anand Jayaraman: And these currents are flowing inside. So what you have inside is summation of wi xi plus some base level of chemicals that's already there and the output of the neuronal cell is some non-linear function of whatever is this input. The non-linear function that we chose was a sigmoid function. 26:43 - 26:48 Speaker 6: Professor, if you write on the top right hand side, we are not able to see it properly because of your video. 26:49 - 27:01 Dr Anand Jayaraman: I see. If I turn off the video, would that help? Yeah, now it's clear. Thank you. Now it's clear. 27:01 - 27:14 Dr Anand Jayaraman: Okay. So let me first figure out where, what is the line of visibility? Okay. What's the so are you able to see this line? 27:16 - 27:17 Speaker 4: Yeah, it's okay. 27:17 - 27:28 Speaker 5: Yes. Okay. So, we can see the non-linear function also okay after that if you write we will not be able to see but below that we will able to see 27:28 - 27:29 Dr Anand Jayaraman: so this is good 27:29 - 27:30 Speaker 5: this is 27:30 - 27:32 Speaker 4: good yeah after that okay 27:32 - 27:49 Dr Anand Jayaraman: that is fantastic okay let me have that line as a reference and how about on this side everything is okay yeah I prefer keeping the video on I know it takes bandwidth but sometimes I'm waving my hands and 27:49 - 27:50 Speaker 4: I think 27:51 - 27:52 Dr Anand Jayaraman: I feel I think that's so 27:52 - 27:58 Speaker 4: I think groups is having a small laptop I think yeah screen that's why 27:58 - 28:03 Dr Anand Jayaraman: it's good to have a video on we know you're watching so thank you yeah yeah good 28:03 - 28:07 Speaker 6: thanks yeah mine is a 13 inch laptop probably that's enough 28:07 - 28:32 Dr Anand Jayaraman: I see I see but it's good I I thanks for bringing that to my notice. I'll make sure that I don't go all the way to the bottom. Okay. So this was the perceptron, right? And what we discovered last time is that this perceptron is nothing more than a simple logistic regression, right? 28:32 - 29:05 Dr Anand Jayaraman: Logistic regression is a classification, basic classification model that we had discussed during the machine learning course, the introductory machine learning course. What does this do? This handles logistic regression, handles a set of problems where I have my training data set. Let us say I have 4 variables, right? These are different features, X naught, X1, X2, X3. 29:05 - 30:15 Dr Anand Jayaraman: These are different features and I have an output y. So for example you might be thinking about this as a loan bank loan problem. So here is a loan amount is in this column, salary is in this column, number of years of experience is in that column, and credit card balance is in the last column. And you're talking about, these are all different loan candidates 1, 2, 3, whatever you have 1000 loan candidates and you have the output y is whether this particular person paid back or did not pay back. When I have a data set like this and I'm interested in building a model which will tell me for a new client who's come here asking for some amount of loan amount with some salary and some number of years of experience with some credit card balance, I want to be able to understand whether I should give the loan to them or not based on this past data. 30:15 - 31:06 Dr Anand Jayaraman: Now, what logistic regression allows you to do is it builds a model and which will enable you to make these kinds of predictions. So how does this logistic regression build a model? Essentially what it does, it talks about the probability of returning the norm. It builds a model for probability of returning the norm and it gives us probability as a sigmoid function. And the way it evaluates this is w1 times, sorry, w0 times x0 plus w1 times x1, whatever, times w3 times x3 plus b, right? 31:06 - 31:47 Dr Anand Jayaraman: It says, It looks at this data and learns these coefficients w0, w1, w2, w3 and b in such a way that no matter which loan candidate you offer, it will be able to tell you the probability of they actually returning the loan. So the sigmoid function is actually giving you the probability, right? As you are further and further away on this axis, you have a greater probability of getting 1. Otherwise, you have a lower probability of getting 1, right? So that is what this logistic regression actually does. 31:47 - 32:03 Dr Anand Jayaraman: I hope this is all you know sort of like a review for you and I hope it's not all completely new. Is that true? Is that correct? I'm hoping that's correct. 32:05 - 32:13 Speaker 7: Professor, weightage was clear what is the B value professor W is the weight age and W B is what professor if you can help. 32:13 - 33:17 Dr Anand Jayaraman: That is just some intercept so in this the thing is when the interpretation this is exactly what a perceptron also does. In perceptron, when you're looking at it for the perceptron, that represents the base level of chemicals in the neuron itself before anything even started. Yeah. But that essentially, it's another parameter, the algorithm, essentially you need to determine these parameters in such a way that you are able to predict with great accuracy whether somebody is likely to return the loan or not from the training data set. So generally speaking, when in logistic regression, whenever you have 4 variables, so here you have 4 variables, The number of parameters that you need to determine is there is 1 slope coefficient for each 1 of the variable and then there is 1 interceptor. 33:18 - 34:03 Dr Anand Jayaraman: So, you will have to determine 5 parameters to be determined, ok. When there are 4 variables, you need to determine 5 parameters and that is the number of parameters you determine. And you determine these parameters in such a way that for all of these thousand candidates there is as small error as small cumulative error as possible right and there is an algorithm that tries out different values of these parameters and finally figures out what is the best value of the parameter. The process of figuring out these parameters is what we call as learning. The process of understanding the pattern, understanding the pattern in the data is what we call as learning. 34:03 - 34:23 Dr Anand Jayaraman: So in ML parlance, we call it as training. When I'm taking this data and I'm training the algorithm, essentially I'm trying, what's happening under the hood is these parameters are being learned so that the error is as small as possible. Yes, Srikant, please go 34:27 - 34:27 Speaker 8: ahead. Hello. 34:28 - 34:29 Dr Anand Jayaraman: Yes, Srikant, please go ahead. 34:29 - 34:34 Speaker 8: Okay. So the B is bias, right? We call B as bias. 34:34 - 34:58 Dr Anand Jayaraman: So different terms, you're absolutely right. So B is, they call it as bias. This in neural network parlance, they call this as bias. In statistics, where they do the same neural network, the same B, they call it as intercept. Different terminology. 35:01 - 35:28 Dr Anand Jayaraman: I personally dislike the term bias because in machine learning, there is something else that we call as bias. There's another thing also we call as bias. So sometimes it gets confusing for people. So I personally like to stick with the term intercept. Yeah. 35:28 - 36:10 Dr Anand Jayaraman: Any other questions? Okay, so this is logistic regression, right? And what we did last time was we showed logistic regression, example of logistic regression with the 1 simple problem. The problem that we talked about was there is a car that has 120 horsepower engine and the weight of 2.8 tons and we want to find out what is the probability of it being fitted with manual transmission, right? That was the problem and essentially what we have done here is so this problem has only 2 variables right horsepower and the weight of the car. 36:10 - 37:05 Dr Anand Jayaraman: So here is my matrix weight and horsepower these are the 2 variables and then there is a whether it's got automatic or manual I have binary value 0 or 1 right so this is the way the data set looks like what I have done is I plotted this data set over here with 1 variable on 1 axis, the other variable on this other axis. This way where we are plotting all the points, each of these points represents 1 of the cars, each of the data points, each of the dots represent 1 car. So because it's an age car is characterized by a weight and the horsepower, right? So that means that each of the car is a coordinate over here. This kind of plot is called as feature space plot, right? 37:05 - 37:36 Dr Anand Jayaraman: Because I'm plotting, I'm characterizing every point by its features, every point by its features, right? This feature space plot tells us a lot of things. 1 of the things it's talking about is the distribution of these points. In this case, I have colored the points based on automatic or manual. And what we are noticing is that these red data points are all nicely to 1 side, the blue data points are all nicely to the other side. 37:37 - 38:31 Dr Anand Jayaraman: And if I want to find out, if I want to figure out the logic between which separates the blue points and the red points, essentially, I am, the act of figuring out the logic is equivalent to the drawing a separating boundary between the 2 colors. This is what all classification problems do. All classification problems are essentially trying to draw a boundary separating 2 different classes that is what all classification problems do right logistic regression in particular you know what it's doing is it's trying to separate, yeah, I'm putting, here I'm showing the 3D plot, right? The previous 1 I showed a 2D plot. Now the third axis that I introduced is basically whether it is automatic or manual. 38:31 - 39:13 Dr Anand Jayaraman: That's the axis I've put. So all the points which are manual are all marked as 1, all the points which are automatic have all been marked as 0. So those points are on the floor, these points are on the ceiling. And when you have the act of building a model, essentially is equivalent to we are fitting a sheet that tries to go through the green points and the red points, the sigmoid shape sheet that we are fitting. Now, the cars are either automatic or manual, but the sheet is giving us values that are numerical in nature. 39:13 - 40:01 Dr Anand Jayaraman: It gives values from 0 to 1. So what exactly does the values on the sheet represent? The values on the sheet represent the probability that a car with that particular set of, let's say I have a car of 1.8 tons of weight and 110 horsepower. This car I want to know whether it is likely to be fitted with manual engine or automatic engine. What will I do is I will find out where does this sheet, what is the value of the sheet at this point, when the value of the sheet at this point is somewhere near the value of the is somewhere probably around 60 percent or something, this is on the top side of the sheet. 40:01 - 40:27 Dr Anand Jayaraman: So, the value is somewhere around 60 percent or 0.6, which represents 60 percent. So there is a 60 percent probability that this car will be a manual transmission car. And that's the way I interpret for any new car. This is exactly what logistic regression does. Now, in the previous graph, I drew a line. 40:28 - 41:17 Dr Anand Jayaraman: Now, what is the connection between that line and this 3D picture I showed? This 3D picture is basically, I can draw, I can find the set of points where, which correspond exactly to 50% probability, right? 50% probability, right? The level, and that level set, when I mark it on the floor over here, that is exactly what this line that I, it's talking about, this line is, represents the boundary points, the points which I have exactly 50 percent probability of being either this side or that side right. So, what does logistic regression give you? 41:17 - 41:39 Dr Anand Jayaraman: Logistic regression gives you the equation of this line in the feature space. And what does logistic regression do? What kind of line does logistic regression draw? Logistic regression draws a straight line boundary. The separating boundary is a straight line. 41:39 - 42:13 Dr Anand Jayaraman: That is what logistic regression does. Now, in 2D space, in 2D feature space, I have only 2 features, weight and horsepower in 2D feature space, the separating boundary is a line is a linear line. In 3D space, let us say I have 3 different variables, right, weight, horsepower and also for example, CUSEC. CUSEC is time taken to travel a quarter mile. That's another standard way they characterize the power of a car. 42:13 - 42:40 Dr Anand Jayaraman: How long does it take to travel 1 quarter mile? So let's say I have 3 variables and 3 pieces of information is given and then they're asking you what is the probability it's a manual transmission. Then I would then do the exact same problem with 3 variables. In that case, my feature space itself will be in a three-dimensional space. And there are points that are split in this three-dimensional space. 42:40 - 43:26 Dr Anand Jayaraman: And if I want to separate out the red points and blue points in that space, I will need to have a plane that is cutting through the three-dimensional space, right? And logistic regression will give a linear plane which separates out the blue points and red points. So what does logistic regression do? Logistic regression draws a linear separating boundary in the feature space that is what logistic regression does. And it allows and this draw the act of drawing a linear separating boundary is allows you to classify do binary classifications either it's this person will return the loan or not return the loan. 43:26 - 43:44 Dr Anand Jayaraman: This car has had automatic transmission or not automatic transition. This transaction is fraudulent or not fraudulent. This stock is a buy or a not buy, right? All of these are binary classification problems. And for binary classification problems, logistic regression is a good algorithm to try and do that. 43:44 - 43:59 Dr Anand Jayaraman: But what this algorithm does is it draws necessarily a linear separating boundary between 2 classes. Questions? I hear a sound. I am assuming someone has raised their hand. So please go ahead. 44:00 - 44:01 Speaker 7: Yourself. So, quick question here. 44:02 - 44:03 Speaker 3: Yes. For the same 44:03 - 44:11 Speaker 7: problem, we can do by deep learning also. You started the session with the same thing, right? Logistic regression and deep learning are similar. So, 44:11 - 44:37 Dr Anand Jayaraman: no, so not deep learning, perceptron. Yeah, not, we haven't come there. Perceptron is exactly equal to logistic regression. Okay. Perceptron is exactly equal to logistic regression no difference at all between that thank you yeah We haven't come to deep learning at all. 44:37 - 44:40 Dr Anand Jayaraman: We won't come until tomorrow's half day. 44:41 - 44:41 Speaker 1: But we 44:41 - 44:42 Dr Anand Jayaraman: are getting there. 44:45 - 45:03 Speaker 1: Professor, I always thought that the line that you are showing on this on your screen now that separates the blue and red was always the threshold that we set which yeah but may not always be 0.5 right can be 0.6 point where it is for in 45:03 - 45:12 Dr Anand Jayaraman: in a business case it's a threshold that you can set but no matter what threshold you will set it will always be a linear line that is what logistic 45:12 - 45:15 Speaker 1: but it's not always 0.5 right like 45:15 - 45:53 Dr Anand Jayaraman: you said regression. So, when you run the default logistic regression, it will give you some coefficients as what I'm showing you is an output of logistic regression in Python, right? When you can run logistic regression, even in Azure ML Studio, and even there, it will give you this kind of output. You might not have looked at those numbers but if you look at those numbers that it prints there, You see these numbers 18.86 minus 8 and then 0.03. What is the meaning of that number? 45:54 - 46:13 Dr Anand Jayaraman: The meaning of that number is if you write those numbers as an equation 18.863 minus 8.08 times weight plus 0.03 times horsepower and you set that equal to 0 that actually gives you the separating boundary with 0.5 threshold. 46:15 - 46:17 Speaker 1: That's not necessarily the best clip. 46:19 - 46:24 Dr Anand Jayaraman: It's giving you 0.5 threshold. See it doesn't know what your business use case is right 46:25 - 46:26 Speaker 1: okay okay got it 46:26 - 46:47 Dr Anand Jayaraman: okay if you unless you want to yeah so like generally speaking you know any number what is that what is the classification problem? The classification problem is you're separating between zeros and ones. Right? Anything above 0.5, you will round it up to 1. Anything below 0.5, you will round it to 0. 46:48 - 47:06 Dr Anand Jayaraman: So it's using 0.5 as a threshold which is a natural threshold. Now there are business cases where you might want to choose a different threshold and that's okay that's absolutely possible for you to do but by default it's going to give you with the 0.5 ratio. 47:07 - 47:09 Speaker 3: So why it is equal to 0 should be equal to 0.5. 47:10 - 47:25 Dr Anand Jayaraman: No, no, no. So forget about the details of why that is because then we'll actually have to go to the math behind what is actually the mathematical equation for the sigmoid and from there we can figure it out. 47:27 - 48:18 Speaker 8: So I have a question but I think it's too early to ask but I'm trying to ask them. The features that we currently take right, the weight and horsepower, there could be another feature that could be more effective than weight and horsepower right? How do we know which 1 to take when we design or when we come up with like what features are really needed for us to get the boundary or the largest regression work for us. And if you take number of yeah if it I'm sorry if you also take so many number of feature in the in the equation and will that be closer to the right answer or will that go beyond the right answer like how do we know where to stop I know this is beyond the class but I'm just trying to ask now. 48:19 - 48:36 Dr Anand Jayaraman: Very very very good question. Right. So these are all topics generally. The question that you're asking is this. What how do I know what set of features are needed to model this problem? 48:41 - 48:44 Speaker 8: Right, I mean what is the effective set of features that I need to take? Yeah, 48:44 - 49:23 Dr Anand Jayaraman: correct. Unfortunately, there is no way to answer that. Data science does not, cannot tell you what might be a feature that you are missing out because data science deals only with data that is in front of you. Now, let us say, so for example, in the car sector, I have these other set of features data that I have and this is automatic or manual. How many? 49:23 - 50:00 Dr Anand Jayaraman: 1, 2, 3, 4, 5, 6, 7, 8. There are 8 features and then this is my target value. Now data science can certainly help you in identifying among these 8 features, tell me the 3 most important 1 that it can answer for you. Okay. However, it won't tell you whether there is some other feature which might have improved which is not there because it doesn't know what is not 50:00 - 50:04 Speaker 8: there. Yeah, definitely. Yeah. But if I say the data problem then. 50:04 - 51:01 Dr Anand Jayaraman: Correct. So we won't know if there is a better feature out there which is already not included. But among the values that you have, there are 2 algorithms, linear regression and logistic regression, both of them give a way to identify whether the variable is useful or not. Both linear regression and logistic regression and these are the only 2 algorithms that give you this right well and decision tree sorry but the when you did your algorithms on Azure ML Studio, the ML Studio does give you these coefficients exactly like Python code. It does give you this coefficient. 51:01 - 51:45 Dr Anand Jayaraman: And there are these bunch of numbers over here right all of these numbers I already told you what these coefficients are these coefficients give you the equation of the line right I told you about that then there are these other bunch of numbers that are there what do these numbers mean and that's what those numbers are trying to they try to tell you How likely is it that a particular variable is not useful? How likely is it that a particular variable is useless? That is what this number is telling you. 51:48 - 51:52 Speaker 8: But we should know beforehand, right? We should not be trying to put in the line and then try to 51:52 - 52:32 Dr Anand Jayaraman: know so you run the regression because you need to provide the data to it for it to evaluate whether it's useless or not right so when it evaluates now it's going to give you these numbers these are actually probability numbers right. It is actually probability numbers it is saying here is this number right the way I read this is 0.04 right. So, this is saying there is a 4% chance that horsepower is useless. There is a 0.8% chance that weight is useless. And this constant term, there is a 1% chance it is useless. 52:33 - 52:46 Dr Anand Jayaraman: Now these are all small numbers and the probability that they are useless is very small and so in turn it means that all of these are very useful numbers and that's the way you do that. 52:47 - 52:55 Speaker 8: But if we actually kind of change the way the combination of these variables then will that change the probability of the coefficients to? 52:56 - 53:03 Dr Anand Jayaraman: Absolutely it will change. Okay. Correct. Correct. Absolutely it will change because it's now trying to evaluate it. 53:04 - 54:05 Dr Anand Jayaraman: Yeah, in the because when you're in the presence of this guy, I mean, you already have, you know, MS Dhoni in the team. And now you are adding Sanju Samson who's also a wicket keeper can be a wicket keeper and can be a stroke player but you know if MS zone is already informed then Sanju Samson is might not be that useful but if MS zone is not there then Sanju Samson might be very useful right so the usefulness or uselessness of the variable is definitely relative to what other variables are there. And that's what these probability numbers are giving you. I didn't mean to talk about those probability numbers in today's class at all. If all of that seemed very complicated, don't worry about it. 54:05 - 54:11 Dr Anand Jayaraman: That's not the main topic at all. But there was an opportunity to present that information. So I'm presenting that. That's all. 54:12 - 54:13 Speaker 8: Thank you so much. 54:14 - 55:00 Dr Anand Jayaraman: OK. So this is all what I've done is I've talked about a sigmoid neuron or logistic regression. Now this is a neuron, we are going back to the neuron body and I talked about if the non-linear function which is called as the activation function is chosen as a sigmoid, then what you are doing is you have just rediscovered logistic regression. Now, however, if the activation function, instead of putting it as sigmoid, if you just put it as a linear function, do you know what you rediscovered? 55:03 - 55:04 Speaker 3: Regression. Sigmoid. Regression. 55:05 - 55:27 Dr Anand Jayaraman: You rediscovered linear regression. Exactly. You have rediscovered linear regression models. Now, there are different kinds of activation functions people use right The most common 1 is sigmoid and there is also linear and we will talk about why sometimes some of these other ones are used. We will touch upon that little bit later. 55:27 - 55:43 Dr Anand Jayaraman: But depending on the activation function chosen, the action of the neuron might be slightly different. We will talk about it. We will come back and talk about when do you use what activation function before too long. 55:43 - 55:47 Speaker 3: Sorry, Professor, just a quick, you know, in the previous slide. 55:47 - 55:48 Dr Anand Jayaraman: So 55:48 - 55:55 Speaker 3: the activation function, would that be different, my understanding for logistic, let us say logistic regression versus linear regression? 55:56 - 56:42 Dr Anand Jayaraman: Yeah. So, for when the activation function, so activation function and I will put the outcome. So, what is the mathematical model it is and what is the outcome? When the activation function is sigmoid, sigmoid looks like this, it goes from between 0 and 1, the model is actually a logistic regression, logistic regression and the outcome is values between 0 to 1 those are the only values you get. This is actually useful for classification problems. 56:43 - 56:46 Speaker 3: Is that because it works only on categorical variables? 56:47 - 57:04 Dr Anand Jayaraman: No, no, because of the sigmoid function, the output is a sigmoid function, the sigmoid function looks like this, its values don't go beyond 0 or 1. No matter what value you feed in, its output is lies between 0 and 1. It's 1 of those functions. Okay. Right? 57:05 - 58:05 Dr Anand Jayaraman: So that is useful for classification problems because you can say that if it is closer to 0, then it belongs to class 0. If it is closer to 1, it belongs to class 1. On the other hand, if the activation function is a linear function, you are essentially rediscovered logistic regression and its values can go from minus infinity to plus infinity and this kind of activation functions are useful for regression problems. In regression problems, you are trying to predict a numerical value as an output, not just as a class variable as an output. When you have the numerical value as an output, for example, what is the revenue of apple next quarter right it's not between 0 and 1 right unless of course we are talking about in trillions of dollars or whatever right so you have the you want numbers to range from the possible values of numbers can be anywhere between minus infinity to plus infinity. 58:06 - 58:25 Dr Anand Jayaraman: So, that so, if you are expecting an output which want in the span the entire range, you need to give the activation function to be a linear activation function. We will talk about it in a bit more, just give me half an hour, by the second half of the class, we will talk about these activation functions in details. 58:26 - 58:43 Speaker 6: Professor Gopal, 1 question I have. So you mentioned that the perceptron is a logistic regression, right? So would it remain as a logic logistic regression cell even if the activation functions are different like linear or piecewise linear or whatever? 58:43 - 58:59 Dr Anand Jayaraman: No, so it remains a logistic regression only when the activation function is a sigmoid activation function. Sigmoid activation function gives a logistic regression. If I change the activation function, it would not be logistic regression. 59:00 - 59:15 Speaker 6: Got it. So what you were trying to convey earlier was it was started, Perceptron was discovered as a logistic regression model, but went on with other activation functions and became the different other models, correct? 59:15 - 59:30 Dr Anand Jayaraman: Yeah, they can you can use put in other things and you can use it for other things as well. But we told the thought, we'll come, we're getting there. Okay. Let's forget about linear at all. Let's just keep in our mind only the sigmoid part. 59:30 - 59:55 Dr Anand Jayaraman: And then let's move on to the next 1. Now I want to talk a little bit about the limitations of a perceptron. Now here is the data set that I have. This data set I have x1, x2. So it's 2 dimensional data set and here are these points and in this case I have only 4 points and here are the y values 0 or 1. 59:55 - 01:00:00 Dr Anand Jayaraman: When I plot that over here is x1, over here is x2, 01:00:00 - 01:00:29 Dr Anand Jayaraman: Here is x 2, this is the feature space. These 2 points belong to 1 class 1 and these 2 points belong to class 0. Can I train a perceptron model here or a logistic regression model, a sigmoid perceptron model here or a logistic regression model here? This is a classification problem. What do we do in classification problem? 01:00:29 - 01:00:52 Dr Anand Jayaraman: Draw a separating boundary to divide the class. That is what we are trying to do. Do you think it will work here? Yeah, it will work because it is a binary classification. It is a binary classification, it should be able to work except when you are trying to do a sigmoid perceptron or a logic or equality logistic regression, what does logistic regression do? 01:00:54 - 01:01:14 Dr Anand Jayaraman: It draws a linear separating boundary. Can I draw a line which separates out these 2 classes perfectly? No. No matter where you draw the line, you will have errors. No matter where you draw the line. 01:01:17 - 01:01:52 Dr Anand Jayaraman: This is a data set with only 4 points. And even with this 4 points, this sigmoid perceptron does not work. These guys, these neural network scientists, they first claimed, lot of credit, they said that we are going to figure out how humans think and we draw here is a human cell human neuronal cell and I'm going to write a mathematical model of it and they named it grandly as perceptron. But unfortunately the model cannot even do correctly classify this problem which has only 4 data points. Right? 01:01:53 - 01:02:20 Dr Anand Jayaraman: There is no straight line you can draw, which can correctly classify this. This is XOR, right? Exactly. This is that. Now, when it was discovered that there are such simple data sets, which you a perceptron cannot classify, I mean, it really led to a huge disappointment in the neural network community, right. 01:02:23 - 01:02:57 Dr Anand Jayaraman: Now what do we do, right? We thought we were trying to build an intelligent algorithm which can learn emotions and which can recognize people and all of that. But that silly thing is not able to classify this particular simple problem with 4 data points. The thing is this, we started out by asking you know can we start off with a big brain, human brain and we said that human brain is too big for us to understand, we can't model it. Then we ask can we do get small brain have intelligence. 01:02:57 - 01:03:15 Dr Anand Jayaraman: Then we agreed small brain can have intelligence. Then we immediately from there we pushed all the way to a brain with only single neuron. All I'm telling you now is a brain with a single neuron cannot solve this problem. That's all I'm telling you. How does brain do nonlinearity? 01:03:16 - 01:03:37 Dr Anand Jayaraman: How is brain able to do complex problems? Because there are multiple neurons connected with each other. Perhaps if we also are able to connect multiple neurons, then we might also be able to solve nonlinear problems. How do you connect multiple neurons? So imagine this, right? 01:03:37 - 01:04:04 Dr Anand Jayaraman: So I, let's go back to this dataset. This dataset has only 2 dimensions. So here are this 2, here is this input. So each input is going to enter here and it is going inside the neuronal body. Neuronal body, 1 input x1 is coming and another input x2 is coming and this output came. 01:04:04 - 01:04:31 Dr Anand Jayaraman: This is a single neuron. We agree that the single neuron will be unsuccessful in classifying these kinds of problems. We agree to that. Now, I am telling you, I have suggested, I've hopefully convinced you of this idea that brain is not a single neuron, it's multiple neurons connected. So we need to somehow model that idea here. 01:04:32 - 01:04:51 Dr Anand Jayaraman: So I want to be able to add 1 more neuron, right, multiple neurons together. So let's just make an extension, put 2 neurons, right? So here is another neuron, okay? This neuron also I'm going to feed in the same inputs x1 and x2, same inputs. Now that will also have an output. 01:04:51 - 01:05:24 Dr Anand Jayaraman: Now this 1 will have 1 set of parameters w1, w2. This 1 might have a different set of parameters w3, w4. But The problem is this neuron has got 1 output, this 1 has another output, which output should I listen to? We ultimately want only single output, single values and output. So what I'm gonna do is, I'm gonna take the output of this neuron and output of this neuron and send it to a third neuron which will look at this output and that output and decide which is the correct output to send it out. 01:05:26 - 01:06:04 Dr Anand Jayaraman: Now, suddenly it starts to look like a circuit diagram, right. I am taking 1 neuron, here is another neuron and these 2, this neuron is getting an output, that is what is connecting an output and you are sending that to the third neuron to that. Talking about, why did this happen? Yeah, there it is. So, we are now talking about a neural network. 01:06:05 - 01:06:29 Dr Anand Jayaraman: We are connecting a network of neurons, right? Now, so how does it, Why do I think having a neural network actually can help solve the problem? We'll talk about that. But let me first show you how people draw this. Professor, 01:06:32 - 01:06:35 Speaker 2: I have a question, probably a very stupid question. 01:06:35 - 01:06:36 Dr Anand Jayaraman: Can you 01:06:36 - 01:06:46 Speaker 2: just go back to the previous slide please? In that, I mean, there are 2, no, no. Yeah, so let's say put here, there are 2 in the cells. 01:06:47 - 01:07:30 Dr Anand Jayaraman: And why does the weights change? You know, the W1, W2, if it is the, isn't the weight is based on the probability based on the inputs parameter? Absolutely absolutely and that's where I'm going with the explanation okay so what I'm actually trying to do is this when I add 2 neurons see what do these weights mean w1, w2, these weights, let us go back to here, these weights, These weights are what is determining where exactly is the line being drawn. That's a slope of the line. The weights are determining the slope of the line. 01:07:31 - 01:07:58 Dr Anand Jayaraman: And the intercept is talking about the positioning of the line. This number 18.8 instead of 18.8, if it was 17, then there should be a line over here. Instead of 18, if the line is 25, this number is 25, then the line would be here, exactly same slope. The slope, as you change the slope, the line becomes steeper or shallower. That's what these slopes numbers are controlling. 01:08:00 - 01:08:36 Dr Anand Jayaraman: These values, these weights determine the exact position of the line which will divide the data. Right. Now, what I am doing here is, why do I, let's go back to this question, why do I believe connecting 2 neurons might help solve the problem. I believe it might help solve the problem because when I'm putting this neuron there, neuron A, neuron A can be responsible for drawing this line. And neuron B can be responsible for drawing a completely different line. 01:08:36 - 01:09:11 Dr Anand Jayaraman: And this neuron C will tell you how to collate the information from this neuron and that neuron and decide on your correct class. Having these 2 lines will enable you to correctly classify the problem. Is that clear? We agreed that drawing there it is impossible to draw a single line which correctly classified the problem. But now by adding an extra neuron, I am effectively using 2 lines to classify the problem. 01:09:11 - 01:09:12 Dr Anand Jayaraman: That is possible, right? 01:09:14 - 01:09:24 Speaker 2: Agreed, I think the 2 lines, but how do you determine what is the different slopes are coming by virtue of the different constants and the different slope? 01:09:25 - 01:09:40 Dr Anand Jayaraman: How does the neuron know what slope to put? That we will come and answer that. We are just trying. It is possible to find the value of W1, W2, W3, W4 so that you have correct classification of the problem. That is all 01:09:40 - 01:09:42 Speaker 2: I am saying. How does 01:09:42 - 01:09:46 Dr Anand Jayaraman: the neuron determine that? We will talk about that. That is coming later. 01:09:46 - 01:10:07 Speaker 2: And the question I mean what I'm slightly not clear is that the W1 W2 is correctly computed by virtue of whatever is the optimus optimal things to get a better output. So why does it differ for W1, W2? Why does it differ from W3 and W4? Good, good question. 01:10:07 - 01:10:09 Dr Anand Jayaraman: Sorry, who's speaking? I'm sorry. 01:10:10 - 01:10:13 Speaker 2: This is Bhaskar. Bhaskar. Okay, good. 01:10:13 - 01:10:32 Dr Anand Jayaraman: Very, very nice question, Bhaskar. So in your mind, you are imagining I'm training 1 neuron separately and another neuron separately. I'm not going to do that. I'm going to train this whole network together. Right. 01:10:32 - 01:11:10 Dr Anand Jayaraman: I'm saying you figure out what set of weights to put on this neuron, this neuron and this output is going to come in and it is this 1 is also another neuron. So, it will also have its own set of weights. So, this will have 1 constant C 1, another constant C 2 and this will have another set of h2. So, w5 here, w6 here and c3 here. And all together for this particular problem, I am now allowing for 1, 2, 3, 4, 5, 6, 7, 8, 9. 01:11:10 - 01:11:33 Dr Anand Jayaraman: I am allowing for 9 parameters to be determined and having this 9 parameters I am effectively able to do this classification algorithm. That's what I'm saying. So why do you think I'm able to do the classification? Because now I have more parameters to try and fit the data. When I have more parameters to fit the data, it's not a big surprise that I'm able to correctly classify. 01:11:34 - 01:11:58 Dr Anand Jayaraman: That's another way to look at it. Okay, I'm not training the neurons separately. I will go to this entire network and train the network together. Figure out the set of weights in such a way that you get correct answers always, or most of the time. How do you train the network? 01:11:58 - 01:12:06 Dr Anand Jayaraman: We'll talk about that. We'll talk about that later. But right now, I'm just talking about the architecture. I'm trying to architect the solution. That's all right. 01:12:09 - 01:12:24 Speaker 3: Questions? Thanks professor. Thanks professor. That is a lot of silence there. 01:12:25 - 01:12:55 Dr Anand Jayaraman: Now, here is the, just think what essentially I am doing is that these neurons, multiple neurons are connected in our body. Now in a sense when we are teaching someone how to do mathematics, we are not training each of these neurons separately. This entire structure is trained together. Each neuron takes up 1 responsibility. The entire network as a whole is being taught. 01:12:55 - 01:13:27 Dr Anand Jayaraman: And the weights are getting adjusted for all of those neurons together. And that is what we are doing in a neural network. Right. And so the neural network when I have 2 neurons and this is the way they are going to start representing it. I want you to now move away from a biological model of a neuron, I want you to start thinking about an electrical model instead, right. 01:13:28 - 01:14:04 Dr Anand Jayaraman: Now I am sure all of you in your high school, at some point you would have, when you did your physics, you would have connected resistors and, perhaps not op amp, but capacitors and battery on a on a circuit board. You have all done this in high school physics. Now I don't know how many of you have heard this term as well. Breadboard, Is that familiar to you? Yeah. 01:14:05 - 01:14:23 Dr Anand Jayaraman: Breadboard is where you assemble the circuit. Yeah. Right? This is a, it's a board with lots of holes in it. Like tiny tiny holes in it, where you can plug in your resistor or plug in your capacitor and so on, right. 01:14:23 - 01:14:40 Dr Anand Jayaraman: That's what they call as a breadboard. Now what, imagine this, right. I have a breadboard, right. This is my breadboard and I am looking at the breadboard now from the top. What I'm going to do is, I'm going to put 2 pins over here. 01:14:41 - 01:15:09 Dr Anand Jayaraman: These are, to this pin I'm going to give in, plug in the signal that is coming from x1 right. This is my data point right x1 x2 right. The x1 signal I'm going to come in and it's going to this pin and over here X2 signal is being sent and this is a pin, right? Now here I have 1 neuron. So think of that 1 neuron as some electrical component. 01:15:10 - 01:15:37 Dr Anand Jayaraman: So this neuron is gonna get connected both X1 and X2 data. The current is flowing into that and it is going to have some output. And here is the second neuron and for the second neuron too I am going to send in the same input, right. This neuron will have, will learn 1 functionality, this neuron will learn perhaps this line, This neuron will learn perhaps this line. This neuron will learn perhaps this line. 01:15:38 - 01:15:58 Dr Anand Jayaraman: And the output from both of them is connected to the third neuron and from this is the output I will monitor. This is the output that is going to be monitored. I will and this output needs to be 0 or 1 to determine whether loan should be given to the person or not or stock should be 01:15:58 - 01:15:59 Speaker 3: bought or not or whatever. 01:16:00 - 01:16:25 Dr Anand Jayaraman: I want you to imagine this network. This is the neural network. Now in general neural networks, here I will talk about this in a second. So, this is how a general network will look like. This is a neural network. 01:16:26 - 01:16:52 Dr Anand Jayaraman: This is how a general neural network will look like. In this network, here I am showing previous network we talked about. There were, you know, these are my connection points. This was x1, x2 came. Both the data was sent to this neuron, both this data was sent to this neuron, the output came and this is an output neuron, right. 01:16:52 - 01:17:08 Dr Anand Jayaraman: This is my previous neural network. In this network, the computation was done by these 2 neurons. There were 2 neurons over there. Over here, I'm showing you here 5 neurons. 12345. 01:17:09 - 01:17:10 Speaker 3: Right. 01:17:12 - 01:17:18 Dr Anand Jayaraman: These points that I'm showing here, what are these? What are these points? 01:17:19 - 01:17:20 Speaker 4: Inputs. 01:17:21 - 01:17:41 Dr Anand Jayaraman: That is basically input pin where from this common pin I am going to collect this electrical current to this 1 and from this common pin I am going to collect electrical current to that. This is my common pin. There is no computation happening here. No computation happening here. It is just a common multi-pin plug that is over there. 01:17:41 - 01:18:17 Dr Anand Jayaraman: Computation is happening here and here. Now, this portion where this multi-pin plug is there, they call it as input layer. In input layer, no computation is happening. This is the place where output is being collected from, they call this as the output layer. Professor 1, in the input layer also the once we do the activation given activation function there will be some competition. 01:18:18 - 01:18:33 Dr Anand Jayaraman: There is no activation function in the input layer. Input layer is strictly a multi pin plug. Okay. X1 is coming, that current is, it is just a holding point from where all of these wires are connected. That's all. 01:18:34 - 01:18:53 Dr Anand Jayaraman: But no computation actually happens in the input layer. Okay. Okay. Now this layer, They call it as the hidden layer. Hidden layer because we don't actually observe what output is coming from here at all. 01:18:53 - 01:19:07 Dr Anand Jayaraman: We are not going to interact with this layer at all. There is going to be weights that are determined, W1, W2, W3, weights are going to be determined. Okay, but we are not going to observe them. It's not of our concern. Our only concern is I'm going to be providing some input here. 01:19:07 - 01:19:30 Dr Anand Jayaraman: And I'm going to be observing the output. Keep I'll train this model until the output is correct. For all of 90% accuracy, for all my thousand loan candidates, right? I'll keep training the model until I do that. So what is going to happen inside the hidden layer, I will not actually observe in that. 01:19:30 - 01:19:33 Dr Anand Jayaraman: So which is why the term hidden layer. Sorry, there was 01:19:33 - 01:19:38 Speaker 3: a question. Please go ahead. Professor. Yes. 01:19:41 - 01:20:09 Speaker 2: Sorry, I'm asking the same. I'm slightly confused and hence I'm asking the same question again. We have these 4 inputs 1, 2, 3, 4 and the hidden layer have got nodes right 1, 2, 3, 4, 5 and each of these nodes are connected to the same inputs 1, 2, 3, 4. My question is why does the at the hidden layer right why does node 1, 2, 3, 4, 5 compute different weights for the same inputs? I didn't understand that when you explained that. 01:20:09 - 01:20:17 Speaker 2: I can understand if the inputs were different yes it would compute a different weight but it is exactly the same inputs why would it compute different weights? 01:20:17 - 01:20:50 Dr Anand Jayaraman: So what I am talking, I want you to think about it as this okay. I want you to think about it as this. Okay. I want you to think about a business problem is being given to a team of 5 people, okay. Now 1 person is very good at analyzing that problem at a very surface, at a business level, but they don't do math. 01:20:51 - 01:21:12 Dr Anand Jayaraman: Second person is a very mathematically oriented person, but they don't think about business. Third person does a different function. Fourth person does a different function. Each person makes a contribution, they all get the same input. They get a contribution and they provide this contribution to the output 1. 01:21:12 - 01:21:34 Dr Anand Jayaraman: This is your program manager who gets all of the output and gives you the final output. What we will find is, see all these 5 are human beings. They all are capable of, technically capable of doing everything. But we are specialists. This neuron will be a specialist which will specialize in doing 1 thing. 01:21:34 - 01:21:54 Dr Anand Jayaraman: This is another specialist which will specialize in doing another thing and so on and so forth. When the neural network is trained, it will automatically decide 1 neuron is going to specialize in this, next 1 is specializing in this and so on and so forth. How does this actually happen? Do not worry about it yet. We will come there, we will come to the training part. 01:21:55 - 01:22:01 Dr Anand Jayaraman: But the final trained neural network is has these bunch of different specialists. 01:22:03 - 01:22:05 Speaker 2: Yes, understood it, sir. Thank you. 01:22:05 - 01:22:09 Dr Anand Jayaraman: Right. And the way you specialize is by having a different set of wings. 01:22:13 - 01:22:17 Speaker 5: So can there be also a different activation function at each and every layer of 01:22:17 - 01:22:37 Dr Anand Jayaraman: potentially you can have a neural network where each 1 has a different activation function but that is really not done all of these neurons are exactly same neurons and in this case they're all going to be sigma neurons The only thing that's different is the weights. The weights are going to be different. 01:22:39 - 01:22:53 Speaker 6: Professor, the other way we can think of is, I'm just, since you were saying, so I was just thinking about an example. So 7 blind men and an elephant right. So each 1 of them touching and saying it is a tail it is this and then finally we got the elephant. 01:22:53 - 01:23:12 Dr Anand Jayaraman: Absolutely yeah. There you are trying to understand it right. Here what you're doing is these you're imagine building a house right. These 5 people are building different portions of the house, right? Each 1 is a specialist or a different activity. 01:23:12 - 01:23:33 Dr Anand Jayaraman: 1 person is a bricklayer, The other 1 is a cement, this guy, other guy is a specialist in doing something else, right? There are 5 different specialists. All their contributions are going to be needed. Yeah. Now, this is a neural network. 01:23:33 - 01:24:04 Dr Anand Jayaraman: Can I also? It's important just to make 1 point and then this thing, right. This is the way in which all textbooks represent neural network and I hate it, absolutely hate it, because they put these circles for neurons, right? This is a neuron, this is a neuron, right? But it's absolutely idiotic that they use circles, same size circles here, except these are not neurons. 01:24:05 - 01:24:06 Dr Anand Jayaraman: What are these? 01:24:07 - 01:24:08 Speaker 6: Input values. 01:24:09 - 01:24:24 Dr Anand Jayaraman: Yeah this is just an input node. This is a multi-pin plug. Right, so in my diagram I chose to put a pin there, a square 1 there. This 1, there is no computation at all. There is no activation function, nothing. 01:24:25 - 01:24:38 Dr Anand Jayaraman: It's just a multi-pin plug. It's a node from which all connections can be drawn. That's all. Okay. So just remember that when you see the diagram, this is the input layer, no computation is happening. 01:24:38 - 01:24:56 Dr Anand Jayaraman: The first computation happens here in the hidden layer. This is where the first computation happens. And then each 1 is gonna compute and give some output. This 1 collates all of the output, again uses the output and uses a set of weights to determine what should be the final output based on all of them. 01:24:59 - 01:25:09 Speaker 7: Professor, 1 question here. Which input then gets triggered? I mean, we are talking about no computation, right? But there are multiple inputs that we see input 1, 2, 01:25:09 - 01:25:10 Speaker 3: 3, 4. 01:25:10 - 01:25:20 Speaker 7: So what determines to which input we start? We started input 1 or input 2. There must be some determination there, right? I mean, something must be triggering. 01:25:20 - 01:25:35 Dr Anand Jayaraman: Absolutely. All of these values are sent to all of them. Right? Imagine this to be, you know, 5 people interview panel. Right? 01:25:36 - 01:25:56 Dr Anand Jayaraman: Now, you are this candidate all 4 inputs this represent candidate. Now what is this input? 1 of them is someone's dressing sets when they come in. Okay. The other 1 other feature they're evaluating you on your language capability. 01:25:56 - 01:26:18 Dr Anand Jayaraman: Another 1 is your analytical capability. Another 1 is your stress handling capability. Ultimately all 4 inputs are coming from this 1 data part, 1 person. There are 4 features, each 1 of us have different features. I have strength in 1, I have weaknesses in another, These are all features, different features, but all of them are 1 point. 01:26:18 - 01:26:37 Dr Anand Jayaraman: Now, all of those are being absorbed, are being sent to this evaluator. Now, this evaluator might place more value on your dressing sense and analytic capability. This 1 might use a different set of combination. This 1 might set up them for combination. Each person will send their vote through. 01:26:37 - 01:26:43 Dr Anand Jayaraman: This person collates it and then gets your final decision. Is that a better way of thinking? 01:26:44 - 01:26:46 Speaker 7: Yes, Absolutely, yeah. 01:26:46 - 01:26:48 Dr Anand Jayaraman: But all of these are features. 01:26:52 - 01:26:52 Speaker 3: Yeah. 01:26:53 - 01:27:06 Speaker 5: So can the hidden layer be trained differently with the same input? Like, for example, the number of iterations for hidden layer 1 could be many, and number 2 could be not less than that, so that we get different values for weight. 01:27:08 - 01:27:13 Dr Anand Jayaraman: So how exactly do we train? We will come there. Right? All I'm saying is it can 01:27:13 - 01:27:14 Speaker 5: be trained. 01:27:17 - 01:27:24 Dr Anand Jayaraman: Right? It can be trained and I'll come to it. I still haven't delivered the punchline yet. This is a neural network. Right. 01:27:24 - 01:27:46 Dr Anand Jayaraman: Now, how do we code this in practice? Right. It's a complicated neural network, looks complex. How do we code it in practice? You know what I told you right you don't although you don't need to learn coding for this 1 I want you to be able to see code and understand this code. 01:27:47 - 01:28:15 Dr Anand Jayaraman: Right. First thing is this. These let me call these, give, introduce some other. Okay. This is a feed-forward, feed-forward feed forward, feed forward, densely connected neural network. 01:28:16 - 01:28:36 Dr Anand Jayaraman: Okay. This is a feed forward densely connected neural network. Feed forward while information is flowing from this layer to that layer from that layer to that layer. The excuse me, information is necessarily flowing in only 1 direction. And so we call this as a feed forward network. 01:28:36 - 01:28:58 Dr Anand Jayaraman: There are networks where instead of information feeding in 1 direction, that it is information, the feedback is also sent back to the previous layer. There are neural networks like that. Okay we are not talking about it that yet. But in this neural network there is no such feedback. So we are just calling it as a feed forward from 1 layer to another layer information is flowing. 01:28:58 - 01:29:18 Dr Anand Jayaraman: So it's a feed forward neural network. Right. Next term, densely connected. What is densely connected mean? Densely connected means a neuron in a given layer is connected to all the neurons in the previous layer. 01:29:19 - 01:29:50 Dr Anand Jayaraman: Like this neuron, right? It's there are 5 neurons in the previous layer, it is connected to all 5 neurons, right? That kind of neural networks, that kind of neurons are called as densely connected neuron, right. If a given neuron is connected to all the neurons in the previous layer that is called densely connected neuron and then we will see that there are situations where such thing is not true, right. So, there are other types of neurons as well. 01:29:51 - 01:30:19 Dr Anand Jayaraman: So, that is where this term comes from feed forward densely connected neural network. Now, let us go ahead and look at this code. How do we define this in Python? Okay you know this complicated creature, this complicated messy diagram actually does not take much to write in Python. The way you write it is just those 3 lines. 01:30:20 - 01:30:52 Dr Anand Jayaraman: Let me remove all my annotations and discard and then present it again. There, those 3 lines are all that is needed to define this neural network. How do I define this? I start off by telling, I am going to define a neural network model in a sequential way. I am going to start describing the model 1 layer at a time. 01:30:53 - 01:31:14 Dr Anand Jayaraman: Okay that's what this command means I'm going to define it in a sequential way. Right now I'm declaring that I'm defining starting a model that the different sequential. So this model that I just defined in the previous line, I want to add 1 layer. What layer am I adding? And layer of dense neurons. 01:31:15 - 01:31:28 Dr Anand Jayaraman: Do we still remember what are dense neurons? Yes. Neurons that are fully connected to the previous layer. Yes. And how many neurons do I want to add? 01:31:28 - 01:31:41 Dr Anand Jayaraman: 5 neurons 1, 2, 3, 4, 5. So, I am adding 5 neurons, I am creating a layer with 5 neurons. Now, this neuron has inputs. How many inputs does it need? 4. 01:31:41 - 01:31:58 Dr Anand Jayaraman: 4 inputs. So it's got 4 inputs. And for all of these neurons, I want to have an activation function of sigma. Clear? This 1 line nicely describes this particular layer. 01:31:59 - 01:32:24 Dr Anand Jayaraman: Okay. Now, next, I am saying, create a next layer also for dense neuron, but I need only 1 of them, and activation is sigma. Notice, I do not necessarily tell how many inputs are needed for this neuron. Why? Why do I not need to tell that? 01:32:27 - 01:32:28 Dr Anand Jayaraman: Because I'm adding it sequentially. 01:32:28 - 01:32:29 Speaker 3: So 01:32:29 - 01:32:38 Dr Anand Jayaraman: it knows that next 1 is that and I've already said it's dense neuron. So it knows that all of the previous 1 will connect them. Is that clear? 01:32:38 - 01:32:40 Speaker 3: Okay. The 01:32:40 - 01:32:43 Speaker 8: machine will understand that's what you're trying to say, sir. 01:32:44 - 01:33:03 Dr Anand Jayaraman: It will automatically. Yeah. You don't need to specify the input. Because first you describe this next 1, we know it's going to connect to all of the previous 1, because it's a dense neural. Now, and this is the way we define neural networks. 01:33:05 - 01:33:30 Dr Anand Jayaraman: Now, why is there so much excitement about neural networks? There is so much excitement about neural networks because no matter how complicated a dataset you have. Here is a complicated dataset. Now, how do I draw a separating boundary between those? Clearly I can't draw 1 line which separates the class. 01:33:32 - 01:33:56 Dr Anand Jayaraman: Right? What there is a theorem which tells you that no matter how complicated a data set you have, you can classify to the desired level of accuracy using 1 hidden layer neural network. Just 1 hidden layer is needed. Okay. How many neurons should be there in that hidden layer? 01:33:57 - 01:34:17 Dr Anand Jayaraman: We don't know yet. You can try it with 5. If that doesn't work, then try it with 8. If that doesn't work, try it with 10 neurons. But there is a finite number of neurons which will be able to solve a problem no matter to the desired level of accuracy, no matter how complex that is. 01:34:19 - 01:34:52 Dr Anand Jayaraman: Extremely powerful theorem, extremely powerful theorem. Why is that even reasonable? It is reasonable because what does each neuron do? Each neuron effectively is responsible for drawing a line so this neuron perhaps might draw a straight line another neuron might draw this line yet another neuron might draw this line and so on and so forth it can completely separate out the filled dots with hollow dots. Right? 01:34:52 - 01:35:17 Dr Anand Jayaraman: Underneath, that's what is happening. But you don't even need to be able to visualize the data. You are assured there is a mathematical theorem that tells you no matter how complicated a problem, a neural network, a feed-forward neural network with a single layer can solve that. Amazing right, how powerful a theorem, right. No matter how complicated a problem, right. 01:35:19 - 01:35:47 Dr Anand Jayaraman: I want to figure out how the New York Stock Exchange, how the models of all the stocks performance have whether I can predict that next 6 months performance or not based on a bunch of features. Can I solve that problem using neural network? Absolutely. There is a finite number of neurons which will solve that problem. Can we predict for Tesla and the airline stocks right away please? 01:35:49 - 01:35:57 Dr Anand Jayaraman: Absolutely, absolutely we can do that. My Google Pay number is this. Once I get the money, we can totally do that. 01:35:59 - 01:36:11 Speaker 2: So Professor, the number of nodes, or the nodes, how much it should be, whether it should be 5, 8, 10, that kind of thing, is it like arbitrarily taken and then optimized or is it. Correct, 01:36:12 - 01:36:34 Dr Anand Jayaraman: absolutely right. It is arbitrarily decided. This is called as a hyper parameter. You know, we you know we know each other now for 2 classes, we are friends. So I'm very comfortable using that word arbitrarily decided. 01:36:35 - 01:36:49 Dr Anand Jayaraman: You randomly try out different things. But you don't sound intelligent when you say it's arbitrarily decided. So like neural network, machine learning engineers, they have come up with a fancy term. We call it, it's a hyperparameter. What is a hyperparameter? 01:36:50 - 01:36:57 Dr Anand Jayaraman: It's a value you try again multiple values for this business case how much is needed you figure out. It's a hyper parameter. 01:37:01 - 01:37:04 Speaker 8: So the hyper parameter is nothing but the dense what you are giving. 01:37:04 - 01:37:21 Dr Anand Jayaraman: Number of neurons is a hyper parameter. Number of neurons is a hyper parameter. So, you try trial and error and figure it out. Why is it called a hyper parameter and not a multi parameter? There is something called a parameter first. 01:37:21 - 01:37:51 Dr Anand Jayaraman: The parameter is the value of weights that get determined during training. That you are not randomly trying to figure out. There is a training algorithm, you issue the train command, it will automatically figure out what set of weight should be there for each. Parameters are things that are determined during training, determined automatically during training. In fact training is a process of determining parameters. 01:37:53 - 01:38:09 Dr Anand Jayaraman: But hyper parameter is a design decision that's decided before a network is, when the network is being created. Training hasn't happened. I'm going to create a network with 3 neurons. That's a hyperparameter. Then you can train it, and you determine the parameters. 01:38:09 - 01:38:20 Dr Anand Jayaraman: Next, I'm going to design a neural network with 7 neurons. That's a hyperparameter. The design decision is before training. So those are all my type of problems. 01:38:22 - 01:38:40 Speaker 4: Sorry, this is Tarek speaking. Yeah, I have a question. Let's say that you decided to design the neural network and how many layers you need. And then the training did not give a good results. At that time, you have to figure out again and go back to the design, right? 01:38:40 - 01:38:42 Speaker 3: Correct. OK. 01:38:42 - 01:38:59 Dr Anand Jayaraman: Exactly. But you are assured that there is a finite number of, it will never be a situation that you need infinite number of neurons. Knowing that is reassuring. So you can slowly increase the complexity of the network and figure it out. And we will do all of this next class. 01:38:59 - 01:39:05 Dr Anand Jayaraman: In next class, we will play with the neuron and figure out what should be the level of complexity needed. 01:39:05 - 01:39:20 Speaker 8: Professor, 1 question here on biological 1, sorry, biological 1, biological neuron which is that right after axion, it connects to some 1 more neuron where it takes decision, right? So in the I forgot the name synaptic or something you said 01:39:20 - 01:39:29 Dr Anand Jayaraman: yeah the connection is called isn't this connection is the synaptic connection the lines are all synaptic connections 01:39:31 - 01:39:32 Speaker 8: is not axon professor 01:39:33 - 01:39:54 Dr Anand Jayaraman: the axon what we are representing here is only synaptic connections. The entire cell I'm representing as a circle. And I'm just saying the connection between this neuron and that neuron, I'm just saying this neuron is connected with that neuron and that that line will represent the strength of the connection. 01:39:58 - 01:40:13 Speaker 5: The the number of hidden layer neurons that we are talking about right? We're talking about 1 like you said 5, 8 and all. So is it a Fibonacci that we have to go with as a like you know 1358? We go in that way or there is any other way that we? 01:40:13 - 01:40:18 Dr Anand Jayaraman: It can be any number. It can be odd, it can be even, it needs to be an integer. That's all. 01:40:18 - 01:40:22 Speaker 5: OK, cool. OK, I was thinking it is only like some pattern that we need to follow. 01:40:22 - 01:40:31 Dr Anand Jayaraman: No, no, no. There is absolutely no pattern. It's basically, depending on the complexity of the problem, we might need more or we might need less. 01:40:32 - 01:40:32 Speaker 2: Professor, I 01:40:32 - 01:40:33 Speaker 3: think it's 01:40:33 - 01:40:34 Speaker 2: Opal here. If you 01:40:34 - 01:40:40 Speaker 3: are talking about hyperparameters, you are also talking about things like number of layers can change, activation function, there are 01:40:40 - 01:40:51 Dr Anand Jayaraman: many kinds. Correct. All of those are hyperparameters. I don't want to confuse you yet by mentioning all of that yet. So right now I'm just introducing 1 hyperparameter which is the number of neurons. 01:40:51 - 01:40:57 Dr Anand Jayaraman: But you are absolutely right, there can be more than 1 hyperparameter. In fact, in neural networks, there are lots of hyperparameters. 01:41:00 - 01:41:18 Speaker 9: Professor Opal here, I think my question is similar to John. So it could also differ by different classes, correct? It could be by network structure, learning patterns, regulation effects, and a lot of parameters can come into the context of hyperparameters, correct? 01:41:19 - 01:41:36 Dr Anand Jayaraman: Absolutely. There are lots of hyperparameters. Now, we'll come to that. We'll come to them slow. And this is to be thought of as a gentle introduction to neural networks where you're going at a slow, leisurely pace. 01:41:36 - 01:41:39 Dr Anand Jayaraman: And then suddenly the rush of all of those things. 01:41:39 - 01:41:40 Speaker 9: Wait for that tsunami. 01:41:41 - 01:41:48 Dr Anand Jayaraman: Okay, thank you. Yeah. Yeah. But this is perhaps a time for break, but yes. Yeah, that's right. 01:41:48 - 01:41:54 Dr Anand Jayaraman: That's right. It's a break time, right? So I just wanted to remind you. Yeah, yeah. Are there any questions? 01:41:55 - 01:42:03 Dr Anand Jayaraman: I love this. It's wonderful. This is 1 of those discussion points. We usually spend a lot of time. So I love all the questions that are coming in. 01:42:03 - 01:42:14 Dr Anand Jayaraman: I don't want to stop for a break if there's a question. If there's a question, let's address that and then we'll go for a break. But I won't do any new slides now. Any questions? Professor, can we have your email please? 01:42:16 - 01:42:30 Dr Anand Jayaraman: Absolutely. I think it should be there in the first slide of the text. You should get the text soon. I apologize for the delay. But you will have it. 01:42:30 - 01:42:47 Dr Anand Jayaraman: And the first slide will have my email address. Thank you. OK, let's take a 10-minute break. So just please look at your watch. It's, for me, 8.05 India time, so 8.15. 01:42:47 - 01:42:51 Dr Anand Jayaraman: And John, please adjust that to your time zone. 01:42:53 - 01:43:15 Speaker 2: Professor? Yes. Hi, this is Bhaskar here. I have a question which is not necessarily related to the current 1, this 1, but there is a business problem at some of my work, and I wanted to check whether deep learning is the right 1 to use it or something different. So can I send an email to you with respect to that and maybe take your guidance on that? 01:43:15 - 01:43:16 Speaker 2: Would that be okay? 01:43:17 - 01:43:44 Dr Anand Jayaraman: Yeah, we can we can chat about that. Let's towards the end of the class, you know, just describe in 1 or 2 sentences and then I'll tell you whether deep learning is needed or not. Actually, I can tell you right away, if the data is something that is there in an Excel sheet, you don't need deep learning. If it's structured data, you don't need deep learning at all. 01:43:47 - 01:43:58 Speaker 2: Okay, Yeah, it is unstructured data and something related to the IoT sensors coming in the data coming in from there. And that is the reason I was checking. Sure, we can talk about that at the end of the session, 01:43:58 - 01:44:03 Speaker 3: end of the class. Yeah. Thanks. Yeah, thank you. 10 minute break. 01:44:03 - 01:51:15 Speaker 3: Transcribed by https://otter.ai you you you you you you you you you you you you you you you you you you you you you you you you you you you you You 01:51:48 - 01:52:18 Dr Anand Jayaraman: Hello everyone. Hope you got a chance to get a cup of coffee or something. We have 1 more hour and We'll wrap up the neural networks session by that. Any questions? Are people back or people are still enjoying their coffee. 01:52:22 - 01:52:24 Speaker 8: Professor we are back. 01:52:30 - 01:53:14 Dr Anand Jayaraman: I forgot to put my headset. I wasn't sure which way I put it if you are speaking earlier. Okay, so what we have done is we have introduced a neural network right and there are and this is just 1 type of architecture for a neural network right. As I mentioned before right there are the dense neuron is where there are connections coming from all the neurons in the previous layer. This is a dense neuron. 01:53:14 - 01:54:09 Dr Anand Jayaraman: But then there are other neurons where you know you don't have connections from all of the neurons in the previous layer but only from some selected neurons, right. Instead of having these 5 connections it might have only 3 and there might be another neuron which might have connections to the other 2, right. Those kinds of neurons also exist and those kinds of neurons are, you know, those are these kind of localized connections that are there, is where we will talk about CNN or convolutional neural network. Over here is feed-forward neurons, right, where information from 1 layer is moving to the forward layer strictly without any feedback. But there are also neurons where you have feedback coming in. 01:54:12 - 01:54:53 Dr Anand Jayaraman: And those kinds of neural networks are called as recurrent neural networks. And those kinds of neurons are useful in time series type of problems or language problems where my nets output is dependent on what was the previous output. The feedback is needed. The way I'm talking, I pause at 1 word and usually because you know I've already said a bunch of words. Now the next word needs to be grammatically correct and also the meaning should be correct. 01:54:54 - 01:55:20 Dr Anand Jayaraman: And so the next word, the output depends on what was said before. And that kind of things can be modeled with the recurrent neural networks. But for a while now, for the next 5, 6 classes, we are going to be working only with dense neural networks, dense neurons, neurons which are connected to all the neurons in the previous layer. Any questions? 01:55:23 - 01:55:25 Speaker 8: You mean feed forward, right? 01:55:25 - 01:55:31 Dr Anand Jayaraman: Feed forward, yeah, feed forward densely connected neural network. 01:55:31 - 01:55:32 Speaker 3: Okay. 01:55:32 - 01:55:34 Dr Anand Jayaraman: Yeah. How do we know whether 01:55:34 - 01:55:36 Speaker 5: the problem is linear or non-linear? 01:55:39 - 01:55:39 Speaker 8: So generally 01:55:41 - 01:55:59 Dr Anand Jayaraman: we do not know. So you try it. You basically try different. If a hidden layer is not even needed, start trying with only a single neuron. And if it's able to get good accuracy, then you know that it's a linear problem. 01:56:01 - 01:56:20 Dr Anand Jayaraman: Right? Unfortunately, in order for us to know whether it's linear or not, we need to be able to visualize the data. And we can, our ability to visualize stops the moment we have more than 2 features. At 2 features, I can visualize. I can visualize the feature space. 01:56:21 - 01:56:46 Dr Anand Jayaraman: But more than 2 features I cannot visualize the feature space. So generally I don't know whether it's a linear problem or not. The only way I know is by trying using a linear model, if I'm able to correctly classify, get high accuracy, then I know that it's a linear problem. If accuracy is very poor, then I know that I'm dealing with a nonlinear problem and I need to have a hidden layer with some number of units. 01:56:47 - 01:56:51 Speaker 3: Well, sometimes they use the QQ plot and things like that. 01:56:57 - 01:57:16 Dr Anand Jayaraman: So there are ways to plot higher dimensional data. Right. But those are all necessarily, it's a projection. Right. It's a projection. 01:57:17 - 01:57:41 Dr Anand Jayaraman: And depending on the view that is chosen, depending on the parameters that are chosen, you always have an incomplete view of the data. So it's not a guaranteed way of telling whether the problem is linear or not. The easiest way is just fit a linear model. If you get good accuracy, it is linear. That's it. 01:57:45 - 01:58:11 Dr Anand Jayaraman: And in pretty much all problems, we start off with linear problem. You want to try the simplest model possible. Okay. Now, here is a general structure of a neural network, right? Generally, you might have 1 or more hidden layers of neural network like the architecture I'm saying. 01:58:14 - 01:58:57 Dr Anand Jayaraman: I'm for now I'm I forbid you from thinking about more than 1 hidden layer. There is a mathematical theorem which tells you no matter how complicated a problem you need only 1 hidden layer, right. So 90% of problems, 90% of the problems will all be able to satisfactorily address with just 1 hidden layer. There are some very small set of problems where you might want to go to 2 hidden layers, right. But almost never do you want to do more than 1, more than that for any of these standard business problems that you face, right. 01:58:57 - 01:59:37 Dr Anand Jayaraman: So let's for now stick with only 1 hidden layer. But in terms of architecture, this is the way the architecture would work. Now each this again, there's no computation, there's an input layer, this is just a simple pin that's used to connect to other neurons. These each 1 of these layers are computational units and those computational units, these are all sigmoid neurons, right. The activation function, you do not bother changing the activation function, we just keep all of them as it is, right. 01:59:37 - 01:59:59 Dr Anand Jayaraman: Now the only place where you might want to consider anything other than a sigmoid neuron is in the output neuron, right. Depending on the choice you make on the output neuron, the output could essentially be restricted between 0 and 1, which is a sigmoid. 02:00:02 - 02:00:30 Dr Anand Jayaraman: Or if you are, there are situations when you want outputs of some numerical values that exceed this range of 0 to 1, right. How many solar storms do we anticipate in the next year, right. The number, it's a number that can be greater than 1, right? And that is a regression problem. They're trying to get the numerical value estimate out. 02:00:30 - 02:00:59 Dr Anand Jayaraman: And for that, sigmoid will not be enough. So you would use a linear activation function. When you use a linear activation function, it will happily give you any output between minus infinity to plus infinity. So the output layer, You choose the output layer based on what kind of problems you're solving. If you're solving a classification problem, the output activation function will be a sigmoid activation function. 02:01:00 - 02:01:16 Dr Anand Jayaraman: But if you are solving a regression problem for the output layer, you use a linear activation function. That is pretty much the only change, right? But all the hidden layers, you will just leave it as a sigmoid activation function. Is that clear? 02:01:17 - 02:01:18 Speaker 2: But the question… I think 02:01:18 - 02:01:19 Speaker 3: we are not into here. 02:01:19 - 02:01:21 Speaker 4: Binary classification, right? 02:01:23 - 02:01:30 Dr Anand Jayaraman: Absolutely right. It is binary classification. We'll talk about complexities in a bit. 02:01:31 - 02:01:50 Speaker 3: Bhartendu, you had a question. Just coming back to the brain analogy, you're saying most, almost all problems can be solved with 1 hidden layer. How many hidden layers effectively does the processing in our brain use? Because with billions of neurons, I'm assuming there would be thousands of hidden layers. 02:01:50 - 02:01:51 Dr Anand Jayaraman: Exactly. And 02:01:51 - 02:01:55 Speaker 3: why does it need so many hidden layers? What is it? 02:01:55 - 02:02:18 Dr Anand Jayaraman: We'll talk about that. Again that comes in the we've just introduced our hero So now we are talking about more complex plot details. I don't know the hero having an affair and there's somebody else taking photographs and all kinds of complexities are there. We'll get there in a bit. Right now we'll just introduce the hero. 02:02:19 - 02:02:35 Speaker 5: So Professor when it is linear and find a neuron, you are saying the output can be from minus infinity to infinity, right? Correct. Then if all the inputs are 0 and 1, then how would the output can be a kind of a continuous thing? 02:02:35 - 02:03:14 Dr Anand Jayaraman: Good question, because the weights can be anything. So, this neuron, this particular neuron will have w1 times, let us say the output of this neuron is o 1, output of this is o 2, o 3. So, w 1 o 1 plus w 2 o 2 plus w 3 o 3 plus a constant, an intercept. This is what is going to be inside that. Now, for a linear function, this function, it's linear, right? 02:03:14 - 02:03:46 Dr Anand Jayaraman: So, it's just the output is just this, is y, that's the output, right? Depending on what the weights are, you can have a strong negative number or large positive number, right. So, there is no restriction in that sense, but yeah good question, good question. Can you give us a practical example to understand this? We are coming there, we are coming there. 02:03:47 - 02:03:56 Dr Anand Jayaraman: Before soon I am going to start showing you some examples where this, how exactly are we designing a neural network to solve 1 practical problem. 02:03:57 - 02:03:58 Speaker 5: We're getting there, right? 02:03:58 - 02:04:14 Dr Anand Jayaraman: But Professor, in the midst of doing Tomorrow's class, I will actually solve a business problem. Today, we are just going to introduce that the architecture, which is going to be still focused only on architecture. In tomorrow's class, we'll actually solve a business problem. 02:04:14 - 02:04:35 Speaker 6: Professor, to understand better, you said weights, right? Weights, constants are intercept, right? If I'm hoping that tomorrow session, if you can give what the weight means for a given problem statement, that would really help because probably I'm nonmathematical. I'm not able to visualize what is weight, what is intercept, just by graph I'm understanding. Any business problem would really help. 02:04:36 - 02:05:22 Dr Anand Jayaraman: Fine, so here is, let's do a quick 1 then. So I want you to understand for linear regression, right. We'll understand for linear regression, then it's true for all other problems right. When you do linear regression let me see if I can find the empty cars data set yeah I'm sure you have done that empty cars data set with linear regression right or right. So there the coefficient the slope coefficients that you are getting that's we called it a slope when you are considering linear regression In neural network those coefficients we call it as weights that is all. 02:05:23 - 02:05:29 Dr Anand Jayaraman: It is basically numbers that are multiplying your predictors. 02:05:31 - 02:05:43 Speaker 6: So, in exploratory data analysis when we load a data, we say the data frame or dot coefficient if I give it gives me the coefficient of every features, right. Is that what referring to professor here? 02:05:44 - 02:06:18 Dr Anand Jayaraman: Exploratory data analysis is different. Now, exploratory data, let us make sure that we understand these terms correctly, that your understanding and my understanding are the same. I'm not saying you have to say this, I'm just saying, because different people use it differently. I am very very very sorry guys, I unfortunately today's class will exceed the time of 9-10, we will go beyond that. I want to make sure that these points are addressed. 02:06:19 - 02:06:25 Dr Anand Jayaraman: Right now here is a data set. Right. When I am trying to do 02:06:27 - 02:06:27 Speaker 6: you know 02:06:28 - 02:07:06 Dr Anand Jayaraman: summary of this descriptive statistics for example I click and I am trying to do understand what this data is. Labels in the first row, new worksheet and summary statistics. Now it is telling me that I am under understanding the mileage per gallon and the mean value is this and the minimum is this and maximum with this. So, how many data points are this? This is my exploratory data analysis. 02:07:06 - 02:07:35 Dr Anand Jayaraman: I am trying to understand what is there in the data. Right. Now model building is different. In model building what I'm trying to do is I want to say I'm asking the question here is this data right that I'm interested in predicting for example mileage per gallon. I want to understand how is the weight horsepower and Q second related to mileage per gallon. 02:07:35 - 02:08:11 Dr Anand Jayaraman: How will it enable me to predict mileage per gallon? That is model building. Essentially I'm asking, given that, let's say I have a new car that is being designed right. Well this new cars is a 4 ton weight car and it's horsepower is 150 horsepower and the time it takes to go for 1 quarter mile is 15 seconds. Tell me what is the mileage per gallon of that car? 02:08:12 - 02:08:28 Dr Anand Jayaraman: Right, I'm giving this information and I'm asking you to predict that. So in order to be able to do that, what I'm going to do is based on all of this past data, I'm going to understand the relationship between weight, horsepower and q cycle. Right? That is model building. Right? 02:08:28 - 02:08:53 Dr Anand Jayaraman: That's model building. And the way you do model building, as I'm sure you have done before when you did your linear regression is in Excel I go over here, click on this regression number and I click OK and I say that y value, this is the set of numbers that I'm interested in predicting, okay. And what is the x value? These are the values using which I want to predict it. So, these are the set of values with which 02:08:53 - 02:08:54 Speaker 2: I am predicting it. 02:08:54 - 02:09:07 Dr Anand Jayaraman: And I am saying that the first row has labels. So, there is labels and put the output in a new worksheet. I click that. So, it gives me this particular output. And this is the way I do linear regression. 02:09:07 - 02:09:27 Dr Anand Jayaraman: And the linear regression now it is saying, you see this numbers it gave this coefficients, right. Let us go back and color them with some other color. It has these coefficients. I wanted to pay attention to that coefficient. So the coefficients are now, it's saying that mileage per gallon is equal to 27.6. 02:09:28 - 02:10:01 Dr Anand Jayaraman: You see that number, first number 27.6, minus 4.35 times weight of the car minus 0.01 times the horsepower of the car, not times, sorry, plus, sorry, minus 4 point this thing And then plus 0.51 times Q sec of the car. This is what is going to give me this formula that I have here. This is what is going to 02:10:01 - 02:10:01 Speaker 5: give me the, this is what 02:10:01 - 02:10:28 Dr Anand Jayaraman: is going to give me the, this is what is going to give me the mileage per gal. Right, I figured it out now. So for this new car that I wrote here, I can just plug in this, these values into this formula and they'll be able to predict the mileage per gallon. Now, these are coefficients. This is what in neural networks we call it as weights. 02:10:29 - 02:10:34 Dr Anand Jayaraman: What does this coefficient, negative coefficient mean? Can someone tell me? Minus 4.35, what does that mean? 02:10:36 - 02:10:37 Speaker 5: It's a reverse effect. 02:10:39 - 02:11:17 Dr Anand Jayaraman: Excellent. Negative means it's a reverse effect or it is inverse correlation, but you can quantify it even more. For every 1 ton increase in weight, the mileage per gallon will come down by minus 4, right, by 4.35. So if you're previously, what might have been at 25 miles per gallon car when you increase its weight by 1 ton now it will come down by a 4.35. So the weights are or in this case we call it a slope coefficient rate but that's exactly what weights are in neural networks. 02:11:17 - 02:11:24 Speaker 7: Yeah can you say Professor, that the lighter the car is, the less consumption is? 02:11:24 - 02:11:41 Dr Anand Jayaraman: Correct, exactly. That's that understanding we have. See, this does not even know whether we are talking about cars or not. It does not know any physics at all. It looked at the data and automatically figured out that heavier the car, lower is going to be the mileage per year. 02:11:42 - 02:11:58 Dr Anand Jayaraman: Right? It's reverse engineering and it's figuring out the physics from the data, which is the reason why machine learning is very powerful. You don't even need to understand the physics. Look at the data and you can figure out those relationship back from the data. Right. 02:11:58 - 02:12:16 Dr Anand Jayaraman: And so this is what coefficients are and that's what we are over here when we are talking about weights, that's what I'm talking about. It's figuring out the during the machine learning algorithm it will automatically figure out what is the weights of the neurons, right. Essentially it's just coefficients, it's slope coefficients. 02:12:17 - 02:12:23 Speaker 7: I think professor the big problem we have is till now there is no explanation how they figure out, right? 02:12:24 - 02:12:34 Dr Anand Jayaraman: Correct. That's fine. All I'm trying to do is that when you're training, it's getting figured out. That's all. How it is going to figure out. 02:12:34 - 02:12:44 Dr Anand Jayaraman: We'll talk about it. We're getting there right now. We're just introducing the architecture of it. Right. I'm just saying that there will be an algorithm which will figure it out for me. 02:12:44 - 02:12:57 Dr Anand Jayaraman: But we'll come. Thank you, Professor. So it's useful. Yeah. So next that we have to do is the thing is we will talk about the output layer some more. 02:12:58 - 02:13:09 Dr Anand Jayaraman: The activation layer function for the output layer is going to be a linear activation function for regression problems. If I want to do classification problems, what should I put for activation function? 02:13:12 - 02:13:13 Speaker 4: For binary classification. 02:13:15 - 02:13:39 Dr Anand Jayaraman: Sigmoid, Sigmoid is the binary classification. The thing is neural networks are also capable of doing multi-class classification problems. Now how does it do multi-class classification problems? The way you do it is, the output class. So for example, let's start, talk about a clear-cut business example, right. 02:13:41 - 02

Deep Learning Session 2 - transcript-full.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue