CNN Full transcript.pdf

00:00 - 01:39 Dr Anand Jayaraman: You Hello everyone. Hello everyone, so I see that there are already 25 learners. So let's just get started. Let me remind you what we were discussing in the last class. So we were talking about entity embeddings. And so the idea of entity embeddings is that when you have a categorical variable, we already talked about this, neural networks can handle only numerical variables. So, the thing is, actually I should just write this, so with any neural network, the input is numbers and the output is also numbers. That's all it can do. It can't handle non-numerical values. It 01:39 - 02:32 Dr Anand Jayaraman: can't handle categorical variables. So when we have categorical variables, we need to first convert them into numerical variables and then use that. So when we have categorical variables, we talked about one-hot encoding or it has, you have discussed one-hot encoding much before when you were discussing other ML algorithms. I introduced to you target encoding. That's another way of doing it, which is useful when you have large number, high cardinality categorical way. And yet another option, which I think is more elegant, is this idea of entity embedding, which what it does, essentially is it takes every entity 02:32 - 03:25 Dr Anand Jayaraman: in the category, in that particular category and to each of these entities, it assigns a numerical value to it. So, let us say that we are talking about movies and this is the example that we did last time. So I'm going to pick a movie so Avatar and I'm going to this movie Avatar is now going to be characterized in, let's say, 4 different values. It's in terms of whether it's an adventure movie or not, whether it's a sci-fi or not, whether it's a comedy or not, whether it's a romantic movie or not, a drama or 03:25 - 04:12 Dr Anand Jayaraman: not, or musical or not. And 1 way we can do this is we can say this avatar has, I don't know, 90% adventure, sci-fi is, you know, 95% sci-fi. On the comedy front, it probably is like 10% comedy and romantic it's like 20% and so on and so forth. And we can say that this set of numbers therefore is a nice way to characterize our time. Now this way of doing it is what we'd be doing it if we are doing it, if an expert is doing it, who has seen the movie and is able to 04:13 - 04:58 Dr Anand Jayaraman: characterize this movie this way. But in those situations where you don't have an expert who's actually able to manually do this when you have a huge set of movies, the number of movies in your data set could be as large as 30,000 different movies. And what neural network does is that it allows you to look at the data and it's automatically able to determine a set of, all you need to do is you say that I want to embed this movie, each of the movie I want to embed in, you know, 10 dimensions. So, what it 04:58 - 05:42 Dr Anand Jayaraman: does is that for every movie it will come up with a set of 10 numbers in such a way that it's actually able to use this embedding to correctly predict whether a particular user will like the movie or not. So embedding is something that the, what is an embedding? Embedding is every single category, every single entity in a categorical variable gets assigned some set of numbers by the neural network during the process of training. And it assigns these values to accomplish the goal of the neural network. In the example that we discussed last time, which was 05:42 - 06:27 Dr Anand Jayaraman: a recommendation engine example, we wanted to see for this particular user, what are the movies to recommend? So what we did was we had a huge data set where we had the list of users, the movie that this particular user watched, and the ratings. And we had these numbers in this data set. So what we did is that we embedded the user. Each of these users can also be similarly characterized with an embedding, exactly the way movies were embedded. So each of the users were, I believe we used 10 dimensions for the user, for movie also 06:27 - 07:04 Dr Anand Jayaraman: we used 10 dimensions, and Using those 2 embeddings, we sent it through a neural network and we were trying to predict the rating of the movie. So let me see, where did I? So here is a picture of what we actually did on that base class. We were able to get an index, an embedding for… Sorry, do we have anybody from UpGrad here? Because 1 of 07:04 - 07:06 Speaker 2: our mates, Jay, he has 07:06 - 07:07 Speaker 3: dropped out. 07:09 - 07:12 Speaker 2: Apologies for interrupting, but he's been trying for a while. 07:12 - 07:18 Dr Anand Jayaraman: Sorry, the UpGrad person dropped out. Sorry, can you say it again? 07:20 - 07:31 Speaker 2: Yeah, The issue is 1 of our team members, Ajay, is not able to connect to this call. He's not added to the meeting invite, it looks like. I see. 07:33 - 09:37 Dr Anand Jayaraman: Let me see what I can do. Just give me a minute please. Thank you. Sorry, someone had a hand raised. I'm still trying to get that number. 09:41 - 09:46 Speaker 2: Professor if you are available I can ask a question because you are talking on phone I will wait sir. 09:47 - 09:48 Dr Anand Jayaraman: Please please go ahead. 09:48 - 10:27 Speaker 2: Yes so the question here is the image which you showed, movie index and entity index, trying to visualize how does it look like because you said it will be embedded. I'll give an In my mind I have an example where user has the example name, age, gender, rationality and language. Assume there are 5 dimensions here. And the movie has a similar another 5 dimensions here. Like movie name, director, genre, actor and the language. Now, age is a numerical value, gender is a categorical value, where you can do onehearted coding also we can do. Nationalities, probably it 10:27 - 10:49 Speaker 2: is a categorical value, we need to change it to embedding now, right? So again, it is a finite number of lists we can do either. Maybe an entity encoding also we can do or something else. It's a language is also an embedding requirement. Now this collection. How do I translate to an entity encoding professor? That's the question. 10:51 - 11:53 Dr Anand Jayaraman: So the numbers that you're getting, 1 second, I'm just calling Prateek and then I'll. Sure, sure, sure. I'll hold on. You you These guys from UPGRAD are not supposed to be there with us all along the class? Pratik has joined in now. I am 11:53 - 11:55 Speaker 2: here, I am here. Please tell me, please tell me. Hello? 11:58 - 12:26 Speaker 4: Pratik, 1 request and suggestion both together. I would request you to be available 1 of you guys to be available during the session at least for the first 15 minutes to half an hour because people are joining and they may face some issues in joining so I would suggest you to at least be there for the theory classes 15-20 minutes and for the lab classes for the entire session. Thank you. 12:37 - 12:55 Speaker 2: So he has to login on learn platform and then he can see that. He has not been added as a participant to this session so maybe for this particular session he hasn't been added. I'll have to check I mean. 12:56 - 13:00 Dr Anand Jayaraman: Yeah if you can you should have his number if you can call and directly. 13:00 - 13:04 Speaker 2: Well I do not but I'll have a quick check wait So let me check it out. 13:04 - 13:07 Dr Anand Jayaraman: Yeah, wait, I mean someone has the WhatsApp. 13:10 - 13:25 Speaker 2: Professor, the tabular sheet which you have put no professor if you add 5 dimension quickly professor not too much 1 example only a user entity I had only 5 attributes name age gender nationality language same for movie rating only 13:26 - 13:28 Dr Anand Jayaraman: this is 5 numbers 13:30 - 14:01 Speaker 2: exactly same with movie also I have 5 numbers there 5 entities are there yeah so taking my class an example, I have 30 participants here. So we have age, gender, nationality, the language we speak and watch also. So I might watch Tamil, Kannada, English and Hindi. Same way everybody. In the movie, I have a list of movies in Tamil, list of movies in Canada, and list of movies in Hindi and English now. I want to build a neural network. So now I know, based on my learnings now, there is a categorical value, nationality is categorical value, 14:01 - 14:16 Speaker 2: language is categorical in user entity. In the movie entity, movie name is categorical, director is categorical, journal is categorical, actor is categorical, language all are categorical. So, how do I make this problem solved using your algorithm? 14:16 - 14:32 Dr Anand Jayaraman: No, no. So, in this case, right, in this case, all I'm trying to do is we are trying to come up with a recommendation using user and movie, right? Yes. That's what, and the movie, I'm saying that there are every user, every movie has 5 values. 14:33 - 14:33 Speaker 2: Yes. 14:34 - 14:49 Dr Anand Jayaraman: And we are not going to assign the value. We are not going to assign these values. The values, the neural network will automatically, just like the neural network determined these weights, right, when it was training, The neural network will also determine these values while training. 14:51 - 15:16 Speaker 2: Okay. Can I ask 1 more question, professor? Please. Yeah. So the question here is, weights I understood. Now, this is basically neural network, as you said, right? It will support only integers, no categorical. So in this case, my data is all categorical and only 2 fields are numerical so wouldn't that in that case how does it professor just to if you can give 1 simple elaboration that will be helpful no 15:16 - 16:03 Dr Anand Jayaraman: no so this is what so what it will you so last time I actually showed you the code, all you say is, here is the list of movies, here is the list of movies in code, you just say that this is the list of movies that are there, always embed this in dimension equal to 5. We just say that and then similarly here is a list of users and then embed this in dimension equal to 5. And then you say that concatenate it, concatenate this 5 dimensions with this 5 dimension. So, this is list 1, so 16:03 - 16:42 Dr Anand Jayaraman: this is list 1, I am sorry, after embedding the numerical values, this is list 2, so you concatenate list 1 and list 2. So, now here this will be 10 numerical values, 5 of them there, 5 of them there and then these numerical values will get there. Initially, what neural network could do is for every number, every movie, it will give some random 5 numbers and for every user it will give some random 5 numbers just like for every weight here it gives random 5 numbers, sorry random numbers. All of these weights will start off with 16:42 - 16:55 Dr Anand Jayaraman: random and then back propagation, the training process, it will go ahead and try to refine the correct set of weights and correct set of embeddings for each 1 of them. 16:55 - 16:59 Speaker 2: So in this case, 10 neurons are professor, then it is input is 10 neurons. 17:00 - 17:11 Dr Anand Jayaraman: Input layer will be of size 10, but this is only the input layer, input layer there is no neurons. This is the hidden layer, this is where the neurons is and the number of neurons is a hyper parameter. 17:13 - 17:14 Speaker 2: Thank you professor, got it. 17:16 - 17:23 Dr Anand Jayaraman: So this embedding is your responsibility is only specified to the neural network, embed it as 5 numbers. 17:24 - 17:24 Speaker 2: Okay, got it. 17:24 - 18:02 Dr Anand Jayaraman: It will not know the language, it will not know any of them, but it will figure it out on its own. Okay. Thank you professor. Yeah. So, this is the movie embedding. I want to quickly show you some examples of other real world use cases where these kinds of things are used. Although I would not show the code, I I'm linking you to some papers. So here is 1 particular paper, which this is a paper that they, are you able to see my screen and the projection mode? 18:02 - 18:04 Speaker 2: Yes, Professor. 18:04 - 18:59 Dr Anand Jayaraman: So this 1 is basically a Kaggle competition that was organized. Now this Kaggle competition, what these guys are at, this is a, it's about the Rossmann store data. This is a German company and they had about over a thousand different stores. And they have data from, for I think, if I'm not mistaken, something like 2 and a half or 3 years worth of data they had from all of these thousand stores. And they had also talked about whether there was any promotions that was happening or not and each of these stores, the location of the stores, 19:00 - 19:46 Dr Anand Jayaraman: whether it's in a city, which city it was located in and outside the city, whatever. And what was asked of the participants was this, that you need to predict the store sales. So they gave 2 and a half years worth of data of store sales and you are asked to predict ahead, about 6 months ahead prediction. I do not think it was 6 months, maybe 3 months ahead predictions. This is what they were asked to predict and this was again, these are all paid competition and so people who win actually win money. Now, what happened is 19:46 - 20:34 Dr Anand Jayaraman: that the, so these guys published a paper out of it and these guys actually did not win the competition, they came third. But the first and second, first prize and the second prize, went for people who were actually experts in retail industry. So retail industry experts, what these guys did is they looked at, okay, what does Rossmann sell? What are the products that they sell. And these products have seasonality in it. So what I will try to do is that I look at this data and extract some features which indicates the seasonality of some particular products. 20:34 - 21:18 Dr Anand Jayaraman: I can use information on this particular state and in this particular district, there is likely a horse racing that's gonna happen. And so there is those kinds of products that's going to be sold more in that particular area. Details about that particular industry, that particular things were used to create new features by these experts and they were able to make predictions and those are the guys who went won the first and second place, which is again, taking, informing you that having expertise allows you to do something of value, your human expertise is definitely of value. But 21:18 - 22:04 Dr Anand Jayaraman: what these guys did was they had no expertise at all in retail industry. What they did was something completely, I would say dumb, but completely mechanical. What they did is they took every store, this store information, there were 1,115 different stores. And what they did this and then the information that was given in the feature set was the day of the week, the date was given from which you can derive the day of the week, day, month and year and so on, whether promotion was there or not and what state that store was located in. The 22:04 - 22:46 Dr Anand Jayaraman: state itself had 12 values. This is Germany that we are talking about. Now, what these guys did is they took these values and they decided to embed them. So they took the store values, and then they embedded it in 10 dimensions. So every store, they asked a neural network to determine 10 numbers that characterize that particular store. Similarly, the day of the week, they embedded it in 6 dimensions. The day of the month, they embedded it in 10 dimensions. The month itself, they embedded it in 6 dimensions. Each 1 of them, they embedded in a particular 22:46 - 23:29 Dr Anand Jayaraman: number of dimensions. And then they used this data and blindly trained it in neural network. So here is this embedding. This is quite literally the nature of the code that they use. This is the input data, this is the store data. I want to embed this 1115 data into 10 dimensions. This is the day of the week. There are 7 days of the week. I want you to embed it in 6 dimensions and so on and so forth. This is exactly what these guys did. And then they trained the network. And you know what this again, 23:29 - 24:10 Dr Anand Jayaraman: right? So the information from the store is getting embedded, information about the day is getting embedded and so on. You concatenate everything and then you send it through 1 dense layer. They send it through 3 dense layers and finally the predicted output comes out as a numerical value here. So information is flowing in this direction. And based on this, they were able to predict the store sales and these guys came third. They had no idea about retail. They had absolutely no use, no domain knowledge at all about the detail. The neural network automatically was able to 24:10 - 24:53 Dr Anand Jayaraman: figure out the appropriate embedding dimension for the store, appropriate embedding dimension for the date and all that, because we know that dates also have patterns. Sundays, perhaps there's a larger amount of sales that happen. The actual date also has value. For example, every 15th and the last day of the month being 30th or 31st, people typically get paid bi-weekly. And so the 15th and 30th, When they get salary, they'll come back, come to the store, and they are likely to buy a greater amount of things. So there is an embedding for even for the date. And 24:53 - 25:40 Dr Anand Jayaraman: in fact, if you look at the embedding for the 15th and embedding for the 30th of the month, those 2 will look similar in terms of its embeddings. These are kind of things without putting in any domain understanding itself, they were able to derive all of this. And you know what was really cool about this is this, right, when we plotted the embedding, this is the plot of the store embeddings, the state embeddings, because we know that the store belonged to 1 particular state. What they found was that the numerical value of Hamburg, so these city, 25:40 - 26:36 Dr Anand Jayaraman: these states, the numerical values were actually close, the embeddings were close and in fact, they were also actually geographically close. The embeddings is looking only at the data, right? Data of how much sales have happened in the past 2 and a half years. From the data, we are finding that the embedding of these states, the numerical embeddings of the states are, so reflect similar geographical properties as the reality, the physical location of the states, which was interesting. And you will see this exact same point when you start to learn word embeddings in the next module when 26:36 - 26:43 Dr Anand Jayaraman: you start doing languages and so on. Yeah, there was a question. Yes, please. 26:43 - 27:12 Speaker 2: Sorry, professor, again, question. I'll take your biological neuron reference here. As a human, just by going by Hamburg and other places, right, I'm able to understand their close geography, part of 1 state and everything, right? Because as a human, I'm able to identify my biological neurons, able to tell all those things based on some information. Now without passing that to a machine, how machine is able to identify hamburgers is very close and the other states are very close to each other 27:12 - 27:59 Dr Anand Jayaraman: and they all… It's Absolutely, it has no information about geography. We are also able to identify only because we know something about geography. I should admit that I know nothing about geography. I would have failed in trying to spot where each 1 of them are, which is why I didn't even try to pronounce any of them. But looking at the data, this locations, therefore seems to be in a sense, the proximity seems to be hidden there. Perhaps it's indicating that people of this region have certain commonality in that shopping behavior. If you think about it, in 27:59 - 28:27 Dr Anand Jayaraman: India where I completely understand this, south of India there are some festivals that are shared and north of India, there's a different set of festivals that are shared. So if you look at the shopping patterns, you will find that the embeddings of these states are similar because the days on which shopping seems to be increasing are similar for all of that. So the neural network is accepting only that. 28:28 - 28:30 Speaker 2: Got it, professor. It doesn't 28:30 - 28:35 Dr Anand Jayaraman: know anything at all about geographical proximity. There are other reasons why they are coming out to be similar. 28:35 - 28:42 Speaker 2: Got it professor. Thank you. Yeah, I got it. I mean, I can explain. I came from retail. Now I'm able to visualize what I'm trying to do. 28:44 - 28:45 Dr Anand Jayaraman: Yeah, please. 28:45 - 29:14 Speaker 3: I have 1 question. If you go to the back slide where we show the detailed diagram this 1 yes so initially when we get the embedding it just a text right so it does not have any other relationship so it just a initialization but based on the target what we doing right is learning on the go. So if I have a similar kind of business problem, then this embedding makes sense because it has a similar relationship. For a different problem, I have to fine tune again with the different targets. 29:15 - 30:12 Dr Anand Jayaraman: Absolutely right. Absolutely right. Normally, when you sub get go, the embedding is technically tied only to this specific problem. Embedding is typically tied to this specific problem. But what people have found that it works, it has some generality to it. Like the next example that I'm going to talk about, which is basically YouTube, how YouTube makes its recommendations. And this is a paper which talks about YouTube recommendations, published by Google. So what YouTube does is they actually use, there are 2 different neural networks that they use for making recommendations for every single user. What the first 30:13 - 31:02 Dr Anand Jayaraman: input is, First neural network, SIFS looks at millions of videos. And the user history is fed to that. And this 1 outputs about hundreds of videos that might be likely useful for that particular user. And the second more finely tuned ranking system, again, takes this user context. It takes these hundreds, these subset of ones, and uses other information, more detailed information, to now start proposing which set of videos you should show at this particular time. This the current the ranking, the final ranking uses a bunch of information. It uses not just the video features, it might 31:02 - 31:55 Dr Anand Jayaraman: also use the time of the day as well, because you as a user in daytime you might be interested in watching 1 set of videos and in night time you might be watching a different set of videos. I mean, I know it's like a completely different of the unconnected, not unconnected, related topic. I've been ill for the past couple of days. Today, finally, I'm starting to recover. I've been sleeping most of the day. And because of that, last night, I woke up at 3 o'clock and I was watching Instagram. And I didn't know what to do, 31:55 - 32:32 Dr Anand Jayaraman: right? I was looking at Instagram and for some bizarre reason, Instagram was showing me videos on how to get over a breakup. You've just lost, you've broken up with someone you loved a lot and how do you get over it? And for me, it was like bizarre, right? Why the hell is it showing me these stuff, right? Is it predicting I'm going to break up with my wife or something? Like, what is it trying to do? 32:32 - 32:35 Speaker 3: Maybe in your area, many people have this. I 32:37 - 32:38 Dr Anand Jayaraman: think it's 32:38 - 32:40 Speaker 4: a time of the day. 32:40 - 33:21 Dr Anand Jayaraman: Right. It's a time of the day. Right. People have gone through emotional listening, right? And Valentine's Day was just passed recently and it's starting to show me this. So the time of the day plays an important role in what kind of videos it's recommending. And so this is what they do. The previous videos that you watched, they're all embedded, right? And the previous searches that you have done, those are also embedded. Your geographical location is embedded, your age and gender, they're all embedded. All of this is sent into your neural network and from there it's coming 33:21 - 33:44 Dr Anand Jayaraman: up with a ranking system. Arguably this is the largest recommendation system in existence, the YouTube recommendation. So anyway that brings us to an end of the topic of category entity embeddings. We are behind I was hoping to finish it last time but I thought we 33:44 - 34:25 Speaker 4: have 1 question. So now if we consider if you look at different types of correlation methods like we have in the linear side we have PCA and here we have embeddings right. So now my question is how do we understand or how do we measure which 1 of these models are better and which 1 should we go ahead. Is there a metric to measure these things? Like for example, in your case it was showing a video which was completely unrelated, but then it was there was some some ring to it. There was some, you know, reason 34:25 - 34:40 Speaker 4: why it was showing it to you, right? So, but then it was not, it was not good, because it had faltered somewhere that it was not required for you, it was probably recommended badly. So how do we understand, how do we measure all these things? 34:40 - 35:27 Dr Anand Jayaraman: Yeah, so the way you measure it is just this, they fix these algorithms based on user feedback. What happened is after about 10, 15 minutes, I realized what had happened. It was for me again, because I do this for a living, for me it's fascinating problem as well. When it makes a mistake, I'm trying to think about what action of mine might have caused you to make this mistake. After 10-15 minutes, I stopped Instagram and I switched to YouTube and I started watching my usual long form videos on YouTube. Now YouTube here, the system what it 35:27 - 36:12 Dr Anand Jayaraman: tries to predict is it tries to predict the number of minutes that a user is watching and that is what it's actually using to go ahead and correct these embeddings and finally correct its ranking. When you stop watching it you are giving it feedback and it eventually starts learning. I honestly don't know how often these are embeddings are trained but I've got to imagine that they are trained at least once a month or once or twice a month where they start your individual embeddings also change with time, with different usage patterns. Now, there was a question 36:12 - 36:55 Dr Anand Jayaraman: earlier on saying that the embeddings is very specific to the particular problem where it's being used to. That's absolutely right. Embedding is very specific to the particular problem. But what Google claims is that the embeddings that they get from YouTube recommendation, they use in other applications as well. In their normal search applications as well, search and other applications, right. So, they do find that the knowledge that you extract, you are able to extract with this embeddings has some validity, some generality to a user behavior much more than what you would expect purely from from the fact 36:55 - 37:22 Dr Anand Jayaraman: that this was obtained from solving this 1 very specific problem. So 1 thing is this, basically companies all around us are using it. I know a friend of mine is leading this thing in Target, the data analytics team in Target, they are using and I'm sure like you name a company which is selling products they are using. 37:28 - 38:01 Speaker 5: Professor, I have a question. So is it possible that, for example, YouTube or any other application can show this video that you have seen at 3 o'clock in the morning, based in your community? For example, you have close friends or something like that, they are undergoing, let's say a bad situation, but it's kind of a warning to you. So, hey, maybe they, 1 of your community, they get maybe break up or something like that. Is it, is it, Is it possible? 38:02 - 39:00 Dr Anand Jayaraman: Yeah. So it's, I mean, definitely possible, but it may not be in the community per se. There are other ways in which, scarily accurate ways in which you can predict whether someone's going through a tough time in the relationship or not. Facebook talked about this as well, where there's a, based on the pictures that you post, very, very simple metrics through which you can actually tell whether someone is having a happy married life or unhappy married life or completely a disaster of a married life, right. All of that is possible. 39:01 - 39:32 Speaker 2: Professor, I can give an example for what Taru asked. So I work in programmatic advertising. So when you said first question about what YouTube ad which we run, right, so we do digital ads through a platform, Google platform, there's something called attention rate, Attention rate is the 1 we measure, seeing how many minutes or seconds the user has watched the ad. Based on that only, we retarget. That's what we do. I mean, neural network is not the other 1. We do other models to train and say that what should be a target. With respect to community, 39:32 - 40:11 Speaker 2: I can give an example. During summer 2022, you know our India, we have power cuts across the India, right? So we worked with 1 of the client in UP where power cut happened. So we got the data from the the electricity board of Uttar Pradesh and we targeted inverter ads. Exactly we know which postcode goes what point of time your current goes off then we started showing video ads like YouTube or Facebook or any channel, right? So OTP or anything, we started streaming, saying it inverter ad like Sachin Tendulkar coming and saying, okay, you buy luminous 40:11 - 40:36 Speaker 2: inverter. And it is very, very targeting at a specific point of time. So you said, night 03:00 you woke up and you saw a bizarre ad, right? So when the power goes off, anybody who turns on mobile, they'll start seeing in that geography, based on postcode, we start showing inverter ad, right? So it's absolutely possible in digital marketing, or programmatic advertising. And yeah, neural network really works there. Just want to share that info. 40:36 - 40:43 Dr Anand Jayaraman: Right, right. Thanks, Siddhi. Interesting. Shreyas, you had a comment. 40:45 - 41:03 Speaker 6: Professor, 1 question since we have been talking about this weights and the embeddings. So, these are still not randomized, right? They follow a pattern. If I don't, if I remember correctly, I've mentioned it follows a logarithmic graph for changing the weights and testing the model for it, right, while training? 41:03 - 41:42 Dr Anand Jayaraman: So, yeah, so it starts off as random and then with training, each, the values settle down to the correct values, right? The back propagation, so the whole process of training is you know is finally figuring out the correct set of weights right and when I'm saying weights now I'm including I'm including these weights and as well as the embeddings. All of them together are found as a in the back propagation 41:42 - 41:48 Speaker 6: step. But back propagation would be helped if we start with the correct weights, right, Professor Abin? 41:49 - 42:22 Dr Anand Jayaraman: No, so yeah, the thing is, so it would help if you already knew the answer, of course, back propagation would end quickly. But even if it didn't, and that's the whole power of this that we and we will start seeing soon, right, in today's lecture, the material that we're going to start off in today's lecture. We are going to start working with neural networks where we have to get something like 158 million weights. There is no way we are going to start off with correct weights. We are going to start off with a random set of 42:22 - 42:55 Dr Anand Jayaraman: numbers and it's going to take a long time to figure out those correct set of values. And even then, even after you find it out, you don't know whether it's absolutely correct or not. But what we find is that maybe it reached only a local minimum, that's okay, but it's still performing quite well. And so we don't care that it hasn't reached the global minimum. And it's but it's still performing well so we accept it and we move on. Yeah, thank you professor. 42:57 - 43:15 Speaker 3: Professor Sachin here, I just have 1 question on the the recommendation system right so when you started this topic right you said contained emitting is 1 of the way another way or rule ways at all right so any comment on the matrix factorization where people works on matrix factorization do the recommendations 43:16 - 43:25 Dr Anand Jayaraman: yeah yeah I unfortunately I don't want to talk about that now. The QR factorization and all that is too 43:26 - 43:30 Speaker 3: Is the old paradigm or the current paradigm is a contained embedding? That's what I'd like to know. 43:30 - 43:32 Dr Anand Jayaraman: Right. So the people 43:32 - 43:34 Speaker 3: are trying massive factorization and they move to this. 43:34 - 44:28 Dr Anand Jayaraman: No, no. So the thing is What is absolutely true is that best embedding engines are all ensemble algos. So when I say ensemble algos, essentially it means that they are using multiple systems to try and predict your rankings. And so they use all the different methods, each 1 have some strengths, some weaknesses, and they combine them to make a final prediction. So, the, so yeah, Ensembling is really the way to go. 1 of the methods is of course this. This method has some appeal in that once you use are able to figure out this embedding, those 44:28 - 45:24 Dr Anand Jayaraman: embeddings like an organization like Google is able to use in other situations as well and that's where the attraction is. Makes sense. So let's move on on to today's topic. We are behind, but fortunately for us, today's topic has 2 sessions devoted to it. So we are okay. We will be able to catch up on that. Embeddings, I wanted to cover it because I needed to, I know that it's got a lot of use cases in retail industry and many of from business users perspective, it's an interesting topic to cover, so that's why I spent time 45:25 - 46:08 Dr Anand Jayaraman: on that but convolutional neural network is what we're going to be discussing and I have 2 days assigned for it. Sorry, I, 1 of the before, sorry, I know there's a question, but before I, I need to apologize. So the previously, the, I was going to cover after the embeddings, I was planning on covering RNN and LSTM first and then move on to convolutional neural network. But that 1 day class that we canceled required me to reevaluate the order and I'm more comfortable with this order right now. Tomorrow's lab session will also discuss convolutional neural network, 46:09 - 46:15 Dr Anand Jayaraman: right? And then after that, the last lab session we'll discuss the LSD. Sorry, Someone had a question. 46:17 - 46:19 Speaker 2: I had the exact question, so you answered it. 46:20 - 47:22 Dr Anand Jayaraman: OK. Yeah. So essentially, the lab session, otherwise, there would have been a lab session today, and they wouldn't have had anything to cover. So this movement, this change in topics was required because of that change in schedule but nothing you won't lose any material on the long long time. So us talk about convolutional neural network. So before I talk about what is this bizarre name convolutional neural network, let me I want to start talking about identifying and recognizing images. Now, we have discussed this before, Images, the way the images are stored, images are stored as pixels. 47:22 - 48:25 Dr Anand Jayaraman: So here is an image where every single pixel has different shades of gray and that different shades of gray is what is internally is stored as the numbers. So, darker number would have darker, higher values. So, 255 is the darkest shade and as it gets lighter and sorry, did I say no, yeah, I'm completely inverting it. This is the level of brightness. 255 is the extreme brightness and 0 would be the extreme darkness. So those are the values in between represent the different levels of gray. This is the way an image is typically stored. Now, if 48:26 - 49:15 Dr Anand Jayaraman: I want to recognize an image, essentially, I need to supply these pixel intensities into a neural network. Now if you think about a picture which is 1024 by 1024 image. This 1 has 1024 by 1024 that's roughly about a million. Now a normal picture that we take, it's a color picture, right? That's got actually 3 channels, right? There is a red channel, green channel and blue channel, there is 3 different channels. So, which means that the number of information from that you are sending in from a picture into a neural network is about 3 million input 49:15 - 50:07 Dr Anand Jayaraman: data points, right, 3 million input data points. And so your matrix that you are looking at has essentially 3 million features. And here is my target finally, the target is, the first image was this, this first image was Bob, second image was Lisa and so on. Now, to be able to train this algorithm and the 3 million features, you know that my first input node needs to have 3 million input nodes. And then there are multiple dense layers. And finally, this is a multi-class classification. So I have a last output layer with as many neurons as 50:07 - 50:53 Dr Anand Jayaraman: the number of classes in that. This is going to be an extremely large neural network with huge number of weights that need to be in that. This is a big problem. Now, in a neural network, let me, but still even though it's a big problem let me just show you how you go about handling it. So here is for example I'm going to talk about a data set of images of different items of clothing. This is a 1 of those standard data sets people use to train neural networks. This data set is called as the fashion 50:54 - 51:34 Dr Anand Jayaraman: MNIST data set. So essentially different clothing items are images of different clothing items are given and you can train your neural network to try and recognize given a new image that neural network should be able to recognize whether that particular image is of what type of image that is. So here is my entire data set or not entire data set, the sample of the data set. These are the kinds of images that I have given. They are grayscale images. Each of these images is a tiny 28 by 28 pixel sized image. This is just to test 51:34 - 52:21 Dr Anand Jayaraman: out algorithms. So they just gave a smaller subset, smaller sized, the small resized images. In this, there are altogether 10 different classes and altogether 70,000 images are there. So approximately you know 7,000 different shirts and 7,000 different pants, 7,000 handbags and 7,000 shoes, different items are there. 10 classes but 70,000 different images are there. If I want to build this using a neural network, how would I do this? Can someone tell me how I would do this. How would you design this neural network? What do you need? 52:22 - 52:26 Speaker 4: Input would be 28 into 28 number of pixels. 52:27 - 53:17 Dr Anand Jayaraman: So the input node is 28 times 28 is my input node. Nice. And output would be 10 class activation. 10 of them and the activation is softmax. And then hidden layer. So fantastic, so your code will look something like this. Right, you have an input, your 28 by 28 pixels, you flatten it out into 1 vector and then you send it, send it to a dense neural network with 128 neurons. This is the number of hidden layer neurons and the, and the output is 10 and with softmax as the activation function, right? This is the way 53:17 - 54:00 Dr Anand Jayaraman: you do this classification. Now in this case, it actually works reasonably well. The problem is, number 1, it requires all of the images to be 28 by 28, which is not a problem. You can always scale it. But it also requires the images all be centered. Because that's the type of images that are given in the data set, centered images. The images are all centered. So the shoe needs to be in the center of the frame, then it recognizes. And the shirt needs to be in the center of the frame and then it recognizes. Now, a 54:00 - 54:40 Dr Anand Jayaraman: much better way, I mean, again, we are able to recognize shoes and shirts and images of these things in a picture, even when they are not located in the center. It will be nice If I am able to extract, like how do we recognize a shirt, right? For example, if I want to recognize a picture of a shirt, right? I would describe it as something like, you know, a shirt has maybe 2 hands and a body, right? A shoe, a picture of a shoe should have a sole and a covered top. And then a handbag should 54:40 - 55:41 Dr Anand Jayaraman: have something like a rectangular container and something, a line, a structure that's connecting 2 ends of it. Some set of features like that is the way I would describe a handbag or a shirt or a shoes. Now, what people did was that they were actually tried recognizing images this way. When the prior to the current revolution in deep learning that happened, what people are doing is they used a domain expert to identify features about a particular image and then sent those features as 1 of the feature into an image and then had the image run it 55:41 - 56:35 Dr Anand Jayaraman: through an ML algorithm to classify. So instead of sending in all of this 3 million pixels directly, we were sending in information about the features, because sending in 3 million pixels was just like a waste of computational effort. And another reason why we do not want to send all 3 million of them is because in any of our neural networks, where do I have a neural network picture, here. So I put 28 by 28, which is what 784 or something like that, that is a total number of pixels. Now, thus in a traditional neural network, let 56:35 - 57:26 Dr Anand Jayaraman: us say I have these features, let us work with the simple 1, the weight of the car, horsepower of the car, automatic or manual and then number of something, rear axle ratio, I do not remember rear axle ratio and then we are trying to predict mileage per gallon. So 4 features are there. So this is a network which I would have 4 neurons. And here is the output. This is my neural network for this. Now, in this column, in this data matrix, what I would do is I would take the first data point and send it 57:26 - 58:01 Dr Anand Jayaraman: here, right? Weight here, horsepower here, automatic or manual here, the rear axle ratio over here, and then the neural network would go ahead and learn everything. This is the way a regular neural network works. Now, does the order in which I specify the features actually matter? Somebody else got this data set and let us say they arranged this data set differently, automatic or manual first, horsepower, then rear axle ratio and then weight, they just arranged it in alphabetical order. 58:01 - 58:02 Speaker 4: It does not matter. 58:04 - 58:49 Dr Anand Jayaraman: You train the network with that particular thing and then it will go ahead and do that. So, in all of your neural network, the order of your features is given no importance at all. It does not give any importance at all. Now here, however, this 28 by 28 pixels that you have, These 2 pixels are near each other. These are all near each other. This pixel is very far from that pixel. There is a meaning to the location of those pixels. But when I flatten it out like into this, I am throwing away all the meaning. 58:51 - 59:26 Dr Anand Jayaraman: Agree? I'm throwing away all the meaning. I'm talking about the location of the individual location of the pixel. I am not talking about the relationship between 1 pixel and the other. Whereas in our identification of an image, that is extremely crucial. The relationship between 1 pixel and another is extremely crucial. Now the reason If I just showed you this 1 pixel value alone, you wouldn't have been able to guess that it was part of an eye. But when you see the relationship with respect to other ones, now you know that this is an eye and this 59:26 - 59:59 Dr Anand Jayaraman: is an eye of a person. The relationship between the pixels matter. Whereas, when I put this in and when I flatten it out, I am just putting this array of pixels next to that array of pixels and so on. And I am putting it in a neural network like this. I have completely thrown away location information about the pixel and I'm asking the algorithm to go ahead and recognize which category it belongs to without using the location information. 01:00:01 - 01:00:46 Dr Anand Jayaraman: Looks like I have thrown away 1 of the most important factor, 1 of the most important feature that is needed to recognize images. Do you agree? The location of the pixel is, the relative location of pixel is lot more important than the actual location of the pixel. Everyone agrees? That is what I meant when I am saying, see when I am looking at absolute location of the pixel, and it matches this, you'd recognize this image, it might recognize as a person's face. But it will say that this might be, I don't know, what? 500 and, sorry, 01:00:46 - 01:01:36 Dr Anand Jayaraman: 453rd pixel. And this 1 might be 475th pixel. It might recognize that 453rd pixel and 475th pixel are eyes, but that will work only if the image is centered. If the image is moved, shifted even by 2 pixels to the side, the algorithm will no longer recognize it because it is looking for 453rd and 475th pixel to represent the eyes. Whereas it is only the relative location that matters a lot. Are you able to understand my point? If it is a relative reference, Then we need a reference point. Exactly. So, which is why I'm saying this 01:01:36 - 01:02:23 Dr Anand Jayaraman: way of recognizing this, even though it works well for this test set, it works only because they are all centered, right? A better way of recognizing these images is if there is a way to extract features about that particular image that we are seeing. Features like it has 2 hands or this thing, or there is a square region and a circular region that is holding it for a handbag and things like that. If we are able to deal with features instead of pixels directly, then that machine learning algorithm is likely to do a better job. Does 01:02:23 - 01:03:16 Dr Anand Jayaraman: that sound like a reasonable hypothesis, reasonable starting point for us? Can you repeat that professor? Sorry, I missed that. So I'm saying till now, we were doing machine learning in a way that is completely agnostic about the relative position of each of these inputs. We looked at each of these columns as completely independent of each other. Right? So changing the order of these columns will have, should have no relevance to the outcome, because we are looking at structured data. But in the unstructured data that we are looking at now, which is images, There is a great 01:03:16 - 01:04:18 Dr Anand Jayaraman: relevance to the each of the pixel information, the location information is in fact key. And if we Provide the neural network, provide this pixel information directly to the neural network, we are deliberately ignoring the relative location information. So what I'm suggesting is this, going forward, it will be nice if to the neural network, if you don't give the pixel information directly. Instead, we will identify from the pixel, we will identify some important features, some important features about the image. And that important features is what is going to be provided. So this is sort of handcrafted important 01:04:18 - 01:05:07 Dr Anand Jayaraman: features about the image. Does the image have a face or not? Does the image have a body or not? Or different kinds of handcrafted image features we will create and then we will provide this into a neural network and the neural network can be that. This must be reduce the dimensions as well. It will definitely reduce the dimensions. It will definitely reduce the dimensions. The problem is this is extremely hard to do and it requires a domain expert for solving each and individual problem. But this is what the entire field of computer vision was doing. They 01:05:07 - 01:06:01 Dr Anand Jayaraman: were looking at images and trying to extract, they designed different filters, which extracted features from an image. If you use Photoshop, Photoshop has edge detection feature, which detects edges in the image. That is a feature. How many edges are there? As in not the border of the image, but which looks at only the places which have the highest contrast in the face. Those are the edge detection algorithms that it runs. And basically mathematicians designed different filters which extracted some useful features about the image. And then they provided these features as input to the algorithm and tried 01:06:01 - 01:06:51 Dr Anand Jayaraman: to do a recognition of whether it's a cat or a dog or whatever. So a filter extracts the presence or absence of a feature in a location. And then they from this, they created a second level features and these features are sent to an ML algorithm and this is the way they were building vision algorithms. Problem was that the accuracies that they were getting were not really encouraging. Computer vision, I mean, for the longest time we were talking about creating self-driving cars. Long, long time, we were talking about creating self-driving cars. 1 of the key aspects 01:06:51 - 01:07:45 Dr Anand Jayaraman: of self-driving cars is figuring out the problem of computer vision. But the state of art was so bad, so bad that even detecting cat or whether a picture is a cat or a dog was a challenge. The kind of accuracies that they were getting using the methods that were available until then was actually quite sad. Was actually quite sad. Now how does I detect anything? The way our visual cortex detects any images, And again, these are all people, again, a lot of our neural network progress has happened because of our trying to understand how nature has 01:07:45 - 01:08:40 Dr Anand Jayaraman: created vision. And so there were experiments that were done from 1950s and 1960s where they tried to understand how does our visual cortex recognize images. What they found was that in the visual cortex there are cells that typically fired when it started when it saw a signal of 1 particular angle right, light in this particular angle will trigger 1 set of neurons, light in this particular angle will trigger a different set of neurons, light in this particular angle would trigger a different set of neurons right. They found that there were simple cells which recognized basic patterns 01:08:41 - 01:09:32 Dr Anand Jayaraman: in the images and those fed on to the next level of cells which now recognize more complicated patterns and the final 1 recognize even more larger complicated patterns and so on and so forth. I already mentioned this hierarchical development, hierarchical feature development was noticed in our study of the visual context. This is what we used as an inspiration when we introduced deep learning. Now again, the order in which topics are being covered is not exactly the order in which topics were developed in a historical site. I did mention the vision example even then, But now we 01:09:32 - 01:10:23 Dr Anand Jayaraman: are going to handle vision separately in more detail in this class and in the next class. Now, what we are going to do is this, we are going to do deep neural networks. The part, a deep neural network with local connectivity, okay. Local connectivity, what do I mean by local connectivity? Here is the image that we had. I want the deep neural network, the network to recognize that all of these pixels are neighbors. And all of these pixels are neighbors. And all of these pixels are neighbors. I want a way to input these pixels into my 01:10:23 - 01:11:02 Dr Anand Jayaraman: neural network in a way that respects this local arrangement of these pixels. We don't want to throw away that information. That's my first goal. Second goal, again, is I wanted to undo this automatic hierarchical feature recognition. And we saw, even in structured data, we saw that this was happening automatically. Once you set up the problem correctly, all of this happens automatically. And then the most important thing is I want to make sure that I have a more manageable number of feeds. If you don't have a really large number of weights, I want to have a more 01:11:02 - 01:11:55 Dr Anand Jayaraman: manageable number of weights. This is our objective and we will talk about how we need to modify our neural network, the way we are approaching the problem to accomplish all of this shortly. It is 7.45 India time in my watch, Can we take a 10-minute break now and then we'll pick up when we come back. 10-minute break is an appropriate mistake. I will start talking about history of what happened, how did people arrive, different innovations that happened and then I'll talk about what exactly is a convolutional neuron and so on. 00:01 - 00:43 Dr Anand Jayaraman: 2000s, it's right in the beginning part of the century, that it was identified, recognized that computer vision is an important problem to solve. And they knew that the current state of affairs was actually quite sad that we were not actually able to do that. Now, 1 of the things that people try to do in order to improve the state of the situation was that they said, the way it can be done is if we have an open competition with a price money, right? So there was a particular challenge set up, this is called as the ImageNet 00:44 - 01:28 Dr Anand Jayaraman: large-scale visual recognition challenge, where what they did is they got a huge number of images, really large set of categories of images, really millions of images which are downloaded from the web and these were labeled by a human and it was not just labeled as a particular picture and said that this is a bird, that's not what they did. They actually talked about what type of bird that was, right? A human actually labeled all of this. So instead of just calling it a cat, they were talking about is it an Egyptian cat, is it a Persian 01:28 - 02:18 Dr Anand Jayaraman: cat, a Tabby Cat or Lynx or whatever it is. So they had this image, which is extremely rich set of data. There's a lot of interesting stories on how this was actually labeled, how it was manually labeled as well. But, you know, let's, we don't want to go there on that. When you Google, you should be able to find these fascinating stories on this. But the thing was, up to 2010, when these competition was going in full-fledged, the performance was actually of computer algorithms was quite sad. The state of the art as of 2011 was that 02:19 - 02:27 Dr Anand Jayaraman: the machine, an algorithm, wait, am I audible? Am I sharing the right screen? 02:29 - 02:29 Speaker 2: Yes, sir. 02:30 - 03:14 Dr Anand Jayaraman: Yes. Because since I started from the break there was no sound, so I didn't want to finish the entire lecture by sharing the wrong screen or without turning my mic on. So the state of the art up to 2011 was this, a machine when it was trying to recognize these images, it was getting around 30% error. Now, what was actually asked of the machine was this, right? It should not only identify it as a bird, but you should identify what type of bird it is. And the way it was given was, if the machine was allowed 03:14 - 04:04 Dr Anand Jayaraman: 6 guesses for every image, and if the correct category was in any of the top 6 then it is considered as having gotten it correct. Right and Even with that kind of more elaborate scoring mechanism, the error rate it was getting was close to 30%. How does this stand out against human performance? A normal human being like us, we would have around 10% error. An expert would make around 5% error. That is a state of the art. So the machine was like way behind even a less skilled human being, in being able to recognize these images. 04:07 - 05:11 Dr Anand Jayaraman: Now what had happened was suddenly in 2012, a group from Toronto, they introduced this algorithm, which dramatically improved the performance. Until then, what was happening was every year, the performance was just improving by couple of percent, 1 or 2 percent. So from 32 percent, it fell to 30 percent. Prior, it was 34 percent to 32 percent, and so on. Just gradual, incremental improvement was happening. But suddenly in 2012, this group from Toronto reduced the error by half to 15%. You should know now that this was a group that was in, this is Jeffrey Hinton's group. You 05:11 - 05:51 Dr Anand Jayaraman: remember this name, the guy who's made huge contributions to deep learning. So it was his group and 1 of his students, a PhD student, Alex Krasinski, he was the main author of that particular paper And in fact, on his name, this particular network was called as Alex. It's referred to as AlexNet. What happened after that was, in fact, initially when the paper was published, there was doubts as to whether they had access to the test data, whether they were cheating or anything. But then, enough work was done later on and people realized that they were not 05:51 - 06:34 Dr Anand Jayaraman: cheating, it was really an incredible accomplishment what they had done and Google in fact went out and hired the entire team, right, the Jeffrey Interns entire group moved to Google. Now this was published paper so the entire, the part of the competition was that you need to put, everything needs to be open, there's no, you can't talk about proprietary algorithms, everything needs to be open. This was a state of affairs in 2012, right, remarkable change had happened, like sudden big inflection point. Now next year, 2013, what happened, other people took this AlexNet and there were people 06:34 - 07:22 Dr Anand Jayaraman: from New York University. And now they started using a more powerful network, more powerful computers, just scaled it up and reduced the error even more. Nothing earth-shattering. I mean, they did make some innovations, but nothing earth shattering. Moved from 15% error, the error moved out to 11% error. The following year, this is 2013, the following year, 2014, Google, remember, people from Toronto, many of them moved to Google by then. Google now released its own algorithm called Inception. I don't know whether you remember the reason why it was called as an Inception Net was that was the 07:22 - 08:05 Dr Anand Jayaraman: year that movie Inception came out. There's a movie, it's about a dream within a dream within a dream kind of thing and there's something about this network design which is sort of reminiscent of that as well. So they call this as Inception net and they reduce the error to now 6.7 percent. Now you've already beaten the less skilled human being And you're starting to approach a human level. Interestingly, that particular year, Google Net did not win the competition. That was number there was another competitor, VGG net, which did better. There are 2 problems in that. 1 08:05 - 08:46 Dr Anand Jayaraman: is identification and other is localization, right? And these 2 share 1 of them, 1 in identification, other 1 figure in localization. We'll talk about details of this in a bit. I know there is a question there, we'll come there in just a second. This was 2014. 2015 what happened was Microsoft released its particular algorithm which is called as ResNet which basically crushed everything else. Now you're at a competition, you're at a level of 3.6%. So basically these guys are, the machine is able to do much, much, much better than any expert human being. So human beings 08:46 - 09:40 Dr Anand Jayaraman: are essentially crushed, the machine has taken over the crown. What is remarkable is all of these algorithms is right now available in public. You can log into your Jupyter notebook on Google collab and download the ResNet, the trained algorithm and start you can start using it for building your own other applications which is what many of them do right now. What we will talk about is how did all of this happen, What is the core innovations that happened which made all of this possible? Tesla was doing their own independence and their accuracy was much higher. Right. 09:42 - 10:01 Dr Anand Jayaraman: So the thing is, I don't know whether Tesla was competing in the ImageNet. These are the ImageNet results. It's possible that Tesla would have developed some which might not have been part of the listing. But people were, the names that I'm giving were the recognized winners of that particular competition. 10:02 - 10:07 Speaker 2: Tesla's achievements are not available to the public and they were not competing. 10:07 - 10:18 Speaker 3: And actually Tesla was trying to create a whole system, they are not competing in this particular benchmarking of computer vision algorithm. I think professor is correct over here. 10:18 - 10:24 Speaker 2: And the major discovery which made public from Tesla is the GPU. 10:28 - 10:45 Speaker 3: Yeah, you are talking about this inception why it was named as Inception because I think Google has introduced a new concept that is Deep Dream and that is the reason they have tried to connect the Deep Dream the concept with the Inception and they have named their thing as Inception V4 because you remember the Deep Dream architecture. 10:46 - 11:40 Dr Anand Jayaraman: Yeah so we will talk about that in the next class, right. Many of these standard architectures, we will talk about that in the next class, right. Today all I'm hoping to accomplish in the next hour or so of the class is introduce to you what is convolution and we will show how it's actually achieving. So let's now just step back from neural networks itself. And we will talk about this process called convolution. Now convolution is a mathematical process which is applied on matrices. People were using convolution to take an image and do modifications of images. You 11:40 - 12:23 Dr Anand Jayaraman: have an Instagram, you take a picture and you apply a filter to Instagram, that is a convolution. Some manipulation of the pixels is happening. And the way it is done is through essentially some matrix multiplications are happening. And that type of matrix multiplication is called as convolution. Let me show you what a convolution is. So let us see, imagine that there is this particular image. To make things easy, I am just going to put ones and zeros in the image instead of numbers all the way going from 0 to 245. I'm just making a simple image. 12:24 - 13:07 Dr Anand Jayaraman: So this is an image which is 1, 2, 3, 4, 5, 1, 2, 3, 4, 5. So a 5 by 5 pixel image. To this image, I am going to apply a convolution and the convolution can be of, you need a second matrix, which is a filter, which defines what type of filter you want to apply on to that particular image. For now, the convolution that we are going to choose, the filter that we are going to choose is a 3 by 3 filter. And so here is the values of that 3 by 3 filter. So 13:07 - 13:54 Dr Anand Jayaraman: let me first define to you what is a convolution. I am going to take this particular image and remember this values that are there and I am going to apply on to that image. The 3 by 3 was applied to that original image. You see these numbers 101, 010, 101. Those are being applied there. 101, 010, 101. That's being applied to that image. Now, what you do is a matrix, the cell by cell multiplication, 1 times 1, 1 times 0, 1 times 1, 0 times 0, so on and so forth. You do a cell by cell 13:54 - 14:17 Dr Anand Jayaraman: multiplication and then add all of them. Now, this number becomes 0, this is 0, This is 0, this is 0, this is 0, this is 0 and what is left is this 1, this 1, this 1 and that 1. When you add all of them, I get the number 4. Are you comfortable with what I did? Just now. This definition is 14:17 - 14:19 Speaker 4: very simple. How is the 14:19 - 14:55 Dr Anand Jayaraman: filter selected, just random filter? Yeah, right now, let us just say that I chose some image, some filter I have chosen. How do I select them? We will come there. Right now, I am defining a mathematical operation. This part I just did this multiplication and added it up and I got 4. Next, what I am going to do, I am going to take this matrix and move it to the next 3. And then again, I am going to repeat the same operation I did. Now, when I add it, I get 3. And I am going to 14:55 - 15:13 Dr Anand Jayaraman: repeat it again. Now, when I add it, I get 4. And I keep moving it to the next row and repeating it, next row repeating it and now I what I have this is the output of my convolution. 15:15 - 15:20 Speaker 2: So the size of the convolved feature is the same as the filter itself? 15:21 - 15:28 Dr Anand Jayaraman: No, no. So the size of the convolved feature is the same as the filter itself. No, no. So, the size of the convolved feature will be less than the original matrix. 15:29 - 15:31 Speaker 2: No, It is same as the filter size, correct? 15:31 - 15:48 Dr Anand Jayaraman: It will not always be the same as filter size. It will be less than the, this thing. The mathematical, so imagine this, if you had this matrix was bigger by 1 more this thing. 15:48 - 15:50 Speaker 3: Right, right, got it. 15:50 - 15:52 Dr Anand Jayaraman: This would have also been larger by 1 more. 15:52 - 15:53 Speaker 3: Got it, Professor. 15:54 - 16:38 Dr Anand Jayaraman: So it would not be same as the, the filter size is determining that, But it is not that simple. It is an easy mathematical formula, but right now I do not want to confuse you with that. But the convolved feature is going to be less than the original image. This is called as a convolution. Professor, is this the dot product? It is not the dot product. This is specifically called convolution. The mathematical formula is this. We did not study this in high school. This kind of learning 1 portion of the picture at a time. You are 16:38 - 16:55 Dr Anand Jayaraman: exactly what you are doing is you are taking 1 region of the image and doing some kind of weighted averaging of that pixel. And then moving to a different region, doing a weighted averaging of that pixel, and so on and so forth. That is what you're doing. 16:59 - 17:01 Speaker 2: What is a typical use case of this? 17:01 - 17:42 Dr Anand Jayaraman: We will see, we will see. You will see that in a second, just a few seconds, right? You will see the use case of that. Now you see how this convolution kernel is going to be moved on the images, right? This is my original image. This is my convolution filter. I move it, you know, 1 pixel at a time, and then I'll get my final convolved image. Okay, this 1 effectively you can think of it as some sort of local operations on pixels of a particular region. That is what it is. Now, once you understand the 17:42 - 18:33 Dr Anand Jayaraman: mathematical operation of convolution, Then you will understand the use case of it. We can come to the use case of it. Here, for example, is an image. This image, what we are going to do if I apply this convolution, this is the filter I am going to use. 1 1 1 0 0 0 minus 1 minus 1 minus 1. If I apply this filter on this, look at this. There's 1 on the top. There is minus 1 on the bottom. Middle row is 0. You know what happens? You get this image output. Essentially, it detects horizontal 18:33 - 19:06 Dr Anand Jayaraman: edges. It took this image and now it is giving an output wherever there is horizontal edges. On the other hand, if I put 11100000 minus 1 minus 1 minus 1, this 1 detects vertical edges. It lights up whenever there is a vertical edge in the image. Do you see that? If not, let me actually take you to this web page. 19:09 - 19:10 Speaker 4: Yes, Ravi sir. 19:12 - 20:05 Dr Anand Jayaraman: This here, there is for each action, there is a particular filter. So if you want to sharpen an image, this is the convolution filter you apply. If you want to blur an image, this is the convolution you apply. If you want to, this is the left sobble, so left edges, right edges, right? And if you want to get an outline of an image, right? You see that these are all what used to be Photoshop filters. You remember this, not the current version of more sophisticated Photoshop, The old Photoshop, these were the filters that were available. What 20:05 - 20:23 Dr Anand Jayaraman: was happening, the way the image, it was accomplishing this was through a convolutional filter. This convolutional operation was applied to an image and that's the way that particular image got that shape. Is the idea clear what a convolution does? 20:26 - 20:27 Speaker 4: Yes, sir. 20:27 - 20:39 Dr Anand Jayaraman: Based on the numbers that are there, it will do some kind of weighted averaging. Yes, go ahead. So, basically, this is 20:39 - 20:44 Speaker 2: an operation that was existing in the computer vision. Correct. 20:45 - 21:15 Dr Anand Jayaraman: Yeah, this was 1 of the most powerful operations that as part of the computer vision toolkit. Lot of design engineers, each of the design engineers, they want to come up with their own proprietary convolutional matrix, which will accomplish some particular task. Taking this cross edges, detecting cross edges or vertical edges or something. They used to play around with these numbers and they used to accomplish something. So 21:19 - 21:27 Speaker 3: this is also part of the EDA what we do correct because when we do an expert data analysis on images the same functions also we use. 21:28 - 21:35 Dr Anand Jayaraman: So they used to do that before Now you don't need to do this. I'm coming to that in a second. 21:35 - 21:50 Speaker 4: Professor, 1 quick question here. Now, you explained us auto-encoding earlier, right, the neural network, where you give 1, say example, an error image and you want to auto-correct it. How does that get related here? Because this is all connected. 21:50 - 22:26 Dr Anand Jayaraman: There, I pulled a wool over your eyes. I started talking about images without telling you what type of neural network was used to convert images. In that 1, noise correcting, auto encoders, and all that. Convolutional neural networks is what is used, not your regular neural network. Okay. You go back to the slides there, you will see CNN there. At that time, I just, you know, wave my hands and said, you know, you are not looking at this, you are not looking at that. 22:27 - 22:35 Speaker 4: Okay. So, the moment you explained this, I was thinking autoencoder also you showed where I can build a neural network where it can correct the noise. 22:35 - 22:50 Dr Anand Jayaraman: But I did not tell you that this is part. Now, after we finish this, you will see that it was an autoencoder which used CNN, not your regular, not your dense neural networks. 22:51 - 22:52 Speaker 4: Thank you. 22:52 - 23:35 Dr Anand Jayaraman: Very good. Any other questions? So this is a convolution. Now what are we doing with this convolution? What we are going to do is just, I am just going to come to the final point. And then I will come to do the next set of steps. What we will do is previously, so these convolutions are essentially extracting information about the images, different kinds of images, how many vertical edges are there, how many horizontal edges are there and so on. Each type of convolution, each type of 3 by 3 filter, convolution filter. It need not always be 23:35 - 24:13 Dr Anand Jayaraman: 3 by 3, it can be 5 by 5 or 4 by 4 or of any size. But each of that filter, that convolution matrix is extracting some feature about the image. Right? So, you remember I started talking about MNIST, the problem of where I'm trying to recognize whether it's something is a shirt, something is a shoe or whatever. What we can do is if you are able to recognize in the image, what are the hands, what are the edges, what edges, this edges are likely to be hands, these edges are likely to be body or something 24:13 - 24:52 Dr Anand Jayaraman: directly or indirectly, Then it will make the job of a neural network easy. So what we will do is instead of to the dense neural network sending in the original image, we will send the output of the filters that's coming in to the neural network, to the dense neural network and we find that when you do this, it recognizes the images in a better way even when it is not centered. That is the finding. So these outputs are from different filters, right? Each of them are going to be from different filters. I will take an image, 24:52 - 24:57 Dr Anand Jayaraman: apply different filters on it and send the output of the filters into a dense neural network. 24:57 - 25:09 Speaker 4: So professor, you started the conversation saying biological neural network, right, where visual cortex were used, right, with the visual cortex principle, it is applied here, right. So now, 25:09 - 25:19 Dr Anand Jayaraman: I will make that connection, I will make the biological connection. I am just telling the final point, but I have not yet told you how I am going to get here. There is a lot of details that are there. 25:20 - 25:41 Speaker 4: Yeah, so my question is, before jumping and forget it, how does it builds automatically? See, first step is about synchronization. Second is composing everything, right? So how a neural network can understand, hey, this is a shoe where I broke it into multiple pieces and then creating an outcome. Right. So that's what I'm trying to get to. I'll wait for it. I'll wait for it. 25:41 - 26:09 Dr Anand Jayaraman: There is no answer to that. Let's not, it won't be satisfactory for you because you're actually using the words like how does it understand, you're giving it a personality. My answer would be it's following gradient descent. This thing is the law of the universe follows some mathematical rules. Right. We'll get there a little bit later. 26:09 - 26:10 Speaker 4: I'll wait there. Yes, sure. 26:10 - 26:54 Dr Anand Jayaraman: We'll get there. We'll get there. Right. This is where we are going. Okay. Now let's get back to convolution. Now, convolution, I talked about convolution in the case of a single black and white image. But in reality, it is a single image that you have, which is a color image, will actually have 3 different layers. It is not a flat matrix, but instead a volume. 3 layers, 1 is a red layer, 1 is a green layer and 1 is the blue layer. If you are using the RGB representation, there is also other types of representation. Let 26:54 - 27:43 Dr Anand Jayaraman: us not go there. But whatever be the representation, the input image is not 2 dimensional. Instead, it is actually 3 dimensional in terms of storage. So to convolve that, they actually need a 3D kernel. So what will the output be? Even a convolution in 3 layers. So this is my original image. This is the filter, right? This filter, right? This, this layer will apply to this matrix here. This layer will apply to this matrix here, and this 1 will apply to that matrix here. These will all apply there. And then you add all of these, add 27:43 - 28:37 Dr Anand Jayaraman: all of these, add all of these, and together you add everything and you get a single number. So even a 3D convolution matrix will give you a 2D output. Okay. I will let that sink in and then I will show you an animation. So what happens is here for example is 1 3D object with multiple pixels that are there, here is a filter, which is also 3D. And this 3D filter, let me go back. This 3D filter is being moved 1 at a time and it's getting added and producing 1 pixel at a time. And this 28:37 - 29:19 Dr Anand Jayaraman: is the output of that particular filter. This is the next filter you're taking. So the same image you're applying the next filter and that will be the output of that particular filter. This is the next filter. Each filter so typically you take an image and you apply multiple filters on it and you collect the output of all of those filters and you create a new filtered structure. You took some image, this filter might be edge detection, this might be blurring, this might be another thing, different filters are there. You just store separately the output of all 29:19 - 29:27 Dr Anand Jayaraman: of those filters in this separate structure. That is what is done here. Professor Gopesh here, so 1 question I have. 29:28 - 29:36 Speaker 2: So, Is it mandatory that we have to have all the 3 layers of similar filters? 29:38 - 30:17 Dr Anand Jayaraman: Similar values of filters? So in the filter, so let us go back to this 1. In this filter, the numbers on the 1 layer can be very different than the number of the other layer. The filter that's applied on the red layer can be different than in the green layer versus the red layer. Right? Those numbers are all different, can be different. Okay. Yeah. We have defined only the multiplication, the convolution operation. The actual numbers can be determined and you as an artistic creator can create any type of filter you want. If you remember, 1 of 30:17 - 31:06 Dr Anand Jayaraman: the first things that Instagram did, Instagram guys started as another social network, exactly as a competition, that was a time when like every new startup which you heard was a social networking idea, It was considered very hot and all that. Instagram guys created another social network. And you know what? It was just 1 of the other dozen of social networks that was coming. That guy, he went on a vacation, a beach vacation. And then he found 1 particular filter, which gave this very interesting appearance to the image. Essentially, he found a set of weeds that gave 31:06 - 31:40 Dr Anand Jayaraman: a very interesting appearance to the image. And he released that filter. And you know what? Basically, overnight it took off. Because it made the image, you take a regular picture and it made the image look interesting. And in fact, for a while, that was actually called an Instagram filter, because there was only 1 filter at that time. It wasn't like you had... Now, of course, you have a huge amount of filters and like all that right and in that 1 thing just took off and then in his mind it became very clear that okay the way 31:40 - 32:21 Dr Anand Jayaraman: I'm going to be distinguishing myself from Facebook is that this is going to be an image only platform and in that we do many of these image filters and so on, because you have to find your unique spot in this huge landscape of different social networking. So this you get to choose what set of weights you want to create a particular output that you want. Here, what I'm showing you is how we will do. We will take this 3D image, which might have, normally it has only 3 layers, red, green, this thing, but I'm just showing 32:21 - 33:04 Dr Anand Jayaraman: you this structure shows multiple layers. But this is 1 filter. I'm not implying that all numbers are going to be same. Numbers can be different. This is 1 filter, this is different number of filters. We will generally, what we'll do, we'll take that image, apply multiple filters on it, and create a new 3D structure with variety of images. This original structure might, let's say it has only 3 layers but I am applying let us say 12 filters on it what I will get is my output object will have 12 layers right each layer from 1 filter. 33:04 - 33:11 Dr Anand Jayaraman: Is that clear? Hopefully, this animation should make that clear. 33:13 - 33:14 Speaker 4: Yes, professor. 33:14 - 33:58 Dr Anand Jayaraman: Okay. Now, what we are going to do is this. The key aspect that we are going to do is we are going to make the neural network learn what should be the value of each 1 of these filters. What should be the appropriate filter to apply to transform the image in such a way that you are able to make the correct prediction of whether it's a cat or a dog or whatever. We are not going to supply the weights of a neural network filter. You're not going to manually supply it. We will tell that, we'll design 33:58 - 34:42 Dr Anand Jayaraman: a structure, we'll say that I want to create 12 filters. I don't know what those filters are going to be. These 12 filters, each 1 of the filters is going to have how many values? Let's just count. This original matrix has how many layers? 1, 2, 3, 4, 5, 6, 7, 8 layers. Right? Now, so my filter will be 3 by 3 by 8. Right? 3 by 3 by 8. Right? This is the number of weights the neural network will have to learn for each filter. I am asking you to create 12 filters. So, I am 34:42 - 35:30 Dr Anand Jayaraman: asking you to learn all of those weights in such a way that you create a new filtered structure and this filtered structure should be useful in recognizing what the final image is. Prof Y12. Oh, that's a hyper parameter. I'm just giving you an example. It's a hyper parameter. That is a this and you will see that genuinely it's a hyper parameter right. So you have this original image 32 by 32 pixel with 3 layers you apply the convolution layer with let's say 6 different filters, 6 different filters if you apply, you will get a new image 35:30 - 35:42 Dr Anand Jayaraman: which is 6 layers deep but with a reduced size 28 by 20 right it was 32 by 32 Now it is reduced size 28 by 28. We will stack the output of each of these filters together to get a new 3D image. 35:44 - 35:56 Speaker 2: Professor, I understand that we have 3 layers because of RGB components. But in the previous slide, you showed that it is 8 different layers. 35:56 - 36:25 Dr Anand Jayaraman: Why is that? Because what you will see, like At the end of first convolution operation, now you have 6 layers. You have 6 layers. Essentially, what we will do is on this, we will again apply convolution, you will see. So this image is showing some n layered image how does convolution work that is what it shows. 36:27 - 36:29 Speaker 3: Got it, got it. Thank you. 36:29 - 36:50 Dr Anand Jayaraman: It is just showing the general operation. You will see just like we had 1 hidden layer of dense neural network and another hidden layer of another dense neural network and so on and so forth. We will similarly have 1 layer of convolutional neural network and another layer of convolutional neural network and yet another layer of convolutional neural network and so on. 36:51 - 36:55 Speaker 4: So professor, if we have 10 filters, then I will have 10 layers, right? 36:55 - 37:01 Dr Anand Jayaraman: Absolutely. In a single term. Yeah, each of those filters is going to get even huge. 37:01 - 37:10 Speaker 4: So I have 1 image, I apply 10 filters and I will see that 1 single image is exploded to 10 layers now. 10 layers. Yes. 37:10 - 37:53 Dr Anand Jayaraman: Right. Now, what once in a while we'll also do is what is called as to red, we will reduce the information that is there in this layer through what is called as a pooling operation. What is a pooling operation? Pooling operation looks at some neighborhood. There are 2 types of pooling operation. 1 is called max pooling and then other 1 is called average pooling. There is also min pooling too. You can do all kinds of pooling operation. Now, what does max pooling do? This is a 2 by 2 max pooling. You look at 2 by 2 37:53 - 38:08 Dr Anand Jayaraman: regions together and in that 2 by 2 region, take the maximum value and put it there. That is called as max pooling. So, here this 1, this 1, the max value is 20. 38:09 - 38:11 Speaker 2: Professor, this is through the convolved output? 38:12 - 38:51 Dr Anand Jayaraman: Sorry, it can be done on the convolved output or it can be done on the original input as well. So, right now I am defining an operation, but yes, usually it will be done after the convolved output. Now, here is this 1, this 30 is the largest, so I come there. Here the largest number was 112, so that is there. Here the largest number of 37, that is there. This kind of operation is called a pooling operation. I defined the convolution operation. Now I define pooling operation. Why is pooling operation important? Pooling operation is important because 38:51 - 39:41 Dr Anand Jayaraman: it allows us to impose local translational invariance. What do I mean by that? This is a picture of a cow. Now this is also a picture of a cow which looks very similar but there is a 1 minor, it looks similar to our human eye. The 1 difference is this picture of the cow has some values of pixels there. Right? This picture of the cow has is shifted down by 1 pixel. The entire layer has been shifted down by 1 pixel. Right, you see the first layer is all zeros, then 131, 282, 131 and so on. 39:42 - 39:50 Dr Anand Jayaraman: Okay, do you understand what I am saying? This is shifted, right? Now, in both of them. Professor, is 39:50 - 39:51 Speaker 2: it the left 1? 39:52 - 40:01 Dr Anand Jayaraman: No, the right 1 is shifted down. The right 1 has been shifted down... Yeah yeah now we want in both of them... The clouds on the 40:01 - 40:03 Speaker 3: right 1 but not on the left 1. 40:03 - 40:42 Dr Anand Jayaraman: The clouds and the ground level, yeah. So, we want in both of them, it should be recognized as a cow. So, it should not necessarily be that 1 is called a cow, other 1 is not a cow because it is 1 shifted pixel is there. What happens is when you do a pooling operation, you look at this region and you ask what is the local region and ask what is the maximum value, maximum value is 8. And here, you look at the local region and ask what is the maximum value that is also 8. Right? Basically, 40:43 - 41:23 Dr Anand Jayaraman: pooling operation, you are focusing only on the highest value, not on the details of what is near. And therefore, it will provide some amount of resilience to slight shift of the object from left side or right side. It does not matter whether the shirt is the center or slightly shifted, it is still a shirt. This pooling operation, the max pooling will allow you to continue recognizing that. As long as the image is the same. As long as the image is the same, correct. As long as the image is the same, yes. It focuses on the largest 41:24 - 42:09 Dr Anand Jayaraman: details of the image rather than minor variations. That is the main impact of the pooling operation. The secondary impact is the pooling operation also reduces the size of the image. Here is the original image. You apply a big pooling, then the reduced, the feature will be much smaller. This is the size of your pooling matrix. Now that matrix moves 4 times and you have captured the entire image. So it has reduced the size. It allows for dimensionality reduction as well. What I have done is I have defined for you what is convolution and I have defined 42:09 - 42:22 Dr Anand Jayaraman: for you what is pooling. I have not talked about how do you join together to make an image recognition. And that I am going to do now. 42:22 - 42:24 Speaker 4: Can I ask a question professor? 42:24 - 42:25 Dr Anand Jayaraman: Please go ahead. 42:25 - 42:40 Speaker 4: Now, what if this the whole exercise what we saw is the image is centered around, what if take an example cow has turned other side yeah so in that case you are pulling operation may not work the same 42:40 - 43:16 Dr Anand Jayaraman: may not work absolutely so we are coming there very good questions very good questions So we need to take care of all of that and we will talk about all of that towards the end of the session, might not be today but in the next class but tomorrow's class is going to be the lab session where he's actually going to Debrook is going to Dr. Debrook is going to actually show you how to build convolutional neural network using Azure. 43:16 - 43:26 Speaker 2: Professor a quick request by the end of the session will you spend 5 minutes explaining the assignment? I am not sure, I just joined late, so I am not sure if you have covered at the start of the session. 43:26 - 43:43 Dr Anand Jayaraman: The assignment explanation will also be done by Dr. Duttwo tomorrow. Tomorrow, ok. Thanks. Because, I mean, he will have the Azure ML studio in front of him and we can you can ask your questions. Yeah, he'll clarify Yeah, so I was 43:43 - 43:46 Speaker 2: just we wanted to know the rubrics in terms of how it will be marked and stuff 43:46 - 43:47 Dr Anand Jayaraman: like that. 43:47 - 43:47 Speaker 2: He is 43:47 - 43:49 Dr Anand Jayaraman: the person who is going to be grading it. 43:49 - 43:50 Speaker 2: Oh, great, great. 43:50 - 43:51 Speaker 3: Thank you. 43:51 - 44:36 Dr Anand Jayaraman: Yeah, please go ahead. Professor, I was just looking at the complexity. I think OpenAI just released a tool called Sora, S-O-R-A, text to video. Correct. Wondering how complex that could be. Yeah, so we will get there right. By the end of this module, we will talk about image generation. We have not talked about generation yet. The last session of this module will be on generation. Till now, we have been only talking about classifications, looking at data and classifying this or that or doing a regression problem. You are still in the generative AI, not in generative AI. 44:36 - 45:38 Dr Anand Jayaraman: Exactly. We are going to talk about generation in the very last session. And then the next module is all about generation. And you will start at least the basic, the overall picture is going to start becoming clear in the other next 1. Any other questions? Sorry. That is, any other questions? Sorry, I am, I apologize, I am recording and my throat is slowly starting to give away. Now, what I have done so far is described to you the mathematical definition of a convolution operation and I 45:38 - 45:38 Speaker 3: have defined to you the mathematical definition of a convolution operation. 45:38 - 46:33 Dr Anand Jayaraman: And I have defined to you the mathematical definition of the pooling operation, that is all I have done. Where is neural networks in this? There is no neural networks and then it is a matrix multiplication operations I am talking about. There is no neural networks here. Where is neural networks? Here is the way, the connection to neural networks. This convolution that we describe can be thought of actually as a neural network with some set of constraints. With some set of constraints. Now imagine I actually have a one-dimensional picture, just one-dimensional picture, not Just 1 dimensional picture, 46:35 - 47:32 Dr Anand Jayaraman: not a 2D picture, 1D picture. And I have a filter, which is also 1 dimensional filter. This is my filter and this is my original picture. Now what will I do? I take this filter and apply it first on this. So, there is some number here w1, w2, w3. So, w1, w2, w3 And these are going to be some set of values x1, x2, x3, x4, x5, so on and so forth. So, I will take this and then I will apply it. So, then I will do x1 times w1, x2 times w3, w2, x3 times w3. 47:33 - 48:12 Dr Anand Jayaraman: And then I can move this filter by either 1 pixel or I can choose to move it by 2 pixels or I can choose to move it by 3 pixels for the next step. That the amount by which I choose to move the pixel is called as the stride. So, I can either do stride 1, stride 2 or stride 3. That is what I am saying. The first 1 is like this. The next 1, do I move to here or do I move to here? Depends on the stride length of my convolution operation. Till now, we 48:12 - 49:02 Dr Anand Jayaraman: have been talking only of stride length 1 convolution operation, but I can have stride length 2 convolution as well, stride length 3 convolution as well. When I do stride length 2 or stride length 3, I get a smaller output. For the sake of convenience, let us say I am talking only about stride length 3, stride length 3 convolution. So, in stride length 3 convolution, what will happen? So, here is my original matrix x1, x2, x3, x4, x5, x6, x7, x8, x9. Stride length 3 what will happen? My filter will first apply here then next my filter 49:02 - 49:18 Dr Anand Jayaraman: will move here to here, then after it is done it will apply here and the filter will directly move here. Everyone with me so far? This is stride length 3 applying of convolution. Yes? 49:18 - 49:19 Speaker 4: Yes professor. 49:20 - 50:01 Dr Anand Jayaraman: Now, imagine this, this is what is happening here. Here, this is x1, x2, x3 all the way down here. And what am I doing? I am applying this filter w1, w2, w3. That you can think of what here I would have multiplied x1 times w1, x2 times w2, x3 times w3, then added all of them. That is what a neuron would do if it was connected only to 3 of the pixels. A neuron connected only to 3 of the pixels would have done exactly that, right? W1 times X1, W2 times X2, W3 times x3 and then 50:01 - 50:37 Dr Anand Jayaraman: added all of them together that will be the input to this neuron right now look at this next neuron this next neuron let us say is somehow connected only to these 3 neurons that also would have done except now this set us the second neuron we are saying instead of having completely different weights compared to the first neuron, we are saying it has exactly the same weights as the previous neuron. It has only 3 weights, but it does exactly same set of weights as the previous 1. And the third neuron also has exactly same set of 50:37 - 51:27 Dr Anand Jayaraman: weights. And those are the outputs that are going to come out of that. It's as if there are 3 neurons applying the same filter in 3 different locations. Do you see that? In a sense, this is like locally connected neurons as opposed to what did we have before? Dense neurons. Densely connected neurons. You see the connection now? Locally connected neurons, So it's connected only to few of them, remaining weights are all 0. And the other neurons, all the neurons in that layer share the same weights, sharing of weights. 51:29 - 51:45 Speaker 2: Professor 1 question here, When you were executing strides, you were moving the window from 1 pixel to the next pixel and so on. But in the neural network above, it is actually moving to 2 pixels. 51:46 - 51:52 Dr Anand Jayaraman: In this case, it is this thing. I just did it because my diagram, my drawing skills are not great. 51:58 - 51:59 Speaker 3: I am trying to understand it can 51:59 - 52:02 Speaker 2: be moved at any distance. 52:02 - 52:09 Dr Anand Jayaraman: You can specify any of them. We will see in the design of the neural network code, you can specify the strike length. 52:11 - 52:16 Speaker 2: It has to be not more than the filter Or can we leave? 52:17 - 52:29 Dr Anand Jayaraman: You can choose to leave. Yes. It, I mean, as I said, there is no neural network police force established yet. So nobody is going to arrest us. Right. 52:30 - 52:35 Speaker 2: But the size of the left neuron, left pixel will be different right on the 52:35 - 52:55 Dr Anand Jayaraman: output layer. That is the thing. So, thing is the most of them you need to make sure that you have complete coverage of all of them. So, you want to make sure your stride length is shorter than the filter size. Otherwise, you are ignoring some pixels. Your intuition is perfect. 52:57 - 53:05 Speaker 4: So Professor 1 big question. Now, when do we choose dense connections and when do we choose local, Professor? 53:06 - 53:57 Dr Anand Jayaraman: So, dense connections is what you are going to be looking at when you are looking at structured data. Structured data is where you had different columns of numbers. The order of these columns did not matter. So every column a priory every column is as important as everything else. Densely connected neurons matter because every neuron needs to look at all of the data. Locally connected neurons is important in unstructured data where you're saying when you're yes it is true that an image also can be put in the form of a matrix like that. But you're missing out 53:57 - 54:47 Dr Anand Jayaraman: the main point of the image when you look at it purely as a matrix. There is some local information that is a regional local information that is there, that local connections have value. The neighborhood has value, is providing value in the sense of informational value. Locally connected neural networks attempts to extract that value. This type of neurons, which are locally connected neurons are called as convolutional neurons. And you can specify what should be the size of the filter, convolutional units. I will show you code in a minute. And when I am talking about a feature, when 54:47 - 55:22 Dr Anand Jayaraman: I am talking about the filter, 3 by 3 filter and whatever, that's the number of weights it will be learning, right. So if it's a 3 by 3 filter, the 9 weights will be learned. And in that hidden layer, that essentially, normally you would have had to learn how many weights. This neuron will have how many weights? There are 7 neurons, 7 inputs, right? So, it will be 7 times 3, sorry, 7 weights for this. This 1 will also learn 7 weights for it. This 1 will also learn 7 weights for it. So, if you have 55:22 - 55:38 Dr Anand Jayaraman: 3 outputs then 21 weights will have to be learned. But now in because the weights are being shared, the number of ways is much. Sorry, questions again. 55:39 - 55:52 Speaker 3: Professor, just a quick query. I remember, I mean, I think 1 of these sessions you mentioned the deep learning or the neural networks works on the unstructured data, they don't work well onto the structured data. 55:53 - 56:52 Dr Anand Jayaraman: So, let me clarify that statement, what I mean is, this is the neural network infrastructure is like a sledgehammer, right? Very powerful tool, okay? You don't want to use it to crack open a nut. There are simpler machine learning algorithms that allow you to handle structured data. Unstructured data, The simpler machine learning algorithms struggle because these simpler machine learning algorithms work under the assumption that each column of the data is independent and there is no way for us to Show to it that there is meaning in these columns, there is no way to show that in 56:52 - 57:35 Dr Anand Jayaraman: the traditional ML algorithms. But in these neural network algorithm, there is a way to show it through these convolutional neurons. Convolutional neurons specifically are local neurons. And because of that, I would say, I'm saying that use these deep neural network models only on unstructured data, because for unstructured data, you cannot use a traditional machine learning algorithm. Whereas for most of the structured data problem, traditional machine learning algorithms are available for you and they can be made to work. There are of course some situation where it's much harder to do it like in the case of YouTube 57:35 - 58:08 Dr Anand Jayaraman: video recognition right because of the number of categorical variables being so large that sorry not number of categorical the cardinality of the categorical variable being so large makes it very hard. So, to solve that cardinality problem, you are going to do deep neural networks in the YouTube recommendation. But for most business problems, you will find that traditional ML algorithms do just fine with much lower computational loss. 58:11 - 58:17 Speaker 3: Okay, Professor, I got a related question, but I think I will hold on till probably the end. 58:17 - 58:19 Dr Anand Jayaraman: No, no, please ask away. Ask away. 58:19 - 58:47 Speaker 3: No, it's just an application I was thinking about. For instance, if there is a, I'm just taking a document which is there, right? I mean the document, let's say PDF document, which is all a text, a lot of text. And then there are numbers involved in that right so maybe let's take a contract or a sales number there is a table there is a lot of data which is there now that people who I mean the data we can extract it does it make no sense to analyze that data if you need to take that put 58:47 - 58:56 Speaker 3: it into a table and th

CNN Full transcript.pdf

Document Details

Tags

Related

Full Transcript