Musical Playlist Generation using Facial Expression Recognition PDF

Summary

This project presentation details a system for generating musical playlists based on user facial expressions. The methodology involves computer vision techniques, including Haar Cascade classifiers and Convolutional Neural Networks (CNNs), to detect and classify emotions in images. The system aims to automate playlist creation, saving user time.

Full Transcript

Musical Playlist Generation using Facial Expression Recognition Presented by team 8.0 Problem statement Develop a Musical system which will generate a musical playlist for the user based on his/her mood, by capturing his/her facial expressions. INTRODUCTION Music plays an...

Musical Playlist Generation using Facial Expression Recognition Presented by team 8.0 Problem statement Develop a Musical system which will generate a musical playlist for the user based on his/her mood, by capturing his/her facial expressions. INTRODUCTION Music plays an important role in our life. It's not just a source of entertainment in our life. It gives us relief and reduces our stress thus music also imparts a therapeutic approach. It helps to improve our mental health. Computer vision is a field of study which encompasses on how computer see and understand digital images and videos. Computer vision involves seeing or sensing a visual stimulus, make sense of what it has seen and also extract complex information that could be used for other machine learning activities. We will implement our use case using the Haar Cascade classifier. Haar Cascade classifier is an effective object detection approach which was proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. T Methodology: Input Training the Capture the face of dataset cnn model user using webcam Generating Detecting the Facial feature Music classifier playlist emotion extraction Methodology - in detail Step 1: Proposed System Convolution neural network algorithm is a multilayer perceptron that is the special design for the identification of two- dimensional image information. It has four layers: an input layer, a convolution layer, a sample layer, and an output layer. Step II: Face Detection The Viola-Jones Algorithm, developed in 2001 by Paul Viola and Michael Jones, the Viola-Jones algorithm is an object- recognition framework that allows the detection of image features in real-time. The Viola-Jones Object Detection Framework combines the concepts of Haar-like Features, Integral Images. the AdaBoost Algorithm, and the Cascade Classifier to create a system for object detection that is fast and accurate 1. Haar-like Features: Haar-like features are named after Alfred Haar, a Hungarian mathematician in the 19th century who developed the concept of Haar wavelets (kind of like the ancestor of haar-like features). The features below show a box with a light side and a dark side, which is how the machine determines what the feature is. Sometimes side will be lighter than the other, as in an edge of an eyebrow. Sometimes the middle portion may be shinier than the surrounding boxes, which can be interpreted as a nose. There are 3 types of Haar-like features that Viola and Jones identified in their research: Edge features Line-features Four-sided features 2. INTEGRAL IMAGE We calculated the value of a feature. In reality, these calculations can be very intensive since the number of pixels would be much greater within a large feature. The integral image plays its part in allowing us to perform these intensive calculations quickly so we can. understand whether a feature of a number of features fit the criteria. To calculate the value of a single box in the integral image, we take the sum of all the boxes to its left. Haar-like features are actually rectangular, and the integral image process allows us to find a feature within an image very easily as we already know the sum value of a particular square and to find the difference between two rectangles in the regular image, we just need to subtract two squares in the integral image. So even if you had 1000 x 1000 pixels in your grid, the integral image method makes the calculations much less intensive and can save a lot of time for any facial detection model. 3. Cascade classifier A Cascade Classifier is a multi-stage classifier that can perform detection quickly and accurately. Each stage consists of a strong classifier produced by the AdaBoost Algorithm. A CNN typically typical has three layers a convolutional layer, a pooling layer, and a fully connected layer 1.Convolutional Layer: The element involved in carrying out the convolution operation in the first part of a Convolutional Layer is called the Kernal Filtor. The objective of the Convolution Operation a to extract the high-level features such as edges, from the input image. Conventionally, the first, Convolution Layer is responsible for capturing the Low-Leval features such as edges, color, gradient orientation, etc. With added layers, the architecture adapts to the High Level features as well. Convolution is a mathematical operation to merge two sets of information in our case the convolution is applient on the input data using a convolution filter to produce a feature map We perform the convolution operation by sliding this filter over the input. At every location, we do element-wise manne mutoplication and sum the result. This sum goes into the feature mup. The groen area where the convolution operation takes place is called the receptive field. Due to the size of the fiter the receptive fold is also 3x3. 2.Pooling Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction. Furthermore, it a useful for extracting dominant features which rotational and positional invariant, thus maintaining the process of effectively training of the model There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling returns the maximum value from the portion of the image covered by the Kernel On other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel. 3.Fully Connected Layer: Neuront in tha layer have full connectivity with all neurons in the preceding and succeeding byer as seen in regular FONN. Fully Connected Layer is also called as Dense Layer. It provides learning features from all the combinations of the features of the previous layer. The input and the output FC layer helps to map the representation between The flattened output is fed to a feed-forward neural networic and backpropagation Fig fully connected layer applied to every iteration of training Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in images and classify them using the SoftMax Classification technique. RESULT Neutral expression detected and accordingly playlist is generated Surprise emotion Angry emotion detected detected and accordingly playlist and accordingly playlist is created is created CONCLUSION A thorough review of the literature tells that there are many approaches to implement Music recommendation system using facial expressions. 2. Implementation of this project will help the user to automatically generate musical playlist using his facial expressions, saving his time and manual labour. 3. In this project, we are generating the playlist according to the emotion of the user, we developed a program for predicting the emotion of the user using Convolution neural networks and for generating the playlist we have used Spotify API. 4. We have applied it on various images and achieved a very high accuracy of more than 99% for happy, anger, surprise and fear emotions. 1. K S Akshaya nivetha 2. T Aswini 3. V Ilakiya 4. V S Kamya 5. R Kaviya Dharshini 6. D Nandika 7. R Rashmika 8. S T Sahanaa

Use Quizgecko on...
Browser
Browser