🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

2-Feature engineering and extraction Part 1.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

Project 1 about segmentation Brain MRI segmentation Dataset: https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation About Dataset What is a brain tumor? A brain tumor is a collection, or mass, of abnormal cells in your brain. Your skull, which enclose...

Project 1 about segmentation Brain MRI segmentation Dataset: https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation About Dataset What is a brain tumor? A brain tumor is a collection, or mass, of abnormal cells in your brain. Your skull, which encloses your brain, is very rigid. Any growth inside such a restricted space can cause problems. Brain tumors can be cancerous (malignant) or noncancerous (benign). When benign or malignant tumors grow, they can cause the pressure inside your skull to increase. This can cause brain damage, and it can be life- threatening. The importance of the subject Early detection and classification of brain tumors is an important research domain in the field of medical imaging and accordingly helps in selecting the most convenient treatment method to save patients life therefore Methods According to the World Health Organization (WHO), proper brain tumor diagnosis involves detection, brain tumor location identification, and classification of the tumor on the basis of malignancy, grade, and type. This experimental work in the diagnosis of brain tumors using Magnetic Resonance Imaging (MRI) involves detecting the tumor, classifying the tumor in terms of grade, type, and identification of tumor location. About Dataset This dataset is a combination of the following three datasets : figshare SARTAJ dataset Br35H This dataset contains 7023 images of human brain MRI images which are classified into 4 classes: glioma - meningioma - no tumor and pituitary. no tumor class images were taken from the Br35H dataset. I think SARTAJ dataset has a problem that the glioma class images are not categorized correctly, I realized this from the results of other people's work as well as the different models I trained, which is why I deleted the images in this folder and used the images on the figshare site instead. Note Pay attention that The size of the images in this dataset is different. You can resize the image to the desired size after pre-processing and removing the extra margins. This work will improve the accuracy of the model pre-processing code Pattern Recognition 2-Feature engineering and extraction BY PROF. MOHAMED A BERBAR 2-Feature engineering and extraction A feature is a measurable quantity obtained from the patterns present in the signal. Raw features obtained from sensors are fed to a feature engineering step. Feature engineering extracts essential features that can be used to detect different patterns. Good feature representations allow us to build robust models that learn the salient structures in the data. For example, to obtain new representations of the data, We can express a signal f(t) in terms of a d-dimensional feature space x, using a finite set of d measurements x1, ,xd, where d is the total number of features. 2-Feature engineering and extraction The common goal of feature extraction and representation techniques is to convert the segmented objects into representations that better describe their main features and attributes. Feature extraction is the process by which certain features of interest within an image are detected and represented for further processing The resulting representation can be subsequently used as an input to a number of pattern recognition and classification techniques, which will then label, classify, or recognize the semantic contents of the image or its objects. Model such as support vector machine (SVM), multilayer perceptrons (MLP) and random forests, the extracted features of lesions are fed to the Machine Learning (ML) models for training the model to differentiate the lesions. Feature based ML requires feature engineering, which is a tedious and error prone job. 2-Feature engineering and extraction Machine Learning (ML) approach termed as deep learning (DL) like convolutional neural network (CNN) can automatically learn high-level representations of objects from large numbers of data instead of using a set of handcrafted features. In short, the CNN, an approach of deep learning takes the image as input and gives the output in the categories of classes such as lesion and no lesion in an image. The DL method does not require hand-crafted feature engineering techniques, unlike traditional ML algorithms. Complexity of PR – An Example camera Problem: Sorting incoming fish on a conveyor belt. Assumption: Two kind of fish: (1) sea bass (2) salmon 8 2.1. Preprocessing A critical step for reliable feature extraction! Preprocessing: Noise removal Image enhancement Separate touching or occluding fish Extract boundary of each fish 9 Training/Test data How do we know that we have collected an adequately large and representative set of samples to extract features for training/testing the system? Training Set ? Test Set ? 10 2.2. Feature Extraction How to choose a good set of features? ◦ Discriminative features ◦ Invariant features (e.g., invariant to geometric transformations such as translation, rotation and scale) Are there ways to automatically learn which features are best ? 11 Feature Extraction Let’s assume that a fisherman told us that a sea bass is generally longer than a salmon. We can use length as a feature and decide between sea bass and salmon according to a threshold on length. 12 Multiple Features To improve recognition accuracy, we might need to use more than one features. ◦ Single features might not yield the best performance. ◦ Using combinations of features might yield better performance.  x1  x1 : lightness x  x2 : width  2 13 Multiple Features (cont’d) Does adding more features always help? ◦ It might be difficult and computationally expensive to extract more features. ◦ Correlated features might not improve performance (i.e., redundancy). ◦ Adding too many features can lead to a worsening of performance 14 2.2.1 Morphological Feature Extraction The shape and margin characteristics of a tumor can be efficiently represented by morphological features as they are considered to be clinically significant in discriminating the benign and malignant tumors in medical applications (Computer Aided Diagnostic software). The benign tumors have a regular shape (round or oval), while malignant tumors are irregularly shaped. The computed morphological features are: area, perimeter, circularity, equivalent diameter, convex area, solidity, Euler number, length of major axis and minor axis of the tumor region. The computed morphological features have been aggregated to form a morphological feature set (MFS). Fig. 7 Sample malignant breast ultrasound image, indicating (a) tumor boundary, (b) tumor boundary and convex hull boundary, (c) tumor boundary and bounding rectangle, (d) tumor boundary and ellipse of the tumor 2.2.2 Local binary patterns features LBP compare each sample in a neighborhood window to the middle pixel and generates a binary code that encodes the local behavior of the signal. For example, an LBP can be generated by applying a binary neighbourhood operation G(.) over signal regions Ri, each with a k sample width. The function G(.) compares the pixels in the neighborhood to a reference pixel (the middle pixel), and produces a k-bit binary number, where 1 means that the kth pixel in the window is greater than the reference pixel. A signal with N regions will produce N, k-bit binary patterns. We convert each binary pattern to decimal representation to create a feature vector. Local binary patterns features An example region of the original image is examined, with neighboring parameters of R = 1 and N = 8. Neighboring pixels are compared to the center pixel: pixel values smaller than the center pixel values are assigned to 1, pixel values bigger to 0. Binary values are stringed together. This allows a calculation of a decimal value which will be stored in matrix with the same width and height as the original image and in the same place as the input center pixel. This is done for every pixel of the image. The LBP matrix can be represented as a histogram which will be treated as the feature vector of the original image. Advantages of LBP Local Binary Pattern (LBP) has several advantages that make it a popular method for texture analysis in computer vision and image processing: 1.LBP is robust to illumination variations, which means that it can effectively capture texture information in images that have different lighting conditions. This makes it particularly useful for applications such as facial recognition and object detection, where lighting conditions can vary significantly. 2.LBP is a computationally efficient method for texture analysis, which makes it suitable for processing large datasets and real-time applications. 3.LBP is invariant to image rotation and scale. Hence it can effectively capture texture information in images that have been rotated or scaled. 4.LBP has been shown to be highly discriminative for texture analysis Disadvantages of LBP 1.LBP is sensitive to noise in the image. This can affect its ability to accurately capture texture information. The LBP operator compares neighboring pixel intensities, and if there is noise in the image, it can result in incorrect binary values that can affect the resulting LBP histogram. 2.LBP only captures local texture information near each pixel, which can limit its ability to capture more global texture information in the image. 3.While LBP is invariant to image rotation, it does not capture rotational information in the texture patterns. This can limit its ability to distinguish between textures that are similar but differ in their rotational patterns. 4.LBP is typically applied to grayscale images, which means that it does not capture color information in the texture patterns. Example: LBP Based CAD System Chapter “LBP Based CAD System Designs for Breast Tumor Characterization” proposes an efficient CAD system for characterization of breast ultrasound images based on LBP texture features and morphological features. The results illustrate that CAD system based on ANFC-LH algorithm yields optimal performance for breast tumor characterization. The link LBP-Based CAD System Designs for Breast Tumor Characterization.pdf Combined Feature Set Generation Fig. 5 Combined feature set generation. (Note: LBP Local binary pattern, TFS Texture feature set, RTFS Reduced texture feature set, MFS Morphological feature set, CFS Combined feature set) Fig. The experimental workflow adopted for the design of an efficient LBPbased CAD system for breast tumor characterization. (Note:OFS Optimal feature set, LH Linguistic hedges, GA Genetic algorithm, PCA Principal component analysis, SAE Stacked autoencoder, ANFC Adaptive neuro-fuzzy classifier, SVM Support vector machine, SM Softmax) 2.2.3. Region Based (morphological) Features 1- Area 2- Centroid (center of gravity) To calculate the centroid of an image, you can follow these steps: 1.Determine the size and dimensions of the image (width and height). 2.Create two variables, sum_x and sum_y, both initialized to 0. 3.Traverse through each pixel of the image. You can use nested loops for this purpose, where the outer loop iterates through the rows of pixels and the inner loop iterates through the columns. 4.For each pixel at position (x, y), retrieve its intensity or color values. 5.Multiply the x-coordinate of the pixel by its intensity and add the result to the sum_x variable. 6.Multiply the y-coordinate of the pixel by its intensity and add the result to the sum_y variable. 7.After traversing through all the pixels, calculate the total sum of intensities or colors of all the pixels in the image. 8.Finally, calculate the centroid coordinates using the following formulas: centroid_x = sum_x / total sum of intensities centroid_y = sum_y / total sum of intensities The resulting centroid_x and centroid_y values represent the coordinates of the centroid of the image. 3- Euler Number The number of connected components (C) minus the number of holes (H). 4- Perimeter The perimeter of a binary object Oi can be calculated by counting the number of object pixels (whose value is 1) that have one or more background pixels (whose value is 0) as their neighbors. An alternative method consists in firs extracting the edge (contour) of the object and then counting the number of pixels in the resulting border. 5- Thinness Ratio Thinness ratio is the measure of roundness of the binary object. It is calculated as the ratio of the Area and the perimeter of the object. As the perimeter increases relative to the area of the object, the object gets thinner. It can also be used to measure the regularity of an object. Since the maximum value for Ti is 1 (which corresponds to a perfect circle). Where Ai is the area and Pi is the perimeter. The idea behind the thinness ratio is that once known the perimeter of the unknown shape if the shape is similar to a circle the measured area should be equal to the theoretical area of a circle with the circumference equal to the perimeter of the unknown shape. The idea behind the thinness ratio is that once known the perimeter of the unknown shape if the shape is similar to a circle the measured area should be equal to the theoretical area of a circle with the circumference equal to the perimeter of the unknown shape. 6- Circularity Another very common shape factor is the circularity, a function of the perimeter P and the area A: The circularity symbol is used to describe how close an object should be to a true circle. Sometimes called roundness A value of 1.0 indicates a perfect circle. As the value approaches 0.0, it indicates an increasingly elongated shape. 7- Eccentricity The eccentricity of an object is define as the ratio of the major and minor axes of the object. 8- Aspect Ratio The aspect ratio (AR) is a measure of the relationship between the dimensions of the bounding box of an object. Project 2: Brain Tumor Classification (MRI) Proposes an efficient CAD system for characterization of brain MRI images based on LBP texture features and morphological features. Use dataset from: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification- mri/data 1- CLAHE for noise removal. 2- Segmentation using Region Growing algorithm. Using Region Splitting and Merging Algorithm. It will segment the image into regions. Each one with a label number. Separate each region in separated image. For each region calculate: Morphological features: Area, centroid, Perimeter, Thinness Ratio, Circularity, Eccentricity, Aspect Ratio, mean, Major axis, minor axis. 3- For the enhanced complete image calculate LBP. The zero coefficients are replaced by encoding them using the theory of the popular RLE algorithm. 4- Then combine the two feature set and 5- use SVM for classification as a black box. ULBPEZ Features https://doi.org/10.1007/s13755-022-00181-z

Use Quizgecko on...
Browser
Browser