Comparative Analysis of Similarity Methods in High-Dimensional Vectors PDF
Document Details
Uploaded by TrustedSerpentine6280
Assiut University
2023
Mohammad Yasser, Khaled F. Hussain, Samia A. Ali
Tags
Summary
This paper reviews different similarity methods in high-dimensional vectors, focusing on their strengths and limitations. It covers geometric-based methods like Euclidean distance and Minkowski distance, as well as neural network approaches. The methods were tested on several open-source datasets.
Full Transcript
Comparative Analysis of Similarity Methods in 2023 International Conference on Artificial Intelligence Science and Applications in Industry and Society (CAISAIS) | 979-8-3503-1478-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/CAISAIS59399.2023.10270776...
Comparative Analysis of Similarity Methods in 2023 International Conference on Artificial Intelligence Science and Applications in Industry and Society (CAISAIS) | 979-8-3503-1478-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/CAISAIS59399.2023.10270776 High-Dimensional Vectors: A Review Mohammad Yasser Khaled F. Hussain Samia A. Ali Department of Electrical Engineering Faculty of Computers and Information Department of Electrical Engineering Assiut University Assiut University Assiut University Assiut city, Assiut Assiut city, Assiut Assiut city, Assiut mohammad [email protected] [email protected] samia [email protected] Abstract—With the increasing availability of high-dimensional B. Sparsity and Density data in diverse fields as well as the need to process this data, the accurate measurement of similarity plays a crucial role in many High-dimensional vectors often exhibit sparsity, meaning applications. Specifically, the machine learning models that check that most of the vector components are zero or close to zero. for similarity between given items need methods that are both This sparsity poses a challenge in measuring similarity as time efficient and accurate.The exploration of high-dimensional it can lead to misleading results. Traditional similarity data has become increasingly crucial in diverse fields, including measures assume dense vector representations and may not natural language processing, computer vision, bioinformatics, and more. This paper presents a comprehensive comparison effectively capture the underlying patterns and relationships of various similarity methods in high-dimensional vectors. This in sparse high-dimensional data. Addressing sparsity requires paper is concerned with evaluating and analyzing a number tailored similarity methods that consider the distribution and of different similarity methods, considering their effectiveness, density of non-zero elements in the vectors. efficiency, and robustness in capturing similarities within high- dimensional datasets. The methods shown were tested on a total C. Computational Complexity of 11 open source benchmark data sets The experimental results on various datasets demonstrate the strengths and limitations of As the dimensionality of vectors increases, the computa- each method, providing valuable insights for researchers and tional complexity of measuring similarity also escalates. Tra- practitioners in selecting appropriate similarity measures for ditional algorithms may struggle to handle the computational their specific needs. demands imposed by high-dimensional data, leading to signif- Index Terms—HD computing , Machine Learning , Similarity icant processing time and resource requirements. It becomes methods crucial to develop efficient similarity measurement techniques that can handle the computational complexity of large-scale I. I NTRODUCTION high-dimensional data while maintaining acceptable accuracy. In recent years, high-dimensional vectors have emerged as a popular data representation in various fields, ranging from D. Dimensionality Reduction and Feature Selection natural language processing and computer vision to recom- To alleviate the challenges associated with high-dimensional mendation systems, information retrieval, machine learning, vectors, dimensionality reduction techniques are often em- and data mining. However, as the dimensionality of ployed. However, dimensionality reduction can impact sim- these vectors increases, measuring similarity between them ilarity measurement by distorting the original vector space becomes a challenging task. This article aims to explore the and potentially discarding relevant information. Additionally, key challenges associated with measuring similarity in high- selecting the most informative features from high-dimensional dimensional vectors and shed light on the impact of these vectors is a non-trivial task that can significantly affect the challenges on data analysis tasks. accuracy of similarity measurements. It becomes essential to carefully consider the impact of dimensionality reduction and A. Curse of Dimensionality feature selection on similarity analysis and choose appropriate One of the primary challenges in high-dimensional vector techniques that preserve the underlying structure of the data. similarity measurement is the curse of dimensionality. As the number of dimensions increases, the volume of the space E. Scalability and Indexing grows exponentially, resulting in sparsity and a diminishing Efficient indexing and retrieval of high-dimensional vectors number of data points. Consequently, traditional similarity based on similarity are critical in many applications. However, measures, such as Euclidean distance or cosine similarity, may traditional indexing methods, such as tree-based structures, fail to provide accurate results due to increased data dispersion may suffer from the ”curse of dimensionality” and encounter and decreased discriminative powe r. The curse of dimen- performance degradation in high-dimensional spaces. Scalable sionality highlights the need for specialized techniques that can indexing techniques, such as locality-sensitive hashing (LSH) handle the unique characteristics of high-dimensional spaces. or random projections, have been developed to address this Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply. challenge. These methods enable approximate similarity expressed as follows : search and retrieval in high-dimensional spaces, but they v u n introduce trade-offs between efficiency and accuracy. uX 2 EuclideanDistance (X, Y ) = t (xi − yi ) (1) This paper presents a comprehensive review of different i=1 types of similarity methods employed in high-dimensional In this equation, xi and yi represent the individual elements vector spaces; mainly geometric-based methods [Euclidean or dimensions of the vectors x and y, respectively. The sum Distance - Minkowski distance - Hamming Distance - Jaccard is taken over all dimensions of the vectors. The expression coefficient - Sørensen-Dice similarity - Cosine Similarity] and (xi − yi )2 calculates the squared difference between the the use of neural network as an approximation method. We corresponding elements of the two vectors. Finally, the square discuss the underlying concepts, strengths, and limitations of root of the sum is taken to obtain the Euclidean distance. each method, along with their applications and performance The Euclidean distance measures the magnitude of the differ- evaluations. Our objective is to provide researchers and ence between two vectors in terms of their individual dimen- practitioners with a better understanding of the diverse sions. It represents the length of the straight-line path between similarity techniques available, enabling them to make the two vectors in the high-dimensional space. A smaller informed decisions when selecting appropriate methods for Euclidean distance indicates a higher similarity or proximity their specific use cases. While the information presented in between the vectors, while a larger distance indicates greater this paper is universal to all high-dimensional vector spaces, dissimilarity or separation.The Euclidean distance has some the foucs is on hyperdimensional computing vectors space advantages and disadvantages as a measure of similarity or as per our previous work.In short, hyperdimensional dissimilarity between vectors. computing is a novel computing paradigm inspired by the The Advantages of Euclidean Distance can be summarized principles of cognitive neuroscience. It utilizes in it being Suitable for continuous or real-valued data repre- high-dimensional vectors to represent and process information, sentation. Euclidean Distance also Measures the magnitude of mimicking the way the human brain encodes and manipulates the difference between vectors and Provides a straightforward data. By leveraging the properties of these vectors, such measure of dissimilarity. as their ability to capture complex patterns and perform On the other hand, the disadvantages of Euclidean Distance rapid similarity comparisons, hyperdimensional computing include the emphasizing of magnitude, which may not be holds the potential for efficient and robust solutions in areas appropriate in some cases. As well as the lack of consideration like machine learning, pattern recognition, and cognitive of the presence or absence of features. computing. The following sections in this paper are organized as follows: Section gives some examples of incremental C. Minkowski distance learning methods in face recognition in the literature. Section The Minkowski distance is a generalized distance metric II gives a brief explanation for the various methods along that encompasses both the Euclidean distance and the Man- with their advantages and disadvantages. Section III has the hattan distance as special cases. It calculates the distance results of the experiments and commentary, and Section IV between two vectors in a high-dimensional space, taking into concludes the paper. account the differences between their corresponding elements. Mathematically, the Minkowski distance between two vectors, II. M ETHODS x and y, can be expressed as follows: A. Introduction n ! p1 X p In the era of big data, high-dimensional vectors have be- M inkowskiDistance (X, Y ) = (|xi − yi |) (2) come ubiquitous in various fields, including machine learning, i=1 data mining, and information retrieval. Analyzing and under- In this equation, xi and yi represent the individual elements standing the similarity between these vectors is crucial for of the vectors x and y, respectively. The sum is taken over all many applications, such as document clustering, image recog- dimensions of the vectors. The expression |xi −yi |p calculates nition, and recommendation systems. This chapter provides an the absolute difference between the corresponding elements of overview of similarity methods commonly employed in han- the two vectors raised to the power of p. Finally, the sum is dling high-dimensional vectors, highlighting their strengths, raised to the power of 1/p to obtain the Minkowski distance. weaknesses, and applications. The parameter p determines the type of Minkowski distance. When p = 1, it represents the Manhattan distance, which B. Euclidean Distance calculates the sum of the absolute differences between the The Euclidean distance is a measure of similarity or elements of the vectors. When p = 2, it represents the dissimilarity between two vectors in a high-dimensional Euclidean distance, which calculates the square root of the space. It calculates the straight line distance between the two sum of the squared differences between the elements. For vectors, treating them as points in space. Mathematically, the other values of p, it represents a generalized Minkowski Euclidean distance between two vectors, X and Y, can be distance. The Minkowski distance allows for flexibility in Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply. measuring similarity or dissimilarity, as the choice of p can On the other hand, the disadvantages of Hamming distance be adjusted based on the specific requirements of the problem include its assumption of equal importance or weighting of all at hand. A smaller Minkowski distance indicates a higher features. Another disadvantage comes from being Specifically similarity between the vectors, while a larger distance indicates designed for binary or categorical data which makes Hamming greater dissimilarity. Minkowski distance has some advantages distance less applicable to continuous data. and disadvantages as a measure of similarity or dissimilarity between vectors. E. Jaccard Index Minkowski distance is a generalization of Euclidean distance The Jaccard Index, also known as the Jaccard similarity which gives it some of the advantages of Euclidean distance coefficient , is a measure of similarity between two sets. like capturing magnitude in high-dimensional vectors. On top It quantifies the size of the intersection between the sets of that, Minkowski distance also captures direction and can relative to the size of their union. Mathematically, the Jaccard handle both continuous and categorical data representations. coefficient between two sets, X and Y, can be expressed as On the other hand, some of the disadvantages of Euclidean dis- follows: tance carry over to Minkowski distance. Minkowski distance |X ∩ Y | |X ∩ Y | may not perform well with imbalanced or varying ranges of JaccardSimilarity(X, Y ) = = |X ∪ Y | |X| + |Y | − |X ∩ Y | features. (4) In this equation, |X ∩ Y | represents the cardinality (number of D. Hamming distance elements) of the intersection of sets X and Y, while |X ∪ Y | The Hamming distance is a metric used to measure the represents the cardinality of the union of sets X and Y. dissimilarity between two strings or binary vectors of equal The Jaccard coefficient ranges between 0 and 1, where a length. It calculates the number of positions at which the value of 0 indicates no overlap or similarity between the corresponding elements of the two vectors are different. Math- sets, and a value of 1 indicates a perfect match or complete ematically, the Hamming distance between two vectors, x and similarity. In the context of high-dimensional vectors, the y, can be expressed as follows : Jaccard coefficient can be calculated by treating the vectors n X as binary representations, where each element indicates the Hammingdistance (X, Y ) = (xi ̸= yi ) (3) presence or absence of a particular feature or attribute. The sets i=1 X and Y can be considered as the sets of indices of the non- In this equation, xi and yi represent the individual elements zero elements in the respective vectors. Jaccard coefficient has of the vectors x and y, respectively. The sum is taken over all some advantages and disadvantages as a measure of similarity dimensions of the vectors. The expression (xi ̸= yi ) evaluates or dissimilarity between vectors. to 1 if the elements at the corresponding positions are different Jaccard coefficient measures the size of the intersection rela- and 0 if they are the same. The sum calculates the total tive to the union of two sets which gives it an advantage for number of positions where the two vectors differ, yielding the being suitable for binary or categorical data representation. Hamming distance. Jaccard coefficient focuses on the presence or absence of The Hamming distance is particularly useful when dealing features and provides a similarity measure between 0 and 1 with binary data, where each element of the vector represents which is particularly useful for comparing sets with different a binary attribute or feature. It quantifies the dissimilarity in cardinalities. terms of the number of bit flips required to transform one On the other hand, the disadvantages of Jaccard coefficient vector into another. A smaller Hamming distance indicates a stem from focusing solely on the presence or absence of higher similarity between the vectors, while a larger distance features, ignoring magnitude or value differences. This makes indicates greater dissimilarity. Jaccard coefficient sometimes unable to provide an accurate The Hamming distance is commonly used in various ap- measure for datasets with imbalanced or highly varying car- plications, including error detection and correction, coding dinalities since it is only applicable to set-like data represen- theory, and DNA sequence analysis. It is a simple and efficient tations. measure that does not consider the magnitude or intensity of the differences between the elements, making it suitable F. Sørensen-Dice similarity for comparing binary or categorical data. However, it may The Sørensen-Dice similarity, also known as the Sørensen- not be the most appropriate choice for continuous or real- Dice coefficient or Dice similarity coefficient , is a mea- valued data, where measures like Euclidean distance or cosine sure of similarity between two sets. It is commonly used in similarity are more commonly used. Hamming distance has various fields, including information retrieval and data mining, some advantages and disadvantages as a measure of similarity to compare the similarity between high-dimensional vectors. or dissimilarity between vectors. The Sørensen-Dice similarity is defined mathematically as Hamming distance was specifically designed for binary or follows: Let’s assume we have two sets, X and Y, represented categorical data representation by focusing on element-wise by high-dimensional vectors. Each vector can be considered differences between vectors which gives it a great advantage as a binary representation, where each element indicates the as a high dimensonal vector similarity method. presence or absence of a particular feature or attribute. The Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply. Sørensen-Dice similarity can be calculated using the following continuous. Cosine similarity measures the angle between formula: vectors, emphasizing direction rather than magnitude which 2 ∗ |X ∩ Y | makes it effective for high-dimensional data where the mag- Sørensen − DiceSimilarity(X, Y ) = (5) nitude is less important |X| + |Y | On the other hand, the disadvantages of Cosine similarity In this equation, the numerator represents twice the number start from ignoring magnitude or value differences between of common elements between sets X and Y. The denominator elements, focusing on the direction or orientation only. It is represents the sum of the cardinalities of sets X and Y. The also sensitive to the sparsity or distribution of data. resulting value ranges between 0 and 1. A value of 0 indicates no overlap or similarity between the sets, while a value of H. Neural Network 1 indicates a perfect match or complete similarity between the sets. The Sørensen-Dice similarity is often used in cases Neural networks can be used in high dimension vector where the presence or absence of certain features is more similarity tasks to learn complex patterns and relationships important than their actual values or magnitudes. Sørensen- among high-dimensional vectors. By leveraging their ability to Dice similarity has some advantages and disadvantages as a capture non-linear dependencies, neural networks can provide measure of similarity or dissimilarity between vectors. powerful solutions for similarity or distance computations On the advantage side, Sørensen-Dice similarity measures the between vectors in high-dimensional spaces. Here are a few overlap between two sets or vectors and provides a similarity ways neural networks are applied in high-dimensional vector measure between 0 and 1 which is simple and intuitive but may similarity tasks: not capture complex relationships. This also makes it suitable Embedding Learning: for binary or categorical data representation. Neural networks can be trained to learn low-dimensional On the other hand, the simplicity of Sørensen-Dice similarity representations (embeddings) of high-dimensional vec- give is a drawback since it makes it harder to capture complex tors. These embeddings aim to preserve the similarity relationships or dependencies in high-dimensional vectors.This relationships between vectors. By mapping vectors into is due to the lack of consideration of the magnitude or value a lower-dimensional space, the similarity computation differences between elements in Sørensen-Dice similarity. can be performed more efficiently. Popular methods like G. Cosine Similarity word2vec and sentence encoders leverage neural net- works to learn vector embeddings that capture semantic Cosine similarity is a measure of similarity between two relationships between words or sentences. vectors in a high-dimensional space. It calculates the Siamese Networks: cosine of the angle between the two vectors, which reflects Siamese networks consist of two or more parallel neural their similarity in terms of direction rather than magnitude. networks that share weights. They are trained to learn Mathematically, the cosine similarity between two vectors, x similarity or dissimilarity metrics between pairs of vec- and y, can be expressed as follows: Pn tors. Siamese networks can take high-dimensional vectors X.Y (xi yi ) as input and produce a similarity score or distance metric CosineSimilarity (X, Y ) = = pPn i=12 pPn ||X||||Y || x 2 as output. These networks are often used in tasks such i=1 i i=1 yi (6) as image similarity, facial recognition, and document In this equation, (X.Y ) represents the dot product of vectors similarity, where high-dimensional vectors need to be x and y, which is the sum of the element-wise products compared. of the corresponding elements. ||X|| and ||Y || represent the Metric Learning: Euclidean norms (magnitudes) of vectors x and y, respectively, Neural networks can be trained to learn a similarity calculated as the square root of the sum of the squared metric or distance function directly. By optimizing a loss elements. The cosine similarity ranges between -1 and 1, function that encourages similar vectors to be close and where a value of 1 indicates that the vectors have the same dissimilar vectors to be far apart, the network learns to direction and are perfectly similar, a value of -1 indicates discriminate between vectors based on their similarity. they have exactly opposite directions and are dissimilar, and Metric learning networks are particularly useful in sce- a value of 0 indicates they are orthogonal or independent. In narios where the notion of similarity is subjective and the context of high-dimensional vectors, the cosine similarity cannot be easily captured by traditional metrics. can be used to compare the similarity between vectors by Deep Metric Learning considering the angles between them. It is particularly useful Deep metric learning extends the concept of metric when the magnitude of the vectors is not crucial, and the learning by employing deep neural networks to learn focus is on their relative orientations or directions. Cosine similarity or distance metrics. These networks typically similarity has some advantages and disadvantages as a measure consist of multiple hidden layers and can capture com- of similarity or dissimilarity between vectors. plex relationships among high-dimensional vectors. Deep Cosine similarity’s advantages include being suitable for var- metric learning methods are used in applications such as ious data representations, including binary, categorical, and image retrieval, recommendation systems, and content- Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply. based search, where the goal is to find similar items based The framework seen in figure 1 consists of 2 blocks: on their high-dimensional representations. Feature extraction block and HD classifier block. The Feature Overall, neural networks provide a flexible and powerful extraction block envelopes two stages: Feature extraction stage framework.The advantages include neural networks being using ”inceptionV3” and Features reduction stage using suitable for various data representations, including binary, cat- K-means clustering. The feature extraction block is responsible egorical, and continuous.Neural networks can also learn com- for mapping the input image from the input space into the plex patterns and relationships in high-dimensional vectors. feature space µ. The HD classifier block uses a combination This offers flexibility and can capture non-linear dependencies. of an item memory (IM) and an associative memory (AM) to On the other hand, neural networks requires a sufficient classify the target dataset. An encoding and decoding structure amount of training data for accurate learning. Another disad- is set before and after the HD classifier block for interaction vantage is that training neural networks can be computation- with the feature extraction block to map the features from the ally expensive and time-consuming.Lack of interpretability is feature space µ to HD computing space τ. The mapping be- also a common issue compared to other similarity measures. tween the two spaces is done through an encoding from feature Finally, neural networks requires expertise in designing and space to HD computing structure. The output is extracted using training neural networks. a decoding from HD computing to flat output space structure. The similarirty method under test will be used in both the III. E XPERIMENTAL RESULTS encoding and decoding stages of the framework. The results In this section, the similarity methods described before are tested on some open-source face datasets as a case of real- life usage of the methods. A framework that was made in our previous work was used to measure similarity methods performance in different conditions, 11 datasets were used : CBCL Subjects:9 / Images:1859 / The subjects were asked to rotate their faces in depth and the lighting conditions were changed by moving a light source around the subject. NCKU Number of Subject:90 /Number of Fig. 1. Showing block diagram of the used framework. Input image data is mapped from the input space to feature space through feature extraction block images:6660 / Images were taken every 5 degrees from f : X → τ. Through the HD encoding, the data is mapped from feature space right profile to left profile in the pan rotation to HD space k : τ → µ. Color Ferret Subjects:994 / Images:11338 / Many subjects with different poses and facial expressions can be seen in table I. As can be seen, the results are very MUCT Subjects:276 / Images:3755 / Diversity close to each other signifying that all of the given methods are of lighting, age, and ethnicity equally usable in this specific high dimension vector similarity CMU Subjects:20 / Images:1872 / Face images application. of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wear- IV. C ONCLUSION ing sunglasses or not), and size This review paper delved into the realm of similarity UMIST Subjects:20 / Images:1012 / Each analysis for high-dimensional vectors, providing a review of subject is shown in a range of poses from profile to frontal various prominent similarity techniques used in this domain, views each exhibiting unique strengths and limitations. Each method Yale2B Subjects:28 / Images:16380 / Different showcased close degrees of applicability and performance, facial expression depending on the specific characteristics of the dataset and face94 Subjects:153 / Images: 3078 / Subjects were the analytical objectives. In conclusion, there is no one- speaking while a sequence of twenty images was taken to size-fits-all similarity method for high-dimensional vectors. introduce moderate and natural facial expression variation The choice of technique depends on the nature of the data, grimace Subjects:18 / Images:360 / A sequence of the problem at hand, and the specific requirements of the 20 images per individual was taken,using a fixed camera. analysis. Researchers and practitioners must carefully assess During the sequence, the subject moves his/her head and the characteristics of their datasets and the objectives of their makes grimaces which get more extreme towards the end research to select the most appropriate similarity measure. of the sequence For instance, The method used to measure performance of ORL/Olivette Subjects:40 / Images:400 / Im- the methods in this paper was based on hyperdimensional ages taken at various times, varying the lighting, facial computing. This made the results very close to each other. expressions (open / closed eyes, smiling / not smiling) In other high dimemnsional problems, the results may vary and facial details (glasses / no glasses) greatly. As technology and research continue to progress, AR Subjects:136 / Images:3315 / Different it is likely that new similarity methods tailored to high- facial expressions, illumination conditions and occlusions dimensional vectors will emerge. Moreover, advancements in Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply. TABLE I R ESULT OF TESTING DIFFERENT SIMILARITY METHODS AGAINST DIFFERENT DATASETS Method CBCL NCKU FERET MUCT CMU UNMIST Yale2B face94 Grimace ORL / Olivetti AR Sørensen-Dice 99.35 98.59 80.40 99.70 85.86 97.02 99.42 100.00 100.00 96.30 98.50 Cosine 99.35 98.50 80.19 99.70 86.18 97.02 99.39 100.00 100.00 96.30 98.13 jaccard 99.35 98.59 80.40 99.70 85.86 97.02 99.42 100.00 100.00 96.30 98.50 Euclidean 99.35 98.50 80.19 99.70 86.18 97.02 99.39 100.00 100.00 96.30 98.13 Minkowski 99.35 98.50 80.19 99.70 86.18 97.02 99.39 100.00 100.00 96.30 98.13 Hamming 99.35 98.50 80.19 99.70 86.18 97.02 99.39 100.00 100.00 96.30 98.13 Neural Network 99.03 96.65 75.04 98.96 80.59 95.83 99.24 100.00 100.00 98.15 95.51 computational power and algorithmic efficiency will further T.-H. Wang and J.-J. J. Lien, “Facial expression recognition system enhance the accuracy and scalability of existing techniques. based on rigid and non-rigid motion separation and 3d pose estimation,” Pattern Recognition, vol. 42, no. 5, pp. 962–977, 2009. In the pursuit of innovation and discovery, it is crucial for N. C. K. University. Facedetect and poseestimate data set. researchers and practitioners to remain informed about the http://robotics.csie.ncku.edu.tw/Databases/FaceDetect PoseEstimate.htm. latest developments in similarity analysis for high-dimensional P. Phillips, H. Moon, S. Rizvi, and P. Rauss, “The feret evaluation methodology for face-recognition algorithms,” IEEE Transactions on vectors. Armed with this knowledge, they can make informed Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090– decisions, extract meaningful insights, and unlock the full 1104, 2000. potential of high-dimensional data across a wide range of U. D. of Defense (DoD) Counterdrug Technology Development Program Office. color feret database. Last accessed date :24-04- applications. 2022. [Online]. Available: https://www.nist.gov/programs-projects/face- recognition-technology-feret R EFERENCES S. Milborrow, J. Morkel, and F. Nicolls, “The MUCT Landmarked Face Database,” Pattern Recognition Association of South Africa, 2010, I. H. Sarker, “Machine learning: Algorithms, real-world applications and http://www.milbo.org/muct. research directions,” SN computer science, vol. 2, no. 3, p. 160, 2021. ——. The muct face database. Last accessed date :24-04-2022. K. Echihabi, “High-dimensional vector similarity search: from time [Online]. Available: http://www.milbo.org/muct/ series to deep network embeddings,” in Proceedings of the 2020 ACM T. M. Mitchell, “Artificial neural networks,” Machine learning, vol. 45, SIGMOD International Conference on Management of Data, 2020, pp. pp. 81–127, 1997. 2829–2832. ——. Cmu face images data set. Last accessed date :24-04-2022. [On- M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, line]. Available: http://archive.ics.uci.edu/ml/datasets/cmu+face+images R. Wald, and E. Muharemagic, “Deep learning applications and chal- D. B. Graham and N. M. Allinson, “Characterising virtual eigensigna- lenges in big data analytics,” Journal of big data, vol. 2, no. 1, pp. 1–21, tures for general purpose face recognition,” in Face Recognition. 2015. Springer, 1998, pp. 446–456. D. Kleyko, D. Rachkovskij, E. Osipov, and A. Rahimi, “A survey on ——. Face database. Last accessed date :24-04-2022. [Online]. hyperdimensional computing aka vector symbolic architectures, part Available: http://eprints.lincoln.ac.uk/id/eprint/16081/ ii: Applications, cognitive models, and challenges,” ACM Computing K.-C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for Surveys, vol. 55, no. 9, pp. 1–52, 2023. face recognition under variable lighting,” IEEE Transactions on pattern M. Yasser, K. F. Hussain, and S. A. E.-F. Ali, “An efficient hyperdimen- analysis and machine intelligence, vol. 27, no. 5, pp. 684–698, 2005. sional computing paradigm for face recognition,” IEEE Access, vol. 10, Y. University. The extended yale face database b. pp. 85 170–85 179, 2022. Last accessed date :24-04-2022. [Online]. Available: http://vision.ucsd.edu/ leekc/ExtYaleDatabase/ExtYaleB.html P. Kanerva, “Binary spatter-coding of ordered k-tuples,” in International C. T. University. Facial images: Faces94. Last conference on artificial neural networks. Springer, 1996, pp. 869–873. accessed date :24-04-2022. [Online]. Available: ——, “Hyperdimensional computing: An introduction to computing https://cmp.felk.cvut.cz/ spacelib/faces/faces94.html in distributed representation with high-dimensional random vectors,” ——. Facial images: Grimace. Last accessed date :24-04-2022. [Online]. Cognitive computation, vol. 1, no. 2, pp. 139–159, 2009. Available: https://cmp.felk.cvut.cz/ spacelib/faces/faces94.html ——, “Computing with 10,000-bit words,” in 2014 52nd annual Allerton F. Samaria and A. Harter, “Parameterisation of a stochastic model for conference on communication, control, and computing (Allerton). IEEE, human face identification,” in Proceedings of 1994 IEEE Workshop on 2014, pp. 304–310. Applications of Computer Vision, 1994, pp. 138–142. T. Andreescu and D. Andrica, Complex Numbers from A to... Z. A. L. Cambridge. The database of faces. Last accessed date :24-04-2022. Springer, 2006. [Online]. Available: https://cam-orl.co.uk/facedatabase.html D. J. Robinson, “An introduction to abstract algebra,” 2003. A. Martinez and R. Benavente, “The ar face database: Cvc technical A. H. Murphy, “The finley affair: A signal event in the history of forecast report 24,” 1998, department of Computer Science,Universitat Autònoma verification,” Weather and forecasting, vol. 11, no. 1, pp. 3–20, 1996. de Barcelona. T. A. Sorensen, “A method of establishing groups of equal amplitude in A. M. Martinez. Ar face database. Last accessed date :24-04-2022. [On- plant sociology based on similarity of species content and its application line]. Available: https://www2.ece.ohio-state.edu/ aleix/ARdatabase.html to analyses of the vegetation on danish commons,” Biol. Skar., vol. 5, C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking pp. 1–34, 1948. the inception architecture for computer vision,” in Proceedings of the P.-N. Tan, M. Steinbach, and V. Kumar, “Introduction to data mining. IEEE conference on computer vision and pattern recognition, 2016, pp. ed,” Addison-Wesley Longman Publishing Co., Inc., 2005. 2818–2826. R. Fischer, J. Skelley, and B. Heisele. (2003) Mit-cbcl face recognition database. Massachusetts Institute of Technology,Last accessed date :24-04-2022. [Online]. Available: http://cbcl.mit.edu/software- datasets/heisele/facerecognition-database.html Authorized licensed use limited to: Florida Atlantic University. Downloaded on December 09,2024 at 05:27:43 UTC from IEEE Xplore. Restrictions apply.