Feature Spaces and Distance Measures PDF

Summary

This document discusses feature spaces and distance measures in the context of data mining. It describes the process of data pre-processing, transformation, and evaluation. The document also covers how stemming can be used to handle variations in words.

Full Transcript

focusing --; ► get the data g. g- 8 ::,;;...

focusing --; ► get the data g. g- 8 ::,;; CD ◊ ► organize data (file/database) ► select relevant data o' -o :,;: j. § 0 0 preprocessing ► integrate heterogeneous C]D 0 "'O 0 '""" £ m. data 0 a. 0 CD ► check for completeness CJ) ► check for consistency - CJ) 3 o' 0 transformation., 3 iil a. ► discretize numeric attributes... :::, V> CD ► infer new attributes g·...' ► select relevant attributes...... - co data mining co ► generate patterns or models :::, evaluation ► assess "interestingness" for the user 5 ► validate models statistically io i ""a. /1) Problem 2 ► The same word can appear differently (learn, learning; go, went). ► Solution: stemming. Any word is mapped to its stem (base or root form). ► For English texts, algorithmic stemming is possible (Porter's stemming algorithm, see http://tartarus.org/~martin/ PorterStemmer/index. html). ► For other languages, dictionaries are required.

Use Quizgecko on...
Browser
Browser