🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

2. Feature Spaces and Distance Measures.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

focusing --; â–º get the data g. g- 8 ::,;;...

focusing --; ► get the data g. g- 8 ::,;; CD ◊ ► organize data (file/database) ► select relevant data o' -o :,;: j. § 0 0 preprocessing ► integrate heterogeneous C]D 0 "'O 0 '""" £ m. data 0 a. 0 CD ► check for completeness CJ) ► check for consistency - CJ) 3 o' 0 transformation., 3 iil a. ► discretize numeric attributes... :::, V> CD ► infer new attributes g·...' ► select relevant attributes...... - co data mining co ► generate patterns or models :::, evaluation ► assess "interestingness" for the user 5 ► validate models statistically io i ""a. /1) Problem 2 ► The same word can appear differently (learn, learning; go, went). ► Solution: stemming. Any word is mapped to its stem (base or root form). ► For English texts, algorithmic stemming is possible (Porter's stemming algorithm, see http://tartarus.org/~martin/ PorterStemmer/index. html). ► For other languages, dictionaries are required.

Use Quizgecko on...
Browser
Browser