The Introduction.pdf
Document Details

Uploaded by WellIntentionedAmazonite
The Hong Kong University of Science and Technology
2024
Tags
Full Transcript
DSAA5002 Data Mining and Knowledge Discovery in Data Science Li, Jia DSA Thrust, Information Hub The Hong Kong of Science and Technology (Guangzhou) Spring Term Jan 22, 2024 1 This is the DSAA5002 for PhD/Mphil students. There is another 5002 only for Msc students taught by Prof. Chen Lei. 2 Instruc...
DSAA5002 Data Mining and Knowledge Discovery in Data Science Li, Jia DSA Thrust, Information Hub The Hong Kong of Science and Technology (Guangzhou) Spring Term Jan 22, 2024 1 This is the DSAA5002 for PhD/Mphil students. There is another 5002 only for Msc students taught by Prof. Chen Lei. 2 Instructor LI,JIA W2-605 [email protected] Office hour: Wes, 3:30PM-4:30PM TA Chen, Xiaolong Zhao, Haihong Li, Yuhan [email protected] [email protected] [email protected] Time and Venue Mon 3:00PM – 5:50PM W1 101 Course Page https://sites.google.com/view/lijia/courses/dsaa5002 3 Prerequisites Data Structure and Algorithms Decision Tree Hierarchical Clustering Linear Algebra Spectral Clustering Probability Theory GMM HMM Expectation Maximization 4 Data Mining Applications PaLM 2 ChatGPT 5 In Risk Management Does the model fit into real scenario? Is the model robust/efficient enough? Sense-making 6 Assessment Scheme Midterm Examination (50%) Late March, open book with only printed material (No Internet Access), samples will be provided by exercises (1, 2, 3). Individual Project (50%) Early May Presentation (25%) oral presentation with slides, 10 min each Report (25%) 7 Project Requirement A research topic related to course material, ACM format with strict 6 pages limitation, see the following for reference https://kdd.org/kdd2021/calls/view/call-for-research-track-papers 1.The report should at least consist of introduction, related work, methodology and experiment. Theoretical deviation is not a necessity but encouraged. 2. Use concise and clear language. 3. Clearly declare your difference with previous works. 4. If there is any theoretical deviation, check your assumption and make sure it is non-fragile. Report will be graded based on: Writing (25%) Novelty (25%) Experiment (25%) Others (25%) 8 Outline Data Types of data Normalization/cleaning Similarity Classification Models Decision tree Generalization theorem SVM Ensembles Kernel methods Cluster Models K-means Hierarchical clustering Spectral clustering Association Analysis Apriori Graph Analytics PageRank HITS and SimRank Expectation Maximization GMM/HMM Topic models Dimension Reduction PCA Anomaly Detection 9 Data Mining is NOT Just a Course 10 Data Mining is NOT Just a Course 11 Data Mining is NOT Just a Course 12 Data Mining is NOT Just a Course 13 Data Mining is NOT Just a Course 14 Data Mining is NOT Just a Course 15 Data Mining is NOT Just a Course 16 Q&A 17