EGIT 563: Data Mining (Semester 2/2017)
   
Description This course will provide practical coverage of essential data mining topics including:
  • Data mining concepts
  • Data and Data exploration
  • Data inputs and outputs
  • Data mining algorithms
  • Evaluating data mining results
  • Advance issues in various applications.

Students will work with the following software:

Rapidminer Studio 8.0 [download]

Weka data mining tool set version 3.7.10 [download] [manual].

Course introduction on youtube.

   
Textbook

Introduction to Data Miningby Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, 2005

Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) by Ian H. Witten and Eibe Frank. Morgan Kaufmann, 2005.

Course Date

Saturday, 9.00am -12.00pm, Room: R403

   
Instructors
Asst.Prof. Sotarat Thammaboosadee, Ph.D.
Office: R305
Line: zotarutto
e-mail: zotarat@gmail.com
Office Hours: kindly appoint
 


Tentative Course Schedule
Week Date Lectures Topics Materials (Lectures and Lab) Assignments
1 Saturday, January 13, 2018 

Introduction to Data Science

and Data Mining

Big Data / Data Science Ecosystem /
Data Mining Tasks and Processes
Wk01.0-Course Introduction
Wk01.1-Intro to DM
Wk01.2-RapidMiner
HW1 (3pts)
2 Saturday, January 27, 2018 Data and Data Exploration Data and Data Types / Structured and
Unstructured Data / Data Visualization / Measurement
Wk02.1-Data
Data01_customer-churn.csv
HW2 (3pts)
3 Saturday, February 3, 2018 Data Integration and Transformation Database / Data Warehouse / NoSQL /
DataLake / Data Quality /Data Integration
Wk03.1-DataIntegration
Data02_customer-info.xlsx
Data03_customer-churn.csv
TeamCoachOhm-DB.xlsx
 
4 Saturday, February 10, 2018

Data Preprocessing

Attribute Selection / Missing Value Handling /
Data Value Transformation / Outlier Analysis
Wk04.1-DataPreprocessing
Normalize.xlsx
 
5 Saturday, February 17, 2018 Introduction to Machine Learning
and Basic Classification Techniques
Knowledge Representation / Machine Learning Concepts/
OneR / Naïve Bayes / Decision Tree /
Linear Regression / k-Nearest Neighbors
Wk05.1-BasicClassification
Data04_Campaign.xlsx
Data05_Campaign-new.xlsx
Data10_Student.csv
HW3 (4pts)
6 Saturday, February 24, 2018 Model Evaluation Techniques Train-Test Validation / Cross Validation /
Leave-one out/ Confusion Matrix / ROC Curve
Wk06.1-ModelEvaluation
Data06_Heart-Attack.csv
Data07_Titanic.xlsx
Data08_wine.csv
 
7 Saturday, 03-Mar 2018 ----Midterm Examination----
and
Research Paper Assigning
(All topics covered in Week 1-6) (Midterm Examination Outline)  
8 Saturday, 10-Mar 2018 Advanced Data Preprocessing Tecniques Data Rebalancing / Principle Components Analysis /
Attributes Weighting / Local Outlier Factors /
Feature Subset Selection
Wk08.1-AdvancedDataPrep
Data06-Tel.xlsx
Data08_Automobile.csv
Data09_Student.csv
 
9 Saturday, 17-Mar 2018 Advanced Classification Techniques
and Model Integration
Neural Network / Random forest /
Gradient Boosted Tree / Voting Method /
Ensemble Method
Wk09.1-AdvancedClassification.pdf
Data08_car.csv
HW4 (4pts)
(in lecture slide)
10 Saturday, 24-Mar 2018 Clustering Techniques  K-Means / Density-based Method /
Semi-supervised Learning
Wk10.1-Clustering Techniques
Data11-StudentHobbies.csv
Data12-RFM
 
11 Saturday, 31-Mar 2018 Association Rules Discovery / Apriori / FP-Growth / Sequential Pattern Mining Wk11.1-AssociationRules
Data13_supermarket.csv
Data14_Transaction.csv
Data15_bank_call_center.csv
HW5 (3pts)
(in lecture slide)
Friday, 06-Apr 2018 Project Proposal Submission [Project Proposal Topics] and
[Full Project Topics]
12 Saturday, 07-Apr 2018 (Special lectures)
Introduction to Big Data Analytics Ecosystem
with Microsoft Azure
     
13 Saturday, 21 Apr 2018 Introduction to Text Mining
Data Governance and Data-driven Organization
Introduction to Text Processing Methods/
Text Mining Applications

Data Governance Concepts and Framework
Wk13.1-TextMining
Wk13.2-DataGov
Data16-twitter.csv
Data17-review.zip
 
14 Saturday, 28 Apr 2018 Research Presentation (Round 1)      
15 Saturday, 5 May 2018 Research Presentation (Round 2)      
16 Saturday, 12-May 2018 ----Final Examination----   (Final Examination Outline)

Remark: There are no class on 20-Jan and 14-Apr.


Resources:

Weka: http://www.cs.waikato.ac.nz/~ml/weka/index.html
Rapidminer: http://docs.rapidminer.com/

Grading

Grades will be based on points earned from assignments, individual project, research presentation, midterm and final examination as follows:

Grading Procedures:

Assignments: Assignments will be graded on the +/ok/- scale, where + indicates excellent, ok indicates satisfactory, and - indicates needs improvement. Assignments will be given each selected class. Your solutions to these assignments are your way of telling the instructor about your mastery of this course. Your solutions must be clearly different than those turned in by others in the class and represent a unique and special effort on your part.

Assignment Submission: The deadline of each assignment is set to a week after an assigning date, submitted to zotarat@gmail.com. The email must contains a subject likes "HW1-60xxxxx". The submitted assignment MUST be in PDF file. Late submission will not be accepted.

Individual Project: Students have to propose a complete data mining process for their selected data set, including domain problems, nature and example of data, selected data preprocessing methods, selected data mining algorithms with comparisons, model evaluation, and possible deployment in real world. All works must be identified in a report and submitted as a PDF file.


Attendance Policy:

Lectures: Due to the flexibility issue, there is no score criteria for attendance. Students are expected to be responsible themselves to attend and follow the weekly lecture by e-learning. If students face the necessity to miss the presentation or examination, they must inform the instructor before the scheduled time.