60/15 - DATA MINING
Academic Year 2019/2020
Free text for the University
BARBARA PES (Tit.)
- Teaching style
- Lingua Insegnamento
|[60/65] MATHEMATICS||[65/20 - Ord. 2012] Applicativo||6||48|
|[60/68] PHYSICS||[68/00 - Ord. 2014] PERCORSO COMUNE||6||48|
|[60/73] INFORMATICS||[73/00 - Ord. 2017] PERCORSO COMUNE||6||48|
Students should have background knowledge of algorithms and data structures.
Basic concepts of databases, probability and statistics are also useful.
- What is Data Mining?
- Data Mining and Knowledge Discovery.
- Types of data and general characteristics of datasets
- Data Quality
- Data Pre-processing
- Measures of similarity and dissimilarity.
- General approach to solving a classification problem
- Classification techniques: Decision trees, Rule-based classifiers, Nearest-Neighbor classifiers, Bayesian classifiers, Artificial Neural Networks, Support Vector Machines
- The problem of model overfitting
- Evaluating classification models: methods and metrics for performance evaluation, methods for model comparison.
4) ASSOCIATION ANALYSIS
- Problem definition (market-basket model)
- Support and confidence of association rules
- Apriori algorithm: frequent itemset generation, rule generation
- Evaluation of association rules.
- Different types of clustering
- K-means algorithm (and extensions)
- Hierarchical techniques
- Cluster evaluation.
6) THE WEKA DATA MINING WORKBENCH
- How to apply data mining techniques
The teaching activity consists of 48 hours of frontal lectures, which also include guided exercises. Additional exercises are assigned as homework and then discussed in class, to give the students the opportunity to reinforce and self-assess their knowledge/skills. The teacher provides further support and personalized assistance during the established office hours and by e-mail.
Verification of learning
The assessment for this course includes:
- a written exam involving both theory questions (in the form of open answer as well as closed answer questions) and exercises about the topics covered during the course; the aim is to evaluate the extent to which the student knows/understands the taught concepts and is able to apply them to practical problems;
- a final project which may involve the discussion of a scientific article or the application of data mining techniques to real-world data (with a presentation/discussion of the results); the aim is to evaluate the extent to which the student can autonomously cope with new topics/case studies.
The grading system is from 1/30 to a maximum of 30/30 cum laude.
In more detail:
- up to 28 points are assigned based on the written exam;
- up to 4 points are assigned based on the project;
- the points gained with the written exam and those gained with the project are added together: to pass the exam, it is necessary to obtain at least 18 points in total; if the total of points is superior to 30 (31 or 32), the final grade will be ‘30 cum laude’.
To pass the exam (final grade of at least 18/30), the student must demonstrate a sufficient knowledge of the data mining techniques covered in the course (pre-processing, classification, clustering, association analysis). To achieve the highest grade (‘30 cum laude’), the student must demonstrate an excellent knowledge of the course topics and must be able to apply them to the solution of problems. Communication skills and proper terminology also contribute to the final grade.
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar, Introduction to Data Mining, Pearson, 2018. (primary textbook)
Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, DATA MINING: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016. (for supplementary readings)
Auxiliary learning resources:
lecture slides, exercises (some of them with solutions), scientific articles.
All the course material is available at https://elearning.unica.it/