Select Academic Year:     2017/2018 2018/2019 2019/2020 2020/2021 2021/2022 2022/2023
Second Semester 
Teaching style
Lingua Insegnamento

Informazioni aggiuntive

Course Curriculum CFU Length(h)
[60/65]  MATHEMATICS [65/20 - Ord. 2012]  Applicativo 6 48
[60/68]  PHYSICS [68/00 - Ord. 2014]  PERCORSO COMUNE 6 48
[60/73]  INFORMATICS [73/00 - Ord. 2017]  PERCORSO COMUNE 6 48


Students should have background knowledge of algorithms and data structures.
Basic concepts of databases, probability and statistics are also useful.


- What is Data Mining?
- Data Mining and Knowledge Discovery.

- Types of data and general characteristics of datasets
- Data Quality
- Data Pre-processing
- Measures of similarity and dissimilarity.

- General approach to solving a classification problem
- Classification techniques: Decision trees, Rule-based classifiers, Nearest-Neighbor classifiers, Bayesian classifiers, Artificial Neural Networks, Support Vector Machines
- The problem of model overfitting
- Evaluating classification models: methods and metrics for performance evaluation, methods for model comparison.

- Problem definition (market-basket model)
- Support and confidence of association rules
- Apriori algorithm: frequent itemset generation, rule generation
- Evaluation of association rules.

- Different types of clustering
- K-means algorithm (and extensions)
- Hierarchical techniques
- Cluster evaluation.

- How to apply data mining techniques
- Exercises.

Teaching Methods

The teaching activity consists of 48 hours of frontal lectures, which also include guided exercises. Additional exercises are assigned as homework and then discussed in class, to give the students the opportunity to reinforce and self-assess their knowledge/skills. The teacher provides further support and personalized assistance during the established office hours and by e-mail.

Verification of learning

The assessment for this course includes:
- a written exam involving both theory questions (in the form of open answer as well as closed answer questions) and exercises about the topics covered during the course; the aim is to evaluate the extent to which the student knows/understands the taught concepts and is able to apply them to practical problems;
- a final project which may involve the discussion of a scientific article or the application of data mining techniques to real-world data (with a presentation/discussion of the results); the aim is to evaluate the extent to which the student can autonomously cope with new topics/case studies.

The grading system is from 1/30 to a maximum of 30/30 cum laude.
In more detail:
- up to 28 points are assigned based on the written exam;
- up to 4 points are assigned based on the project;
- the points gained with the written exam and those gained with the project are added together: to pass the exam, it is necessary to obtain at least 18 points in total; if the total of points is superior to 30 (31 or 32), the final grade will be ‘30 cum laude’.

To pass the exam (final grade of at least 18/30), the student must demonstrate a sufficient knowledge of the data mining techniques covered in the course (pre-processing, classification, clustering, association analysis). To achieve the highest grade (‘30 cum laude’), the student must demonstrate an excellent knowledge of the course topics and must be able to apply them to the solution of problems. Communication skills and proper terminology also contribute to the final grade.


Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar, Introduction to Data Mining, Pearson, 2018. (primary textbook)

Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, DATA MINING: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016. (for supplementary readings)

More Information

Auxiliary learning resources:
lecture slides, exercises (some of them with solutions), scientific articles.

All the course material is available at https://elearning.unica.it/

Questionnaire and social

Share on:
Impostazioni cookie