Teachings

Select Academic Year:     2017/2018 2018/2019 2019/2020 2020/2021 2021/2022 2022/2023
Professor
BARBARA PES (Tit.)
Period
Second Semester 
Teaching style
Convenzionale 
Lingua Insegnamento
ITALIANO 



Informazioni aggiuntive

Course Curriculum CFU Length(h)
[60/65]  MATHEMATICS [65/60 - Ord. 2020]  MATEMATICA APPLICATA 6 48
[60/68]  PHYSICS [68/40 - Ord. 2020]  FISICA MEDICA E APPLICATA 6 48
[60/73]  INFORMATICS [73/00 - Ord. 2017]  PERCORSO COMUNE 6 48

Objectives

The course aims at presenting the main concepts, methods and techniques in the field of data mining and knowledge discovery.

In more detail, the course objectives include:

- to provide students with a solid background on the main KDD (Knowledge Discovery in Databases) tasks, including data preparation, extraction of patterns from data using supervised data mining approaches (classification) as well as unsupervised approaches (clustering, association rule mining), and pattern evaluation [KNOWLEDGE AND UNDERSTANDING];

- to develop problem-solving skills; a number of real-world problems will be presented and discussed [APPLYING KNOWLEDGE AND UNDERSTANDING];

- to develop critical thinking and decision-making skills; the students will learn to differentiate between situations for applying different data mining techniques [MAKING JUDGEMENTS];

- to make students able to discuss data mining issues with proper terminology [COMMUNICATION SKILLS];

- to stimulate their capacity to deepen the course topics in an autonomous way, also addressing research topics, and to perform a self-directed piece of practical work, that may require the application of data mining techniques to new problems/contexts [LEARNING SKILLS].

Prerequisites

Students should have background knowledge of algorithms and data structures.
Basic concepts of databases, probability and statistics are also useful.

Contents

1) INTRODUCTION
- What is Data Mining?
- Data Mining and Knowledge Discovery.

2) DATA
- Types of data and general characteristics of datasets
- Data Quality
- Data Pre-processing
- Measures of similarity and dissimilarity.

3) CLASSIFICATION
- General approach to solving a classification problem
- Classification techniques: Decision trees, Rule-based classifiers, Nearest-Neighbor classifiers, Bayesian classifiers, Artificial Neural Networks, Support Vector Machines
- The problem of model overfitting
- Evaluating classification models: methods and metrics for performance evaluation, methods for model comparison.

4) ASSOCIATION ANALYSIS
- Problem definition (market-basket model)
- Support and confidence of association rules
- Apriori algorithm: frequent itemset generation, rule generation
- Evaluation of association rules.

5) CLUSTERING
- Different types of clustering
- K-means algorithm (and extensions)
- Hierarchical techniques
- Cluster evaluation.

6) THE WEKA DATA MINING WORKBENCH
- How to apply data mining techniques
- Exercises

Teaching Methods

The teaching activity consists of 48 hours of frontal lectures, which also include guided exercises. Additional exercises are assigned as homework and then discussed in class, to give the students the opportunity to reinforce and self-assess their knowledge/skills. The teacher provides further support and personalized assistance during the established office hours and by e-mail.

The course will be delivered in person. The lessons can be integrated with audio-visual materials and streaming.

Verification of learning

The assessment for this course includes:
- a written exam involving both theory questions (in the form of open answer as well as closed answer questions) and exercises about the topics covered during the course; the aim is to evaluate the extent to which the student knows/understands the taught concepts and is able to apply them to practical problems;
- a final project which may involve the discussion of a scientific article or the application of data mining techniques to real-world data (with a presentation/discussion of the results); the aim is to evaluate the extent to which the student can autonomously cope with new topics/case studies.

The grading system is from 1/30 to a maximum of 30/30 cum laude.
In more detail:
- up to 28 points are assigned based on the written exam;
- up to 4 points are assigned based on the project;
- the points gained with the written exam and those gained with the project are added together: to pass the exam, it is necessary to obtain at least 18 points in total; if the total of points is superior to 30 (31 or 32), the final grade will be ‘30 cum laude’.

To pass the exam (final grade of at least 18/30), the student must demonstrate sufficient knowledge of the data mining techniques covered in the course (pre-processing, classification, clustering, association analysis). To achieve the highest grade (‘30 cum laude’), the student must demonstrate an excellent knowledge of the course topics and must be able to apply them to the solution of problems. Communication skills and proper terminology also contribute to the final grade.

Texts

Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar, Introduction to Data Mining, Pearson, 2018. (primary textbook)

Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, DATA MINING: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016. (for supplementary readings)

More Information

Auxiliary learning resources:
lecture slides, exercises with solutions, scientific articles.

The course materials will be available at https://elearning.unica.it/.

Questionnaire and social

Share on:
Impostazioni cookie