EC/0056 - METODI DI APPRENDIMENTO STATISTICO PER IL DATA SCIENCE
Academic Year 2021/2022
Free text for the University
FRANCESCO MOLA (Tit.)
- Teaching style
- Lingua Insegnamento
|[11/82] DATA SCIENCE, BUSINESS ANALYTICS AND INNOVATION||[82/00 - Ord. 2018] PERCORSO COMUNE||9||54|
Main objectives of the course is focus attention on Statistical data driven methods based on machine learning algorithms. Several statistical methods going far beyond the classical statistical modelling approaches will be considered. These methods are considered the basics of the Statistical Learning approach, i.e. methods based on a "learning from data" paradigm. Students will be familiar and able to apply methods developped during last 20 years thanks to the evoultion of the computational statistics and the increase in computing power. They will be able to analyze data, building predictive models, investigate on the main factors that influence choices in Economics, Business and Management.
KNOWLEDGE AND UNDERSTANDING
The students will acquire knowledges of new statistic methodologies; moreover they will be able to compare those with other methodologies met in the courses of first level. The parallelism with methods used in other disciplines concerning Data Science will systematically be considered. Particular attention will be given to the critical aspects of the interpretation of the results, of the applicability of the alternative methods to economic and business data.
APPLYING KNOWLEDGE AND UNDERSTANDING
Practical laboratory applications consistent with theoretical arguments treated during classes, will allow students to identify the right analytical strategies oriented to the “problem solving”. Both real data sets and artificial data sets will be considered. Particular attention will be payed to the ability to verify the applicability conditions of the various proposed statistical methods and models, depending on the problem to be solved as well as depending on the size and the type of the available data.
The effort in analyzing both real cases and ad hoc artificial cases will allow students to enhance their autonomy of judgment, as they will be called to make decisions. The decision process will be the last step after they apply methodologies to the analyzed data and will take into account the predicted values deriving from the “learning from data” process.
The students have to show with accuracy and clarity the analyzed methodologies and the provided results. They must be able to address both to specialized and not specialized people. Moreover, they have to motivate their arguments with graphics, tabular representations as well as with correctly formalized models.
The periodic analysis of scientific articles, together to the theoretical lessons and the integrative activities and practices, they will allow to improve and to refine the abilities of learning of the students in the prosieguo of their academic and professional career.
Statistics (exploratory statistics, probability and Inference).
Basics Informatics (first level course).
Linear Algebra (basics).
Getting started with Statistical Learning and Data Science.
How to apply Statistical Learning methods to economic and business problems.
Prediction in Regression and Classification.
Linear Discriminant Analysis and Quadratic Discriminant Analysis.
Resampling methods (Bootstrap and Cross Validation).
Model selection (Forward and Backward model selection).
Lasso and Ridge Regression.
Smoothers, Spline and GAM.
Non parametric methods: Classificazione and Regressione Trees.
Unsupervised Statistical Learning Methods (Clustering and Principal Component Analysis).
The course is based on 54 hours lectures and practical sessions. Additional 10 hours will be dovoted to lab sessions and the analisys of real data sets. In order to encourage team building, groups of student have to challenge other groups in the data treatment.
Verification of learning
Classes are based on a theory/practice approach. The final evaluation is organized in three steps (in brackets the weight):
a) Labs and homework (20%); periodically, the ability in applying methods will be verifyed. The objective is to verify the “applying knowledge and understanding” skills.
b) Report preparation (30%) and results presentation (10%); it is based on real problem. A brief description of a problem as well as the main objective to attain will be provided. Data sets and software will be available. A final report with the main results, graphics and tables has to be prepared. It will be furtherly discussed in the oral exam (see point c). Objective is to verify the communication skills as well as knowledge and understanding.
c) Oral Exam (40%); at this stage the results and the methodological choices concerning the report of point b) will be considered. Objective is to verify communication skills and learning skills..
The grade is within a 18 (minimal skills) -30 cum laude (maximum skills) range.
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
An Introduction to Statistical Learning with Applications in R.
Springer- (Una versione gratuita in formato e-book è disponibile).
Some more material. Data sets and papers will be available.
To know more…:
The Elements of Statistical learning. Data Mining, Inference and Prediction. Trevor Hastie , Robert Tibshirani, Jerome Friedman- Springer. 2nd Edition
Additional support materials, such as slides and video material, papers, links to international databases and repositories will be distributed.
Interactive LIM, projector and blackboard usage is expected.