Fundamentals and Applications of Machine Learning
Doctoral Programme in Computer Science Faculty of Sciences University of Porto, Porto
September 22, 2020
Data, Data, Data
• There is an avalanche of data being produced by people, applications and machines.
• In the last 2 years more data was produced than in the entire human history.
• The amount of digital information increases 10x every five years.
• Creates important challenges in storage systems.
• Some of these data hide important value.
Motivation
Credit cards have billions of transactions/year 1M
transactions/hour feeding a 2.5
PetaB DB
Large Synoptic Survey Telescope generates
140 Tb / 5 days
1 Human genome
~100Gb, sequenced in 4/5 days.~1 Million
expected to be completed.
• Smartphone apps
• Public web
• Machine Log data
• Data storage
• Sensor data
• Scientific literature archives
• Many other sources ...
• ML is revealing to be a critical tool to extract value from these data on the different domains.
• The high demand for ML specialists to work in problems such as self- driving cars, DNA genome analysis or cancer prediction, climate change and many other fields.
• Prompts the need to train the next generation of computer scientists with the theoretical and practical knowledge of ML.
• Provide a background that allows them to develop projects that use the latest technologies following the best implementation practices.
Motivation
• Understand ML concepts, tasks, and workflow.
• Learn how to implement and apply predictive, classification, clustering, information retrieval and deep learning algorithms to real datasets.
• Develop a critical view and be able to choose, apply and evaluate the most adequate problem solving techniques in ML;
• Be able to design, specify, implement and validate advanced software tools for specific data analysis problems; assess the quality of the models using the relevant error metrics;
• Be able to interact with professionals from the domain field in the process of software development and generate the adequate reporting.
Goals
• Fundamentals of ML; Data Pre-processing and exploration;
• Unsupervised learning:
Clustering, PCA, MDS; t-SNE; UMAP
• Regression:
Univariate and Multivariable linear regression;
• Classification methods:
Decision Trees; K-NN; Support Vector Machines; Neural Networks; Logistic Regression, Ensemble approaches: Bagging, Boosting; Random Forests; Gradient Boosted Decision Trees
• Predictive pipeline:
Cross-validation, hyper-parameter tuning, etc…
• Neural Networks:
Principles of Feed-Forward NNs; Training; Application to regression and classification
• Deep Learning:
Advanced architectures (deep convolutional, recurrent, LSTMs, auto-encoders,...)
Program
Theoretical Aspects:
• Slides from the classes
• Recommended bibliography
Practical Assignments:
• Python 3.0 + Scikit-Learn
• Keras (+ TensorFlow)
• Jupiter Notebooks / Scripts
Evaluation:
• Weekly assignments
• Project presentation and defense
Framework
References
o An Introduction to Statistical Learning with Applications in R
G. James, D. Witten, T. Hastie and R.
Tibshirani
o Python Machine Learning S. Raschka
o Deep Learning With Python, Jason Brownlee o Python Deep Learning, I. Vasilev et al.
o Hands-on machine learning with scikit-learn and tensorflow,
Aurélien Géron
Pedro G. Ferreira (Assistant Professor @ DCC – FCUP; pgferreira@fc.up.pt) Rita Ribeiro (Assistant Professor @ DCC – FCUP; rpribeiro@fc.up.pt)
Pétia Georgieva (Assistant Professor @ DETI/IEETA - UA; petia@ua.pt) Miguel Rocha (Associate Professor @ DI- UMINHO; mrocha@di.uminho.pt)