ML_course_maastricht

Introduction to Machine Learning for Policy Analysis (Maastricht 2025)

Welcome

Dear Students, welcome to the course repository, where you will find all informations supplementing this term’s machine learning for policy analysis course. Here you will find the lectures on the two topics introduced (Supervised Machine Learning & Natural Language Processing) in video format plus facilitating rmarkdown notebooks.

To get the most out of this lectures, I expect you to have R & R-Studio installed and updated on your local machine, and to be generally used to do data analytics in R using the ´tidyverse´ ecosystem. If that is not the case, you might want to take a look at the adittional resoures such as ´My R Brush-up course (Bonus)´ below, where I recap the fundamentals of working with data in R.

::::::::::::::> Watch this intro video to get started <:::::::::::::::::

Lecturer (briefly about me)

Daniel is an Strategic Business Manager at NovoNordisk, where his team develops data driven methods and workflows to improve the performance of clinical trials. This involves the use of machine learning to predict outcomes and costs of clinical trials, and natural language processing to extract informations out of trial protocols.

He is also an Associate Professor in Data Science & Innovation Economics at the Aalborg University Business School, where he was leading the Data Science research track at the AI:Growth lab, and coordinated teaching at the Social Data Science (SDS) master specialization. His research is dedicated to the development and application of data-driven methods to map, understand, and predict technological change, and its causes and consequences for socioeconomic systems on various levels of aggregation. His current contextual focus is the dynamics of AI research and industry.

His research is featured in leading academic journals such as Research Policy, but also attracted attention and funding from the industry, and lead to price-winning applications. Daniel is actively engaged in initiatives to educate (social science) students and researchers, professionals, and policymakers in understanding, evaluating, and applying modern Data Science and Artificial Intelligence methods for data-driven decision making.

As part of the AI:DK project, he coordinates and leads AI proof-of-concept projects within industry. His team also develops enterprise and policy software solutions for IP search and technology mapping.

Live Workshop

A: Case study: Using NLP and ML to predict green patents ::> Html <::

Lectures

Legend:

T: Theory lecture, explaining concepts without using to much code
A: Applications and demonstrations of concepts and techniques, mostly code-based
E: Exercises for you to try your skills

Introduction to Supervised Machine Learning (S-ML) in R

This part will introduce you to the fundamentals of supervised machine learning (SML, aka. predictive modelling), and illustrate practical applications theeof in R.

T: Introduction to supervised ML ::> Video 1: Introduction & Statistics Refresher <:: ::> Video 2: Generalization, Hyperparameter Tuning & Model Clases <:: ::> Slides <::
A: Applied supervised machine learning in R: ::> Video 1: Introduction & ML workflows with tidymodels <:: ::> Video 2: Regression problem case <:: ::> Video 3: Classification problem case <:: ::> Html <:: ::> Colab <::

Introduction to Natural-Language-Processing (NLP) in R

In this part you will be introduced to the fundamentals of analysing textual data, and the practical application in R. After reviwing the basics of string manipulation, we will move to bag-of-word style text summaries, and move on to slightly more advanced applications such as sentiment analysis and topic modelling.

A: Basics of text analysis in R ::> Video 1: Introduction to text analysis in R <:: ::> Html <::
A: Working with long text and extracting text elements Vin R ::> Video 1 <:: ::> Html <::
A: Text Vectorization and Topic Modelling in R ::> Video 1 <:: ::> Html <::

Further Resources

Find below a list of further resources (including own material), either to brush-up basic R knowledge, supplement what you learn here, or dive deeper into related or advanced topics.

Own research: Technology forecasting with ML & NLP

Hain, D. S., Jurowetzki, R., Squicciarini, M., & Xu, L. (2023). Unveiling the neurotechnology landscape: scientific advancements innovations and major trends.
Nechaev, I., & Hain, D. S. (2023). Social impacts reflected in CSR reports: Method of extraction and link to firms innovation capacity. Journal of Cleaner Production, 429, 139256.
Hain, Daniel, et al. Hain, D. S., Jurowetzki, R., Buchmann, T., & Wolf, P. (2022). A text-embedding-based approach to measuring patent-to-patent technological similarity. Technological Forecasting and Social Change, 177, 121559.: Own paper, where we introduce to text embeddings and use it to map technology based on patent data.
Bekamiri, H., Hain, D. S., & Jurowetzki, R. (2021). PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT. arXiv preprint arXiv:2103.11933.: More advanced version of the use of embeddings on.
Hain, D. S., Jurowetzki, R., Konda, P., & Oehler, L. (2020). From catching up to industrial leadership: towards an integrated market-technology perspective. An application of semantic patent-to-patent similarity in the wind and EV sector. Industrial and Corporate Change, 29(5), 1233-1255.: Application of the technique.

Data Science in R in general

Wickham, H., & Grolemund, G. (2023). R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc.: The bible of modern data science in R. USe this to get started.
Baumer, B., Kaplan, D. & Horton, N. (2023) Modern Data Science with R (2nd Ed.). CRC Press : Also nice supplementart book, also touching upon topics such as simulation and network analysis.
Ismay & Kim (2024), Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, CRC Press.: For those who want to first update their knowledge in basic and inferential statistics in a modern R setup.

Supervised Machine Learning

Hain, D., & Jurowetzki, R. (2020). Introduction to Rare-Event Predictive Modeling for Inferential Statisticians–A Hands-On Application in the Prediction of Breakthrough Patents. arXiv preprint arXiv:2003.13441.: Some of our introductory papers. An a bit more elaborate version of what we did so far on a more exciting dataset.
Kuhn, M., Silge, J. (2020). Tidy Modeling with R: GReat introduction to tidymodels by the makers.
Kuhn, M. & Johnson (2019), Feature Engineering and Selection: A Practical Approach for Predictive Models, Taylor & Francis.: Less code but much deep insights in modern ML details, by Thomas Kuhn, the maker of much of tidymodels and caret
Silge, Julia (2020). Supervised Machine Learning Case Studies in R. Online course: Great interactive course Julia took out of DataCamp to offer it for free instead. Fully updated to the tidymodels workflow. YOU ALL SHOULD DO IT!

Natural Language Processing

Julia Silge and David Robinson (2020). Text Mining with R: A Tidy Approach, O’Reilly.: Great introduction to the tidytext ecosystem and NLP in R by the package makers.
Emil Hvidfeldt and Julia Silge (2020). Supervised Machine Learning for Text Analysis in R: More advanced introduction to SML based on textual data.

Further topics of (potential) interest

My R Brush-up course (Bonus)

As a bonus, find some very basic introductions to working with data in R (from another course of mine) below. If you are already used to work with R and the tidyverse, no need to do so. But in case you feel your R skills need a bit of a brush up, feel free to go through the material before auditing my classes.

T: Introduction to the R Data Science Ecosystem ::> Video <:: ::> Slides <::
A: Basics of statistical programming in R ::> Video <:: ::> Html <:: ::> Colab <::
T: Introduction to data ::> Video <:: ::> Slides <::
T: Data manipulation basics in R ::> Video <:: ::> Slides <::
A: Data manipulation in R ::> Video <:: ::> Html <:: ::> Colab <::
T: Data Visualization ::> Video <:: ::> Slides <::
A: Basic data visualization in R using ggplot ::> Video 1 <:: ::> Html <:: ::> Colab <::
E: Data manipulation & visualization basic exercises ::> 1: Basics <:: ::> 2: Joins <:: ::> 3: Data Manipulation Challange <:: ::> 4 EDA & Dataviz <::