Preparation
Table of contents
Two crucial first steps when aiming to design experiments using machine learning pipelines are the following:
- Understand what’s in your DATA.
- know what METHODS and TOOLS you can use to perform machine learning tasks.
In this preparation section, we provide links to the dataset and the Python library we are going to use in the third tutorial, accompanied by some navigating questions. In the next two tasks, you are asked to explore these links and reflect on the potential challenges and opportunities that appear when aiming to design an end-to-end machine learning pipeline (i.e., from acquiring the dataset to making predictions).
But, before our tasks..good to know
This tutorial assumes that you have a basic understanding of Python programming. If you are not familiar with Python, we recommend taking the Python for Everybody course on the Coursera platform, as shown in the following URL:
- Python for Everybody: https://www.coursera.org/specializations/python
Task 1
First, take a look at the dataset we are going to use in the tutorial (https://archive.ics.uci.edu/ml/datasets/Heart+Disease)
Your quest is to explore this dataset. Instead of providing you with steps on how to do so we only provide you with the link to the dataset. Finding a way to navigate through the datasets that exist in the wild is a crucial skill for any Machine Learning task. During the tutorial, we will reflect on the challenges you encountered while exploring the dataset.
Task 2
-
Take a look at the methods used to split your dataset into train and test sets using the Python library scikit-learn (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection). Do they look familiar? Try to identify the ones we have already discussed in class.
-
Also, take a look at the evaluation metrics that are mentioned in the Python Library scikit-learn (https://scikit-learn.org/stable/modules/model_evaluation.html).
- Do they look familiar? Try to identify the ones we have already discussed in class.
- Do you know what the following terms mean:
- Confusion Matrix
- Precision
- Recall
- Accuracy
- F1 score
- Cross Validation
If not, refresh your memory by going through the past lectures, as we will discuss all of them during the tutorial.