The following questions cover concepts that you should have a solid grasp on before moving on to more advanced courses. Click on your selection to expand and check your answer.
Check Your Understanding
Which of the following suggest a potential problem in using ML for your project?
You only make predictions.
Want to make decisions, not just predictions! Your product should take action on the output of the model. ML is better at making decisions than giving you insights.
You have a clear use case.
Start with the problem, not the solution. Focus on problems that would be difficult to solve with traditional programming. Make sure you aren't treating ML as a hammer for your problems.
You have access to historical data.
Actually, you're good, this is what you want! Machine learning is about finding patterns in relevant data and applying it to data you haven't seen before. This requires you have (or can get) existing relevant data.
When using supervised machine learning, your ML problem is well-defined if you have:
BOTH inputs and outputs identified
A well-defined problem has both inputs and outputs. Inputs are the features. Outputs are the labels to predict.
EITHER inputs or outputs identified
If you're missing inputs or outputs, then your problem isn't well-defined.
How many features should you pick when you are first starting a machine learning project?
Pick 1-3 features that seem to have strong predictive power
It's best for your data collection pipeline to start with only one to three features. This will help you confirm that ML is a viable approach to your problem. Also, when you build a baseline from a couple of features, you'll feel like you're making progress!
Pick 4-6 features that seem to have strong predictive power
You might eventually use this many features, but it's still better to start with fewer.
Pick as many features as you can, so you can start observing which features have the strongest predictive power.
Start smaller. The more features you begin with, the harder it is to see what's working. Fewer features usually means fewer unnecessary complications.
Should you collect data and look for correlations before defining your ML problem?
Searching for correlations in existing data dumps is hard because the correlations you find might be spurious. This is only advisable if you have HUGE amounts of data and can conduct live experiments.
Warning: if you try enough experiments, you’ll find something that works, but there's no guarantee that it’ll be useful in production (or even that it’s a real phenomenon).