Artificial intelligence

Life of AI project

Data Science

Discipline of making data useful

Data mining, ML etc..

map of data science (None, Any, Few - decisions)

Descriptive Analytics - None

inspired by data

Machine Learning- Any

Make a recipe

Statistical Inference - Few

Decide wisely

Descriptive Analytics

lets find out what is here

Can we look up the answer -yes

Prototype to production

Step 6: Training and Tuning

Fitting → Validation (should pass) → testing (should pass)
Overfitting → Validation (fails) → Go back to Training and Tuning

This is a dreaded infinite loop
This is called overfitting limbo.

Strategy: Start simple .→ inch the way up to complexity

More complex the solution → chances for overfitting
Longer the take recipe → more complicated it is.
One way to do it - Algorithmically enforce simplicity → called Regularisation

Avoid training using data from the future.

Predict tomorrow’s stock price using tomorrow's interest rate.?
Treat label & features with some respect

you may not have this in production - Pitfall.

The goal is

to find patterns in your data
shortlist of models that seem to work.
don’t try to get it right immediately, it will take few tries

Step 7: Tune and Debug

Need a separate dataset from training.
Part of original splitting data

Original data

Exploratory data

Training data

Debugging data

Validation data

Test data

How to debug?

Fit a model in training data → then move to debugging
Check performance in debugging data
Look for instances where model got it wrong.

Do analysis of whats common among success and ones which failed to fit.

possibly a feature or combination of features.- > do feature engineering

Can I skip this step?

Preferably don’t skip - lose a chance to find dataset which won’t fit the model.

Tuning? - Tune Hyperparameter

Concepts

Parameters: Set using the data
Hyperparameter: numerical settings in an algorithm, even before data is ingested to algorithm

How to do Tune?

Basic tuning (holdout method)

Take tuning dataset from Training dataset
Run iterations using possible values of Hyperparameter
Choose the settings which gives best performance - that's your tuned model

Cross validation (a type of tuning)

k-fold cross validation

k = number of chunks we are splitting our training data with. (100 datasets split 20 each , k =5)
use 1 of chunk as evaluation set, rest 4 as training data
Train for each hyperparameter
evaluate on 1 evaluation set and store the performance.
Choose hyperparameter setting which gives best aggregated performance (eg. mean precision)

Adv : allows to check model stability

helps to find outliers.

Can I skip tuning?

No, you get silly hyper parameters.
Yes - if no hyperparameters + lots of data + method robust to outliers.

Tuning is more relevant towards later phases of ML project.

Debugging → gives you insight
Tuning→ Saves from poor hyperparameter choices

Step 8: Validate the model

Why ? → ML project is oblivious to overfitting unless model is checked on fresh data.
How?

Evaluate on validation data set and either

go to the next step
Go back to training and try again

Built with Potion.so