Artificial intelligence

 
 

Life of AI project

 

Data Science

Discipline of making data useful
Data mining, ML etc..
map of data science (None, Any, Few - decisions)
  • Descriptive Analytics - None
    • inspired by data
  • Machine Learning- Any
    • Make a recipe
  • Statistical Inference - Few
    • Decide wisely

Descriptive Analytics

  • lets find out what is here
  • Can we look up the answer -yes

    Prototype to production

    • Step 6: Training and Tuning
      • Fitting → Validation (should pass) → testing (should pass)
      • Overfitting → Validation (fails) → Go back to Training and Tuning
        • This is a dreaded infinite loop
        • This is called overfitting limbo.
      • Strategy: Start simple .→ inch the way up to complexity
        • More complex the solution → chances for overfitting
        • Longer the take recipe → more complicated it is.
        • One way to do it - Algorithmically enforce simplicity → called Regularisation
      • Avoid training using data from the future.
        • Predict tomorrow’s stock price using tomorrow's interest rate.?
        • Treat label & features with some respect
          • you may not have this in production - Pitfall.
      • The goal is
        • to find patterns in your data
        • shortlist of models that seem to work.
        • don’t try to get it right immediately, it will take few tries
     
    • Step 7: Tune and Debug
      • Need a separate dataset from training.
      • Part of original splitting data
        • Original data
          • Exploratory data
            • Training data
              • Debugging data
            • Validation data
          • Test data
      • How to debug?
        • Fit a model in training data → then move to debugging
        • Check performance in debugging data
        • Look for instances where model got it wrong.
          • Do analysis of whats common among success and ones which failed to fit.
            • possibly a feature or combination of features.- > do feature engineering
      • Can I skip this step?
        • Preferably don’t skip - lose a chance to find dataset which won’t fit the model.
      • Tuning? - Tune Hyperparameter
        • Concepts
          • Parameters: Set using the data
          • Hyperparameter: numerical settings in an algorithm, even before data is ingested to algorithm
        • How to do Tune?
          • Basic tuning (holdout method)
            • Take tuning dataset from Training dataset
            • Run iterations using possible values of Hyperparameter
            • Choose the settings which gives best performance - that's your tuned model
        • Cross validation (a type of tuning)
          • k-fold cross validation
            • k = number of chunks we are splitting our training data with. (100 datasets split 20 each , k =5)
            • use 1 of chunk as evaluation set, rest 4 as training data
            • Train for each hyperparameter
            • evaluate on 1 evaluation set and store the performance.
            • Choose hyperparameter setting which gives best aggregated performance (eg. mean precision)
          • Adv : allows to check model stability
            • helps to find outliers.
        • Can I skip tuning?
          • No, you get silly hyper parameters.
          • Yes - if no hyperparameters + lots of data + method robust to outliers.
        • Tuning is more relevant towards later phases of ML project.
      • Debugging → gives you insight
      • Tuning→ Saves from poor hyperparameter choices
     
    • Step 8: Validate the model
      • Why ? → ML project is oblivious to overfitting unless model is checked on fresh data.
      • How?
        • Evaluate on validation data set and either
          • go to the next step
          • Go back to training and try again
         
    Built with Potion.so