Models¶
My idea of what a model is has changed a lot over time
I can't even begin to describe my ignorant start, it was insane. Nothing made sense
the true understanding of what a model could be that understanding was true only relative to a mindset The bad ways of building a model evolution of the process - ground up
I don't think I quite understood what a model really was when I was at university, but looking back I had NO understanding of anything back then The maths makes a lot more sense now but to me I was so lost at the age of 19-21 studying this crap
A model can be anything
The mean is a model
The median is a model
Just predicting 1 is a model
anything can be a model
Any rule for a prediction is a model
its so simple now but back then I couldn't get my head around it
While in my ignorant state I found Machine Learning through a course my friends were taking and I just felt socially pressured to join in it was a interesting course, I got a 50. Lets not talk about that! but it talk me enough to bullshit myway through a phone interview and into a data science internship which was enough at the time lol.
fast forward a few years and I was building models at a Bank in Toronto. The standard way was to just throw a tree based model, choose the best features, then build a model on that. But this is essentially what a few AutoML tools are doing. You're just finding some soup of features, parameters and data that produces the best error score.
looking back I understand why, it's just the simplest thing to do
But the way model building should be done is through iteration and the ground up
This is nice if you have a handful of features, of course if you're using 100's of features at a time then im unsure I guess I would still take a similar approach but with clusters of features and doing higher level analysis
regardless,
look at your data check what the columns are what is the problem you are trying to answer any columns match that (purely based on name)? oh great you found a few, even some basic features and a target Split the data create a model keep it crude okay how does it fit on the data look at the errors are they correlated with any of the features you did not use? the model you used, what were the assumptions behind it? do you need to make a transformation? Do you want to try add another feature? Do you want to just take all the features, create a massive basis expansion, then lasso to the data?
Navigating this tree, using your gut instinct, building a model
This takes time! and care and patience - which I have but I can greatly misuse because I leave model buidling till the end
These people that tell you I spend most of my time making features and little time model building
I thought that statement was valid but it is not
the entire process is one whole unit, and you build the model from the ground up
build model, look at errors, take data and analyse against the errors rinse repeat
until you have your features generated which you will need to collect up your code and package it nicely into an efficient ETL Then create the model Then run it on the data Collect test set errors for reporting Deploy into A/B env