Thinking About The Output
What are we building? What kind of data are we analyzing? What do we want to make happen? Are we trying to build a model that can classify, or regress? These and many other questions will be considered here, as we think about what kind of output we seek.
Choosing a Model
You may have seen the Starter Guides on choosing models. Naturally, a model choice limits the outputs. Once we have chosen a model, we then have to choose from the range of outputs it can offer, then permutate from there, or rethink the model(s) we have chosen.
Imagining The Output
You may recall from the commentary on what a function means in data science that we are looking to get some output $Y$: we put data ($X$) in, try to minimize the reducible error by modifying our function $f$, and get some output $Y$.
Is $Y$ a fitted regression line? Is it a prediction, a probability representing the chance that an image contains a stop sign in it? Is it multi label output, namely, are the labels that can be applied to it non-exclusive? Is the output a number, a string, an image, or what kind of data type? Are we trying to deploy the model? Are we just building a graph one time and thats it? If you can go through each model choosing page and identify roughly where your output should lie there, often the model and output will naturally choose itself.
For instance, image classification is a very common neural network task. The output in that case would be (implicitly) a model which can make predictions, and the individual probability predictions themselves, a numerical value between 0 and 1. These predictions might come in arrays, or they might be singular one-off predictions.
Being really specific about the form of output is essential. Maybe you aren’t sure when you start, but over time you will get used to knowing roughly what you want as an output- it comes with practice.
Summarizing Output Types
Try to summarize the following things to get a clear idea of your output:
-
What kind of model(s) am I building? K-nearest neighbors, multivariable regression, etc?
-
What data types (floats, strings etc) will be outputted?
-
What kind of analysis am I doing in general? Computer vision, NLP, etc?
-
What kind of metrics should I be using to measure the output?
-
Are we building dashboards, interactive webapps, graphs, databases, predictions, etc out of this? What form do they need the data in to be able to feed in to them?
-
Go through each of the pages in the choosing models section, and write out if your problem seems to prefer a particular type of model. Maybe you think a supervised model is best, or you’re sure that regression is the best type of analysis for your problem. If you aren’t sure, try a few different ways and see if you can develop intuition!