This piece is the first real exercise documented on this blog(have more on the way!). This piece describes this example of the modelling process. This example allowed me examine the tf.Estimator and feature column API to experiment with feature transformations
the highlight for me: choice of optimiser ultimately has the most positive effect on the model wrt avoiding NaN errors. not adding epsilon to the normalisations etc
The data in this example is a small dataset from the Automobile data set with 205 examples. The exercise attempts to predict the price of a car using its features
Since the goal of this example is to examine modeling and data transformations and because of the small size of the data, splitting into training, evaluation and test data was skipped.
The following transformations were done to the data;
- made sure all numeric data are actually numeric through coercion using
- filled missing values with 0
without normalisation, modified the model to achieve the lowest eval loss.
Here, poor hyperparameter choices(mainly the choice of optimizer) caused there to be NaN losses during training.
Fixed this by using Adagrad optimiser. Because of the small size of the data, pretty much any other solution didnt work.
recall: Adagrad and Adam optimisers are built-in tf optimisers just like the Gradient descent optimiser which unlike the latter creates separate effective learning rates per feature
visualized the model's predictions using scatter plots. Highlights of this step for me was the predict_input_fn
predict_input_fn = tf.estimator.inputs.pandas_input_fn( x=x_df, batch_size=batch_size, shuffle=False) #similar to the training and evaluation input functions predictions = [ x['predictions'] for x in est.predict(predict_input_fn) ]
- attempted to add normalisations to the numeric features. z-score and scale to a range normalisations did not work as NaN losses were still present
visualising each fetaure in histogram showed that most were approximately normal distributions, a few had crazy outliers, not crazy enough for clipping I guess, as the instructor also used z-score first
for feature_name in numeric_feature_names ]model_feature_columns = [ tf.feature_column.numeric_column(feature_name, normalizer_fn=lambda val: (val - x_df[feature_name].min())/(x_df[feature_name].max()-x_df[feature_name].min()) )
attempted to make a better model using only the categorical features. using the Gradient descent optimiser also flagged NaN losses which were again corrected with either the Adam or Adagrad optimisers.
same behaviour was seen when both the categorical data and the numeric data were used together.
Getting more familiar with the input functions syntax, tf.Estimator function and the feature column APIs were the highlights of this example.
The model used a dense neural network algorithm for predictions