You may need to apply two kinds of transformations to numeric data:
- Normalizing - transforming numeric data to the same scale as other numeric data.
- Bucketing - transforming numeric (usually continuous) data to categorical data.
Why Normalize Numeric Features?
Normalization is necessary if you have very different values within the same feature (for example, city population). Without normalization, your training could blow up with NaNs if the gradient update is too large.
You might have two different features with widely different ranges (e.g., age and income), causing the gradient descent to "bounce" and slow down convergence. Optimizers like Adagrad and Adam protect against this problem by creating a separate effective learning rate per feature. But optimizers can’t save you from a wide range of values within a single feature; in those cases, you must normalize.