Thursday, August 3, 2023

AI models

standard deviation

Normalization vs Standardization Explained

The Theta1 feature size/scale is smaller than Theta2

The difference in feature scales causes oscillations

Compute the distance
The horizontal feature scale dominates the vertical scale
-need to standardize

Normalize along both axis

Compute the distance
Both features contribute roughly equally to the calculated distance
The algorithm that uses the data will no longer be affected by the feature with the higher scale.

How and why to Standardize your data: A python tutorial

I assume that you have a matrix X where each row/line is a sample/observation and each column is a variable/feature (this is the expected input for any sklearn ML function by the way -- X.shape should be [number_of_samples, number_of_features]).

Core of method

The main idea is to normalize/standardize i.e. μ = 0 and σ = 1 your features/variables/columns of Xindividuallybefore applying any machine learning model.

StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY, so that each column/feature/variable will have μ = 0 and σ = 1.

P.S: I find the most upvoted answer on this page, wrong. I am quoting "each value in the dataset will have the sample mean value subtracted" -- This is neither true nor correct.

See also: How and why to Standardize your data: A python tutorial

Example with code

from sklearn.preprocessing import StandardScaler
import numpy as np

# 4 samples/observations and 2 variables/features
data = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

[[0, 0],
 [1, 0],
 [0, 1],
 [1, 1]])

[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]

Verify that the mean of each feature (column) is 0:

scaled_data.mean(axis = 0)
array([0., 0.])

Verify that the std of each feature (column) is 1:

scaled_data.std(axis = 0)
array([1., 1.])

Appendix: The maths

PCA in Scikit-learn – Principal Component Analysis (with Python Example)

seaborn python plotting

No comments:

Post a Comment