https://alvinntnu.github.io/python-notes/nlp/sentiment-analysis-dl.html

https://medium.com/data-science-365

Normalization vs Standardiza tion Explained

The Theta1 feature size/scale is smaller than Theta2

The difference in feature scales causes oscillations

Compute the distance

The horizontal feature scale dominates the vertical scale

-need to standardize

Normalize along both axis

Compute the distance

Both features contribute roughly equally to the calculated distance

The algorithm that uses the data will no longer be affected by the feature with the higher scale.

https://stackoverflow.com/questions/40758562/can-anyone-explain-me-standardscaler

How and why to Standardize your data: A python tutorial

I assume that you have a matrix X where each row/line is a sample/observation and each column is a variable/feature (this is the expected input for any sklearn ML function by the way -- X.shape should be [number_of_samples, number_of_features]).

Core of method

The main idea is to normalize/standardize i.e. μ = 0 and σ = 1 your features/variables/columns of X, individually, before applying any machine learning model.

StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY, so that each column/feature/variable will have μ = 0 and σ = 1.

P.S: I find the most upvoted answer on this page, wrong. I am quoting "each value in the dataset will have the sample mean value subtracted" -- This is neither true nor correct.

Example with code

from sklearn.preprocessing import StandardScaler
import numpy as np

# 4 samples/observations and 2 variables/features
data = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print(data)
[[0, 0],
 [1, 0],
 [0, 1],
 [1, 1]])

print(scaled_data)
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]

Verify that the mean of each feature (column) is 0:

scaled_data.mean(axis = 0)
array([0., 0.])

Verify that the std of each feature (column) is 1:

scaled_data.std(axis = 0)
array([1., 1.])

Appendix: The maths

PCA in Scikit-learn – Principal Component Analysis (with Python Example)

seaborn python plotting

https://medium.com/data-science-365/an-in-depth-guide-to-pca-with-numpy-1fb128535b3e

https://medium.com/data-science-365/principal-component-analysis-pca-with-scikit-learn-1e84a0c731b0

programming matrix

Thursday, August 3, 2023

AI models

Core of method

Example with code

Appendix: The maths

No comments:

Post a Comment

Followers

Blog Archive