본문 바로가기

Learning/Statistics & Data analysis

[ Summary part .1 ] Understanding Representation Learning With Autoencoder: Everything You Need to Know About Representation and Feature Learning

 

This 1st part of summary is about a post with a title "Understanding Representation Learning With Autoencoder: Everything You Need to Know About Representation and Feature Learning". I summarized the post here by using the exact words of the author(Nilesh Barla) of the post  (quotation mark area). A link to the original post is written below as "Original post". 

 


 

 

 

The machine learning limitation 

"The machine learning algorithm that predicts the outcome has to learn how each feature correlates with the different outcomes: benign or malignant. ... So in case of any noise or discrepancies in the data, the outcome can be totally different, which is the problem with most machine learning algorithms." 

 

 

 

 

 

Representation learning useful for supervised learning algorithms. 

"Representation learning works by reducing high-dimensional data into low-dimensional data, making it easier to find patterns, anomalies, ... It also reduces the complexity of the data, so the anomalies and noise are reduced. This reduction in noise can be very useful for supervised learning algorithms." 

 

 

 

 

 

Deep learning as a black box

"Deep learning is often seen as a black box, where a finite number of functions are used to find parameters that yield good generalization. This is achieved by optimization where the algorithm tries to correct itself by evaluating the model output with the ground truth." 

 

 

 

 

 

Variance and entanglement needed to be eliminated

"Two major factors that usually occur in any data distribution are variance and entanglement. ... Any model that we build has to be robust to variance, i.e. it has to be invariant because this can greatly harm the outcome of a deep learning model. ... Entanglement is the way a vector in the data is connected or correlated to other vectors in the data. These connections make the data very complex and hard to decipher."

 

 

 

 

 

Representation learning =  deep learning

The ability of representation learning is that it learns abstract patterns that make sense to the data, while deep learning is often ascribed the ability of deep networks to learn representations that are invariant (insensitive) to nuisance such as translations, rotations, occlusions, and also “disentangled”, or separating factors in the high-dimensional space of data

...

invariance in a deep neural network is equivalent to the minimum of the representation it computes, and can be achieved by stacking layers and injecting noise in the computation.. overfitting can be reduced by limiting the information content stored in the weights. 

 

 

 

 

 

 

Compressed representations squeezed by the Information Bottleneck 

it can extract relevant information by compressing the amount of information that can traverse the full network, forcing a learned compression of the input data....  by squeezing the information through a bottleneck, leaving only the features most relevant to general concepts (is possible). This compressed representation not only reduces dimensions but along with it reduces the complexity of the data. 

 

 

 

 

 

 

a probability distribution p(x) -> a conditional distribution p(x|z) 

to learn the probability distribution over images of cats we need to define a distribution that can model complex correlations between all pixels which form each image ... Instead of modelling p(x) directly (in all Machine learning), we can introduce an (unobserved) latent variable z and define a conditional distribution p(x | z) for the data, which is called a likelihood. For the example of cat images, z could contain a hidden representation of the type of cat, its color, or shape.

 

 

 

 

 

 

a conditional distribution p(x|z) -> Bayes theorem -> data distribution p(x)

Having z, we can further introduce a prior distribution p(z) over the latent variables to compute the joint distribution over observed and latent variables p(x,z) = p(x|z)p(z).  To obtain the data distribution p(x) we need to marginalize over the latent variables. Prior to that we can compute posterior distribution p(z|x) using Bayes theorem.

 

 

 

 

 

 

Models with latent variables is a compressed representation of the data

Models with latent variables can be used to perform agenerative process from which the data was generated. This is known as the generative model. ... Mathematical models containing latent variables are by definition latent variable models. These latent variables have much lower dimensions then the observed input vectors. This yields in a compressed representation of the data. ... (So) Latent variables are basically found at the information bottleneck. 

 

 

 

 

 

 

The manifold hypothesis 

The manifold hypothesis states that the high-dimensional data lies on the lower-dimensional manifold. 

 

 

 

 

 

 

 

Representation learning

Current machine and deep learning models are still prone to variance and entanglement from given data. Representation learning can improve the model’s performance in three learning frameworks – supervised learning, unsupervised learning, and reinforcement learning.
...
( to be continued in 2nd part of summary )

 

 

 

 


 

Original post  

https://neptune.ai/blog/understanding-representation-learning-with-autoencoder-everything-you-need-to-know-about-representation-and-feature-learning