Kleinberg shows that it’s impossible to find a clustering algorithm that satisfies three simple properties.
A discussion and implementation of “vantage point trees”
a quick look at how tSNE uses random walks on graphs to compute affinities
code that verifies that pytorch automatically computed gradient from KL divergence agrees with the formula in the tSNE paper.
animation of tsne on 2500 MNIST digits