Due to the demand of removing similar articles in Touchpal Dialer’s feeds, I did some research on related algorithms and realized a few of them.
I will tell about three of them in this article.

simhash

Simhash is a wellknown algorithm which is applyied by Google in its similar webpage detection.

minhash

pass

LSHForest

The first time I saw this name is when I was browsing the guideline of sklearn.

Reference

[]
[]
[]
[]