What algorithm can be used to sort 10,000 images by visual similarity?

What algorithm can be used to sort 10,000 images by visual similarity?
For instance, if you have thumbnails of porn clips and want to sort them by pose without having to develop a supervised model.

Attached: 1648596890179.jpg (750x750, 211.8K)

Other urls found in this thread:

en.wikipedia.org/wiki/Image_segmentation
twitter.com/SFWRedditImages

sorting implies having a total order

make a total order then
you can always have one (in this case)
but probably OP just means clustering

which comes first {A, 2} or {B, 1}?

I'm not sure clustering would help. How many clusters? How to sort images within clusters? It's not like one of those meme flower vs dogs datasets, the images are very diverse in lighting and angle.

i have an idea of what would help: finding jesus

>what is lexicographical ordering
Anyway even if OP ends up with some clusters in a high dimensional space you can always just apply dimensionality reduction and create a well ordering that way. I don't know why you're trying to make a problem where there isn't one.

nakadashi

>How many clusters?
Some clustering algorithms can determine a good number on the go and otherwise you can jerry rig it in there somehow.
>How to sort images within clusters?
Whatever you want or whatever is practical given the implementation you chose.
>It's not like one of those meme flower vs dogs datasets, the images are very diverse in lighting and angle.
Yes, you're gonna have to do preprocessing obviously.

It's just an amusing distraction. Solving difficult machine learning problem also forces me to learn new methods and concepts.
>dimensionality reduction
I have tried taking the PCA of the images and sorting by the main component, finding features using a convolutional neural network (vgg16 or something) and sorting by nearest neighbors, calculating perceptual hashes and also sorting by NN (colorhash, phash), and nothing worked.

If you don't organize while it's small and then dowload+organize from that point forward, it's too late.

Aren't there pretrained human/pose recognition models nowadays? And have you tried segmentation?

that's a man

>preprocessing
What kind of preprocessing? The only thing I have done so far is resize the images to 200x200 pixels so I can compare them directly.
I download all webms from /gif/ every day. I said 10,000 in the OP but it's around 21,000 now.
Not that I could find on github. What do you mean by segmentation? I'm not very familiar with computer vision concepts.

en.wikipedia.org/wiki/Image_segmentation
>I'm not very familiar with computer vision concepts.
The best of luck, then. Why don't you just use metadata from the /gif/ threads/posts/filenames?

>I download all webms from /gif/ every day. I said 10,000 in the OP but it's around 21,000 now.
You should have each thread in it's own folder. may be a bit late to start, but may as well. even if it's just the op number as a folder name, you can go thru and see all your BBC and transcuck threads already *partly* sorted.

The sorting isn't an issue with but it would probably be much easier to use a pre-trained model rather than making your own. For a pose, you'd have to binarize the thumbnails, skeletonize them, compute a graph and the assignement distance between each of the other graphs and finally clusterize your data.

Because I was dumb and didn't save the thread names from the beginning, and now I already have 50 gb content of content and will not start over.

So I should use an image segmentation method to separate humans from the background before extracting features? Is that what you are saying?
>binarize the thumbnails
Like make skin white and background black?

you might as well start over lmao they post the same shit over and over there
Also you probably have 49.9gb of trap porn lmao

That's what I'm saying. Face recognition might be useful too.

I'd say some 10% of it is girls with dicks, yes. Not all images are reposts, after 20,000 videos get 2000 new ones for every 3000 downloaded.

Any particular segmentation algorithm that I should be aware of?

The fuck are you even doing. Just tag them as you download them. Don't tell me you have too much porn to do that.