What is the best perceptual hashing algorithm for sorting 10,000 porn clips?

What is the best perceptual hashing algorithm for sorting 10,000 porn clips?

Attached: hash.png (600x433, 79.14K)

Other urls found in this thread:

hackerfactor.com/blog/index.php?/archives/929-One-Bad-Apple.html
same.energy/about
twitter.com/SFWRedditVideos

Which have you tried so far?

bcrypt. Better start now.

cksum

>cksum
>perceptual hash

imagehash on github. Out of the available hashes, average_hash was the best but it's not very good (orders of magnitude better than sorting by name though)

Are you trying to eliminate duplicates, or group photosets.

Use Stash, it's made for porn and it creates p-hashes from videos. I helped review the implantation.

Essentially it creates a grid of screenshots from the video, calculates the average color value, the stores each captured as a grid of "higher than the average", or "lower than the average". Works pretty well for finding small crops and reencodes, even at different bitrate or resolution. The app includes a dedupe tool, but it's kind of clumsy. The API is the great though, you can download metadata by just uploading a video p-hash

Group by similarity. Not even group, just create a gallery that I can navigate by scrolling and find several similar clips in sequence
I have tried stash, it takes forever to calculate the hashes, has only one hash option, and is very bloated. I'm creating my own personal alternative using flask

middle out

Attached: middleout.gif (367x206, 907.69K)

>group by similarity
>hash
Nope
Research hashes better. Even the same clip but with a single bit changed will throw your whole idea off.
You're better off using something like average color or something

>What is the best perceptual hashing algorithm
What did he mean by "perceptual" hashing?

My fault, I'm retarded

try stash

see and

The Feds use PhotoDNA.

Fun fact: PhotoDNA is reversible
>Microsoft says that the "PhotoDNA hash is not reversible". That's not true. PhotoDNA hashes can be projected into a 26x26 grayscale image that is only a little blurry. 26x26 is larger than most desktop icons; it's enough detail to recognize people and objects. Reversing a PhotoDNA hash is no more complicated than solving a 26x26 Sudoku puzzle; a task well-suited for computers.

rm -rf pornDir

Gah i forgot the link
hackerfactor.com/blog/index.php?/archives/929-One-Bad-Apple.html

Check out
same.energy/about
They apparently use a proprietary model similar to OpenAI's CLIP

PhotoDNA is more like a reverse image search algorithm a la Yandex. It's not really the same thing.