What is the best perceptual hashing algorithm for sorting 10,000 porn clips?
What is the best perceptual hashing algorithm for sorting 10,000 porn clips?
Other urls found in this thread:
hackerfactor.com
same.energy
twitter.com
Which have you tried so far?
bcrypt. Better start now.
cksum
>cksum
>perceptual hash
imagehash on github. Out of the available hashes, average_hash was the best but it's not very good (orders of magnitude better than sorting by name though)
Are you trying to eliminate duplicates, or group photosets.
Use Stash, it's made for porn and it creates p-hashes from videos. I helped review the implantation.
Essentially it creates a grid of screenshots from the video, calculates the average color value, the stores each captured as a grid of "higher than the average", or "lower than the average". Works pretty well for finding small crops and reencodes, even at different bitrate or resolution. The app includes a dedupe tool, but it's kind of clumsy. The API is the great though, you can download metadata by just uploading a video p-hash
Group by similarity. Not even group, just create a gallery that I can navigate by scrolling and find several similar clips in sequence
I have tried stash, it takes forever to calculate the hashes, has only one hash option, and is very bloated. I'm creating my own personal alternative using flask
middle out
>group by similarity
>hash
Nope
Research hashes better. Even the same clip but with a single bit changed will throw your whole idea off.
You're better off using something like average color or something
>What is the best perceptual hashing algorithm
What did he mean by "perceptual" hashing?
My fault, I'm retarded
try stash
see and
The Feds use PhotoDNA.
Fun fact: PhotoDNA is reversible
>Microsoft says that the "PhotoDNA hash is not reversible". That's not true. PhotoDNA hashes can be projected into a 26x26 grayscale image that is only a little blurry. 26x26 is larger than most desktop icons; it's enough detail to recognize people and objects. Reversing a PhotoDNA hash is no more complicated than solving a 26x26 Sudoku puzzle; a task well-suited for computers.
rm -rf pornDir
Gah i forgot the link
hackerfactor.com
Check out
same.energy
They apparently use a proprietary model similar to OpenAI's CLIP
PhotoDNA is more like a reverse image search algorithm a la Yandex. It's not really the same thing.