Web Scraping, Data Mining, and Data Science, general automation and bots thread

Web Scraping, Data Mining, and Data Science, general automation and bots thread.

What are you scraping?
What are you analyzing?
What are you automating?
What are you training?

last:

Attached: Untitled-2.jpg (1504x940, 205.57K)

my ballsack
the smell
poop creation
my kegels

I like using beautiful soup to scrape price aggregator sites to find good deals. I've been trying to do something similar with Nim but it doesn't seem to have any library that's even remotely as good

I'm trying to make some projects for my portfolio but I have no idea of what to make. Got any ideas you could recommend?

pretty cool! do you have alarms set or how does it work? And yeah the python ecosystem is pretty good for these tasks

just scraping or do you want to do some stat / machine learning, too?

Is Automate The Boring Stuff book good or out of date now?

How tf do i get around people that don't want me to scrape their websites? Amazon is a pita

Still working on my gpu inventory scraper. Got a couple with it so far

email them youre scraping them for educational purposes only. works everytime

Mainly ML stuff, even better if it doesn't involve scrapping my own dataset. I would love doing something with natural language but I have no idea what should I do besides the usual "twitter sentiment analysis" thing. Is this the best place to ask or is there a DS/ML general somewhere else?

some basics which work for most sites
>set user agent to chrome / firefox
>random sleep intervals between requests
works for about 98% of all sites

lots of people still use it

Learning to scrape football match results because no site offers rss feeds for them

I'm not up to date on open datasets but kaggle has a lot. Twitter sentiment analysis is very basic and often done, some other ideas
>topic modelling: get some tweets at companies or use online reviews to categorize them. besides the sentiment you can specific problems / benefits / types of writing. you can couple that with star ratings
>predicting with NLP (word2vec or similar): predict what headline is read often / gets upvoted and why
>automatic summary: user provides a URL and you summarize the article

I use garbage manga sites like mangakakalot and manganelo so I scrape their front page and filter the things I'm reading.
It's wild how there is nothing decent like novelupdates for manga. Individualized rss feeds based on your follow list can't be that hard.

Tachiyomi, but I guess you want a desktop solution?

I have a cronjob to run the price checker once a day and if the price is below a certain threshold it mails me an alert. It all runs on a pi zero 2W

some user was making his own manga site, maybe you can ask him to add those features?

Whats the best way to scrape TikTok?

Attached: 75223187-53B7-452A-A159-4273F0433FC4.jpg (181x278, 7.86K)

Check out sofascore, they're THE source for in-play data. You'll probably have to request from one of their APIs. Or any big sports betting website like Pinnacle.
>inb4 the retards who think reverse-engineering JS API isn't scraping come crawling out of the woodwork

yt-dlp my man: github.com/yt-dlp/yt-dlp

Attached: sc.jpg (474x534, 36.93K)

I fucking HATE scraping Google. For a company that built their entire business model scraping the web, they sure don't like you trying to do the same to them.