Web Scraping, Data Mining, and Data Science, general automation and bots thread

Question

Web Scraping, Data Mining, and Data Science, general automation and bots thread

Brayden Collins

Web Scraping, Data Mining, and Data Science, general automation and bots thread.

What are you scraping?
What are you analyzing?
What are you automating?
What are you training?

last:

Attached: Untitled-2.jpg (1504x940, 205.57K)

March 23, 2022 - 16:03

Jace Jones

my ballsack
the smell
poop creation
my kegels

March 23, 2022 - 16:06

Zachary Carter

I like using beautiful soup to scrape price aggregator sites to find good deals. I've been trying to do something similar with Nim but it doesn't seem to have any library that's even remotely as good

March 23, 2022 - 16:10

Cameron Collins

I'm trying to make some projects for my portfolio but I have no idea of what to make. Got any ideas you could recommend?

March 23, 2022 - 16:11

Liam Thomas

pretty cool! do you have alarms set or how does it work? And yeah the python ecosystem is pretty good for these tasks

just scraping or do you want to do some stat / machine learning, too?

March 23, 2022 - 16:20

Aiden Ortiz

Is Automate The Boring Stuff book good or out of date now?

March 23, 2022 - 16:23

Angel Harris

How tf do i get around people that don't want me to scrape their websites? Amazon is a pita

March 23, 2022 - 16:23

Nathan Thompson

Still working on my gpu inventory scraper. Got a couple with it so far

March 23, 2022 - 16:25

Jacob Young

email them youre scraping them for educational purposes only. works everytime

March 23, 2022 - 16:25

Jaxson Martin

Mainly ML stuff, even better if it doesn't involve scrapping my own dataset. I would love doing something with natural language but I have no idea what should I do besides the usual "twitter sentiment analysis" thing. Is this the best place to ask or is there a DS/ML general somewhere else?

March 23, 2022 - 16:29

Nathan Myers

some basics which work for most sites
>set user agent to chrome / firefox
>random sleep intervals between requests
works for about 98% of all sites

lots of people still use it

March 23, 2022 - 16:29

Jose Richardson

Learning to scrape football match results because no site offers rss feeds for them

March 23, 2022 - 16:39

John Jenkins

I'm not up to date on open datasets but kaggle has a lot. Twitter sentiment analysis is very basic and often done, some other ideas
>topic modelling: get some tweets at companies or use online reviews to categorize them. besides the sentiment you can specific problems / benefits / types of writing. you can couple that with star ratings
>predicting with NLP (word2vec or similar): predict what headline is read often / gets upvoted and why
>automatic summary: user provides a URL and you summarize the article

March 23, 2022 - 16:47

Noah Richardson

I use garbage manga sites like mangakakalot and manganelo so I scrape their front page and filter the things I'm reading.
It's wild how there is nothing decent like novelupdates for manga. Individualized rss feeds based on your follow list can't be that hard.

March 23, 2022 - 17:06

Nicholas Thompson

Tachiyomi, but I guess you want a desktop solution?

March 23, 2022 - 17:15

Christopher Lewis

I have a cronjob to run the price checker once a day and if the price is below a certain threshold it mails me an alert. It all runs on a pi zero 2W

March 23, 2022 - 17:15

Leo Lopez

some user was making his own manga site, maybe you can ask him to add those features?

March 23, 2022 - 17:16

Leo Cruz

Whats the best way to scrape TikTok?

Attached: 75223187-53B7-452A-A159-4273F0433FC4.jpg (181x278, 7.86K)

March 23, 2022 - 17:30

Levi Davis

Check out sofascore, they're THE source for in-play data. You'll probably have to request from one of their APIs. Or any big sports betting website like Pinnacle.
>inb4 the retards who think reverse-engineering JS API isn't scraping come crawling out of the woodwork

March 23, 2022 - 17:34

Anthony Campbell

yt-dlp my man: github.com/yt-dlp/yt-dlp

Attached: sc.jpg (474x534, 36.93K)

March 23, 2022 - 18:11

Justin Johnson

I fucking HATE scraping Google. For a company that built their entire business model scraping the web, they sure don't like you trying to do the same to them.

March 23, 2022 - 18:18

1 2 3 Next

Web Scraping, Data Mining, and Data Science, general automation and bots thread

Last threads