We’ve been able, so far, to get 5-10k emails per day using a server with 3,000 scrapers - scraping with 500 scrapers in parallel with no delays.
All of the scraping is being done with Full API emulation, and no EB scraping. From my experience, scraping with EB has caused bans very fast, and the server would have to be extremely powerful for it to run 3,000 browsers simultaneously.
My question is, what is the best way to do this?
I have attempted multiple blocks of 500 scrapers in parallel with no gaps, and sometimes 1 file is full of emails, the other is empty, as though there is API scrape blocks (say 20-30 emails out of 5,000 usernames get scraped).
Jarvee is able to scrape lots of detailed information that I dont need, such as website, address and bio. However, it doesn’t give me email or phone number (in the 2nd file)
In the first file, it seems ok. But even sometimes the first file can have gaps too.
It looks like in the first file, there was a gap of about 45 minutes where no emails were given, and then it continued as normal (I assume this is because my proxies rotate every 1 hour)
About my proxies: 4,000 ipv6 proxies each IP from different /48 and /32 subnet , rotating every hour , residential announcements
So far we are confined to 5,000 emails per day per server. But with 6,000 scrapers spread across 2 servers, I’m sure I can do better without having to buy 20 servers if I wanted 100k emails per day.
I’ve also tried a larger number of Blocks of 50 scrapers in parallel with modest breaks. This will just cause scraping to go slower and still give me some files almost empty. Let’s say 1-2k emails per day in that case.