Why shouldn’t I use Puppeteer or Selenium for all my scraping projects?

Because they’re slow, costly, and overkill for most jobs. Start with just making http requests and use Puppeteer only when absolutely necessary.

Is scraping behind a login legal?

No. Avoid it. Behind the login means you have to login in order to scrape the site. Everything you do should be without logging in. A good rule of thumb if its accessible by an incognito browser, you're good.

How do I find hidden APIs on websites?

Open your browser’s network tab (DevTools) while browsing. Look for XHR/fetch requests returning JSON, often, those are the same calls you can copy.

Which proxies work best for web scraping?

Providers like Decodo (Smartproxy), Webshare, Evomi, and Bright Data offer reliable residential and datacenter proxies suited for scraping at scale.

Can I just use Axios or Requests for scraping?

You can, but libraries like got-scraping or Impit are optimized for scraping and handle things like headers, retries, and anti-bot measures better.

Web Scraping Mistakes That Break Production Crawlers

Web scraping mistakes sink production crawlers—Puppeteer-first stacks, login scrapes, HTML parsing, and proxyless bursts top the list.

Fix the architecture early. APIs and HTTP beat browsers for most social and SaaS targets.

In this guide, you’ll learn:

Don’t default to Puppeteer or Selenium
Avoid scraping behind login
Prefer APIs over fragile HTML parsers
Use scraping-ready HTTP libraries, not plain Axios
Why proxies matter at scale

Use the sections below as your playbook.

1. Relying on Puppeteer or Selenium as Your First Option

It’s tempting to jump straight into browser automation tools like Puppeteer or Selenium. They sound impressive, but they should be your last resort, not your first.

Why?

Slow and expensive at scale: launching headless browsers for every request chews up CPU and memory.
Harder to deploy: especially if you’re scaling across cloud servers.
Most sites don’t require it: static HTML, APIs, or lightweight scraping libraries often do the job better.

Best Practice: Start with lightweight HTTP libraries. Keep Puppeteer in your toolbox, but only use it as a last resort.

Scraping behind login walls (like Facebook, LinkedIn, or Instagram) is risky. Not only does it raise legal and ethical concerns, but it also adds unnecessary complexity: maintaining sessions, handling CAPTCHAs, and being easily flagged by anti-bot systems.

Best Practice: Focus on public-facing data. Many sites expose the same information via APIs or pre-login endpoints. Challenge yourself to find the open data path. And often it’s easier, cleaner, and more sustainable.

3. Parsing HTML Instead of Using APIs

Another rookie mistake: scraping raw HTML for data that’s already being fetched via an underlying API call.

HTML parsing = fragile (changes to page layout break your scraper)
APIs = cleaner JSON (structured data, fewer headaches)
Avoid double work: parsing HTML and handling browser rendering when you could just hit an endpoint directly.

Best Practice: Before writing a single scraper, inspect the network tab in your browser’s dev tools. If the content loads dynamically, chances are there’s a hidden API request you can mimic.

4. Using a Generic HTTP Library

Yes, you can scrape with Axios, Fetch, or Python’s Requests library. But at scale, these options lack the robustness needed for modern web scraping.

Better Tools:

got-scraping (Apify): purpose-built for scraping, handles headers, cookies, retries, etc.
Impit (Apify): a solid scraping-friendly HTTP client.

Best Practice: Use a library built for scraping, not just for generic HTTP calls. You’ll avoid anti-bot pitfalls and cut down debugging time.

5. Scraping Without Proxies

Perhaps the biggest mistake: not using proxies. Without them, you’ll hit rate limits, get blocked, or worse, burn your IPs.

Recommended Providers:

Best Practice: Always rotate proxies and pair them with proper headers (user agents, etc) for more natural traffic patterns.

Final Thoughts

Web scraping is both art and engineering. Avoiding these five mistakes: overusing Puppeteer, scraping behind logins, parsing fragile HTML, using the wrong HTTP library, and skipping proxies, will set you up for faster, more reliable, and more scalable scraping projects.

Web Scraping Best Practices
5 Common Mistakes to Avoid

1. Relying on Puppeteer or Selenium as Your First Option

3. Parsing HTML Instead of Using APIs

4. Using a Generic HTTP Library

5. Scraping Without Proxies

Final Thoughts

Frequently asked
questions

Why shouldn’t I use Puppeteer or Selenium for all my scraping projects?

Is scraping behind a login legal?

How do I find hidden APIs on websites?

Which proxies work best for web scraping?

Can I just use Axios or Requests for scraping?

1. Relying on Puppeteer or Selenium as Your First Option

2. Scraping Behind a Login

3. Parsing HTML Instead of Using APIs

4. Using a Generic HTTP Library

5. Scraping Without Proxies

Final Thoughts

Frequently askedquestions

Why shouldn’t I use Puppeteer or Selenium for all my scraping projects?

Is scraping behind a login legal?

How do I find hidden APIs on websites?

Which proxies work best for web scraping?

Can I just use Axios or Requests for scraping?

Frequently asked
questions