Why Choose a Hosted Solution for Data Crawling

17
w hy choose a hosted solution THE EDGE OF USING A HOSTED SOLUTION OVER DIY TOOLS

description

Here's why hosted solutions work better than DIY scraping tools. Scraping tools may be easy, but in our experience, we solved more problems! See how hosted solutions score over stand-alone tools.

Transcript of Why Choose a Hosted Solution for Data Crawling

Page 1: Why Choose a Hosted Solution for Data Crawling

why choose a hosted solution THE EDGE OF USING A HOSTED SOLUTION OVER DIY TOOLS

Page 2: Why Choose a Hosted Solution for Data Crawling

before you begin

EVALUATE YOUR REQUIREMENT

Page 3: Why Choose a Hosted Solution for Data Crawling

think about

LARGE-SCALE CRAWLS = 100 + WEBSITES

SMALL-SCALE CRAWLS = 5 OR LESS WEBSITES

Page 4: Why Choose a Hosted Solution for Data Crawling

Data Requirement

Recurring

Large-scale

Small-scale

One-time

Large-scale

Small-scale

Page 5: Why Choose a Hosted Solution for Data Crawling

Support Required

Recurring

Large-scale

Small-scale

One-time

Large-scale

Small-scale

Page 6: Why Choose a Hosted Solution for Data Crawling

Scraping on a tool

Convenient since you don’t have to explain needs to a DaaS provider

Works best when sources are simple & few

Ease of use is in indicating fields

CSV files appears with data!

This is neat! But…

Page 7: Why Choose a Hosted Solution for Data Crawling

…problems appear when…

you increase websites and/or add more fields at one time

you submit the request after having laboriously selected all fields from across websites!

scrapes run till 99% & fail!

Page 8: Why Choose a Hosted Solution for Data Crawling

Will re-running solve this problem?

Support Centers reply:

“Site has blocked the bots.”

Did it really solve your data requirement?

Page 9: Why Choose a Hosted Solution for Data Crawling

Scraping via a hosted solution

Up-time

Provider has machines running 24x7

We do!

Scraping tools invariably fail when enough servers are not available to perform crawls

Hosted solution gives you continuous data feeds! All the time. Every time!

Page 10: Why Choose a Hosted Solution for Data Crawling

Scalability

Providers scale platforms to meet client numbers & sources

Scaling remains smooth as long as design decisions remain constant

Tools get boggled with increase in scale

Page 11: Why Choose a Hosted Solution for Data Crawling

We had clients who tried running a scraping tool for a complete day to extract data from a huge site. THEIR LAPTOPS DIED.

#TRUESTORY

Page 12: Why Choose a Hosted Solution for Data Crawling

Monitoring

DIY solutions rarely support monitoring

Example:

Your tool extracts data every week

The site changes structure every month!

Hosted solutions have alerts in place to mitigate any changes

Page 13: Why Choose a Hosted Solution for Data Crawling

Fail-over & Support

There’s support for everything

Basically, life is easy.

The headache is the provider’s. Trust us, we know.

With DIY Tools, you’re at the mercy of the Support Center. IF your calls get through at all!

Page 14: Why Choose a Hosted Solution for Data Crawling

SOME REAL QUERIES WE RECEIVED WHEN DIY SCRAPING TOOLS FAILED...not convinced yet?

Page 15: Why Choose a Hosted Solution for Data Crawling

“”

Is it possible to harvest content according to our specifications…We are using X & we are finding very difficult to get the entire core content from a page…

X IS A PLATFORM AS A SERVICE WHERE YOU CAN WRITE PLUG-INS TO SET UP YOUR CRAWLERS.

I.E. MORE THAN JUST A SOFTWARE

Page 16: Why Choose a Hosted Solution for Data Crawling

“”

We are currently using Y for crawling & would be interested to understand the advantages you can provide. Is there any way you could frame a work flow & harvest content according to our needs…Y has only been helpful to a limit.

Y IS A DESKTOP SOFTWARE FOR CRAWLING WEB PAGES.

Page 17: Why Choose a Hosted Solution for Data Crawling

We solved these problems. CLICK

TO SOLVE YOUR.

or e-mail [email protected]