Why Choose a Hosted Solution for Data Crawling

Post on 26-Jun-2015

2.077 views 0 download

Tags:

description

Here's why hosted solutions work better than DIY scraping tools. Scraping tools may be easy, but in our experience, we solved more problems! See how hosted solutions score over stand-alone tools.

Transcript of Why Choose a Hosted Solution for Data Crawling

why choose a hosted solution THE EDGE OF USING A HOSTED SOLUTION OVER DIY TOOLS

before you begin

EVALUATE YOUR REQUIREMENT

think about

LARGE-SCALE CRAWLS = 100 + WEBSITES

SMALL-SCALE CRAWLS = 5 OR LESS WEBSITES

Data Requirement

Recurring

Large-scale

Small-scale

One-time

Large-scale

Small-scale

Support Required

Recurring

Large-scale

Small-scale

One-time

Large-scale

Small-scale

Scraping on a tool

Convenient since you don’t have to explain needs to a DaaS provider

Works best when sources are simple & few

Ease of use is in indicating fields

CSV files appears with data!

This is neat! But…

…problems appear when…

you increase websites and/or add more fields at one time

you submit the request after having laboriously selected all fields from across websites!

scrapes run till 99% & fail!

Will re-running solve this problem?

Support Centers reply:

“Site has blocked the bots.”

Did it really solve your data requirement?

Scraping via a hosted solution

Up-time

Provider has machines running 24x7

We do!

Scraping tools invariably fail when enough servers are not available to perform crawls

Hosted solution gives you continuous data feeds! All the time. Every time!

Scalability

Providers scale platforms to meet client numbers & sources

Scaling remains smooth as long as design decisions remain constant

Tools get boggled with increase in scale

We had clients who tried running a scraping tool for a complete day to extract data from a huge site. THEIR LAPTOPS DIED.

#TRUESTORY

Monitoring

DIY solutions rarely support monitoring

Example:

Your tool extracts data every week

The site changes structure every month!

Hosted solutions have alerts in place to mitigate any changes

Fail-over & Support

There’s support for everything

Basically, life is easy.

The headache is the provider’s. Trust us, we know.

With DIY Tools, you’re at the mercy of the Support Center. IF your calls get through at all!

SOME REAL QUERIES WE RECEIVED WHEN DIY SCRAPING TOOLS FAILED...not convinced yet?

“”

Is it possible to harvest content according to our specifications…We are using X & we are finding very difficult to get the entire core content from a page…

X IS A PLATFORM AS A SERVICE WHERE YOU CAN WRITE PLUG-INS TO SET UP YOUR CRAWLERS.

I.E. MORE THAN JUST A SOFTWARE

“”

We are currently using Y for crawling & would be interested to understand the advantages you can provide. Is there any way you could frame a work flow & harvest content according to our needs…Y has only been helpful to a limit.

Y IS A DESKTOP SOFTWARE FOR CRAWLING WEB PAGES.

We solved these problems. CLICK

TO SOLVE YOUR.

or e-mail sales@promptcloud.com