Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Transcript of Building Efficient and Reliable Crawler System With Sidekiq Enterprise
![Page 2: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/2.jpg)
Has anyone ever written crawlers?
![Page 3: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/3.jpg)
Has anyone ever used cron?
![Page 4: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/4.jpg)
Has anyone ever used Sidekiq?
![Page 5: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/5.jpg)
Gary (Chien-Wei Chu) @icarus4 / @icarus4.chu
Was a C programmerFall in love with Ruby since 2013
CTO of Statementdog
![Page 6: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/6.jpg)
![Page 7: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/7.jpg)
I Play
Photo: https://static01.nyt.com/images/2016/08/19/sports/19BADMINTONweb3/19BADMINTONweb3-master675.jpg
![Page 8: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/8.jpg)
![Page 9: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/9.jpg)
Photo: http://classic.battle.net/images/battle/scc/protoss/pix/units/screenshots/d05.jpg
![Page 10: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/10.jpg)
Photo: http://resources.workable.com/wp-content/uploads/2015/08/ruby-560x224.jpg
![Page 11: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/11.jpg)
• Introduction to Statementdog
![Page 12: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/12.jpg)
• Introduction to Statementdog
• Data behind Statementdog
![Page 13: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/13.jpg)
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
![Page 14: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/14.jpg)
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
![Page 15: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/15.jpg)
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• How we design our system to solve the problems.
![Page 16: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/16.jpg)
Focus on:
• More reliable job scheduling
• Dealing with throttling issue
![Page 18: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/18.jpg)
![Page 19: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/19.jpg)
![Page 20: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/20.jpg)
![Page 21: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/21.jpg)
![Page 22: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/22.jpg)
![Page 23: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/23.jpg)
![Page 24: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/24.jpg)
![Page 25: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/25.jpg)
![Page 26: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/26.jpg)
(Revenue)
![Page 27: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/27.jpg)
(Revenue)
(EPS)
![Page 28: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/28.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
![Page 29: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/29.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
![Page 30: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/30.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
![Page 31: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/31.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
![Page 32: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/32.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
![Page 33: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/33.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable) (PMI)
![Page 34: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/34.jpg)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable) (PMI)
GDP
![Page 35: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/35.jpg)
![Page 36: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/36.jpg)
Taiwan Market Observation Post System ( )
Taiwan Stock Exchange ( )
Taiwan Depository & Clearing Corporation ( )
Yahoo Stock Feed
…
…
![Page 37: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/37.jpg)
Yearly - dividend, remuneration of directors and supervisors
Quarterly - quarterly financial statements
Monthly - Revenue
Weekly -
Daily - closing price
Hourly - stock news from Yahoo stock feed
Minutely - important news from Taiwan Market Observation Post System
![Page 38: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/38.jpg)
![Page 39: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/39.jpg)
Something like this, but written in PHP
A super long running process (1 hour+) loops from the first stock to the last one
Stock.find_each do |stock| # download xml financial report data …
# extract xml data …
# calculate advanced data …end
![Page 40: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/40.jpg)
A super long running process for quarterly report
![Page 41: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/41.jpg)
A super long running process for quarterly report
A super long running process for monthly revenue
![Page 42: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/42.jpg)
A super long running process for quarterly report
A super long running process for monthly revenue
A super long running process for daily price
![Page 43: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/43.jpg)
A super long running process for quarterly report
A super long running process for monthly revenue
A super long running process for daily price
A super long running process for news
.
.
.
![Page 44: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/44.jpg)
• Really slow
![Page 45: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/45.jpg)
• Really slow
• Inefficient - unable to only retry the failed one
![Page 46: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/46.jpg)
• Really slow
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
![Page 47: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/47.jpg)
Job 1 Job 2 Job 3Time
When the server loading is low
Job 4 Job 5
Serverloading
![Page 48: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/48.jpg)
When the server loading is HIGH
Time
Serverloading
Other task
![Page 49: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/49.jpg)
Job 1Job 2
Job 3
When the server loading is HIGH
Job 4Job 5
Time
Serverloading
Other task
![Page 50: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/50.jpg)
Job 1Job 2
Job 3
When the server loading is HIGH
Job 4Job 5
Time
Serverloading
Other task
Too many crawler processes executed at the same time
![Page 51: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/51.jpg)
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
![Page 52: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/52.jpg)
• Inherent problems of Unix Cron:
![Page 53: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/53.jpg)
• Inherent problems of Unix Cron:
• Unreliable scheduling
![Page 54: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/54.jpg)
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
![Page 55: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/55.jpg)
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
![Page 56: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/56.jpg)
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
![Page 57: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/57.jpg)
![Page 58: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/58.jpg)
Created by Mike Perham
![Page 59: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/59.jpg)
![Page 60: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/60.jpg)
Web serverRequest
Request
Request
.
.
.
Process
![Page 61: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/61.jpg)
Request
Request
Request
.
.
.
Job queue
push to queue(very fast)
Web server
Process
![Page 62: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/62.jpg)
Request
Request
Request
.
.
.
Job queue
push to queue(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
![Page 63: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/63.jpg)
Request
Request
Request
.
.
.
Job queue
push to queue(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process Add extra servers when needed
![Page 64: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/64.jpg)
Request
Request
Request
.
.
.
Job queue
push to queue(very fast)
Producer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
![Page 65: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/65.jpg)
Request
Request
Request
.
.
.
Job queue
push to queue(very fast)
Producer
Consumer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
![Page 66: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/66.jpg)
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process v.s.
Multi-threadSingle process
![Page 67: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/67.jpg)
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process 1 : 25
Multi-threadSingle process
![Page 68: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/68.jpg)
Multi-thread
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Single process
Worker process 1 : 25
With the same degree of memory consumption
![Page 69: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/69.jpg)
Sidekiq (OSS) Sidekiq Pro
Sidekiq Enterprise
![Page 70: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/70.jpg)
Sidekiq Pro Sidekiq Enterprise
Batches
Enhanced Reliability
Search in Web UI
Worker Metrics
Expiring Jobs
Rate Limiting
Periodic Jobs
Unique Jobs
Historical Metrics
Multi-process
Encryption
![Page 71: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/71.jpg)
![Page 72: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/72.jpg)
Parallelism Make Things Faster
![Page 73: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/73.jpg)
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
![Page 74: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/74.jpg)
• Efficient - only retry the failed one
• Predictable server loading
• Easy to scale out
![Page 75: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/75.jpg)
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
![Page 76: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/76.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
![Page 77: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/77.jpg)
![Page 78: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/78.jpg)
![Page 79: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/79.jpg)
–Mike Perham, CEO, Contributed Systems, Creator of Sidekiq
![Page 80: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/80.jpg)
Keep states of cron executions in our robustest part of system - database
All scheduled jobs are invoked by a particular job executed minutely
![Page 81: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/81.jpg)
Keep states of cron executions in our robustest part of system - database
All scheduled jobs are invoked by a particular job executed minutely
![Page 82: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/82.jpg)
create_table :cron_jobs do |t| t.string :klass, null: false t.string :cron_expression, null: false t.timestamp :next_run_at, null: false, index: true end
Create table for storing cron settingstable name: cron_jobs
![Page 83: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/83.jpg)
create_table :cron_jobs do |t| t.string :klass, null: false t.string :cron_expression, null: false t.timestamp :next_run_at, null: false, index: true end
Create table for storing cron settings
worker class name
![Page 84: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/84.jpg)
create_table :cron_jobs do |t| t.string :klass, null: false t.string :cron_expression, null: false t.timestamp :next_run_at, null: false, index: true end
Create table for storing cron settings
Something like 0 */2 * * *
![Page 85: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/85.jpg)
create_table :cron_jobs do |t| t.string :klass, null: false t.string :cron_expression, null: false t.timestamp :next_run_at, null: false, index: true end
Create table for storing cron settings
when will a job should be executed
![Page 86: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/86.jpg)
klass cron_expression next_run_at
Push2000NewsJobs “0 */2 * * *” …
Push2000DailyPriceJobs “0 2 * * 1-5” …
Push2000MonthlyRevenueJobs “0 0 10 * *” …
…
![Page 87: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/87.jpg)
# Add to your Cron setting every :minute do runner 'CronJobWorker.perform_async' end
Cron only schedules one job minutely
![Page 88: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/88.jpg)
class CronJobWorker include Sidekiq::Worker def perform CronJob.find_each("next_run_at <= ?", Time.now) do |job|
end end end
CronJobWorker to invoke all of your crawlers
Find jobs should be executed
![Page 89: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/89.jpg)
class CronJobWorker include Sidekiq::Worker def perform CronJob.find_each("next_run_at <= ?", Time.now) do |job| Sidekiq::Client.push( class: job.klass.constantize, args: ['foo', ‘bar'] )
end end end
CronJobWorker to invoke all of your crawlers
Push jobs to job queue
![Page 90: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/90.jpg)
class CronJobWorker include Sidekiq::Worker def perform CronJob.find_each("next_run_at <= ?", Time.now) do |job| Sidekiq::Client.push( class: job.klass.constantize, args: ['foo', ‘bar'] ) x = Sidekiq::CronParser.new(job.cron_expression) job.update!(next_run_at: x.next.to_time) end end end
CronJobWorker to invoke all of your crawlers
Setup the next execution time
![Page 91: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/91.jpg)
class CronJobWorker include Sidekiq::Worker def perform CronJob.find_each("next_run_at <= ?", Time.now) do |job| Sidekiq::Client.push( class: job.klass.constantize, args: ['foo', ‘bar'] ) x = Sidekiq::CronParser.new(job.cron_expression) job.update!(next_run_at: x.next.to_time) end end end
CronJobWorker to invoke all of your crawlers
![Page 92: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/92.jpg)
The missed job executions will be executed at next minute
![Page 93: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/93.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 94: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/94.jpg)
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 95: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/95.jpg)
table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …
![Page 96: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/96.jpg)
table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …
NewsWorker “*/30 * * * *” [popular_stock_id_1] …
NewsWorker “*/30 * * * *” [popular_stock_id_2] …
…
![Page 97: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/97.jpg)
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 98: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/98.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 99: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/99.jpg)
Sidekiq.configure_server do |config| config.periodic do |mgr| mgr.register("* * * * * *", CronJobWorker) end end
![Page 100: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/100.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 101: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/101.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 102: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/102.jpg)
![Page 103: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/103.jpg)
You always want your crawler as fast as possible
![Page 104: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/104.jpg)
However, your target server doesn’t always allow you to crawl with
unlimited rate
![Page 105: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/105.jpg)
Insert 2000 jobs to the queue at the same time
Stock.pluck(:id).each do |stock_id| SomeWorker.perform_async(stock_id) end
If you want to craw data for your 2000 stocks
![Page 106: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/106.jpg)
Assume a target server accepts request at maximum rate equals to 1 request / second
![Page 107: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/107.jpg)
Time (second)
1 2 3
job1 job2 job3
.
.
. job2000
Insert 2000 jobs to the queue at the same time
All of your jobs may be blocked (except the first one)
![Page 108: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/108.jpg)
Improvement 1 Schedule jobs with incremental delays
Stock.pluck(:id).each_with_index do |stock_id, index| SomeWorker.perform_in(index, stock_id) end
![Page 109: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/109.jpg)
Time (second)
1 2 3
job1 job2 job3
…job2000
2000
![Page 110: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/110.jpg)
Workable, but…
1
job1 job2 job3
…job2000
If the target server is unreachable
Time (second)
![Page 111: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/111.jpg)
Workable, but…
1 2 3
job1 job2 job3
…job2000
2000
If the target server is unreachable
job3~2000 will still execute at the same time
Time (second)
![Page 112: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/112.jpg)
• Limit your worker thread to perform specific job with bounded rate
• Sidekiq Enterprise provides two types of rate limiting API
![Page 113: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/113.jpg)
CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) def perform(...) CONCURRENT_LIMITER.within_limit do # crawl stock data end end
![Page 114: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/114.jpg)
CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) def perform(...) CONCURRENT_LIMITER.within_limit do # crawl stock data end end Only 10 concurrent operations inside the block
can happen at any given moment
![Page 115: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/115.jpg)
BUCKET_LIMITER = Sidekiq::Limiter.bucket('price', 10, :second) def perform(...) BUCKET_LIMITER.within_limit do # crawl stock data end end
For every second, you can perform up to 10 operations
![Page 116: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/116.jpg)
You must fine tune parameters of your limiter for each data source for better performance
![Page 117: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/117.jpg)
By far, you already got better performance.
However, the throttling control of your target server may not always be static.
Many websites are dynamically throttling controlled.
![Page 118: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/118.jpg)
If throttling detected, pause your workers for a while
![Page 119: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/119.jpg)
Redis (job queue)
![Page 120: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/120.jpg)
Redis (job queue)
default
critical
low
![Page 121: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/121.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
![Page 122: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/122.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
![Page 123: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/123.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker threadyahoo
![Page 124: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/124.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker threadyahoo
(paused)
Pause this queue when throttled
![Page 125: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/125.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Schedule a job executed after few seconds to “unpause" job in another queue
yahoo(paused)
![Page 126: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/126.jpg)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker threadyahoo
(resumed)
Resumed after the unpause queue job executed
![Page 127: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/127.jpg)
class SomeWorker include Sidekiq::Worker def perform # try to crawl something # ... if throttled queue_name = self.class.get_sidekiq_options['queue'] queue = Sidekiq::Queue.new(queue_name) queue.pause! ResumeJobQueueWorker.perform_in(30.seconds, queue_name) end end end
![Page 128: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/128.jpg)
class SomeWorker include Sidekiq::Worker def perform # try to crawl something # ... if throttled queue_name = self.class.get_sidekiq_options['queue'] queue = Sidekiq::Queue.new(queue_name) queue.pause! ResumeJobQueueWorker.perform_in(30.seconds, queue_name) end end end
![Page 129: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/129.jpg)
class SomeWorker include Sidekiq::Worker def perform # try to crawl something # ... if throttled queue_name = self.class.get_sidekiq_options['queue'] queue = Sidekiq::Queue.new(queue_name) queue.pause! ResumeJobQueueWorker.perform_in(30.seconds, queue_name) end end end class ResumeJobQueueWorker include Sidekiq::Worker sidekiq_options queue: :queue_control, unique: :until_executed def perform(queue_name) queue = Sidekiq::Queue.new(queue_name) queue.unpause! if queue.paused? end end
![Page 130: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/130.jpg)
The queue for ResumeJobQueueWorker MUST NOT equal to the paused queue
We have a dedicated queue for ResumeJobQueueWorker
![Page 131: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/131.jpg)
Decrease Sidekiq server poll interval for more precise timing control
![Page 132: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/132.jpg)
Queue pausing alleviates throttling issues Is it possible for us to do things even better?
![Page 133: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/133.jpg)
Most throttling control aim to block requests from the same IP address
![Page 134: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/134.jpg)
We can change our IP address via proxy service
![Page 135: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/135.jpg)
Sidekiq server
Target server
a.b.c.d
![Page 136: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/136.jpg)
Sidekiq server
Target server
a.b.c.d
a.b.c.d
![Page 137: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/137.jpg)
Sidekiq server
Target server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Same IP for each request
![Page 138: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/138.jpg)
Sidekiq server
Target server
a.b.c.d
Proxy service
end point
![Page 139: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/139.jpg)
Sidekiq server
Target server
a.b.c.d
Proxy service
end point
proxy servere.f.g.h
![Page 140: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/140.jpg)
Sidekiq server
Target server
a.b.c.d
a.b.c.dProxy
service end
point
proxy server
proxy server
e.f.g.h
i.j.k.l
![Page 141: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/141.jpg)
Sidekiq server
Target server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy service
end point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
![Page 142: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/142.jpg)
Sidekiq server
Target server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy service
end point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
Different IP for each request
![Page 143: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/143.jpg)
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
![Page 144: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/144.jpg)
![Page 145: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/145.jpg)
• With Sidekiq (Enterprise) and a proper design, the following problems are solved
• Slow crawler
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron
• Not easy to deal with bandwidth throttling issue
![Page 146: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/146.jpg)
![Page 147: Building Efficient and Reliable Crawler System With Sidekiq Enterprise](https://reader034.fdocuments.us/reader034/viewer/2022042706/5880ca051a28abba3b8b70ad/html5/thumbnails/147.jpg)