MapMyCab
-
Upload
preetika-kulshrestha -
Category
Data & Analytics
-
view
293 -
download
0
Transcript of MapMyCab
Motivation• Tool for Data Scientists and Cab dispatchers to analyze
(by time of day or day of week):
• cab occupancy
• miles travelled
• pickups and drop-offs
• An app for city dwellers to view real-time cab status for unoccupied cabs in a given area
Tables
• Hourly data organized by Day of Week
• Aggregate metrics stored in the same table for fast retrieval
y_m_dow c:0 c:1 c:2 c:3 c:4 … c23 c:Totals
Day of Week Hour 0 Attributes hr 1 hr 2 hr 3 hr 4 … hr 23 ..
2008_01_Mon pickups, dropoffs, avg_occ, avg_dist .. .. .. .. .. ..
sum(pickups), sum(dropoffs), avg(occ), avg(dist)
2008_01_Tue <pickups, dropoffs, avg_occ, avg_dist> .. .. .. .. .. ..
<sum(pickups), sum(dropoffs), avg(occ), avg(dist)>
2008_01_Wed <pickups, dropoffs, avg_occ, avg_dist> .. .. .. .. .. ..
<sum(pickups), sum(dropoffs), avg(occ), avg(dist)>
Hourly Aggregates by Day of Week
API and Lessons Learned
• Need to safeguard against corrupt data
• Workflow is very important when connecting different tools
About Me
• Previous Life - Senior Energy Analyst (EnerNOC Inc.).
• M.S. Electrical Engineering - North Carolina State University (focus on robotics, control systems and smart grid).
• https://github.com/PreetikaKuls
Pipeline
Script Message Broker
Real-Time Streaming
HDFS
HBase UI
MrJobPython Script
uid, lat, long, timestamp, occ
y_m_dow_h, pickups, drops, dist, occ
y_m_dow, hour(pickups, drops, dist, occ)
Hive
Data
Item SF Cabs
Description GPS coordinates of approx. 500 SF cabs collected over 30 days
Format [latitude (float), longitude (float), occupancy (boolean), time (timestamp)]
Size ~ 500 MB
Throughput 50-100 messages/sec (500 cabs, 5-10 min granularity)
Master Data SetTime CabID!
Lat | Long | OccupancyCabID!
Lat | Long | Occupancy -—>
CabID Timestamp!Lat | Long | Occupancy
Timestamp!Lat | Long | Occupancy -—>
Retrieve all data for a given time frame where latitude and longitude fall with in a specific range
Analyze data based on timestamp
Features and Example Queries
Features!
• A system that uses crowdsourcing to automatically generate parking spot information for streets
• Parking information overlaid on Google Maps
Queries!
• Does West Middlefield Road allow for street parking?
• Can I park on this street for more than 2 hours?
• Which nearby streets might have better parking availability?