Collaborative Mapping with Street- level Images in the...
Transcript of Collaborative Mapping with Street- level Images in the...
![Page 1: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/1.jpg)
Collaborative Mapping with Street-
level Images in the Wild
Yubin Kuang
Co-founder and Computer Vision Lead
![Page 2: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/2.jpg)
Mapillary
Mapillary is a street-level imagery platform, powered by
collaboration and computer vision.
Image
s
Dat
a
Collaboration - Image Capture
Computer Vision
Mapillary
data
SfM/3D
reconstruct
Sign
recognition
Object
recognition
Map
featuresMap
updates
OEMs/Map
Providers
![Page 3: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/3.jpg)
Any device combined with automation can scale
infinitely
Collaborative mapping -
Capture
Phone
sAction
cams360 Dashcams Cars Professional rigs
Collaborative mapping generates fresh, diverse and global map data for HD Maps
![Page 4: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/4.jpg)
Localization and Mapping
• Structure from Motion (SfM)
• Simultaneous Localization and Mapping (SLAM)
• Positioning and scale estimation
Monocular Camera + GPS
Collaborative mapping - Computer
Vision
Sensors: Monocular Camera, GPSRedundancy: Accelerometers,
Compass, IMU, LiDAR, Radar, Stereo Camera
Recognition
• Object Recognition
• Stationary objects
• Moving objects
• Semantic Scene Understanding
• Semantic relations between the map objects
Sensors: Monocular Camera
Redundancy: LiDAR, Radar, Stereo Camera,
![Page 5: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/5.jpg)
Key Components
SfMRecognition Map Data
3D reconstruction Object recognition 3D object extraction
Monocular Camera + GPS
![Page 6: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/6.jpg)
Semantic Segmentation
3D Point cloud Semantic Point Cloud
Traffic Sign Recognition
![Page 7: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/7.jpg)
Traffic Signs Poles
Map Data - Visualization and
API
Map data from 200M images accessible worldwide through
API
![Page 8: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/8.jpg)
Challenges and
Solutions
![Page 9: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/9.jpg)
Moving Objects
- Challenges:
Differentiate between the ego motion and distractor motions in the scene
- Solutions:
- Motion segmentation: Identify motion clusters in the scene and recover ego motion
- Moving object removal: Semantically ignore moving objects in SfM
A moving bus in front of the camera
![Page 10: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/10.jpg)
Moving Objects
Imag
e
Segmentation
Static vs.
Dynamic
Before After
Removal of moving objects
![Page 11: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/11.jpg)
Action Cameras Fisheye
Equirectangular (360)
Database:
- Build a database for camera intrinsics and
projection models
Calibration:
- Crowdsourced calibration
- Self-calibration with multiple images
- End-to-end self-calibration with CNN
Camera Calibration
![Page 12: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/12.jpg)
Camera Calibration
Panorama to
Perspective
Time Travel
![Page 13: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/13.jpg)
Map Updates
- Challenges:
- Traditional SfM pipeline is designed for static/batch processing
- Map updates need to be scalable and consistent
- Solutions:
- Stream processing architecture over batch processing
- Robust local reconstruction alignments under varying imaging conditions
- Distributed map updates given GPS (straightforward)
- Handling boundary conditions
![Page 14: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/14.jpg)
Annotations - Recognition
Cityscape Dataset
- 30 object classes
- 5K fine / 20K coarse annotations
- European cities
- Diverse weather/season
- Instance labels
Mapillary Vistas Dataset (MVD)
- 100 object classes
- 25K fine annotations
- 6 continents
- Diverse weather/season/cameras
- Instance labels
Neuhold et al. ICCV 2017 Mapillary
![Page 15: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/15.jpg)
Annotations - Recognition
- Challenge:
- Annotation is time-consuming in terms of specification, annotations and QA.
- Solutions:
- Synthetic data
- GAN for domain adaptation
- Active learning
- Semi-automatic annotation
- Human in the loop
![Page 16: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/16.jpg)
Annotations - Human in the loop
Machin
e
Data Human
- Challenges:
- Turnaround time from annotations to improvement of algorithms
- Quality control is generally difficult with a large crowd of people
- Solutions:
- Fully connected backend with automatic re-training
- Work with the mapping community that understands and cares the quality of map data
![Page 17: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/17.jpg)
Annotations - Human in the loop
Machine detection to human verification Tagging to machine detection
![Page 18: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/18.jpg)
Rare Objects
- Detecting rare objects (under-represented annotations) is key to the safety and map updates
- Long tail distribution for general objects on the road e.g. a koala on the road
Number of instances for each object class in Mapillary Vistas Dataset
>100K street lights <10k mailboxes
<100 ramps
![Page 19: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/19.jpg)
Rare Objects
- Use adaptive weighting in loss functions to boost performance for rare objects
Loss Max-Pooling for Semantic Image Segmentation. Rota Bulò, Neuhold and Kontschieder CVPR 2017,
Mapillary
![Page 20: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/20.jpg)
Scaling
200 million Images
3.4 million km
15.6 billion objects
190 countries
- Challenges:
- Constant and parallel updates
- Serve billions of map features via API
- Low latency and cost-effective processing
- Time-consuming training
- Solutions:
- Streaming processing over batch processing
- Geo-Index and full-text search for map features
- Optimized GPU processing in AWS ~$5K/100M images
- In-house Titan-XP cluster significantly reduces training time
![Page 21: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/21.jpg)
Map Data - Monocular Camera
![Page 22: Collaborative Mapping with Street- level Images in the Wildon-demand.gputechconf.com/gtc-eu/2017/presentation/...Differentiate between the ego motion and distractor motions in the](https://reader034.fdocuments.us/reader034/viewer/2022042212/5eb470323c9f8a4ae26730a0/html5/thumbnails/22.jpg)
Let’s map the world
together!
To Date
200 million Images
3.4 million km mapped
15.6 billion objects
190 countries