Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

17
Team ByteDance-SEU-Baseline Single-person Human Pose Estimation Track of CVPR '18-LIP Challenge Speaker: Zhenqi Xu Kai Su 1,2 , Dongdong Yu 2 , Zhenqi Xu 2 , Xin Geng 1 , Changhu Wang 2 1 Southeast University, 2 ByteDance AI Lab

Transcript of Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Page 1: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Team ByteDance-SEU-BaselineSingle-person Human Pose Estimation Track of CVPR'18-LIP Challenge

Speaker: Zhenqi Xu

Kai Su1,2, Dongdong Yu2, Zhenqi Xu2,Xin Geng1, Changhu Wang2

1Southeast University, 2ByteDance AI Lab

Page 2: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Outline

● Datasets Analysis

● Method Overview

● Single Model Results

● Method Details

● Result Analysis

● Summary

● Future work

Page 3: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Datasets Analysis

PCKh is used as evaluation measure.

Dataset Number of images Keypoints

LIP training 30462 (29866 images is valid), validation 1w, testing 1w* All images are cropped from COCO dataset* The annotation is the same as MPII dataset* The image is already cropped, therefore no person detection is needed.

16

COCO training 14w+, validation 5k 17

MPII 28881 valid images for training 16

Page 4: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Method Overview

Popular Human Pose Estimation Methods

● Stacked Hourglass Networks [1]

● Cascaded Pyramid Networks [2]

[1]Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European Conference on

Computer Vision. Springer, Cham, 2016.

[2]Chen, Yilun, et al. "Cascaded Pyramid Network for Multi-Person Pose Estimation." arXiv preprint arXiv:1711.07319 (2017).

Page 5: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Cascaded Pyramid Networks (CPN)

We adopted CPN, as it performs much better than Hourglass [2].

[2]Chen, Yilun, et al. "Cascaded Pyramid Network for Multi-Person Pose Estimation." arXiv preprint arXiv:1711.07319 (2017).

Method Overview

Page 6: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Single Model Results

Team PCKh on LIP test set

Pyramid Stream Network (Multi-Model)2nd in the CVPR'17-LIP challenge

82.1

NTHU-Pose1st in the CVPR'17-LIP challenge

87.4

CPN(Resnet-101) trained on LIP trainset 87.0* batch size is only set to 16, more batch size will perform better.

Page 7: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Method Details

● COCO and MPII Datasets pretraining

● Batch Size is critical

● Ensemble models trained with different backbones

Page 8: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

COCO and MPII pretraining is critical

Team PCKh on LIP test set

Pyramid Stream Network (Multi-Model)2nd in the CVPR'17-LIP challenge

82.1

NTHU-Pose1st in the CVPR'17-LIP challenge

87.4

CPN(Resnet-101) trained on LIP trainset 87.0* batch size is only set to 16, more batch size will perform better.

CPN(Resnet-101) pretrained on COCO and MPII, finetuned on LIP

89.0* batch size is only set to 16, more batch size will perform better.

Page 9: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Batch Size is criticalTeam Pre-train Batch Size PCKh on LIP test set

NTHU-Pose1st in the CVPR'17-LIP challenge - - 87.4

CPN(Resnet-101)

N 16 87.0

CPN(Resnet-101)

Y 16 89.0

CPN(Resnet-101)

Y 24 89.8

Page 10: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Batch Size is criticalTeam Pre-train Batch Size PCKh on LIP test set

NTHU-Pose1st in the CVPR'17-LIP challenge - - 87.4

CPN(Resnet-50) Y 20 89.4

CPN(Resnet-50) Y 24 89.5

CPN(Resnet-50) Y 32 89.6

Howerver, the performance becomes saturated when increasing batch size.

Page 11: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Ensemble models trained with different backbones

Team Pre-train Batch Size PCKh on LIP test set

NTHU-Pose1st in the CVPR'17-LIP challenge - - 87.4

CPN(Resnet-50) Y 32 89.6

CPN(Resnet-101) Y 24 89.8

ensemble CPN(Resnet-50) & CPN(Resnet-101)

Y 32 & 24 90.2

Page 12: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Other Details

● Training Augmentation○ Random scale

○ Flip

○ Random rotation

● Testing Augmentation○ Flip

○ 40, -40, 20, -20 rotation

● Usually use 4 Tesla-V100 GPUs for training.

Page 13: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Result Analysis

● Details of our submission

*Hip is much more difficult to be located than other joints.

Page 14: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Summary

● CPN shows great performance for single pose estimation task.

● Pretraining on the similar datasets is critical.

● Batch size should be large enough.

● Ensemble is critical for higher performance.

● Due to “difficult” joints, more robust architectures are needed.

Page 15: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Future work

When we start trials, there are only about 10 days left, many works are left to do.

● Sync BN layer

● Explore more robust network architectures

Page 16: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Our team

Kai Su Dongdong Yu Zhenqi Xu Xin Geng Changhu Wang

Page 17: Kai SuXin Geng , Changhu Wang ByteDance AI Lab Team ...

Thanks & Questions