Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data...
Transcript of Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data...
![Page 1: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/1.jpg)
Use of Pandas for data-analysis in allocation problems
Saurabh Kumar, Shana Moothedath & Madhu N. Belur,
Department of Electrical Engineering,
Indian Institute of Technology Bombay
SciPy, 15-Dec-15
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 1 / 13
![Page 2: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/2.jpg)
Outline
1 Two typical allotment problems
2 Two separate groups to be allotted
Course to TA (Teaching Assistant)
Exam-centre to invigilator
3 Allotment versus data-analysis
4 Pandas
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 2 / 13
![Page 3: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/3.jpg)
Allocation/Matching Problems
Given two sets U and V : find a matching between U and V .
Perfect matching: each node is matched
(requires U and V to be same size)
Often: many perfect matchings: find ‘best’ in some sense
1
2
3
a
b
c
U V
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13
![Page 4: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/4.jpg)
Allocation/Matching Problems
Given two sets U and V : find a matching between U and V .
Perfect matching: each node is matched
(requires U and V to be same size)
Often: many perfect matchings: find ‘best’ in some sense
1
2
3
a
b
c
U V
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13
![Page 5: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/5.jpg)
Allocation/Matching Problems
Given two sets U and V : find a matching between U and V .
Perfect matching: each node is matched
(requires U and V to be same size)
Often: many perfect matchings: find ‘best’ in some sense
1
2
3
a
b
c
U V
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13
![Page 6: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/6.jpg)
1
2
3
a
b
c
U V
U and V have no common node. Each edge: one node in U and one in V
‘Bipartite graph’
Other examples: Hospital-patient allocation, student-college admission
But not: room-partner allotment in hostels (non-bipartite, more-complex)
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13
![Page 7: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/7.jpg)
1
2
3
a
b
c
U V
U and V have no common node. Each edge: one node in U and one in V
‘Bipartite graph’
Other examples: Hospital-patient allocation, student-college admission
But not: room-partner allotment in hostels (non-bipartite, more-complex)
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13
![Page 8: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/8.jpg)
1
2
3
a
b
c
U V
U and V have no common node. Each edge: one node in U and one in V
‘Bipartite graph’
Other examples: Hospital-patient allocation, student-college admission
But not: room-partner allotment in hostels (non-bipartite, more-complex)
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13
![Page 9: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/9.jpg)
1
2
3
a
b
c
U V
U and V have no common node. Each edge: one node in U and one in V
‘Bipartite graph’
Other examples: Hospital-patient allocation, student-college admission
But not: room-partner allotment in hostels (non-bipartite, more-complex)
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13
![Page 10: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/10.jpg)
1
2
3
a
b
c
U V
U and V have no common node. Each edge: one node in U and one in V
‘Bipartite graph’
Other examples: Hospital-patient allocation, student-college admission
But not: room-partner allotment in hostels (non-bipartite, more-complex)
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13
![Page 11: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/11.jpg)
1
2
3
a
b
c
U V
a b c
1
2
3
5 10 15
10 5 35
15 40 10
Edges have weights: preferences, eligibilities
Find maximum ‘total weight match’:
Allotter’s decision versus fairness
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 5 / 13
![Page 12: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/12.jpg)
1
2
3
a
b
c
U V
a b c
1
2
3
5 10 15
10 5 35
15 40 10
Maximum total for 1-a, 2-c and 3-b, even though 1-a value is least
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 6 / 13
![Page 13: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/13.jpg)
1st Allocation Problem: TA (Teaching Assistant) to course
Given two sets: students S and courses C : allot S and C
1 Data: students: course preferences, their score
2 Courses: number of slots: number of TA positions
Eligibility (student performance/specialization)
3 parties: students, course instructors, HoD (administrator)
Student: each wants 1st choice
Course instructor: wants high score student as TA
Administrator ensure: all students/slots are allotted
students/courses are ‘satisfied’ most
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 7 / 13
![Page 14: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/14.jpg)
1st Allocation Problem: TA (Teaching Assistant) to course
Given two sets: students S and courses C : allot S and C
1 Data: students: course preferences, their score
2 Courses: number of slots: number of TA positions
Eligibility (student performance/specialization)
3 parties: students, course instructors, HoD (administrator)
Student: each wants 1st choice
Course instructor: wants high score student as TA
Administrator ensure: all students/slots are allotted
students/courses are ‘satisfied’ most
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 7 / 13
![Page 15: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/15.jpg)
2nd allocation Problem: invigilator/centre
GATE: national level competitive exam: critical for IITs
Two sets: invigilators I and examination centres C : allot I to C
1 Data: invigilators give preference cities.
Some invigilators are more experienced/senior.
2 Centre: number of invigilators required
Centres have different experience
Three parties: invigilators, centres, Allotter
1 Invigilators want favorite city: Goa?
2 Centres want experienced invigilators
3 Allotter: No goof-ups in exam: top priority
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13
![Page 16: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/16.jpg)
2nd allocation Problem: invigilator/centre
GATE: national level competitive exam: critical for IITs
Two sets: invigilators I and examination centres C : allot I to C
1 Data: invigilators give preference cities.
Some invigilators are more experienced/senior.
2 Centre: number of invigilators required
Centres have different experience
Three parties: invigilators, centres, Allotter
1 Invigilators want favorite city: Goa?
2 Centres want experienced invigilators
3 Allotter: No goof-ups in exam: top priority
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13
![Page 17: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/17.jpg)
2nd allocation Problem: invigilator/centre
GATE: national level competitive exam: critical for IITs
Two sets: invigilators I and examination centres C : allot I to C
1 Data: invigilators give preference cities.
Some invigilators are more experienced/senior.
2 Centre: number of invigilators required
Centres have different experience
Three parties: invigilators, centres, Allotter
1 Invigilators want favorite city: Goa?
2 Centres want experienced invigilators
3 Allotter: No goof-ups in exam: top priority
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13
![Page 18: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/18.jpg)
2nd allocation Problem: invigilator/centre
GATE: national level competitive exam: critical for IITs
Two sets: invigilators I and examination centres C : allot I to C
1 Data: invigilators give preference cities.
Some invigilators are more experienced/senior.
2 Centre: number of invigilators required
Centres have different experience
Three parties: invigilators, centres, Allotter
1 Invigilators want favorite city: Goa?
2 Centres want experienced invigilators
3 Allotter: No goof-ups in exam: top priority
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13
![Page 19: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/19.jpg)
2nd allocation Problem: invigilator/centre
GATE: national level competitive exam: critical for IITs
Two sets: invigilators I and examination centres C : allot I to C
1 Data: invigilators give preference cities.
Some invigilators are more experienced/senior.
2 Centre: number of invigilators required
Centres have different experience
Three parties: invigilators, centres, Allotter
1 Invigilators want favorite city: Goa?
2 Centres want experienced invigilators
3 Allotter: No goof-ups in exam: top priority
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13
![Page 20: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/20.jpg)
Allotter’s requirement
Exam conduct smooth!
1 Equalize experience within centre
2 Balance: Centre experience + invigilator experiences
3 Balance work across people and weekends
1 Allot senior invigilator’s higher choices
2 Minimize average choice number in an allotment
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 9 / 13
![Page 21: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/21.jpg)
Allotment method
1 Construct weights carefully: student-score (seniority), preferences
Maximum weight matching: Hungarian method:
Munkres in Python
max weight matching using NetworkX
2 Markov-Chain-Monte-Carlo (MCMC):
pymc in Python
Allotment: outside scope of today’s talk
Today: data analysis, using Pandas
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 10 / 13
![Page 22: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/22.jpg)
Why analyze data: just allot?
Allotter’s dilemma
1 Is an allotment good?
2 Choices perhaps skewed
3 Seek more choices?
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 11 / 13
![Page 23: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/23.jpg)
Data decides how good an allotment can be
Typical questions:
1 Did people give sufficiently different choices?
2 Some course (city) too popular, some course (city) unwanted
3 1st weekend for GATE more sought than 2nd weekend (for 2016)
4 If data is skewed, allotment inevitably bad
5 Allot some manually: critical courses/cities
6 Quickly extract specific type of choosers
7 Obtain ‘breakups’: pie-charts, bar-diagrams
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 12 / 13
![Page 24: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f356067a3003b13dd6bd0ed/html5/thumbnails/24.jpg)
Why Pandas?
Pandas: simplifies data processing tasks:
1 Reading and writing files
2 Data extraction
3 Data manipulations
4 Getting statistics on data
5 Plotting and visualization
Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 13 / 13