(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE

Post on 19-Jan-2017

1.315 views 3 download

Transcript of (Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 1© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 1

(Very) Loose Proposalto Revamp MPI_INIT and

MPI_FINALIZEThese are the kinds

of crazy ideasthat we discuss

at the MPI ForumJeffrey M. Squyres

Cisco Systems23 September 2015

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 2

Before MPI-3.1, this could be erroneous

int my_thread1_main(void *context) { MPI_Initialized(&flag); // …}

int my_thread2_main(void *context) { MPI_Initialized(&flag); // …}

int main(int argc, char **argv) { MPI_Init_thread(…, MPI_THREAD_FUNNELED, …); pthread_create(…, my_thread1_main, NULL); pthread_create(…, my_thread2_main, NULL); // …}

These mightrun at the same time (!)

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 3

The MPI-3.1 solution• MPI_INITIALIZED (and friends) are allowed to be called at any time

…even by multiple threads…regardless of MPI_THREAD_* level

• This is a simple, easy-to-explain solutionAnd probably what most applications do, anyway

• But many other paths were investigated

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 4

MPI_INIT / FINALIZE limitations• Cannot call MPI_INIT more than once• Cannot set error behavior of MPI_INIT• Cannot re-initialize MPI after it has been finalized• Cannot init MPI from different entities within a process without a priori

knowledge / coordination

MPI Process// Library 1MPI_Initialized(&flag);if (!flag) MPI_Init(…);

// Library 2MPI_Initialized(&flag);if (!flag) MPI_Init(…);

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 5

MPI_INIT / FINALIZE limitations• Cannot call MPI_INIT more than once• Cannot set error behavior of MPI_INIT• Cannot re-initialize MPI after it has been finalized• Cannot init MPI from different entities within a process without a priori

knowledge / coordination

MPI Process// Library 1MPI_Initialized(&flag);if (!flag) MPI_Init(…);

// Library 2MPI_Initialized(&flag);if (!flag) MPI_Init(…);

THIS IS INSUFFICIENT / POTENTIALLY ERRONEOUS

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 6

1994 called.

They want their API design back.

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 7

What we should have• Call MPI_INIT as many times as you like• By whomever wants to call it

MPI Process

// Library 3MPI_Init(…);

// Library 4MPI_Init(…);

// Library 5MPI_Init(…);

// Library 6MPI_Init(…);// Library 7

MPI_Init(…);

// Library 8MPI_Init(…);

// Library 9MPI_Init(…);

// Library 10MPI_Init(…);

// Library 11MPI_Init(…);

// Library 12MPI_Init(…);// Library 2

MPI_Init(…);// Library 1MPI_Init(…);

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 8

…but that has its own complicationsDo you have to call MPI_FINALIZE exactly that many times?

Do you allow MPI_INIT after MPI_FINALIZE?

Or perhaps you only allow MPI_INIT before MPI has been finalized?

How can you tell if it’s safe to call MPI_INIT? Atomic “test-and-init”?

I IS CONFUSED

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 9

We need something new

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 10© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 10

The following are just (incomplete) crazy ideas

WARNING!

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 11

New MPI concept: a session

int my_thread1_main(void *context) { MPI_Session session; MPI_Session_create(…, &session);

// Do MPI things

MPI_Session_free(&session);}

int my_thread2_main(void *context) { MPI_Session session; MPI_Session_create(…, &session);

// Do MPI things

MPI_Session_free(&session);}

int main(int argc, char **argv) { pthread_create(…, my_thread1_main, NULL); pthread_create(…, my_thread2_main, NULL); …}

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 12

New MPI concept: a session

int my_thread1_main(void *context) { MPI_Session session; MPI_Session_create(…, &session);

// Do MPI things

MPI_Session_free(&session);}

int my_thread2_main(void *context) { MPI_Session session; MPI_Session_create(…, &session);

// Do MPI things

MPI_Session_free(&session);}

int main(int argc, char **argv) { pthread_create(…, my_thread1_main, NULL); pthread_create(…, my_thread2_main, NULL); …}

Now featuring

100% less MPI_INIT!

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 13

Create communicators from sessionsint my_thread1_main(void *context) { MPI_Session session; MPI_Session_create(&session); MPI_Comm_create_from_session(session, &comm)

// Do MPI things with comm

MPI_Comm_free(&comm); MPI_Session_free(&session);}

int my_thread1_main(void *context) { MPI_Session session; MPI_Session_create(&session); MPI_Comm_create_from_session(session, &comm)

// Do MPI things with comm

MPI_Comm_free(&comm); MPI_Session_free(&session);}

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 14

Problems that sessions solve

Each entity (library?) in an OS process can have its own session

Any session-local state can be encapsulated in the handle

Entities can create / destroy sessions at any time …in any thread

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 15

…but what about MPI_COMM_WORLD?

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 16

MPI_COMM_WORLD. Sigh.• When is MPI_COMM_WORLD created (and/or initialized)?• When is MPI_COMM_WORLD destroyed?• Can you use MPI_COMM_WORLD with any session?

There doesn’t seem to be an obvious relation between MCW and individual sessions (ditto for MPI_COMM_SELF)

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 17

What if we get rid of MPI_COMM_WORLD?

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18

Problems that solves• Addresses logical inconsistency with session concept• Clean separation of communicators between sub-entities

…maybe slightly better than we have it today (sub-entities dup’ing COMM_WORLD)

• Side effects:Fault tolerance issues become easierOpens some possibilities for scalability improvements

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 19

Problems that creates• Users will riot

…but what if they don’t?

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 20

Open questions• What would be the forward / backward compatibility strategy?

E.g., deprecate INIT, FINALIZE, INITIALIZED, FINALIZED…?

• What are the other arguments to MPI_SESSION_CREATE?• Can you call both MPI_INIT and MPI_SESSION_CREATE in the same

process?• Can you do anything else with a session?

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 21

Sooo… what happens next?

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 22© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 22

Come to MPI Forum meetings

Discuss this and otherscintillating MPI topics

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 23

Thank you.