Preparing for the Poster Session
Gagan Agrawal
Outline
Background on the proposal Overall research focus Equipment requested Preparing for the Site Visit
Background
A proposal submitted to the National Science Foundations (NSF) CISE Research Infrastructure program
The program targets research equipment for multi-investigator teams doing experimental computer science - typically fund 4-5 US universities each year
After initial review of proposals, a set of universities receive a site visit. Final selection based upon the site visit
History on the Proposal
Proposal involving 14 faculty / senior researchers across CIS, BMI, and OSC (Principal Investigators: Panda, Agrawal, Sadayappan, Shen, Saltz)
Proposal submitted in October 2002 (a 105 page document in all !)
Total request to NSF: $1,350,000 (+ matching from state of Ohio and OSU)
All funds for equipment and one full time support person to manage the equipment
Rated as one of the top three proposals among 22 submissions this year
8 universities are getting site visit, 4-6 to be funded
Site Visit Schedule
Scheduled for 10th March, will involve two NSF program managers and 2 experts from other universities
Agenda: Presentations about the department, our research,
requested equipment Discussion about our education programs, diversity,
etc. Meeting with Dean and Vice Provost for research Tour of facilities and demos A student poster session
Motivation / Goals for Poster Session
Graduate education is a key mission of NSF – they want to fund where it will make a difference on graduate education
Opportunity to show research beyond talks from PIs
A further opportunity to demonstrate a vibrant group of experimental computer science researchers
A further opportunity to stress our need for equipment
Why Should You Care
New equipment should help your research Having an award like this will give more
visibility to our group / department (will help you when you look for a job)
A good opportunity to present your work Posters can be reused for open houses, etc.
Your advisor will be unhappy if you don’t do a good job
Rest of this talk
A big research picture that was put in the proposal
Required to show an overall vision / synergy among the investigators
Some details of the equipment and configuration requested
Things to bring-out in your poster Some kind of questions you should be
prepared for
Overall Research Focus
Science and high performance computing are becoming data-driven
well recognized, for example in the cyberinfrastructure report Clusters are a cost-effective way for
storing large datasets (i.e. serve as data repositories) compute-intensive processing of data. SMPs are also popular architectures for compute-intensive
tasks
Processing of data may not always be feasible or desirable where data is hosted
data repositories may be shared resources may not be the best configuration for compute-intensive tasks
Grid and Cluster Computing Context
Separating processing of data from the cluster hosting the data will be the norm in a wide-area (grid) environment
However, it may also be done within an organization many users accessing the data different configuration may be better for compute-
intensive tasks Support for hosting data at a cluster, and processing
the data at another cluster or an SMP machine is critically required
a challenging problem Our overall focus
Research Challenges Better intra-cluster communication and I/O support for
data intensive and interactive applications, and for allowing shared access to data repositories
Need scheduling and resource sharing policies for such an environment
Need high-level programming support to use such an environment (middleware, compilers)
Algorithms from data intensive application areas (data mining, viz.) need to be modified or tuned for such an environment
Need to work with real applications and real datasets to drive the work
Many existing individual projects in these directions, but a common infrastructure will help integrate and evaluate the work
The Equipment we are asking for
Storage cluster - 24 nodes, 80 TB of storage, located at BMI
Compute cluster – 32 nodes, various interconnects (myrinet, quadrics, infiniband) located at CIS
SMP machine - approx. 16 CPU machine, located at CIS Visualization equipment (graphics cards, haptic
devices) High-speed networking (1.0 Gb) between CIS and BMI,
CIS and OSC, and BMI and OSC Storage and compute clusters will be upgraded during
the 4th year of the grant - inter-site networking up to 10 Gb
Overall Configuration
16-Dual Pentium 1.0 GHz
ComputeServers
VisualizationServer
VideoWall
9-Dual Pentium 1.0 GHz +
Terabytes of storage
DataServer
Ohio Supercomputer Center(Production Clusters + Storage Cluster)
GigabitEthernet
MyrinetGigaNet
Configuration Within CIS
Myrinet (Lanai 3)
16-Quad Pentium
700 MHz
16-Dual Pentium 300 MHz
8-Dual Pentium 2.4 GHz
Myrinet (Lanai 9)InfiniBand (4)Quadrics (4)
Myrinet (Lanai 7)Gigabit Ether (8)
Myrinet (Lanai 9)
Rationale
Need to experiment with applications on a distributed collection of compute, storage, and visualization resources
We want to study architectures for storage clusters and compute clusters, and therefore, want crashable resources
Need to work with data-intensive applications with very large datasets, need sufficient storage for those
We want to evaluate system software in a distributed and heterogenous environment, but need a set up that will allow repeatable experiments
Research will focus on networked clusters (and SMP machines) but is extendable to a more wide area environment through links to OSC, OSC machines, and links from OSC to elsewhere
Proposed Research
Overall theme: an integrated approach – support at low-level, incorporated into appropriate programming systems, driven or enhanced by research at algorithms level, and tested by end applications
Four components: Communication and I/O (Panda, Lauria, Wyckoff ) Middleware and Programming Systems (Saltz, Kurc,
Catalyurek, Agrawal, Saday) Data Intensive algorithms (or application areas) –
Srini, Hakan, Agrawal, Han-Wei, Raghu, Stredney (?) End applications: Saltz et al, Stredney, Raghu, Saday,
Han-wei (?)
Area 1: Communication and I/O
Need to enhance communication and I/O mechanisms
Both at the intra-cluster and inter-cluster level Specific needs for data-intensive and interactive
applications Components:
Support for point-point and collective communication, and synchronization – incorporated at the MPI, DSM layers (Panda)
Support for intra and inter cluster QoS (Panda) Support for efficient and parallel I/O at intra and inter-
cluster level (Lauria)
Area 2: Middleware and Programming Systems
Goal: High-level programming systems and policies are required to utilize multiple clusters and SMP machines
Components: Datacutter (Saltz, Kurc, Catalyurek) Compiler support on top of Datacutter (Agrawal et al.) Scheduling task graphs (Saday et al.) Scheduling across multiple tasks (Saday) Multiple Query Optimization (Saltz et al.) Middleware for Datamining (Agrawal) Indexing and declustering for data repositories (Hakan)
Area 3: Data Intensive Algorithms
Need to develop and/or fine-tune and/or evaluate algorithms and techniques in the areas of
data mining scientific data analysis, and visualization in our proposed environment and on top of the
programming systems developed Components:
Parallel data mining algorithms, particularly shared memory (Srini, Agrawal)
Scientific data analysis (Machiraju, Srini) Visualization and imaging etc. (Han-Wei, Raghu)
Area 4: End Data Intensive Applications We are working with end data-intensive, data-driven,
interactive, and/or collaborative applications to evaluate our work at the communication and I/O,
programming systems, and algorithm levels to obtain large datasets to demonstrate that our research can benefit end real
applications
Components: Time-varying scientific data visualization (Han-Wei) Oil reservoir simulation (Saltz) Medical applications (Saltz, Shen, Stredney, Machiraju) Scientific (chemistry) application (Saday) 3-d human scan analysis (Machiraju)
Things to bring out in your posters
Interesting experimental computer science research
Involving system software, Large datasets, Careful performance analysis on dedicated systems, or Involving a distributed environment
Preferably some preliminary experimental results
Show we can do quality experimental research Demonstrate need for more equipment, if
appropriate (part of future work ?) Mention existing or potential collaborations, if
appropriate
Some Questions to be Prepared for
What equipment you have used so far ? Do you feel need for any additional equipment ? For systems posters: what
benchmarks/applications you might be using in the future
See if any of existing work in the areas of visualization, data mining, end applications may be appropriate
For algorithm / application posters: what system support you could use for scaling your work, or going to distributed environments
See if any of the work on QoS, DataCutter, FREERIDE, Scheduling may be relevant
Some Logistics
A rehearsal session on 28th Feb, 3:30 – 4:30, DL 480 Final site-visit on 10th March, poster session 1:30 – 2:30
- set up from 11:30 onwards, plan to be available till 3:30 - room TBA
Poster size – 30 inch width, 36 inch height – can have 9-12 slides
Can use department poster printer (ask your advisor) – don’t use it for rehearsal
Be professional during the site visit – no unnecessary talking among yourself, no use of Hindi / Chinese / …
Dress code - ?
Top Related