Proposal - alas.matf.bg.ac.rsalas.matf.bg.ac.rs/~mi06031/Proposal.pdfProposal abstract Focusing on...

30
PROPOSAL Proposal full title: MOTIF-BASED SEQUENCE ANALYSIS TOOLS Proposal acronym: MSAT Type of funding scheme: IP Work programme topics addressed: ICT-2009.5.3: Virtual Physiological Human Name of the coordinating persons: Darko Živanović (mi06031@alas.matf.bg.ac.rs) Aleksandar Stefanović ([email protected])

Transcript of Proposal - alas.matf.bg.ac.rsalas.matf.bg.ac.rs/~mi06031/Proposal.pdfProposal abstract Focusing on...

PROPOSAL Proposal full title: MOTIF-BASED SEQUENCE ANALYSIS TOOLS Proposal acronym: MSAT Type of funding scheme: IP Work programme topics addressed: ICT-2009.5.3: Virtual Physiological Human Name of the coordinating persons: Darko Živanović ([email protected])

Aleksandar Stefanović ([email protected])

MSAT: Motif-Based Sequence Analysis Tools

2

Project partners Part. No. Organization name Short name Country 1 (Coordinator)

Faculty of Mathematics, University of Belgrade http://www.matf.bg.ac.rs

MATF Serbia

2 Institute for Genomics and Bioinformatics, Graz University of Technology http://genome.tugraz.at

GENOME Austria

3 Department of Engineering, Katholieke Universiteit http://www.esat.kuleuven.ac.be

GGS Belgium

4 Bioinformatics Centre, University of Copenhagen http://www.binf.ku.dk

BINF Denmark

5 Center for Biological Sequence Analysis, Technical University of Denmark http://www.cbs.dtu.dk

CBS Denmark

6 Department of Informatics, University of Bergen http://www.ii.uib.no

II Norway

7 Bonn-Aachen International Center for Information Technology http://www.b-it-center.de/

BIT Germany

8 Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics http://cmb.molgen.mpg.de

CMB Germany

9 Faculty of Computer Science, Technische Universität Dresden http://www.inf.tu-dresden.de

INF Germany

10 School of the Biological Sciences, University of Cambridge http://bio.cam.ac.uk

BIO UK

11 MediaPrimer, Coimbra http://www.mediaprimer.pt

MP Portugal

MSAT: Motif-Based Sequence Analysis Tools

3

Proposal abstract Focusing on crucial identified challenges related to Virtual Physiological Human, MSAT will develop a tool for discovering motifs in a group of related DNA or protein sequences. A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MSAT will represent motifs as position-dependent letter-probability matrices which will describe the probability of each possible letter at each position in the pattern. Patterns with variable-length gaps will be split by MSAT into two or more separate motifs. MSAT will take as input a group of DNA or protein sequences and will output as many motifs as requested. MSAT will use statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif. The obtained software toolkit will be disseminated via scientific meetings, media, and investor contacts.

MSAT: Motif-Based Sequence Analysis Tools

4

Table of Contents 1. Scientific and/or technical quality, relevant to the topics addressed by the call ...................... 5

1.1 Concepts and objectives ............................................................................................................... 5

1.2 Progress beyond the state-of-the-art ........................................................................................... 9

1.3 S/T Methodology and associated work plan .............................................................................. 9

1.3.1 The overall strategy of the work plan ................................................................................... 9

1.3.2 Gantt chart .............................................................................................................................. 10

1.3.3 Detailed work description .................................................................................................... 10

1.3.3.1 Work package list .............................................................................................................. 12

1.3.3.2 Deliverables list ................................................................................................................... 13

1.3.3.3 List of milestones ................................................................................................................. 14

1.3.3.4 Description of each work package ............................................................................... 15

1.3.3.5 Summary effort table......................................................................................................... 22

1.3.4 Pert chart ................................................................................................................................. 22

1.3.5 Significant risks and associated contingency plans ........................................................ 23

2. Impact ....................................................................................................................................................... 24

2.1 Expected impacts listed in the work programme .................................................................... 24

2.2 Dissemination and/or exploitation of project results ............................................................... 24

3. Ethical issues ............................................................................................................................................. 29

MSAT: Motif-Based Sequence Analysis Tools

5

1. Scientific and/or technical quality, relevant to the topics addressed by the call

1.1 Concepts and objectives The proposed system will allow users to discover signals (called 'motifs') in DNA or protein sequences. The user of MSAT will input a set of sequences believed to share some (unknown) sequence signal(s). For example, some or all of a set of promoters from co-expressed and/or orthologous genes may contain binding sites (the 'signal') for the same transcription factor. Also, a set of proteins that interact with a single host protein may do so via similar domains (the 'signal'). Both types of sequence signals can often be represented as motifs-ungapped, approximate sequence patterns. Using a process akin to gapless, local, multiple sequence alignment, MSAT will search for statistically significant motifs in the input sequence set. In that way, MSAT will be able to discover the binding sites for the shared transcription factor in the set of promoters of the common protein-protein binding domains in the set of proteins. MSAT will also be used for discovering motifs describing many other types of DNA or protein signals besides transcription factor binding sites and protein-protein interaction domains. One of the typical use cases will be as follows: The user will provide a set of sequences in the FASTA format by either uploading a file or by cut-and-paste. The only other required input will be an email address where the results will be sent. By default, MSAT will look for up to three motifs, each of which may be present in some or all of the input sequences. MSAT will choose the width and number of occurrences of each motif. By default, only motif widths between 6 and 50 will be considered, but the user will be able to change this as well as several other aspects of the search for motifs. Concept of the Proposed Project: During the course of the project several state-of-the-art algorithms will be developed and integrated to comprise the proposed system. All algorithms will work on-line. Moreover, novel data architecture will be developed to support data from heterogeneous sources (in various formats), and to promote semantically elevated analyses. Goals and S/T Objectives: During the first year, the goals are:

a) To develop algorithms for discovering motifs in DNA or protein sequences and binding sites for the shared transcription factor in the set of promoters or the common protein-protein binding domains in the set of proteins;

b) To develop new data architecture that would provide for semantically elevated analyses of data of various types and formats.

c) To develop a prototype system that will incorporate accessing data from heterogeneous sources and processing of the data.

During the second year, the goals are:

MSAT: Motif-Based Sequence Analysis Tools

6

a) To test the developed technology on statistically large test beds, using the prototype systems;

b) To enhance the accuracy and performance of the algorithms and the developed prototype system;

c) To analyze and compare obtained results. The main objectives of the proposed research: Objective 1: Development of appropriate data architecture. Adequate data architecture is of vital importance for efficient discovering motifs in DNA or protein sequences. Key issues arising are as follows: (1) architecture should allow for data from heterogeneous sources, commonly in various formats; (2) semantic component of data should be addressed properly; and (3) architecture should also deal with (partially) missing and uncertain data. Success criteria:

Data architecture capable of encapsulating various data types. Objective 2: To choose a subset of the applications of the developed technology that be covered by the Prototype implementation, specify system requirements and develop the content for the Prototype application. Based on MSAT data architecture a prototype for efficient discovering motifs in DNA or protein sequences is to be developed. First a complete set of requirements for supporting computer system will be identified, followed by selection of some features to be implemented in the prototype system for demonstrative purposes. The results will be available as final software architecture and prototype architecture. Specification and acquisition of the data will be performed along with the preparation of the data to be suitable for the Prototype system. Success criteria:

Elaborated system requirements. Elaborated prototype requirements subset.

Objective 3: To develop a prototype system. A prototype system will be developed, to cover the applications in the specified use cases. The Prototype will cover the full workflow required by a complete solution. The Prototype will provide an extendable core architecture that will enable integration of the developed algorithms. Success criteria:

Prototype solution implemented. Use case successfully made on the prototype. Data handling performance meets functional prototype requirements.

MSAT: Motif-Based Sequence Analysis Tools

7

Objective 4: To develop intelligent agents and algorithms. This subsystem will learn from the experiences of previous data accesses. For intelligent routing of the data entering the system required algorithms will be developed and sub-system integration is also to be considered. Success criteria:

The intelligent agents and algorithms developed. Algorithms achieve adequate performance.

Objective 5: To test the developed technology on a large statistical test base, using the Prototype system. Scenarios for the test will be selected and developed. A large test base will be obtained using data form multiple domains. Objective includes definition, preparation and execution of the test scenarios and overall preparing the test plan to enable further analyses and optimization. Success criteria:

Test cases developed according to the needs of the beneficiaries. Test cases pass.

Objective 6: To estimate and enhance the performance. Key issue of the following objective is to conduct system performance analysis and based on the results perform system optimization. A quantitative analysis of the computational efficiency of the developed algorithms will be conducted. Success criteria:

Improvements of the algorithms proposed. Prototype performance optimized.

Objective 7: To disseminate project results to both scientific and non-scientific public and prepare the targeted beneficiaries (end users) for the uptake of the project’s results. The dissemination effort will include two dimensions: (a) Scientific and technical meetings, and (b) Activities related to preparing the public and the market for the appearance of this new approach and product, using media. These activities will include definition of dissemination strategy plan and the organization of the MSAT events, establishing a project web-site, development of project logo and promotional materials. Success criteria:

Quality of presentation. Success rate of campaign.

MSAT: Motif-Based Sequence Analysis Tools

8

A proof that all success criteria of objectives are measurable: No. Success criteria Objective Measure 1 Data architecture capable of

encapsulating various data types. 1 Percentage of planned data

types incorporated. 2 Elaborated system requirements. 2 Percentage of the project

objectives included in the requirements above the acceptance level.

3 Elaborated prototype requirements subset.

2 Requirements pass the beneficiaries' acceptance level.

4 Prototype solution implemented. 3 Percentage of prototype requirements met above the acceptance level.

5 Use case successfully made on the prototype.

3 Use case test coverage above the acceptance level.

6 Data handling performance meets functional prototype requirements.

3 Percentage of functional prototype requirements met above the acceptance level.

7 The intelligent agents and algorithms developed.

4 Percentage of non-functional prototype requirements met above the acceptance level.

8 Algorithms achieve adequate performance.

4 Percentage of non-functional prototype regarding performance met above the acceptance level.

9 Test cases developed according to the needs of the beneficiaries.

5 Test coverage of the functional requirements provided by beneficiaries above the acceptance level.

10 Test cases pass. 5 Percentage of passed tests in the test scenarios above the acceptance level.

11 Improvements of the algorithms proposed.

6 Percentage of important algorithms according to the requirements performance at the state-of-the-art optimum level.

12 Prototype performance optimized. 6 Performance of the most important algorithms according to the requirements at the level of non-functional requirements.

13 Quality of presentation. 7 Ratio of positive vs. negative feedback.

14 Success rate of campaign. 7 Price/performance ratio of campaign.

Table 1: Measures for success criteria.

MSAT: Motif-Based Sequence Analysis Tools

9

1.2 Progress beyond the state-of-the-art Although the elaborated research examples do contribute a number of innovative approaches, none of them is holistic enough to offer an integrated approach, which is where this proposal goes beyond the current state-of-the-art. To clarify this point, the following text points to some state-of-the-art research. The conclusion is that this proposal offers a holistic approach not existing in the scientific literature. Source: Timothy L. Bailey, Nadya Williams, Chris Misleh, and Wilfred W. Li, "MEME: discovering and analyzing DNA and protein sequence motifs", Nucleic Acids Research, Vol. 34, pp. W369-W373, 2006. MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. However, the approach doesn’t include an option in the web server to let the user upload a background sequence model to MEME. Furthermore, there were not implemented algorithms for removing low-complexity regions and repeated elements. 1.3 S/T Methodology and associated work plan

1.3.1 The overall strategy of the work plan Note that the major organizational point of this proposal is to link project objectives and work packages on the basis of 1:1 correspondence (each project objective is encapsulated into a separate work package). Also, success criteria and milestones are linked 1:1 (major activity at each milestone is to check if the success criteria are satisfied). Each WP has one or more tasks, and each task has one or more deliverables. The following is extremely important to keep in mind when evaluating this proposal:

a) Since the dissemination of project results is an Achilles’ heel of most research and development projects, we have dedicated a special WP to dissemination in the widest sense: (a) to inform the scientific and technical community about the generated project results, (b) to create impact on related businesses, and (c) to prepare the market for a new product. Consequently, a special partner, with experience in making business plans for market, is incorporated, to help to market the vision of this MSAT (i.e., turn it into a product for the market, in an entrepreneurial way).

b) Since a Prototype implementation is a condicio sine qua non for this type of innovative efforts, special attention is dedicated to the testing of the Prototype implementation and its applications (by Prototype implementation, here we assume an implementation of a highly reduced subset system, which is applicable to real world situations, and can help prove the applicability of the concept in real world scenarios of interest for the final application).

In order to achieve the overall project objectives, the following items are considered as crucial:

• Competent researchers well aware of the state of the art in the research field; • State of the art research infrastructure; • Excellent ability to work, communicate and collaborate with researchers

MSAT: Motif-Based Sequence Analysis Tools

10

• From various backgrounds, in various circumstances and environments; • Large network of contacts in the research community; • Public awareness of the benefits of the research in a chosen field; • Promotion of research results and achievements.

The project is organized in 8 work packages (WP) as follows:

• WP0: Project management, BIO; • WP1: Development of MSAT data architecture, GENOME; • WP2: Development of a prototype system, INF; • WP3: Development of intelligent agents and algorithms, MATF; • WP4: Testing of the Prototype, II; • WP5: Estimating and enhancing the performance, GGS; • WP6: Development of MSAT system, CMB; • WP7: Informing the scientific and technical environment, investors, and market, MP.

1.3.2 Gantt chart

1.3.3 Detailed work description WP0: Project Management The WP0 or Project Management will guarantee smooth implementation of the project ensuring an objective-driven supervision, quality control and overall coordination of research tasks described in work packages 1-7 adapted to the requirements of this IP project. It will guarantee constant supervision of all partner-related issues, anticipating any major problems in advance, providing sound administrative and financial management coordination based on accepted rules and FP7 guidelines. WP1: Development of the MSAT data architecture

1/11 7/11 1/12 7/12 1/13

WP0

WP1

WP2

WP3

WP4

WP5

WP6

WP7

MSAT: Motif-Based Sequence Analysis Tools

11

The goal of this work package is to develop novel data architecture (in further text MSAT data architecture) for storing and handling semantically elevated data, gathered from heterogeneous data sources. WP2: Development of a prototype system The main goal of this work package is to implement a prototype computer system. Prototype will include all necessary elements for testing of the developed algorithms and concepts, and will serve also to explore issues of crucial importance for development of the full-blown system. WP3: Development of intelligent agents and algorithms This is an extensive and very important work package as it is responsible for bringing elements of artificial intelligence into the system. This includes the development of algorithms for discovering motifs in DNA or protein sequences and binding sites for the shared transcription factor in the set of promoters or the common protein-protein binding domains in the set of proteins. WP4: Testing of the Prototype The aim of this work package is to test the Prototype developed in WP3. The data will be carefully selected and prepared in to serve as a basis for testing algorithms integrated in the Prototype. The test will show how algorithms work each by itself and together as a fine tuned system. The results and feedback will be taken in consideration and will serve as guidelines and potential improvement points for the full-blown system. WP5: Estimating and enhancing the performance The aim of WP7 is to provide a quantitative analysis on the computational efficiency of the developed algorithms. WP6: Development of MSAT system The aim of this work package is to implement the full-blown system. WP7: Informing the scientific and technical environment, investors, and market This ongoing work package is focused on promoting the project and its participants, particularly research centers from the EU and raising general public and industry awareness of the potential benefits of deployment and exploitation of the research results. Different channels will be employed to promote the project. Web site will be developed for informing the public about project progress and results. The unified project logotype will be produced to provide better unified public image of the project.

MSAT: Motif-Based Sequence Analysis Tools

12

1.3.3.1 Work package list Work package No.

Work package title

Type of activity

Lead partic. No.

Lead partic. short name

Person-months

Start month

End month

WP0 Project management

MGT 10 BIO 54 M1 M24

WP1 Development of the MSAT data architecture

RTD 2 GENOME 71 M1 M6

WP2 Development of a prototype system

RTD 9 INF 189 M6 M12

WP3 Development of intelligent agents and algorithms

RTD 1 MATF 202 M6 M12

WP4 Testing of the Prototype

RTD 6 II 96 M12 M18

WP5 Estimating and enhancing the performance

RTD 3 GGS 63 M18 M24

WP6 Development of MSAT system

RTD 8 CMB 103 M18 M24

WP7 Informing the scientific and technical environment, investors, and market

OTHER 11 MP 32 M1 M24

MSAT: Motif-Based Sequence Analysis Tools

13

1.3.3.2 Deliverables list Del. No. Deliverable name WP No. Nature Dissemination

level Delivery date

D0.1 Minutes from each Steering Committee meeting

0 R PU M1 (kick-off meeting), M6, M12, M18, M24

D0.2 Periodic reports at periods specified in the Grant Agreement

0 R PU M6, M12, M18, M24

D0.3 Signed Consortium Agreement

0 R PU M1

D0.4 Final reports at the conclusion of the project

0 R PU M24

D1.1 Specification of the MSAT data architecture

1 R RE M6

D1.2 Specification of the algorithms

1 R PU M6

D2.1 Skeleton of the prototype

2 P RE M9

D2.2 Integrated prototype system

2 P CO M12

D3.1 Specification and implementation of algorithms for discovering motifs

3 R CO M9

D4.1 Specification of the test scenarios and data

4 R CO M15

D4.2 Report on testing results

4 R RE M15

D4.3 Analysis of the testing results

4 R RE M18

D4.4 Implementation recommendations

4 R RE M18

D5.1 Report on simulation results

5 R RE M22

D5.2 Report on optimization recommendations

5 R RE M22

D5.3 Implementation recommendations

5 R CO M24

D6.1 Integrated full-blown system

6 P CO M24

D7.1 Dissemination plan 7 PU M1 D7.2 Project webpage 7 PU M2 D7.3 Scientific articles 7 R PU M24

MSAT: Motif-Based Sequence Analysis Tools

14

D7.4 Articles published in local printed media

7 R PU M12, M18, M24

1.3.3.3 List of milestones Milestone No.

Milestone name WP(s) involved Expected date

Means of verification

M0.1 Kick-off meeting WP0 M1 Minutes from kick-off meeting

M0.2 Consortium agreement signed

WP0 M1 Consortium agreement with partner signatures

M0.3 Project shut down - all deliverables achieved

WP0 M24 Final deliverables report

M1 Finished analysis and design specification

WP1 M6 Comparison to the project objectives

M2 Developed prototype architecture

WP2 M12 Comparison to the specification

M3 Developed intelligent algorithms and agents

WP3 M12 Comparison to the specification

M4 Prototype tested WP4 M18 Comparison with functional requirements

M5 Prototype performance tested

WP5 M24 Comparison to the test plan

M6 MSAT prototype improved WP2, WP5 M24 Comparison to the specification and improvement recommendations

M7 Developed MSAT full-blown system

WP6 M24 Comparison to the specification

MSAT: Motif-Based Sequence Analysis Tools

15

1.3.3.4 Description of each work package Work package number

0 Start date of starting event: Beginning of M1

Work package title

Project management

Activity type

MGT

Participant number

10 All other partners

Total

Participant short name

BIO

Person-months per participant:

49 0.5 per partner

54

Objectives:

1. Ensuring an objective-driven supervision, quality control and overall coordination of research tasks described in work packages 1-8 adapted to the requirements of this IP project.

2. Constant supervision of all partner-related issues, anticipating any major problems in advance.

3. Providing sound administrative and financial management coordination based on accepted rules and FP7 guidelines.

4. Setting up a framework for communication and managing communication flow among partners.

WR Leader: BIO Description of work: T1.1: Consortium management. The management of the Consortium concerns coordination of the project from the standpoint of partner-related issues. T1.2: Research (Content) management. Content management concerns the day-to-day management of actual work on the project, its progress, ensuring the deliverables and milestones have been achieved on time, on budget and in the required quality (quality control). T1.3: Risk management. T1.4: Change management. T1.5: Quality control management. All the above tasks will be performed by BIO with support of all other partners. Deliverables:

MSAT: Motif-Based Sequence Analysis Tools

16

D0.1: Minutes from each Steering Committee meeting (M1, M6, M12, M18, M24). D0.2: Periodic reports at periods specified in the Grant Agreement (M6, M12, M18, M24). D0.3: Signed Consortium Agreement (M1). D0.4: Final reports at the conclusion of the project (M12). Milestones: M0.1: Kick-off meeting (M1). M0.2: The consortium agreement with all partners is signed (M1). M0.3: Project shut down - all deliverables achieved (M24). Work package number

1 Start date of starting event: M1

Work package title

Development of the MSAT data architecture

Activity type

RTD

Participant number

2 6 1 Total

Participant short name

GENOME II MATF

Person-months per participant:

53 10 8 71

Objectives:

1. The objective of WP1 is to define the MSAT data architecture. WR Leader: GENOME Description of work: T1.1: Definition of the MSAT data architecture (GENOME, II). A novel data architecture for storing and handling of semantically elevated data, gathered from heterogeneous data sources will be developed. T1.2: Definition of the algorithms (MATF). All the above tasks will be performed by BIO with support of II and MATF. Deliverables: D1.1: Specification of the MSAT data architecture (M6). D1.2: Specification of the algorithms (M6). Milestones: M1: Finished analysis and design specification (M6).

MSAT: Motif-Based Sequence Analysis Tools

17

Work package number

2 Start date of starting event: M6

Work package title

Development of a prototype system

Activity type

RTD

Participant number

9 3 8 2 Total

Participant short name

INF GGS CMB GENOME

Person-months per participant:

136 24 17 12 189

Objectives:

1. The objective of WP2 is to implement the prototype computer system that will serve as a proof-of-concept.

WR Leader: INF Description of work: T2.1: Implementation of the MSAT prototype architecture (INF). Main result of the task will be a skeleton of the prototype system which will enable seamless integration of all components and will support required internal and external interoperability. T2.2: Implementation of the MSAT data architecture (GGS, CMB, GENOME). All the above tasks will be performed by INF with support of GGS, CMB and GENOME. Deliverables: D2.1: Skeleton of the prototype (M9). D2.2: Integrated prototype system (M12). Milestones: M2: Developed prototype architecture (M12). Work package number

3 Start date of starting event: M6

Work package title

Development of intelligent agents and algorithms

Activity type

RTD

Participant number

1 All other

Total

MSAT: Motif-Based Sequence Analysis Tools

18

partners Participant short name

MATF

Person-months per participant:

177 2.5 per partner

202

Objectives:

1. The objective of WP3 is to develop AI routines and algorithms for discovering motifs in a group of related DNA or protein sequences.

WR Leader: MATF Description of work: T3.1: Development of intelligent algorithms for discovering motifs in a group of related DNA or protein sequences. The above task will be performed by MATF with support of all other partners excluding MP. Deliverables: D3.1: Specification and implementation of algorithms for discovering motifs (M9). Milestones: M3: Developed intelligent algorithms and agents (M12). Work package number

4 Start date of starting event: M12

Work package title

Testing of the Prototype

Activity type

RTD

Participant number

6 All other partners

Total

Participant short name

II

Person-months per participant:

81 1.5 per partner

96

Objectives:

1. The objective of WP3 is to define strategy of Prototype in various scenarios. 2. To test the system.

WR Leader: II

MSAT: Motif-Based Sequence Analysis Tools

19

Description of work: T4.1: Data specification and acquisition. The data sources will be carefully selected to (1) address the prototype domain and (2) present the challenge for the developed algorithms. T4.2: Preparation of the test scenarios T4.3: Execution of the test scenarios T4.4: Analyze the results and implement improvements All the above tasks will be performed by II with support of all other partners. Deliverables: D4.1: Specification of the test scenarios and data (M15). D4.2: Report on testing results (M15). D4.3: Analysis of the testing results (M18). D4.4: Implementation recommendations (M18). Milestones: M4: Prototype tested (M18). Work package number

5 Start date of starting event: M18

Work package title

Estimating and enhancing the performance

Activity type

RTD

Participant number

3 Total

Participant short name

GGS

Person-months per participant:

63 63

Objectives:

1. Analyze the developed system from the performance standpoint. 2. Optimize and accelerate the implemented algorithms.

WR Leader: GGS Description of work: T5.1: Algorithmic engineering. During this task, all developed algorithms will be simulated, in order to make an early estimate of

MSAT: Motif-Based Sequence Analysis Tools

20

their performance in practice and accordingly fine-tune them. T5.2: System performance analysis. T5.3: System optimization. All the above tasks will be performed by GGS with support of all other partners. Deliverables: D5.1: Report on simulation results (M22). D5.2: Report on optimization recommendations (M22). D5.3: Implementation recommendations (M24). Milestones: M5: Prototype performance tested (M24). M6: MSAT prototype improved (M24). Work package number

6 Start date of starting event: M18

Work package title

Development of MSAT system

Activity type

RTD

Participant number

8 1 10 Total

Participant short name

CMB MATF BIO

Person-months per participant:

80 14 9 103

Objectives:

1. The objective of WP6 is to develop full-blown MSAT system. WR Leader: CMB Description of work: T6.1: Implementation of the MSAT full-blown system architecture (CMB, MATF, BIO). The above task will be performed by CMB with support of MATF and BIO. Deliverables: D6.1: Integrated full-blown system (M24). Milestones:

MSAT: Motif-Based Sequence Analysis Tools

21

M7: Developed MSAT full-blown system (M24). Work package number

7 Start date of starting event: Beginning of M1

Work package title

Informing the scientific and technical environment, investors, and market

Activity type

OTHER

Participant number

11 All other partners

Total

Participant short name

MP

Person-months per participant:

27 0.5 per partner

32

Objectives:

1. To promote the project and its participants, particularly research centers from EU. 2. To effectively disseminate achievements of the project on international, national and

local level. WR Leader: MP Description of work: T7.1: Definition of dissemination strategy plan. T7.2: Promotion of the project and its activities. T7.3: Publications. Throughout the entire duration of the project a number of scientific publications will be produced. All the above tasks will be performed by MP with support of all other partners excluding MP. Deliverables: D7.1: Dissemination plan (M1). D7.2: Project webpage (M2). D7.3: Scientific articles (M24). D7.4: Articles published in local printed media (M12, M18, M24).

MSAT: Motif-Based Sequence Analysis Tools

22

1.3.3.5 Summary effort table The table below shows how the anticipated effort will be distributed among participants and work packages. Partners WP0 WP1 WP2 WP3 WP4 WP5 WP6 WP7 Total

person months

MATF 0.5 8 0 177 1.5 0 14 0.5 201.5 GENOME 0.5 53 12 2.5 1.5 0 0 0.5 70 GGS 0.5 0 24 2.5 1.5 63 0 0.5 92 BINF 0.5 0 0 2.5 1.5 0 0 0.5 5 CBS 0.5 0 0 2.5 1.5 0 0 0.5 5 II 0.5 10 0 2.5 81 0 0 0.5 94.5 BIT 0.5 0 0 2.5 1.5 0 0 0.5 5 CMB 0.5 0 17 2.5 1.5 0 80 0.5 102 INF 0.5 0 136 2.5 1.5 0 0 0.5 141 BIO 49 0 0 2.5 1.5 0 9 0.5 62.5 MP 0.5 0 0 0 0 0 0 27 27.5 Total 54 71 189 202 96 63 103 32 806 1.3.4 Graphical presentation of the components showing their

interdependencies (Pert chart)

MSAT: Motif-Based Sequence Analysis Tools

23

1.3.5 Significant risks and associated contingency plans The research project is continuously monitored by project management and thoroughly evaluated twice per year at risk analysis meetings. All identified risks will be ranked in terms of a potential impact on the project and probability of a risk actually taking place (impact multiplied by probability). Specific measures to counteract the risks will be defined for each risk and action point assigned to people responsible for following them up. The progress will be followed up regularly until the risk is mitigated. Between the risk analysis meetings, it will be an ongoing responsibility of the project management to identify potential risks. This ongoing process will make sure that the project stays in line with the initial and possibly evolving planning, and that the quality of the work, the deliverables and the results stay at the highest level for wide acceptance. At the first project meeting a risk analysis session will be held to identify a list of major potential obstacles. Following the process explained above, an action plan will be defined and followed up at subsequent project meetings. Potential risks envisaged at this moment are: Risk #1: Delays in development, upsetting the work schedule. Contingency plan: Monthly progress reports will be checked for progress. If necessary, additional effort will be organized in time. Risk #2: Difficulties in acquiring appropriate data/permissions to use data for the test bed. Contingency plan: Three separate, independent, domains will be examined, allowing for independent testing. Sufficient time has been allocated for finding alternative data sources. Together with regular reporting activities (in line with the project plan) goes an obligation on all the project staff for immediate reporting of 'out of line' situations to project management who will assess the issues and deal with them.

MSAT: Motif-Based Sequence Analysis Tools

24

2. Impact 2.1 Expected impacts listed in the work programme The need for a European approach The substantive activities proposed in this project are of direct relevance to the scientific priorities of each partner, each of which has a key national role in their respective country. In most of the countries the work will compliment ongoing research activities in the respective research area carried out by each partner. Thus the results of the project will be exploited directly by the partner countries but at the same time will also be directly usable for the EC. The project bases itself to a significant level on the past and ongoing European research projects and aims at offering a solution that goes beyond the current European state-of-the-art. External factors The readiness of the final users to uptake the project results will determine in a significant manner the project’s impact. Dissemination activities will be crucial to present the benefits of the newly developed solution and stimulate its uptake. At its current state the proposal is in line with all the EU standards in this field. 2.2 Dissemination and/or exploitation of project results, and management of

intellectual property MSAT will use a multi-level approach including tailor-made dissemination tools and activities depending on the target audiences and their needs. The specification of the target groups and the definition of appropriate objectives is the first step. MSAT has identified target groups on three different levels: European level, project level and local level. Traditional dissemination activities (e.g. web site, e-newsletter, brochure etc.) proved their importance already in many other similar research projects and will also form the basis for MSAT. But MSAT will show its innovative character through a full set of new and innovative dissemination activities. For all three target groups a full set of traditional dissemination activities accompanied by innovative dissemination activities will be carried out.

MSAT: Motif-Based Sequence Analysis Tools

25

Figure: Dissemination management on project MSAT. European Level The main dissemination objectives for MSAT on European level are:

1. To provide high quality contributions to the horizontal dissemination project and to support the wide-spread knowledge and awareness of MSAT project;

2. To disseminate MSAT at already existing international events Target Group(s): 3. Decision-makers on EU, regional and national level; 4. International ICT research community; 5. Technology and equipment providers; 6. Media.

Dissemination activities: MSAT will identify a list of already existing international events in the fields of MSAT research and will check with the organizers the possibility of having »piggy back« activities (e.g. holding a workshop, distributing MSAT brochures etc.). A number of scientific articles will be published in international peer-reviewed journals. Finally, a number of joint meetings will be organized with other FP7 projects. Project Level The main dissemination objectives for MSAT on project level are:

MSAT: Motif-Based Sequence Analysis Tools

26

To support local dissemination and demonstration of new research infrastructure; To provide quality assurance for local dissemination; To ensure empowerment of MSAT working staff; To effectively disseminate achievements of the project to international, national and

local level. Target Group(s):

All MSAT partners, associated partners and their subcontractors; End users.

Dissemination activities: MSAT will use several dissemination tools to disseminate news, recommendations and results. To this end a project web site will be installed. A public mailing list will be setup to disseminate the latest updates to interested users. A discussion board will be available for on-line discussions. The following information will be provided on the web site:

Project objectives and achievements. Public deliverables in electronic forms. Key persons and contacts. Advertisement and announces of forthcoming public events organized by the project

(seminars, conferences, press events) Reports on events completed and project updates. Technical achievements and demo information/updates. Co-operation with similar projects/external bodies, and references to publications and

other miscellaneous information. The objective of the web site and presentation is to provide an entrance point for the community and make sure that the project is appropriately presented and represented and that:

The project is widely known and information is easily accessible; The project objectives, aims and scientific approaches are well understood.

Furthermore a project brochure, an e-newsletter and a final MSAT conference will be carried out. During the course of MSAT a high amount of success stories, recommendations and barriers will be encountered. A lot of this valuable information could simply be lost due to using the wrong dissemination approach. Therefore, MSAT will establish an innovative dissemination activity on project level, namely the so called »story telling principle«. It is based on the word of mouth approach and will be utilized within MSAT to spread most interesting success stories or recommendations »legends«) to certain target groups. This approach is innovative as it is repeating certain MSAT »legends« again and again. This will ensure that they will keep in people’s mind and secondly the story telling principle is based on the peer-group approach. This means that people from the same target group or profession are talking to each other. Events will be organized according to the developed plan, and will include three workshops (one during each year of the project) and three seminars (one during each year of the project).

MSAT: Motif-Based Sequence Analysis Tools

27

Particular attention will be paid to disseminating project results to the potential final users and beneficiaries and for that purpose a number of visits to potential beneficiaries will be organized not only in the countries involved in the projects but also more widely. University students, as potential researchers of the future, will be also targeted by the dissemination activities. Presentation of the project and discussion of its results will be organised at the different universities involved. Local level The main dissemination objectives for MSAT on local level are:

To ensure visibility of MSAT for the target citizens and customers; To ensure that MSAT is well known among the local partners.

Target Group(s):

Local relevant stakeholders; Researchers and students; Citizens; Relevant NGOs; Relevant associations and networks; Newspapers and professional magazines.

Dissemination activities: MSAT will work together with local media (daily newspapers, magazines, TV, radio) to ensure that MSAT gets well known also on the local level. Dissemination through standard media channels is a traditional way of promotion, which includes local newspapers and scientific or administrative publications, newsletters, television and radio at a local or a national level, brochures, CD-ROMs, etc. The objective of dissemination through media channels is as follows:

To enlarge awareness of the technology and applications in the local community; To promote researchers in the local community.

This is a continuous action preferably through channels that the project partners already use, or have easy access to. Clustering The project sees a great potential for cooperation with other initiatives with the same strategic objective and beyond. Because of its focus on research environment, the project will identify several initiatives. Exploitation of results MSAT innovative results (products, areas of knowledge and good practice) will be exploited at local, national and European level. As far as the commercial exploitation of the results is concerned, the assessment of needs in the European market in targeted areas will be done at

MSAT: Motif-Based Sequence Analysis Tools

28

the initial stage of the project implementation, in order to analyze the situation and to define and to channel the exploitation opportunities. Based on this broad knowledge the Exploitation Plan will be prepared, which will link the existing needs and market opportunities with the results gained through the implementation of MSAT project. The exploitation of the project results for further research activities will be carried out by the research partners involved in the projects and the researchers from the scientific community. There are several possible directions for further research. Through several dissemination and training channels the information on exploitation potentials will be launched to targeted audiences at different levels:

Exploitation of results by and between partner organizations during and after project conclusion;

Exploitation of results by relevant decision-makers; Exploitation of results by the research community (as an indicator of a need for further

research and development); Exploitation of results by interested citizens.

With such approach MSAT will assure the maximum possible promotion and also support of the exploitation of project result to the broad number of potential users. As a part of exploitation related efforts there will be also a presentation of recommendations, positive and negative lessons learned, and information related to feasibility of a certain measure. The IPR management will involve:

Periodic revisions of experimental data from each work package in search for exploitable results (WPL and PC);

Decision on dissemination of project’s intellectual property (SC); Decision on patent filing and IPR distribution (SC).

The Work Package Leaders and the Project Coordinator will be responsible for monitoring all project results and evaluation of the possibilities of IP protection of the foreground created by the project. The Steering Committee will make the final approval of the use and dissemination of the exploitable joint results generated by the project. As per general agreement among all partners at this stage, all IPR matters will be arranged in detail in the Consortium Agreement to be signed before the signature of the Grant Agreement. This will include a positive list of background that each partner makes available to the project, specific arrangements for the exploitation and protection of the foreground. The Work Package Leaders and the Project Coordinator will be responsible for making sure that no project foreground is disseminated prior to the exploration of the IP protection possibilities and a decision of the Steering Committee regarding the same. The default FP7 rules regarding access rights will be respected by all members of the consortium.

MSAT: Motif-Based Sequence Analysis Tools

29

3. Ethical issues The proposed project does not directly involve any ethical, legal, social or safety issues. Due to the nature of this project however, in an indirect way, the consortium will also have to deal with ethical, legal, social and safety issues relating to the research projects assisted and coached as consequence of the work foreseen in this proposal. Therefore, training measures for researchers will clearly also cover all relevant ethical, legal, social and safety issues. The consortium will endorse a project Ethical committee to clarify any doubts of their work along all activities, measurement protocols, data collection, data presentation, and data transfer. The consortium is fully aware of the importance of ethical issues particularly in this call. Applicants confirm that the proposal does not raise sensitive ethical, legal, social or safety questions related to: human beings, human biological samples, and personal data (whether identified by name or not), genetic information, and experiments on animals. Applicants confirm that the proposed research does not involve:

Research activity aimed at human cloning for reproductive purposes; Research activity intended to modify the genetic heritage of human beings which could

make such changes heritable; Research activity intended to create human embryos solely for the purpose of research

or for the purpose of stem cell procurement, including by means of somatic cell nuclear transfer.

Special care will be stressed on the environmental issues. YES/NO PAGE Informed Consent Does the proposal involve children? NO Does the proposal involve patients or persons not able to give consent? NO Does the proposal involve adult healthy volunteers? NO Does the proposal involve Human Genetic Material? NO Does the proposal involve Human biological samples? NO Does the proposal involve Human data collection? NO Research on Human Embryo/fetus Does the proposal involve Human embryos? NO Does the proposal involve Human Fetal Tissue/Cells? NO Does the proposal involve Human Embryonic Stem Cells? NO Privacy Does the proposal involve processing of genetic information or personal data (e.g. health, sexual lifestyle, ethnicity, political opinion, religious or philosophical conviction)?

Does the proposal involve tracking the location or observation of people?

NO

Research on Animals Does the proposal involve research on animals? NO Are those animals transgenic small laboratory animals? NO Are those animals transgenic farm animals? NO Are those animals cloned farm animals? NO Are those animals non-human primates? NO

MSAT: Motif-Based Sequence Analysis Tools

30

Research Involving Developing Countries Use of local resources (genetic, animal, plant etc.) NO Use of local community NO Dual Use Research having direct military application NO Research having the potential for terrorist abuse NO ICT Implants Does the proposal involve clinical trials of ICT implants? NO I CONFIRM THAT NONE OF THE ABOVE ISSUES APPLY TO OUR PROPOSAL YES