Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

6
Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh 1 Dr.M. Usha Rani, 2 M.Kalpana Devi, 3  Dr.D.M. Mamatha 4 Dr.R.Seshadri , 5 Yaswanth Kumar.Avulapti 1  Associate Professor,  2  Research Scholar, Dept. of Computer Science,  3  Associate Professor, Dept. of Seri-Bio Sciences, SPMVV, Tirupati 4   Director, S.V.U.Computer Center S.V.University, Tirupati 5  Research Scholar, Dept of Computer Science ,S.V.University, Tirupati ABSTRACT The Widespread of Insect Vector Diseases to humans is causing substantial morbidity and economic loss to our nation. The year 2006 is likely to go down as one of the worst years in terms of public health, which has witnessed a high incidence of Insect Vector Diseases such as  Malaria,  Chikungunya,   Dengue,  Lymphatic Filariasis,  And Japanese Encephalitis. This stressed the need to track the relevant information about these diseases. The reliable and quickly retrievable clinical data on disease wise is a need of the hour with which planners can prepare their strategies to control and curb the diseases. From the aforesaid point of view this particular data warehouse (DWH) going to be handy to the planners. Key Words: Insect Vector diseases, Chikungunya, Malaria, Dengue, Lymphatic Filariasis & Japanese Encephalitis, Clinical data, Data warehouse 1. Introduction The epidemic diseases are a threat to the society starting from the stone age to till date. Even though we have good past experience about epidemic diseases but the problems are not handled in a proper way. The control of these diseases involves control of three living beings and their environment viz. man-the host, mosquito-the vector and the deadly pathogen-the parasite. Since the vector and the pathogen are highly adaptable, much of the emphasis is on man i.e. bringing the awareness in public related to the insect vector diseases. The National and International efforts over these Insect Vector Diseases control were highly successful in late 1950’s and the early 60’s. However, due to various reasons the control programs received setbacks all over the world and today it has come back with vengeance. Present epidem ic of Chikungunya in India after a gap of 30 years, is the largest ever in the world, with over 1.3 million people affected. For other mosquito borne disea ses there has been a threefold increase in Japanese Encephalitis since 2001. Malaria infects 2 million Indians annually. It is time to address the research on these lines to explore, where the system fails in combating these diseases. 2. Origin of the Research P roblem The widespread of Insect Vector diseases to human is causing substantial morbidity and economic loss to our nation. The year 2006, is likely to go down as one of the worst years in terms of public health, which has witnessed a high incidence of Insect Vector diseases such as Malaria, Chikungunya, Dengue, Japanese Encephalitis. The WHO regional office for S outh-East Asia has reported 1.3 million cases from 152 districts in 10 states/provinces of India, out of which 7,52,245 were from Karnataka alone. Impact on disease spread includes socio-economic aspects, clinical attendance and barriers to health care and lack of awareness to control the diseases. This stressed the need to track the relevant information, the various aspects and data about these dis eases. 3. Significance of the Work  The epidemic diseases are a great threat to India and there is a need to construct the data warehouse for prevention, early detection and to take control measures. There is a need t o aware the public about epidemic diseases. The information given by data warehouse is useful to the researchers, academicians, doctors, health workers and Govt. servants including common man. This data keeps us aware and forearmed to prevent such attacks in future. 4. Objectives This work is proposed to be undertaken with the following objectives:  Persons at the helm of affairs at central Government in general and State Government in particula r are worsely in need of disease wise clinical data to equip themselves with corrective cum counter strategies. The reliable and quickly retrievable clinical data on disease wise is a need of the hour with which planners can prepare their strategies to control and curb the diseases from this point of view this particular data warehouse going to be handy to the planners.  This data warehouse is for the future use of the researchers, academicia ns, Doctors, Health workers and Govt. servants including common man. This data keeps us aware and forearmed to prevent such attacks in future.  The data warehouse and analysis reports will be made publicly availab le for further research. 5. Data Warehouse Data Warehousing is a buzz-phrase that has taken the information systems’ world by storm. A data warehouse (DWH) can be looked at as an “informational database” that is maintained separately from an organization’s operational database. But that would fall short of the full technological implications of the DWH term. The process of transforming data into information and making it available to the user in a time bound manner to make a difference is known as data warehousing. In order to serve the decision making process of the managemen t the data warehouse has to supply the following primary functionality:  The DWH is a reflection of the business rules of the enterprise – not just of a specific function or business (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 5, August 2010 240 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Transcript of Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

8/8/2019 Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

http://slidepdf.com/reader/full/clinical-data-warehouse-on-insect-vector-diseases-to-human-of-andhra-pradesh 1/5

Clinical Data Warehouse on Insect Vector Diseases to Human of 

Andhra Pradesh

1Dr.M. Usha Rani,

2M.Kalpana Devi,

3 Dr.D.M. Mamatha

4 Dr.R.Seshadri , 5Yaswanth Kumar.Avulapti

1 Associate Professor,

 2 Research Scholar, Dept. of Computer Science,

 3 Associate Professor, Dept. of Seri-Bio Sciences, SPMVV, Tirupati

4  Director, S.V.U.Computer Center S.V.University, Tirupati

5 Research Scholar, Dept of Computer Science ,S.V.University, Tirupati

ABSTRACTThe Widespread of Insect Vector Diseases to

humans is causing substantial morbidity and economic loss

to our nation. The year 2006 is likely to go down as one of the worst years in terms of public health, which has

witnessed a high incidence of Insect Vector Diseases such

as  Malaria, Chikungunya,   Dengue,  Lymphatic Filariasis, 

 And Japanese Encephalitis. This stressed the need to track 

the relevant information about these diseases. The reliable

and quickly retrievable clinical data on disease wise is a

need of the hour with which planners can prepare their 

strategies to control and curb the diseases. From the

aforesaid point of view this particular data warehouse

(DWH) going to be handy to the planners.

Key Words: Insect Vector diseases, Chikungunya,

Malaria, Dengue, Lymphatic Filariasis & JapaneseEncephalitis, Clinical data, Data warehouse

1.  IntroductionThe epidemic diseases are a threat to the society

starting from the stone age to till date. Even though we have

good past experience about epidemic diseases but the

problems are not handled in a proper way. The control of these diseases involves control of three living beings and

their environment viz. man-the host, mosquito-the vector

and the deadly pathogen-the parasite. Since the vector andthe pathogen are highly adaptable, much of the emphasis ison man i.e. bringing the awareness in public related to theinsect vector diseases.

The National and International efforts over theseInsect Vector Diseases control were highly successful in late

1950’s and the early 60’s. However, due to various reasons

the control programs received setbacks all over the world

and today it has come back with vengeance. Presentepidemic of Chikungunya in India after a gap of 30 years, is

the largest ever in the world, with over 1.3 million people

affected. For other mosquito borne diseases there has been athreefold increase in Japanese Encephalitis since 2001.

Malaria infects 2 million Indians annually. It is time toaddress the research on these lines to explore, where the

system fails in combating these diseases.

2. Origin of the Research ProblemThe widespread of Insect Vector diseases to

human is causing substantial morbidity and economic loss

to our nation. The year 2006, is likely to go down as one of the worst years in terms of public health, which has

witnessed a high incidence of Insect Vector diseases such

as Malaria, Chikungunya, Dengue, Japanese Encephalitis.The WHO regional office for South-East Asia has reported

1.3 million cases from 152 districts in 10 states/provinces

of India, out of which 7,52,245 were from Karnataka alone.Impact on disease spread includes socio-economic aspects,

clinical attendance and barriers to health care and lack of awareness to control the diseases. This stressed the need to

track the relevant information, the various aspects and dataabout these diseases.

3. Significance of the Work The epidemic diseases are a great threat to India

and there is a need to construct the data warehouse forprevention, early detection and to take control measures.

There is a need to aware the public about epidemic diseases.The information given by data warehouse is useful to the

researchers, academicians, doctors, health workers andGovt. servants including common man. This data keeps usaware and forearmed to prevent such attacks in future.

4. ObjectivesThis work is proposed to be undertaken with the

following objectives:

  Persons at the helm of affairs at central Government in

general and State Government in particular are worselyin need of disease wise clinical data to equip

themselves with corrective cum counter strategies. Thereliable and quickly retrievable clinical data on disease

wise is a need of the hour with which planners canprepare their strategies to control and curb the diseases

from this point of view this particular data warehouse

going to be handy to the planners.

  This data warehouse is for the future use of the

researchers, academicians, Doctors, Health workers andGovt. servants including common man. This data keeps

us aware and forearmed to prevent such attacks infuture.

  The data warehouse and analysis reports will be made

publicly available for further research.

5. Data WarehouseData Warehousing is a buzz-phrase that has taken

the information systems’ world by storm. A data warehouse(DWH) can be looked at as an “informational database” that

is maintained separately from an organization’s operational

database. But that would fall short of the full technologicalimplications of the DWH term. The process of transforming

data into information and making it available to the user in atime bound manner to make a difference is known as data

warehousing.

In order to serve the decision making process of the

management the data warehouse has to supply the following

primary functionality:  The DWH is a reflection of the business rules of the

enterprise – not just of a specific function or business

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 5, August 2010

240 http://sites.google.com/site/ijcsis/ISSN 1947-5500

8/8/2019 Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

http://slidepdf.com/reader/full/clinical-data-warehouse-on-insect-vector-diseases-to-human-of-andhra-pradesh 2/5

unit-as they apply to strategic decision support

information.  It is the collection point for the Integrated, Subject-

Oriented strategic information that is handled by thedata acquisition process.

  It is the historical store of strategic information, with

the history relating to either the data or its relationships. 

It is the source of stable data regardless of how theprocesses may change. This requires a data model thatis not influenced by the operational processes creating

the data.

Additionally the data warehouse provides asfunctionality for the support of ad hoc queries.

5.1 The Clinical Data WarehouseA clinical data warehouse or CDWH is a facility

that houses all electronic data collected at a clinical center.

For any modern clinical institute, it is necessary to separate

operational data from informational data by creating a

clinical data warehouse. A growing number of technologies

for integrating and performing structured analyses of data

from disparate sources are competing to win the day forhealthcare organizations.A CDWH is therefore a DWH tailored for the

needs of users in a clinical environment, combining

information from a variety of legacy health-care databases

and cleansed operational data to form a centralized data

repository to answer the informational needs of all clinicalusers.

Data warehouse in clinical context havetraditionally been administrative in nature, focusing on

patient billing and patient-care management, organizational

aspect of hospitals that were optimized using datawarehouse technology not much different than

contemporary enterprises. Technology however evolved

quickly and more complex areas of clinical datamanagement could be tackled. The information technology

supported collection process of clinical data has had a longhistory, and the promise of a new technology leveraging

these collections put physicians, nurses and clinicalresearchers right next to the administrators on the map.

5.2 Extraction, Transformation and Loading -Three Stage Method

As data warehouse data are highly aggregated, very

complex relationships are constructed from various datasources. The process that is responsible for exactly that

transformations is called Extraction Transfer and

Loading(ETL) process and handles getting data out of one

data store[extraction], modify it [transfer], and inserting itinto a different data store[loading].

Data are extracted from operational databases,

legacy systems and external data sources, transformed to

match the DWH schema, and loaded into the datawarehouse database. Generally ETL is a complex

combination of processes and technology that consumes asignificant portion of the data warehouse development

resources and time. Further importance is placed on the ETL

process due to the fact that it is not a one-time event, butstaged periodically. Typical periodicity shows in monthly,

weekly, or daily updates, depending on the purpose of the

data warehouse. ETL also changes as the data warehouse

evolves, so ETL processes must designed for ease of modification.

Once the scope had been set, the relevant data hasto be identified from the raw source data available, to

formalize the approach of this task, a method which

transforms data from the raw data to the source of DWH has

three steps which are as follows:   First stage data is the raw data from operational

database.

  Second stage data is transformed, cleansed andnormalized from stage 1 data.

  Third stage data is further transformed from stage

2, optimized for final fact data representation.“Data stage” software is an ETL Tool selected to

implement data warehouse.

5.2.1 Stage 1- Raw DataData collected from the Ministry of Health and

Family Welfare, Hyderabad, Andhra Pradesh. This stage

data is considered as raw source data, which are of table

format. Data about all the five diseases such as DengueTable, Malaria Table, Chikungunya Table, Japanese

Encephalitis Table, and Filariasis Table are of the sameformat. Description of the sample table is as follows.

Dengue Table (District Name Character

Year NumberTotal Blood Samples

Collected Number

Confirmed Cases Number

Number of Deaths Number)

5.2.2 Stage 2 - Refined dataBased on the tables from Stage 1 certain design

decisions had to be made before any data modeling couldcommence. Questions like

What are the central facts?

Which are the dimensions should be focused?

Parallel to the modeling process, steps were takento get an idea of what information could be derived from

the data available. Stage one tables are used to construct textfiles. The description of data files are as follows:

Disease TableNo. of records: 6

We have taken five Mosquito Borne Diseases such

as Malaria, Dengue, Chikungunya, Lymphatic Filariasis,

and Japanese Encephalitis. Description of the table asfollows.

Disease Table (S. No Number,

Disease Id Character,Disease Name Character)

(S. No=0, No disease)

District TableNo. of records : 23 (Total number of districts in Andhra

Pradesh)

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 5, August 2010

241 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

8/8/2019 Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

http://slidepdf.com/reader/full/clinical-data-warehouse-on-insect-vector-diseases-to-human-of-andhra-pradesh 3/5

Description of the table as follows.

District table (District Id Character,

District Name Character)

Case–year TableNo. of records: Number of confirmed cases of a disease ina particular year varies based on the disease and district.

We have constructed 9 text files; each containsdata for a single year i.e. from 2000 to 2008.

Case Year Table (Case Id Character,

District Id Character,

Disease Id Character,Blood Samples

Collected or Not Logical,Year Number,

Disease Status Logical)

Later we combine all the 9 text files into a single text file

called CASE_ENTIRE_YEAR text file.

Case–District-2009 TableTotal No. of Records: 33601

We have constructed 23 text files for 23 districts

each contain data for the year 2009. (These tables are for

the current year updation.)

Case District 2009(Case Id Character,

Disease Id Character,Blood Samples Logical,

Year Number,

Disease Status Logical)

Later we combine all the 23 districts text files into a single

text file called CASE_HISTORY text file.

Death TableNo. of records: 1200

Death table (Case Id Character,

District Id Character,

Disease Id Character,

Blood Samples

Collected or not Logical,

Year Number,

Disease Status Logical,

Death Id Character)

5.2.3 Stage3-LoadingStage(Clinical Warehouse

creation)

Data from all the text files was extracted and stored

in an Oracle file, while transforming primary keys are to bespecified in the oracle table.

Fig: 1 

Data from all the Case tables from 23 districts for asingle year (for example 2009 data) are combined using link 

collector into another sequential file by using Round Robin

Algorithm.

: :

Case_2009 Job

Fig: 2

The resulted sequential table is then transformed to Oracle

table with the same attributes.

Data from all the Case-year tables for 9 years(2000 to 2008) are combined using link collector into

another sequential file by using Round Robin Algorithm.

Case _ 

Chittoor 

Text files for all

Districts

Case _ 

entire_year  

Case – id

District – id

Disease – id

Blood –

samples

Year 

Disease_

status

Text file

Case_

Kadapa

Case_

Hyderab

ad

Link 

Collector

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 5, August 2010

242 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

8/8/2019 Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

http://slidepdf.com/reader/full/clinical-data-warehouse-on-insect-vector-diseases-to-human-of-andhra-pradesh 4/5

 

:

Case_History Job

Fig: 3

The resulted sequential table is then transformed toOracle table with the same attributes.

Fact 1 Table

Hash file 1: To find the death count we apply query on theDeath table

Query applied: 

Select Count (*), District_id, Disease_id, YearFrom Death_table

Group by District_id, Disease_id, Year;

Hash file 2: To find case count we apply query on theCase table

Query applied : 

Select Count (*), District_id, Disease_id, Year

From Case_table

Group by District_id, Disease_id, Year;

Hash file 3: To find total blood samples collected we applyquery on the Case table

Query applied: 

Select Count (*), District_id, YearFrom Case_table

Where blood_samples = ‘y’Group by District_id, Year;

Fig: 4Fact 2 TableHash file 4: To find Distinct District_id, Disease_id from

Case tableQuery applied:

Select distinct District_id, Disease_id, Year

From Case_table;

Fig: 5

All the 4 Hash tables, Oracle District table and

Disease table are combined and transformed to create finalfact table. In the transformation process we apply the

following queries on the Oracle Disease table and District

table to include the names of the Disease and Districts in thefinal fact table.

Query applied on Disease table:Select Disease_tb. Disease_id, Disease_tb.Disease _name

From pro.Disease_tb Disease_tb

Where Disease_tb. Disease_id=:1;

Case _ history 

Case _

2000_

year

Case – id

District – id

Disease – id

Blood –

samples

Year

Disease - status

Text files for 2000 to

2008 years dataText file

Case_

2001_

year

Case_

2008_

year

Link 

collector

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 5, August 2010

243 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

8/8/2019 Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

http://slidepdf.com/reader/full/clinical-data-warehouse-on-insect-vector-diseases-to-human-of-andhra-pradesh 5/5

Query applied on District table:

Select District_tb.District_id, District_tb. District _nameFrom pro.District_tb, District_tb

Where District_tb.District_id =:1;

Final Fact Job:

Fig: 6

5.2.4 Data Model: Data modeling technique we used is Star schema.  

The advantage of Star schema is that it is easy tounderstand, easy to define hierarchies, reduces the number

of physical joins, and requires low maintenance and simpleMeta data. The actual data model for this data warehouse is

as follows.

5.3 Reporting Tool: Data Stage ETL Tool is used to create data

warehouse .The final fact table produced from ETL Tool

will be given to the Reporting Tool, which will produce

Reports . Reporting Tool we used is Business Objects.

Requested sample query: 

Fig: 7

Sample Report produced :

Fig: 8

Conclusions

Modern tools now come in handy to address the

issues on the disease surveillance, control, monitoring and

evaluation, where should health care centers to be situatedand what services should they offer. Monitoring andevaluation are an essential part of the health programme as

well as other programmes related to development. Hence,

there is a need to sensitize the public about epidemicdiseases. This stressed the need to construct the datawarehouse for prevention, early detection and to take

control measures. This data keeps us aware and forearmedto prevent such attacks in future.

The work is concentrated towards to build the data

warehouse. Due to the time limitation, the current historyfile is constructed only based on the data from 2000 to 2009.

This data to be extend regularly with the availability of the

next year data. This data warehouse is for the future use of the researchers, academicians, Doctors, Health workers andGovt. servants.

References 

[1] Sid Adelman, Larissa Moss, “Data Warehouse Project Management”,2000.

[2] Codd E.F., “The Clinical Data Warehousing”, 2001.

[3]  www.etl-tools.info/en/bi/etl-process.htm 

[4] Jonathan G. Geiger, “The Data Warehouse Model”, 2000.

[5] Michael Haisten, “The real-time data warehouse: the next stage in

Data warehouse evolution”, 1999.

[6] Harry Singh, “Interact ive Data Warehousing”, PHI, 1994.

[7] William Inmon, “Building the Data Warehouse”, 2nd ed, John Wiley,

New York, 1996.

[8] “Clinical Data Warehouse Functionality” Peter Villiers, SAS

Institute Inc., Cary, NC

[9] Laura Hadley, “Developing Data warehousing Architecture”, 2001.

[10] Steven R. Meyer, “Which ETL Tool is Right for You?”, DM Review,

Mar 2001.

[11] Nigel Pendse, “What is OLAP? The Codd Rules and Features”, 2008.[12] National Center for Infectious Diseases, Division of Parasitic Disease.

[13] DJ. Power, “A Brief History of Decision Support Systems”, 2002.

[14] Ralph Kimball, “The Data Warehouse Toolkit”, 1994.[15] Warren Thornthwaite, “Understanding Data Warehouse Architecture

requirements”, 2000.

[16] Usha Rani M., Jyothi S., Rama Sree R. J., “Data Warehousing andData Mining”, Ikon Books, 2009.

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 5, August 2010

244 http://sites.google.com/site/ijcsis/

ISSN 1947-5500