Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples &...
description
Transcript of Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples &...
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 1/7
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 2/7
IEEE Papers:
Personalized Electronic Health Record System for Monitoring Patients with Chronic Disease
2013 IEEE Systems and Information Engineering Design Symposium (SIEDS 2013) -- April 2013
The Personalized Electronic Health Record System for Monitoring Patients with Chronic Disease (PEHRS-
MPCD) is designed to permit tracking and monitoring of the symptoms of patients with chronic disease
and provide healthcare professionals with data on patients' lifestyle changes, medication (drug)
changes, diet changes and symptom changes. The current method of assessing the symptoms of
patients with chronic disease uses an episodic approach that includes phone calls to the patient, paper
surveys of health status and on-site examinations. A preventative approach that can actively involve the
patient, monitor multiple conditions and provide real time information about a patient's health
condition is proven to be effective in chronic disease care. PEHRS-MPCD is designed to continually
monitor patients. The goal of the application is to continuously gain and provide patients' information to
themselves and healthcare professionals in-order to improve the efficiency of the diagnosis and timely
intervention which would yield better quality of care and quality of life for the patient. This personalized
electronic health record system (PEHRS-MPCD) will be objective in providing feedback about patientlifestyle changes and choices, and in channeling this information to healthcare providers. PEHRS-MPCD
would (a) allow for relevant data to be entered by the patient, (b) make relevant data available to
patient's care provider, at real-time and at doctor's visit, (c) generate reports and graphs for the data
and (d) provide secure storage of the data. PEHRS-MPCD is a work in progress as a lot of its
functionalities and user interface design are still being amended. This paper describes the purpose, need
and design of the application.
Available at: http://tinyurl.com/IEEE-Monitoring-Chronic
Smartphone Application for Transmission of ECG Images in PreHospital STEMI Treatment
2012 IEEE Systems and Information Engineering Design Symposium (SIEDS 2012) -- April 2012
An S-T segment elevation myocardial infarction (STEMI) is a severe heart attack that kills heart muscle
every minute it is left untreated. Therefore, early diagnosis and treatment are crucial for patient
survival. Currently, Charlottesville ambulances that service the University of Virginia hospital are
equipped with proprietary systems that send electrocardiogram (ECG) images while the ambulance is en
route to the hospital. From an ECG, a doctor can diagnose a STEMI prior to patient arrival and prepare
for surgery, thereby reducing the time delay. However, these ambulance systems are costly and provide
no feedback regarding the success of an ECG transmission. This paper describes the development of an
inexpensive iPhone application that transmits ECG images over the AT&T data network to the hospitalprior to a patient's arrival. The application is designed to dovetail with the existing STEMI-care protocol
used by Charlottesville-area Emergency Medical Technicians (EMTs), and it provides a novel red/green
light indicator predicting successful receipt of the image within two minutes. The goal of the application
is to improve process efficiency and information flow allowing the patient to receive early, appropriate
care and the best chance for a successful recovery. Test results, including usability tests, show that the
application fulfills all key requirements. A prototype of the application will be evaluated by
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 3/7
Charlottesville area Emergency Medical Technicians prior to implementation in emergency response
protocols and long term deployment elsewhere.
Received best paper award at IEEE Symposium in “System Design and Integration Track”
Available at: http://tinyurl.com/IEEE-Medical-App
Poster at: www.tinyurl.com/STEMI-Poster
Research Reports:
30-Day Readmission Trends and Variables of UVA Hospital Dementia Patients
One area of major concern for hospitals is high readmission rates. According to the Centers for Medicare
(CMS), 20% of Medicare patients who are discharged from hospitals are readmitted within 30 days.
Currently, patient readmissions cost the U.S government over $17 billion per year and are projected to
increase [1, 11]. Although factors such as how a patient is diagnosed, the severity of the illness, and
patient’s behavior may affect the 30-day readmission rate, the Medicare Payment Advisory Commission
(MedPAC) claims that 75% of all readmissions within 30 days can potentially be prevented if the
hospitals properly plan patient treatments [2]. Following MedPAC’s findings and recommendations, the
U.S. government wants to emphasize the readmission problem within the new Affordable Care Act by
severely penalizing any hospital with excess readmission rate within a 30-day period [3].
In this research, de-identified electronic health record system (EHRs) data on 24,954 patients with
dementia is used to predict patients who are at high-risk for 30-day readmissions. Furthermore, random
forest models are used to test with different attributes selection in-order to identify attributes necessary
for significant predictive models for 30-day readmission of Dementia patients. Our model correctly
identifies 98% of the patients that are at high risk for 30-day readmissions, as well as identifies and
investigates the importance of variables necessary to build a significant model.
Available at: http://tinyurl.com/Dementia-Patients
Building a Quantitative Case for the Medical and Economic Potential of Symptom Tracking Tools
Since 2007, the Centers for Medicare and Medicaid Services (CMS) have devised outcome measures that
focus on high quality patient care. A hospital readmission rate over a 30-day period is one such measure
that allows medical professionals and patients to critically appraise health care providers and provides aframework for hospitals to meet quality control standards. Hospital readmission rates, especially for
elderly patients, are a significant concern in U.S. Healthcare since 1 in 5 patients on Medicare &
Medicaid is re-admitted to hospitals within 30 days of their treatment. This is currently costing the U.S.
government over $17 billion per year and is projected to increase. On October 2012, the U. S.
government started implementing penalties, which could reach as high as 40 million, to hospitals with
high rates of readmission over a 30-day period. The objective of this paper is to analyze the efficacy of
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 4/7
the use of symptom tracking tools and to build a quantitative case for the medical and economic
potential of symptom tracking tools using the 30-day readmission rate metric.
Available at: http://tinyurl.com/Medical-Economic-Case
Data Analysis Writing Samples:
Cardiac Rhythm Classification
The task is to design and evaluate models to use in cardiac rhythm classification and to recommend the
best approach with smaller test cases and cross validation error. In this research, I use different machine
learning approaches such as tree learning, rule learning, and instance-based learners, and ensemble
method. The goal is to distinguish atrial fibrillation from normal sinus rhythm and a normal sinus rhythm
with ectopy. After doing different experiments with different prediction classifiers, this research shows
that Random Forest, with 500 trees and 4 attributes (HRV,LDs, COSEn, and DFA), gives thehighest prediction accuracy 93.43%.
Full 19 page report available at: http://tinyurl.com/Cardiac-Rhythm
Design Improvements for the University of Virginia Transplant Center
This study considers the number of kidney and liver transplants at UVA and comes up with an evaluation
for these organic transplants with the MCV and Duke center overall and in different ethnic group
especially for minorities. UVA has the smallest trend on the number of kidney transplants overall and in
non-white group as compared to the two centers over the period 1988 – 2012. The t-test shows that
there is a difference between the number of transplants at UVA and the other centers at 5% level. The95% bootstrap confidence interval of the mean difference also indicates that I can reject the null
hypothesis of mean difference is zero. Time series linear model is constructed to predict the mean
difference between two centers in 2013.The results from Bootstrap and Monte-Carlo simulation reveals
that the 95% prediction confidence interval does not contain zero, meaning that there is a difference
between the prediction numbers of kidney transplants. The negative confidence interval tells that the
predicted number of kidney transplants at UVA overall and for non-whites in 2013 is less than the
predicted number of kidney transplants at MCV and Duke. This suggests UVA to do better at recruiting
people overall and at recruiting people from other ethnicities. For liver transplants, it is hard to conclude
that building the new Roanoke center in 2005 has increased the number of liver transplants at UVA.
Linear model and Poisson model to model the number of liver transplants show contradict results.
Based on linear model with time series, the p-value of Roanoke variable is 0.014 and is less than 0.05.
With this model, I can reject the null hypothesis at 5% and conclude that building the Roanoke center
has increased the number of liver transplants. Meanwhile based on Poisson model with time series,
Roanoke variable does not affect the number of liver transplants and is not significant at 5% level. This
suggests UVA to do more research on liver transplants and it may be interesting to collect data at UVA
C-ville and UVA Roanoke center.
Full 34 page report available at: http://tinyurl.com/UVA-Transplant
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 5/7
Spam Filtering
In this study, I use logistic regression model to build static filter design, i.e. to classify e-mails as spam
and ham. I find that there are 3 important variables that need to be considered in filtering out spam, i.e.
frequency of some words/characters in the message, longest run-length of capital letter, and total run-
length of capital letters. The variables are highly significant in the logistic regression model at 5% level. It
is important to transform the predictors into a log-scale as this will increase the accuracy of the model.
The final model selected for spam filtering has the highest accuracy with smaller total errors (13.4%) and
false positives (7.9%) made. It also fits better based on BIC criteria. For spam filtering, I also build time
series filter design to predict the daily amount of spam e-mails. I found that there is a relationship
between the amount of spam e-mails received and time of arrivals. Time is highly significant in the linear
regression model at 5% level. For spam data, the residuals can be modeled by ARMA model with 2
autoregressive (AR) terms and 1 moving average (MA) term and this model gives the best forecast.
Meanwhile, for ham data, the residuals can be modeled by ARIMA(1,1,1) and this models gives the best
forecast with MSE 2.0. ARIMA(1,1,1) model also has the lowest AIC and BIC values. Both models shows
adequacy from the Ljung-Box Q-statistic plot since all the points are insignificant. The static and time
series filter design can be integrated to produce an overall filter design by using Bayes rule. It meansthat for any email that comes into my classifier, the probability of getting a spam e-mail is determined
by the probability of my e-mail is spam based on the static filter and the probability of my e-mail is spam
based on the time series filter.
Full 36 page report available at: http://tinyurl.com/Spam-Filtering
Analysis of Train Accidents in the U.S. During 2001 – 2012
There are many factors that can cause severity of rail accidents. Based on my findings, season is an
important factor that causes more death. I find that more fatalities occur during summer season. The
rate of change of fatalities during summer season is estimated to be about 0.22 with 95% confident
interval between 0.04 and 0.4. So it is important for the FRA to put an extra safety when the train is
running under summer season. Type of accident and cause of accident significantly affect the cost
damage at 5% level. A train accident at RR grade crossing is more likely to cause cost damage. Putting a
greater safety at RR Grade Crossing can reduced the severity of cost damage. Also, the FRA should train
well their people about safety in order to minimize human error.
Full 23 page report available at: http://tinyurl.com/Train-Accidents-Report
Air Traffic Control, Reliability Analysis and Cargo Operations
This report focusses on some of the challenges in the air transportation and the aircraft industry,
provides a detailed analysis of these challenges, and proposes solutions and recommendations.
Section 1 talks about air traffic control, focussing more on the landing queue system. Owing to the
safety measures of maintaining a certain separation distance in the queue, there are challenges that the
aircraft industry faces in terms of avoiding flight delays and better management of air traffic. A discrete
event simulation (DES) model is used and it is found that the average queue length is between 2.48 and
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 6/7
3.13. This provides room for improvement since it is desirable to have shorter queue lengths. A more
detailed analysis found that about 17.4% to 23% of the time, the queue is clogged, which is defined as
more than 5 planes in the queue. This is far from ideal because a clogged queue means flight delays and
bad customer ratings. Moreover, the total number of planes in a system for a given system requirement
is approximately 14 planes, which is again is far from ideal. The average number of planes in recircles are
also high. Thus, there is significant challenge in terms of reducing queue length, reducing clogged queuetime, number of planes in the system, number of planes in recircles, and a variety of additional issues.
Following a static analysis, the report discusses the impact of a decrease in mean and spread of landing
times by 10%, which may be because of relaxation of stringent safety rules. It is found that all the
statistics improve to a great extent because of such a small change. Hence, it becomes only advisable to
research further on whether this 10% can be incorporated without compromising on safety standards.
Another static analysis was conducted to assess the impact of a decrease in recircle distance by 10%.
This analysis does not show a significant improvement in the overall air traffic control system and hence
can be reduced in priority. The third static analysis was conducted to determine the impact of the
change in plane separation with a decrease of 10%. This results in a dramatic improvement to the
amount of time the queue is clogged. Hence, this change should definitely be considered by themanagement with a high priority. We believe that the first and the third change are highly feasible and
should be implemented following final safety tests. These recommendations can help to ensure
effective air traffic control.
Section 2 focuses on the reliability analysis in the same setup of aircraft industry. Emotionally driven
customers give a lot of importance to safety. It is critical to understand the risks and how those risks
interact with one another and affect the system as a whole. The overall likelihood of an accident is
extremely small; however with increasing air traffic, this probability grows in likelihood and becomes
even more important to the industry. Giving pilots a new dynamic control system, which will limit their
response time in the event of an inflight separation violation, has the potential to reduce this overall
risk. Thus, more errors can be absorbed by the system. Section 1 discussed that the inflight separationdistance can be reduced for effective air traffic control. However, if inflight separation distance is
reduced, then the new dynamic control system has a demerit because it actually results in more
accidents. Pilots think that there is more room for error with this system, when analysis proves that it is
not the case. The benefits from the new dynamic control system are more than offset by the negatives
of altering the inflight separation distance. It is recommended that these two options be considered in
disjunction in order to maintain high safety standards and from the reliability point of view.
Section 3 deviates from air traffic control and focusses on the cargo operations that take place in an
airport network. We built an optimization model to ensure smooth and costeffective management of
cargo operations. There are often carrier capacities at each given airport, and there is cost associated
with transporting cargoes from one airport to the other. Having an optimization model which minimizescost for the aircraft company is always desirable because it would mean more profit and insights into
improving management. The analysis done shows the complexity of such a problem, which can be seen
from the fact the Excel fails to give a feasible solution. With regards to the current system, the
conclusion is that the current carrier capacity is insufficient to achieve global optimum in a week. This is
because of sudden peak influx of cargoes at the airports which can be handled for only a short period of
time. Not having enough carriers increases the cost by 17%. In addition to purchase more carrier. The
recommendation is that weekly demand distribution be smoothened by keeping some extra cargos on
7/21/2019 Imran a. Khan – Online Presence, IEEE Papers, Research Papers, Data Analysis Writing Samples & Posters
http://slidepdf.com/reader/full/imran-a-khan-online-presence-ieee-papers-research-papers-data-analysis 7/7