Isaac Shah Research paper Roberto Campos Austin...
Transcript of Isaac Shah Research paper Roberto Campos Austin...
Research paper Project proposalAustin WilsonRoberto CamposIsaac Shah
Economic impact of epidemics and pandemics
Market losses!
https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf
Market losses from a pandemic could be up to $500 billion
Lower-middle income countries are impacted more than high income countries
Industries affected
Healthcare industry sees a huge spike in costs when a pandemic occurs. Also insurance industry because of people going to the doctor
Industries affected
Agricultural industry is adversely impacted.
● In developed countries the agriculture industry is incentivized to prioritize spending on reducing infectious disease prevention.
● In less developed countries agricultural companies are not incentivized to spend to reduce infectious disease
● Some of these less developed countries may cause an infectious disease outbreak, the result being travel and trade isolation
Travel industry
● People do not want to travel to places where the disease is running rampid● People don’t want to be on planes or ships where they think there might be an outbreak● Estimated $2.8 billion loss to Mexican travel industry from H1N1
Time Series Data Mining by Phillippe Esling
● Data representation: how can time series be represented, what is the shape?
● Similarity measurement: how do we compare two time series objects● Indexing method: how can we speed up query time for big data?
Clustering
● Whole series clustering tries to maximize the distance between different clusters while also maximizing the variance within each cluster
● We can also use subsequence clustering where we try to subset a single time series into different clusters
● Classification is similar to whole series clustering where we are given sets of time series and a label for each set, the task is to train a classifier to label new time series
Segmentation
● Create an accurate approximation while reducing dimensionality of the time series
● Want to keep the essential features and drop redundant or uninsightful features
Piecewise linear approximation
● One of the most successful approaches of segmentation over the years● Try and split the time series up into segments ● Fit individual polynomial or linear cures to each segment● Slicing windows
○ Keep growing a window until it exceeds an error threshold
● Top-down○ Recursively partition a data set until some stopping criteria is met
● Bottom-up○ Start from the finest segments and iteratively merge segments
Data-adaptive vs non-data-adaptive vs model-based
● Data-adaptive: parameters are modified based on the values of consecutive segments
● Non-data-adaptive: parameters of transformation remain the same for every series
● Model-based: assume the time series has been produced by an underlying model and find the parameters of the model
Data
COVID - DetailedNovel Corona Virus 2019 Dataset
COVID - South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset
Stockshttps://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs .
Task Perform + Task Division
Research: Isaac S. Study past events and find dataset’s that we can use to analyze the initial
problems in past situations. Locate when the problem first initiated, when the situation plateaued and when the situation returned to normal. Tableau: Isaac
Visualization will be done before choosing which models to work with. Try to find trends that are visible. Seek patterns and similarities between events. Try to map each case in a US Map and find if there is a correlation between its performance in the stock market.Modeling: Roberto Austin
Explore which types of models can be used to solve each problem. For example, should we use linear regression vs logistic regression, can we find which variables are important. Is a fully connected neural network a useful method for the problem we are currently analyzing. Should we use CNN to identify important features. Can we use SVM to categorize the different events from the past and categorize the current event COVID-19.Data Pre-Processing: Roberto Austin
Data pre-processing will play an important role. We have to analyze the types of data we will be inputting into which model. Different models require different processes.
Tools JupyterInteractive notebook to visually present our models in detail.
PythonOur language of choice to pre-process data and create ML models. We are
interested in using ANN or CNN for our model. We will also consider simple linear or log-linear models as well. R
Used in support of Python as R is a great statistical tool that provides statistical inference. It can help us mathematically prove that there is a correlation between that which we seek to answer. Tableau
A visualization tool that is versatile and creates a custom robust graph.
Progress + Experience
Initial design/case study/prototype/ experimentsWith the expertise of the team combined, we will be able to analyze and seek
Data that can help us answer our problem statement. Once the data is gathered quick visualizations will be rendered to further gain insights. All three members of the team have extensive knowledge of Tableau.
Models can be easily prototyped with the use of Sklearn and Tensorflow libraries. Two Members of the team have experience using these libraries and have access to consulting outside of the classroom.
Progress milestones what will be completed by week 11 and 14By week 11 and 14, the team will have developed visual aids and prototype
models to begin refining and preparing to approach specific details that will need to be specifically taken care of. For example, increasing the accuracy of our model.
ExperienceModeling with Sklearn and Tensor FlowModeling with RData Pre-processingTableauGoogle Colab for Big Data
sources
Economic impact of epidemics/pandemicshttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/
Time Series Data Mininghttps://www.researchgate.net/publication/261722458_Time-Series_Data_Mining
Thanks!