Isaac Shah Research paper Roberto Campos Austin...

Research paper Project proposalAustin WilsonRoberto CamposIsaac Shah

Economic impact of epidemics and pandemics

Market losses!

https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf

Market losses from a pandemic could be up to $500 billion

Lower-middle income countries are impacted more than high income countries

https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf

Industries affected

Healthcare industry sees a huge spike in costs when a pandemic occurs. Also insurance industry because of people going to the doctor

Industries affected

Agricultural industry is adversely impacted.

● In developed countries the agriculture industry is incentivized to prioritize spending on reducing infectious disease prevention.

● In less developed countries agricultural companies are not incentivized to spend to reduce infectious disease

● Some of these less developed countries may cause an infectious disease outbreak, the result being travel and trade isolation

Travel industry

● People do not want to travel to places where the disease is running rampid● People don’t want to be on planes or ships where they think there might be an outbreak● Estimated $2.8 billion loss to Mexican travel industry from H1N1

Time Series Data Mining by Phillippe Esling

● Data representation: how can time series be represented, what is the shape?

● Similarity measurement: how do we compare two time series objects● Indexing method: how can we speed up query time for big data?

Clustering

● Whole series clustering tries to maximize the distance between different clusters while also maximizing the variance within each cluster

● We can also use subsequence clustering where we try to subset a single time series into different clusters

● Classification is similar to whole series clustering where we are given sets of time series and a label for each set, the task is to train a classifier to label new time series

Segmentation

● Create an accurate approximation while reducing dimensionality of the time series

● Want to keep the essential features and drop redundant or uninsightful features

Piecewise linear approximation

● One of the most successful approaches of segmentation over the years● Try and split the time series up into segments ● Fit individual polynomial or linear cures to each segment● Slicing windows

○ Keep growing a window until it exceeds an error threshold

● Top-down○ Recursively partition a data set until some stopping criteria is met

● Bottom-up○ Start from the finest segments and iteratively merge segments

Data-adaptive vs non-data-adaptive vs model-based

● Data-adaptive: parameters are modified based on the values of consecutive segments

● Non-data-adaptive: parameters of transformation remain the same for every series

● Model-based: assume the time series has been produced by an underlying model and find the parameters of the model

Data

COVID - DetailedNovel Corona Virus 2019 Dataset

COVID - South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset

Stockshttps://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs .

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

https://www.kaggle.com/kimjihoo/coronavirusdataset

https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs

Task Perform + Task Division

Research: Isaac S. Study past events and find dataset’s that we can use to analyze the initial

problems in past situations. Locate when the problem first initiated, when the situation plateaued and when the situation returned to normal. Tableau: Isaac

Visualization will be done before choosing which models to work with. Try to find trends that are visible. Seek patterns and similarities between events. Try to map each case in a US Map and find if there is a correlation between its performance in the stock market.Modeling: Roberto Austin

Explore which types of models can be used to solve each problem. For example, should we use linear regression vs logistic regression, can we find which variables are important. Is a fully connected neural network a useful method for the problem we are currently analyzing. Should we use CNN to identify important features. Can we use SVM to categorize the different events from the past and categorize the current event COVID-19.Data Pre-Processing: Roberto Austin

Data pre-processing will play an important role. We have to analyze the types of data we will be inputting into which model. Different models require different processes.

Tools JupyterInteractive notebook to visually present our models in detail.

PythonOur language of choice to pre-process data and create ML models. We are

interested in using ANN or CNN for our model. We will also consider simple linear or log-linear models as well. R

Used in support of Python as R is a great statistical tool that provides statistical inference. It can help us mathematically prove that there is a correlation between that which we seek to answer. Tableau

A visualization tool that is versatile and creates a custom robust graph.

Progress + Experience

Initial design/case study/prototype/ experimentsWith the expertise of the team combined, we will be able to analyze and seek

Data that can help us answer our problem statement. Once the data is gathered quick visualizations will be rendered to further gain insights. All three members of the team have extensive knowledge of Tableau.

Models can be easily prototyped with the use of Sklearn and Tensorflow libraries. Two Members of the team have experience using these libraries and have access to consulting outside of the classroom.

Progress milestones what will be completed by week 11 and 14By week 11 and 14, the team will have developed visual aids and prototype

models to begin refining and preparing to approach specific details that will need to be specifically taken care of. For example, increasing the accuracy of our model.

ExperienceModeling with Sklearn and Tensor FlowModeling with RData Pre-processingTableauGoogle Colab for Big Data

sources

Economic impact of epidemics/pandemicshttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/

Time Series Data Mininghttps://www.researchgate.net/publication/261722458_Time-Series_Data_Mining

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/

https://www.researchgate.net/publication/261722458_Time-Series_Data_Mining

Thanks!

Isaac Shah Research paper Roberto Campos Austin...

Documents

Transcript of Isaac Shah Research paper Roberto Campos Austin...