Big data Ch1,2
-
Upload
jaime-a-sanchez-sanchez -
Category
Documents
-
view
219 -
download
0
Transcript of Big data Ch1,2
-
8/10/2019 Big data Ch1,2
1/20
-
8/10/2019 Big data Ch1,2
2/20
BigData
ARevolutionthatwilltransformhow
welive,workandthink
KeynotesofChapter12
ChangchunZhang
-
8/10/2019 Big data Ch1,2
3/20
Now(Chapter
1)
Examples:
Googletook
the
50
million
most
common
search
termstoidentifyareasinfectedbythefluvirus.
OrenEtzioni predictsifthepriceofplaneticketis
increasingor
decreasing
in
the
future,
to
help
customertodeterminewhentobuytheticket.
Data
become
a
raw
material
of
business,
used
tocreateanewformofeconomicvalue.
-
8/10/2019 Big data Ch1,2
4/20
Lettingthe
data
speak
Thereisnorigorousdefinitionofbigdata.
Oneway
to
think
about
the
issue:
big
data
referstothingsonecandoatalargescalethat
cannot
be
done
at
a
smaller
one,
to
extract
newinsightsorcreatenewformsofvalue,in
waysthatchangemarkets,organizations,the
relationshipsbetween
citizens
and
governments,andmore.
-
8/10/2019 Big data Ch1,2
5/20
-
8/10/2019 Big data Ch1,2
6/20
Thingsarespeedingup.Theamountofstored
informationgrows
four
times
faster
than
the
worldeconomy,whiletheprocessingpowerof
computersgrowsninetimesfaster.
Whatdoesthisincreasingmean?
Bychangingtheamount,wechangetheessence.
Whenwe
increase
the
scale
of
the
data,
we
can
donewthingsthatwerentpossiblewithsmaller
amounts.
-
8/10/2019 Big data Ch1,2
7/20
Big data is about predictions.
Big data is not about trying to teach a
computer to think like humans.
Big data is about applying math to hugequantities of data in order to infer probabilities.
Big data will change fundamental aspects of
life by giving it a quantitative dimension it
never had before.
-
8/10/2019 Big data Ch1,2
8/20
More,messy,
good
enough
Big datas ascendancy represents three shifts in
the way we analyze information that transformhow we understand and organize society:
Analyze far more data.
Loosen up our desire for exactitude.
A move away from the age-old search for
causality.
-
8/10/2019 Big data Ch1,2
9/20
Inthisnewworldwecananalyzefarmore
data.Samplingisartificialfetterbeforetheprevalence
ofhighperformancedigitaltechnologies.
Usingallthedataletsusseedetailswenever
couldwhenwewerelimitedtosmallerquantities.
-
8/10/2019 Big data Ch1,2
10/20
Loosenupthedesireforexactitude:
Withbig
data,
well
often
be
satisfied
with
asense
ofgeneraldirectionratherthanknowinga
phenomenondowntotheinch.
Whatwe
lose
in
accuracy
at
the
micro
level
we
gaininsightatthemacrolevel.
-
8/10/2019 Big data Ch1,2
11/20
Amoveawayfromtheageoldsearchfor
causality.Inabigdataworld,wewonthavetobefixated
oncausality
Instead,wecandiscoverpatternsandcorrelationsinthedatathatofferusnovelandinvaluable
insights.
Thecorrelationsmaynottelluswhyhappening,
butalertusthatishappening.
-
8/10/2019 Big data Ch1,2
12/20
-
8/10/2019 Big data Ch1,2
13/20
Bigdatachangesthenatureofbusiness,
markets,and
society.
Valueshiftedfromphysicalinfrastructureto
intangiblesandnowisexpandingtodata
whichisbecomingasignificantcorporate
asset,avitaleconomicinput,andthe
foundationof
new
business
models.
-
8/10/2019 Big data Ch1,2
14/20
Theeffectonindividualsmaybethebiggest
shockof
all.
Subjectmatterspecialisthavetocontendwith
whatthebigdataanalysissays.
Bigdatawillforceanadjustmenttotraditional
ideasofmanagement,decisionmaking,
humanresources
and
education.
-
8/10/2019 Big data Ch1,2
15/20
Duetodatasvastsize,decisionsmayoftenbe
madenot
by
humans
but
by
machines.
Darksideofbigdataareconsidered Privacy,
Individualvolition,safeguardthesanctityof
theindividual.
Newprinciplesareneededfortheageofbig
data.
-
8/10/2019 Big data Ch1,2
16/20
Summary:
Harnessingvast
quantities
of
data
rather
than
a
smallportion,andprivilegingmoredataofless
exactitude,opensthedoortonewwaysof
understanding.
Bigdataalsooverturnstheideaofidentifying
causalmechanismwhichisselfcongratulatory.
-
8/10/2019 Big data Ch1,2
17/20
More(Chapter
2)
Using all the data at hand instead of just a
portion of it.
Statisticians have shown that sampling
precision improves most dramatically withrandomness, not with increased sample size.
Random sampling has been a hug success and
is the backbone of modern measurement atscale.
-
8/10/2019 Big data Ch1,2
18/20
Fromsome
to
all
Theconceptofsamplingnolongermakesas
muchsense
when
we
can
harness
large
amountsofdata.
Samplinglosesdetail.Inmanycases,ashiftis
takingplacefromcollectingsomedatato
gatheringasmuchaspossible.,andiffeasible,
gettingeverything:
N
=all.
-
8/10/2019 Big data Ch1,2
19/20
Random sampling doesnt scale easily to include
subcategories, as breaking the results down intosmaller and smaller subgroups increases thepossibility of erroneous predictions.
Sampling also requires careful planning andexecution.
Big data is not necessarily big in absolute terms.
What classifies them as big data is that instead ofusing the shortcut of a random sample, data asmuch of the entire dataset as feasible are used.
-
8/10/2019 Big data Ch1,2
20/20
Using all the data instead of a sample isnt
always necessary. But in an increasing numberof cases using all the data at hand does make
sense, and doing so is feasible now where
before it was not.
Sampling will not be the predominant way we
analyze large data set. We will aim to go for allthe data.