Healthcare Costs

Post on 16-Aug-2015

19 views 2 download

Tags:

description

data analysis on healthcare costs

Transcript of Healthcare Costs

8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 1/13HealthcareCostsIstumbleduponsomedataregardinghealthcarecostsforvariousmedicalproceduresandconditionsintheUSA.Andso,Iamgoingtoanalyzethedataandseewhatwecanfindoutfromit.First,letmesetupthisIpythonnotebookwiththenecessaryrequirementsforourwork.Iamalsogoingtoopenthefilecontainingthedata(thecsvfile)andpresentthefirst3rowsofdatainordertohaveanideaofwhatweareworkingwith.Somecomputercodewillbefeatured.Butifyouarenotacomputerprogrammer,don'tworryaboutit.Youwillnotneedthatskilltounderstandthiswork.Herewego.In[5]: #importingthevariousmoduleswewillbeusingimportpandasaspdimportnumpyasnppd.set_option("display.mpl_style","default")#readingthedatafileandshowingthefirst3rowshealthdata=pd.read_csv("healthcarecosts.csv")healthdata[:3]Oknow,wecanseewhatkindofdatawehaveavailable.Wehavethedefinitionofthevariousmedicalproceduresandconditions"DRGDefinition".Wehavetheproviderid,providername,theiraddress,theirstate.Also,theAverageCoveredCharges.Intheoriginalfile,theysaythattheAverageCoveredChargesisthetotalamountthattheprovidercharges.So,wewillusethisasthecostforthevariousprocedures.Wecanalsofindouthowmanyrowsofdatawehavewiththecodebelow.Andwefindoutthatwehave163065rowsofdataand12columns.In[9]: #findingnumberofrowsandcolumnshealthdata.shapeButwedon'tneedallthosedifferentkindsofdataforourwork.Weareonlyinterestedin"DRGDefinition","ProviderState"and"AverageCoveredCharges".So,let'smanipulatethedatainordertoshowonlywhatweneed.Also,forthoseinclinedtowardscomputerprogramming,youmaynotethatthevaluesin"AverageCoveredCharges"arestrings.Well,ifyouhavenoticed,don'tworryaboutit.Iwillalsotransformthosevaluesintonumbers(floats)sowecandocalculationswiththem.WhenIamdone,youwillnolongerseethe$infrontoftheirnumbersOut[5]:DRGDefinitionProviderIdProviderNameProviderStreetAddressProviderCityProviderStateProviderZipCodeHospitalReferralRegionDescription0039EXTRACRANIALPROCEDURESW/OCC/MCC10001SOUTHEASTALABAMAMEDICALCENTER1108ROSSCLARKCIRCLEDOTHAN AL 36301 ALDothan1039EXTRACRANIALPROCEDURESW/OCC/MCC10005MARSHALLMEDICALCENTERSOUTH2505USHIGHWAY431NORTHBOAZ AL 35957ALBirmingham2039EXTRACRANIALPROCEDURESW/OCC/MCC10006ELIZACOFFEEMEMORIALHOSPITAL205MARENGOSTREETFLORENCE AL 35631ALBirminghamOut[9]: (163065,12)8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 2/13In[11]: #combiningonly"DRGDefinition","ProviderState"and"AverageCoveredCharges"datahealthdata2=healthdata[["DRGDefinition","ProviderState","AverageCoveredCharges"]]#firstmakingsurethatdatain"AverageCoveredCharges"arestringsbyconvertingthemintostrings,thenconvertingtofloatshealthdata2["AverageCoveredCharges"]=healthdata2["AverageCoveredCharges"].str[1:].astype(float)#showingfirst10rowsofthenewdatahealthdata2[:10]Ignorethewarning.Everythingisalright.Now,beforewegofurther,Iaminterestinginfindingoutwhatuniquevalues/nameswehaveforthemedicalprocedures.So,let'screatealistthatshowsonlyuniquevalues.SeebelowIn[13]: #convertingthevaluesfrom"DRGDefinition"intoalist.Butthiswillgiveusseveralinstancesofthesamevaluesnewlist=healthdata2["DRGDefinition"].tolist()#retrievingtheuniquevaluesbyconvertingthepreviouslistintoasetdataset=set(newlist)#thenconvertingthesetbackintoalistagain,butwithuniquevaluesthistime,foreaseofoperation.Andsortingthelist#thenshowingthelistwithitsuniquevaluesdatalist=[aforaindataset]datalist=sorted(datalist)datalistC:\Users\Ricardy\Anaconda\lib\sitepackages\IPython\kernel\__main__.py:5:SettingWithCopyWarning:AvalueistryingtobesetonacopyofaslicefromaDataFrame.Tryusing.loc[row_indexer,col_indexer]=valueinsteadSeethethecaveatsinthedocumentation:http://pandas.pydata.org/pandasdocs/stable/indexing.html#indexingviewversuscopyOut[11]:DRGDefinitionProviderStateAverageCoveredCharges0 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 32963.071 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 15131.852 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 37560.373 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 13998.284 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 31633.275 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 16920.796 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 11977.137 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 35841.098 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 28523.399 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 75233.38Out[13]: ['039EXTRACRANIALPROCEDURESW/OCC/MCC','057DEGENERATIVENERVOUSSYSTEMDISORDERSW/OMCC','064INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONWMCC','065INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONWCC','066INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONW/OCC/MCC','069TRANSIENTISCHEMIA','074CRANIAL&PERIPHERALNERVEDISORDERSW/OMCC','101SEIZURESW/OMCC','149DYSEQUILIBRIUM',8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 3/13'176PULMONARYEMBOLISMW/OMCC','177RESPIRATORYINFECTIONS&INFLAMMATIONSWMCC','178RESPIRATORYINFECTIONS&INFLAMMATIONSWCC','189PULMONARYEDEMA&RESPIRATORYFAILURE','190CHRONICOBSTRUCTIVEPULMONARYDISEASEWMCC','191CHRONICOBSTRUCTIVEPULMONARYDISEASEWCC','192CHRONICOBSTRUCTIVEPULMONARYDISEASEW/OCC/MCC','193SIMPLEPNEUMONIA&PLEURISYWMCC','194SIMPLEPNEUMONIA&PLEURISYWCC','195SIMPLEPNEUMONIA&PLEURISYW/OCC/MCC','202BRONCHITIS&ASTHMAWCC/MCC','203BRONCHITIS&ASTHMAW/OCC/MCC','207RESPIRATORYSYSTEMDIAGNOSISWVENTILATORSUPPORT96+HOURS','208RESPIRATORYSYSTEMDIAGNOSISWVENTILATORSUPPORT