Project Hints

download Project Hints

of 10

Transcript of Project Hints

  • 8/2/2019 Project Hints

    1/10

    CS154-ProjectPartICheatSheet

    ThisdocumentcontainsafewhelpfulhintsonhowtoapproachPartIoftheproject.ThefirstsectionwillincludeaGuidefortheProject,outliningthevariousstepsintheconversionfromaregularexpressiontoafiniteautomata,includingpseudocodeandhintsandtips.

    Note:Youdonotneedtofollowtheoutlineprovidedhere.Itisanexampleonly.Itisnotanymore"correct"thananyothermethod,norisitnecessarilythemostefficientoreasiestwaytodoit,butitmayhelpyouingettinganideaofwhatapproachesyoumighttake.

    SECTION1:OutlineofConstruction

    STEP1:ConvertRegularExpressionIntoaParseTree-thisfunctionissuppliedforyouinparser.h.

    STEP2:ConvertParseTreeIntoanepsilon-NFA

    ProposedMethod:In-OrderTreeTraversalbyaRecursiveFunction:

    Thisfunctionassumesyouhavesomewaytorepresentagraph,usingnodestructuresandedgestructures.Italsoassumesyou'vecreatedanoderepresentingastartstate,s,andanoderepresentingafinalstatef,

    whichyoupasstothefunctionalongwiththeregularexpressionparsetreetobegintherecursion.

    Thedesignisuptoyou,butkeepinmindthekindofinformationyou'llneed:

    voidRecursiveConvert(RegularExpressionT*t,NodeT*s,NodeT*f){switch(t->type){TerminalCases:(BaseCase)Createedgewithappropriatelabel.linkedgefromstof

    Union:

    Createanewstatenode,q1,q2.Createanewstatenode,p1,p2.Createedgewithepsilonlabel,e1,e2.Createedgewithepsilonlabel,f1,f2.linke1fromstoq1.linke2fromstoq2.linkf1fromp1tof.linkf2fromp2tof.RecursiveConvert(t->term,q1,p1);RecursiveConvert(t->next,q2,p2)

    Concat:Createanewstatenode,q.

    Createanewstatenode,p.Createedgewithepsilonlabel,e.linkefromqtop.RecursiveConvert(t->term,s,q);RecursiveConvert(t->next,p,f);

    KleeneClosure:Createanewstatenode,q.Createanewstatenode,p.createedgewithepsilonlabel,e.

  • 8/2/2019 Project Hints

    2/10

    createedgewithepsilonlabel,f.linkefromstoq.linkefromptof.createedgewithepsilonlabel,d.createedgewithepsilonlabal,g.linkdfromstof.linkgfromptoq.RecursiveConvert(t->term,q,p);

    ...etc.:(covertherestofthecases)}}

    Suggestions:Keeptrackoftheedgesintonodesastheedgesout-itmakesepsilon-removaleasier(youcantellwhenanodehasnomoreedgesintoit).Rememberthateachstatemustbeuniquelynamed-onesolution:aglobalincrementvariable.Youmayalsofinditusefultosaveallstatesinanarray.

    STEP3:Removeepsilon-Transitions

    ProposedMethod:DepthFirstGraphTraversalbyRecursiveFunction

    Thisfunctionassumesthatyouhaveastartstate,s,andthatyou've

    passeditintobegintherecursion.

    RecursiveEpsilonRemoval(NodeT*s){Ifthisstateismarkedreturn,otherwisemarkitand:Foreachedgeoutofs,ifthelabelisepsilon,letq=destinationifqhasnotyetbeenvisitedinthisfunction,andq!=s,thenaddedgesinqnotinstoedgesofs(addtotheendofthelistofedges,sothattheseedgeswillbecheckedinthisfunctionaswell.Ifqisaccepting,letsbeacceptingaswell.Removetheedge.Foreachedgeoutofs,letq=destinationRecursiveEpsilonRemoval(q);}

    Suggestions:Remembertoremoveunreachablestatesafteryou'vefinished,otherwiseyou'llwastememory.Saveeachnodeinsomekindoforderedarray--thiswillsimplifythingswhenyouconverttoatransitiontableanyway.

    Tipsonhowtofreestuff(forthisfunction,andingeneral):Theeasiestwayisnottoworryaboutfreeingstatesonthefly,butrathertosaveatableofstatesandthenfreethemallwhenyou'redone(youcanfreeallnon-reachablestatesafterSTEP3,andthenfreeallthestatesafterSTEP4,alongwithallremainingedges).InSTEP3,you'llwanttofreeepsilon-edgesasyouremovethem.

    STEP4:ConvertNFAtotransitiontable

    ProposedMethod:DepthFirstGraphTraversalbyRecursiveFunction

    Thisfunctionassumesthatyou'vecreatedattherootlevelsomestructureforrepresentinganNFAtransitiontable,andthatthesetsofnextstateshavebeeninitializedtobeingconsideredemptysets.I'mgoingtoaccessthisstructureasonewouldaccessatwo-dimensionalarray,thoughyoucanhaveanysuitabledatastructureforthis.

  • 8/2/2019 Project Hints

    3/10

    ConvertToNFATable(NFATableT*t,NodeT*s){ifshasnotbeenvisited{Marksashavingbeenvisited.Foreache=edgeoutofs,letq=destinationForeachv=valueofthelabelofeAddq.nametot->table[s][v]Foreachedgeoutofs,letq=destination,ConvertToNFATable(t,q);}}

    STEP5:ConvertNFAtoDFATransitionTable

    ProposedMethod:LazyEvaluationforSubsetConstructiononNFAStateTable.

    Thisfunctionassumesthatyou'vecreatedattherootlevelsomestructureforrepresentingaDFAtransitiontable.InthiscaseI'llbeusingadynamic2-dimensionalarrayofvaluesrepresentingthestates.ThissolutionalsoimpliessomewayoflabellingeachstateintheDFAwiththeNFAnodesthatitrepresents,thoughthisrepresentationcanbediscardedafterconstruction.Again,thisisjustpseudocode,soyoucanapproachthisproblemhoweveryou'dlike.

    ConvertToDFATable(DFATableT*d,NFATableT*n){addrowto2-DarrayrepresentingNFAstate{0}foreachq=elementinthearrayofDFAstates{foreachn=NFAstatevalueinrepresentationofq,{Foreacha=inputvalue{FetchS=nextstatesetfromNFAtable[q][a]IfSrepresentsanewDFAstate(checkarrayofDFAstaterepresentations),createanewrowintheDFAtable,andaddthestaterepresentationtolistofDFAstaterepresentations.Also,ifScontainsafinalstate,remembertomaketheDFAstaterepresentingSafinalstate.AddtransitiontostaterepresentedbyStoDFAtable[q][a]}

    }}}

    Atsomepointinthisconstruction,ifthesizeoftheDFAtableexceedssomemaximumvalue,youcanquit,thusmaximizingthesizeoftheDFAstatetableyou'llproduce.Remember,thegeneralsizeoftheDFAtable(unlessyoucleverlyreduceit)isgoingtoben*input*sizeof(elements).Foratableof256inputsand4byteintegersrepresentingstates,a1024statetablerequiresamegabyteofstorage.

    SECTION2:ReducingMemoryRequirementsforDFATable

    Youwanttotrytoallowmorestateswithlessmemory,withoutlosinganyexpressivityinyourDFA.Thereareseveraldifferentapproachesyoumightwanttotake:

    ElementSize:Youcanimmediatelyreducememoryuseby50%bystoringstatevaluesasshortsinsteadofints.Thiscapsthetablesizeat2^16,sincethatwillbethemaximumamountofinformationatwobyteshortcanrepresent.Itmightnotbeaproblem,becausethatstillallowssome64000states.Youcouldsave75%bystoringstatesaschars,butthenyoueffectivelymaxoutat2^8states,whichisasmallenoughtablethatyou

  • 8/2/2019 Project Hints

    4/10

    wouldn'thavetoworryaboutmemoryanyway,sothebenefitsareeliminated.

    ReductionsinInputSize:Youcantrytodeterminethe"importantcharacters"fromyourregularexpression.Youcansaveaseparatearrayof256chars,indexedbycharacter.Setthe"importantcharacters"indicestouniquevalues[0..255],representingtheindicesofthatinputintheDFAtable.

    Why?WecanthusreducethewidthoftheDFAtabletothesizeofthe"importantcharacter"setplusone,representinganynon-distinguishedcharacters(oranycharacter).e.g.Saywehave9distinguishedcharacters(ourregularexpressionis"a.b.*c..def.hi.*(j)*.*",forinstance).OurDFAtablethusrequiresonlyn*10*sizeof(element)bytesasopposedton*256*sizeof(element)bytes,becauseallothercharacterscanberepresentedbythesametransitions.

    Howtodetermineorexpandontheideaof"importantcharacters":Characterswhichappearonlyparalleltoeachotherinsinglecharacterunionexpressionscanbegiventhesameindexvalue,sincetheywillalwaystransitiontothesamestate.Ingeneral,*any*twoinputsymbolswhosetransitionsmatchforallstatescanbemergedandrepresentedbythesameindexintheDFAtable(theywouldbegiventhesamevalueintheinputindexarraythatImentionedabove).e.g.in"a|b|c",a,b,andc

    canberepresentedbythesameinputindex.

    Alternatively,wecansavecharactersets*as*charactersetsratherthanexpandingthem,andwhenaninputdoesn'tmatchanydistinguishedcharacter,youcancheckcharactersetstoseeifvaluesarerepresentedtherein.Thisonlymakessenseifyoumergeequivalentcharactersets,andmightbedifficulttoimplement.

    Acombinationoftheseapproachesmaybethebestsolution,oryoumayfindaconstructionthatprovidesabettersolution.

    SECTION3:OtherUsefulSuggestionsandConcepts:

    *GraphrepresentationsforFA'saredirectedgraphs.Anedgegoesfromonenodetoanother,butdoesnotimplyanysortofedgegoingtheotherdirection.Thusedge(q,t)isverydifferentthanedge(t,q).Ifyouusesomekindofedgelistinyourconstruction,keepthisinmind.

    *Onceyou'veconstructedaDFA,trytominimizeit.Thisdoesn'tspeeduptheDFA,butitreducesthesize,allowingyoutostoremoreDFA'sinmemory,ifyoudecidemultipleDFAsisimportanttoyoursearchstrategy.

    *Youmighttrytobreakuplargeexpressionsintosmallerones,anduseadifferentsearchstrategyforacombinationofthesemachines.Forinstance,youcouldbreakr|s,whererandsrepresentregularexpressions,intotwoexpressions,convertbothtoDFAs,andminimize,and

    thenruntheminparalleloveryourinput.ThisshouldresultinamoreefficientimplementationthanrunninganNFAonthelargerexpression,andshouldresultinalessmemoryintensiveimplementationthanthecombinedDFA,especiallyifyoucanthentakeadvantageofareductionininputsymbols.

    *ForverylargeNFAsorDFAsinwhichmanystateshavetransitionsonrelativelyfewsymbols,butyoucan'treducetheinputalphabet,youmighttryusinga"sparsematrix".Memorysavingscouldbequitesubstantial.Therearemanywaystoimplementthese.Youcouldimplementitusinga

  • 8/2/2019 Project Hints

    5/10

    charactersettostatewhichsymbolsaredefinedforthatstate,andthenstoringinamuchsmallerarray,indexedinsomewaybasedonthedefinedinputsforthatstate,thestatetransitionsonlyforthosesymbolsonwhichtransitionsaredefined.

    *If,ontheotherhand,youhavelotsoftransitionsonlotsofinputsforeachstateofyourNFA,youmightconsiderrepresentingthetransitions,thesetofcurrentstates,andthesetoffinalstatesbybitfieldsandusingbitwiseoperatorsonthemtodeterminesetinclusion,etc.(thisisalsoafastwaytodoanNFAsearch).Thenumberofbitsrequiredisoneperstate,sothissavesmemoryonlyifyournextstaterepresentationarraysarelargerthanthesefieldsonaverage.Ifyouhave1024states,eachbitfieldwouldhave1024bits,or128bytes.Thisisprettybig,consideringthatyou'llhaveatableof1024*256*sizeof(nextstatestructure)bytes.Ifyoustorenextstatesasapointer(4bytes)toanarrayofshorts,theaveragenumberofnextstatesnecessarywouldbe6statesormoretoseeanybeneficialmemorysavings(6edgesoutofeachstate,onaverage).Thesearchtimeshouldbebetter,however,andit'sunlikely(thoughentirelypossible)you'llhaveNFA'sof1024states.(NotethatthetransitiontableforanNFAofthissizewouldbeabout32MB.)

    *Becreative!Thereare1000'sofwaystomakeareallykick-assDFAand/orNFA,andmanymoreimplementationsthatmakeiteasiertodo

    certainkindsofsearches.ThinkaboutPart2and3oftheprojectwhendesigningyourtables.Thinkspeed.Thinkcompactness.Thinkflexibility.Goodluck!

    SECTION4:HowtoOutputYourNFAfortheEndofPart1:

    Notethatyoumayhavetoadjustthistofityourprogramstructure.Iamdemonstratingbyaccessingvaluesasiftheywereapartofsomestructure,thoughyoumayhavetoaccessthemdifferently.

    printf("NFA\n");printf("%s\n",regexpstring);printf("StartState=%d\n",nfa->start);

    printf("FinalStates={");for(i=0;ifinalStateNum-1;i++){printf("%d,",nfa->finalStates[i]);}printf("%d}\n",nfa->finalStates[i]);for(i=0;inumStates;i++){printf("Stateq%d:\n",nfa->states[i].name);for(j=0;jnextstate[i][j][k]);printf("%d}\n",nfa->nextstate[i][j][k]);}}printf("EndNFA\n");

    SECTION5:HowtoOutputYourDFAfortheEndofPart1:

    printf("DFA\n");printf("%s\n",regexpstring);

  • 8/2/2019 Project Hints

    6/10

    printf("StartState=%d\n",dfa->start);printf("FinalStates={");for(i=0;ifinalStateNum-1;i++){printf("%d,",dfa->finalStates[i]);}printf("%d}\n",dfa->finalStates[i]);for(i=0;inumStates;i++){printf("Stateq%d:\n",dfa->states[i].name);for(j=0;jnextstate[i][j]);}}printf("EndDFA\n");

    NoteonSECTION4andSECTION5:Wewouldliketoseethisformatoutput.Ifyouchoosetorestrictyourinputs,pleasedosohereaswell.Ifyouchoosetoworkwithcharsets,comeupwithsomewayoflabellingthemandoutputtingthecharsetitself(thisfunctionisprovidedinparser.h)afteryou'vecompletedlistingoutthewholdDFAorNFAoutput,pairedwiththelabelyougaveitwhileprintingoutthemachine.(andifyou

    can,trytomatchtheorderingaswell).

    Westronglysuggestthatyoudothis,sothatitmakesiteasierforustogradeandtestyourproject.

    SECTION6:Goodluck!

    Goodluck!

    ADDENDUM1:ProgrammingHints

    Alotofpeoplehavebeensayingthattheydon'tknowhowtogetstarted,orwhatkindsofdatastructurestouse.I'mgoingtoprovideamore

    concreteexampleofhowyoumightactuallybuildsomeofthesethingsinCbelow.Thefollowingcodeis,again,anexample,andmoreover,ithasneverbeentestedforeaseofuseorcompiled,even,sotheremaybesyntaxerrors.Iamprovidingitassortofanoutlineofhow*I*wouldgoaboutimplementingthemethodsthatIoutlinedabove.Irecommendthatyouthinkabouttheseproblems*first*,beforelookingattheseproposedprogrammingapproaches,becauseit'sagoodintellectualexerciseandimportantforyoutodevelopyourownproblemsolvingapproaches.

    -Nate

    PART1:ExampleStructures:

    Herearesomeexamplestructuresyoumighttry,alongthelinesofthemethodsproposedabove.

    typedefenum{SYM,ANY,E,CSET}EdgeLabelT;//SYM=symbol//ANY=anycharacter//E=epsilon//CSET=characterset

    structnode_t;

  • 8/2/2019 Project Hints

    7/10

    structedge_t;

    typedefstructnode_t{intname;//uniqueidentifierforeachstate

    //Thefollowingaredynamicarraysstructedge_t*edges_out;structedge_t**edges_in;//savesyouhassleinremovingunreachable

    //states.Saveaspointers,soyou'llbeabletoidentifythe//edgestructureitself.

    //You'llwanttosavetheallocatedsizeaswelltoallowdynamic //reallocation.intnumout,numoutalloc;intnumin,numinalloc;}NodeT;

    typedefstructedge_t{structnode_t*from;structnode_t*to;EdgeLabelTtype;charvalue;//forwhentype=SYM;CharSetT*charset;//forwhentype=CSET;}EdgeT;

    NodeT*stategraph;intnumstates;//Youcancomeupwithawaytonothavetodynamicallyreallocatethe//stategraph.Whatdoweknowaboutthenumberofstates,giventhis//construction,basedontheregularexpressioninputstring?

    //TableConstructions://Forbothofthesetables,Iamassumingthatyourstatesaregoingto//benumbered0throughn-1,wherenisthenumberofstates.You'll//wanttofindawaytorenameyourNFAstatessothisistrue,orto//comeupwithanalternativewayofindexingthestategraph.This

    //constructionalsoassumesthatyouassumeasetnumberofinputs(in//thiscase256).

    //NFA

    //You'llneedadynamically-reallocatablearrayofnextstates:typedefstruct{int*nextStates;intnextStateNum;intnextStateAlloc;}TransitionT;

    typedefstruct{

    intstart;

    //You'llneedadynamically-reallocatablearrayoffinalstates:int*final;intnumFinalStates;intnumFinalStatesAlloc;

    TransitionT*(nextstates[256]);//comingin,you'llknowthenumberof //states.intnumstates;

  • 8/2/2019 Project Hints

    8/10

    }NFATableT;

    //DFA:

    typedefstruct{intstart;

    int*final;intnumFinalStates;intnumFinalStatesAlloc;

    int*(nextStates[256]);intnumstates;intnumstatesalloc;//Youneedtodynamicallyreallocate.Youwon't

    //knowhowmanyDFAstatesyouneed.}DFATableT;

    PART2:Howtodynamically-reallocateanarray:

    First,startbyallocatingsomeminimumnumberofexpectedstates.Ifyoumakethisnumberlargeenough,youmayneverneedtodynamicallyreallocateanything:

    intarrNum=0;intarrNumAlloc=64;ArrayTypeTarr=(ArrayTypeT*)malloc(sizeof(ArrayTypeT)*arrNumAlloc);//inthiscase,weallocate64elementstobeginwith.

    if(!arr)Error("Nomemory!");//rememberthatmallocdoesn'tcheckforyouifyoudon'thaveenough

    //memory.Youcanhandlethishoweveryou'dlike.Thebest//responseisprobablytobailfromtheprogramandtryand//figureoutwhythisfailed,andhowyoucanpredictfailureand//preventitinthefuture,forinstancebycappingthenumberof//statesthatyourDFAorNFAcanhave(theSUNmachineshavea//lotofmemory,butrememberthatthesegraphscanbe*really*

    //big).

    Asyouaddelementstothearray,incrementthevalueofnumArr:

    arr[arrNum]=element;arrNum++;

    WheneveryouincrementarrNum,checktoseeifthenewvalueofarrNumisequaltothenumberofallocatedelements(itshouldneverbegreaterthanthisnumber--youshouldprobablyproduceanerrorifitis,becausethenyouknowyou'vewrittenyourcodeincorrectly).Ifitisequal,youneedtoreallocatethearraytoalargersize.Asafebetisgenerallytodoublethesizeofthearray.

    assert(arrNum

  • 8/2/2019 Project Hints

    9/10

    againhavetochecktoseethatarrisnotequaltoNULL,asthisiswhatrealloc,likemalloc,returnswhenthereisnotenoughmemorytofulfillyourrequest.Youcanfreeablockthatyouhaverealloc'edinthesamewaythatyoufreeablockthatyou'vemalloc'ed,namely,withthefreefunction.

    Notethatwhenyouarereducingthesizeofthearray(say,removingepsilontransitions),itdoesnotpaytoreallocthearraysmaller,becauseusuallyreallocjustignorestherequestandreturnsthesamepointer.Itshouldn'tmatter,becausetheonlythingthatisimportantintheendisthememoryyouallocatetoyourNFAandDFAtable.FortheNFAtable,youwillknowthenumberofstatescomingin(itwillbethesameasthenumberofstatesinyourgraph).FortheDFAtable,youshouldpickasthemaximumnumberofstatesyourDFAcanhaveavaluethatisapowerof2(ifyouapproachreallocationinthisway).Alternatively,whenyouknowthetruesizeofthearray,youcansavememorybycopyingitintoanarrayofexactlythesizethatyouneededinsteadofleavingitinanarraythatmaybewastingupto50%ofitsallocatedmemory.

    TheremaybeamethodinUNIXtofindoutexactlyhowmuchRAMisavailableforallocation(I'mnotsure,andifthereis,itwouldbeplatformdependent,Ibelieve).Idon't*think*there'sanANSIfunctiontodothis,butIcouldbewrong.Ifthat'sthecase,themoreambitiousofyoucan*dynamically*decidethemaximumsizeofyourDFAbasedonthe

    amountofmemoryavailable!

    Goodluck!

    ADDENDUM2:AnotherNoteonMemory:Thiscameupinsectiontonight.OntheUnixmachines,youhaveavastamountofmemoryatyourdisposal.TheRAMisverylarge.YoucanrepresentverylargeDFAsinit.TheproblemisthatifyouhaveaverylargeDFA,itislikelythatitwillgetpagedoutofmainmemorybythevirtualmemorysystem,andputontothedisk.Thiswillprobablystarthappeninglongbeforemalloccomplainsthatyoudon'thaveenoughmemoryandreturnsNULL,whichiswhycheckingformallocbeingNULLasatestto

    yourDFAbeingtoolargeisabadtest.WhyisitbadifmainmemorypagesoutpartofyourDFAstructure?BecauseDFAscanhaveverypoorlocalityofreference.Whatdoesthatmean?You'renotguarenteedthatwhenyoureadfromonelocation(saytheinuptonacertainstate),thatthenextstateyouhavetoreadisgoingtobeanywhereneartheoriginalstateinmemory.Thisisespeciallytrueifyoudon'thaveyourDFAinacontiguousblock.Soit'slikelythatwhenyou'regoingfromstatetostate,you'llmovetoadifferentpageinmemorythatmayhavebeenpagedout.Thisisverybadbecauseittakesabout1000timesaslong(that'saguess,butitseemstobethegeneralnumberthattheywerethrowingoutinEE182),soit'sgoingtoreducethebenefitsofhavingconvertedtoaDFAinthefirstplace(andafterhavinggonethroughsomuchtrouble,wewouldn'twantthat!).

    Ifanyonewantstotryasolutionthatworksaroundthis,onepossibleapproachwouldbetoattempttoputyourDFAincontiguousmemory(onceyou'vebuilttheDFA,allocateagiantblockofmemoryofthesizeofyourDFA),andcopytherowsovertothatblock,attemptingtoputrowsthatreferenceeachotherclosertogether,toincreaselocalityofreference.Thiswouldresultinmuchextracredit.Moreover,findinganoptimalsolutiontolocalityofreferenceinaDFAstategraph,Iamwillingtobet,isNPcomplete,soyou'dwanttofindaheuristicforit.I'mnotsureonthatthough.Extracreditalsoforanyonewhocanprovethatthis

  • 8/2/2019 Project Hints

    10/10

    isorisnotNPcomplete.:)Ifyoutrythisapproach,trytheotherapproachaswell,andcomparetherunningtimesofyoursolution,ifyoucan.

    Goodluck!