Project Hints
-
Upload
md-mahsin-ul-islam -
Category
Documents
-
view
215 -
download
0
Transcript of Project Hints
-
8/2/2019 Project Hints
1/10
CS154-ProjectPartICheatSheet
ThisdocumentcontainsafewhelpfulhintsonhowtoapproachPartIoftheproject.ThefirstsectionwillincludeaGuidefortheProject,outliningthevariousstepsintheconversionfromaregularexpressiontoafiniteautomata,includingpseudocodeandhintsandtips.
Note:Youdonotneedtofollowtheoutlineprovidedhere.Itisanexampleonly.Itisnotanymore"correct"thananyothermethod,norisitnecessarilythemostefficientoreasiestwaytodoit,butitmayhelpyouingettinganideaofwhatapproachesyoumighttake.
SECTION1:OutlineofConstruction
STEP1:ConvertRegularExpressionIntoaParseTree-thisfunctionissuppliedforyouinparser.h.
STEP2:ConvertParseTreeIntoanepsilon-NFA
ProposedMethod:In-OrderTreeTraversalbyaRecursiveFunction:
Thisfunctionassumesyouhavesomewaytorepresentagraph,usingnodestructuresandedgestructures.Italsoassumesyou'vecreatedanoderepresentingastartstate,s,andanoderepresentingafinalstatef,
whichyoupasstothefunctionalongwiththeregularexpressionparsetreetobegintherecursion.
Thedesignisuptoyou,butkeepinmindthekindofinformationyou'llneed:
voidRecursiveConvert(RegularExpressionT*t,NodeT*s,NodeT*f){switch(t->type){TerminalCases:(BaseCase)Createedgewithappropriatelabel.linkedgefromstof
Union:
Createanewstatenode,q1,q2.Createanewstatenode,p1,p2.Createedgewithepsilonlabel,e1,e2.Createedgewithepsilonlabel,f1,f2.linke1fromstoq1.linke2fromstoq2.linkf1fromp1tof.linkf2fromp2tof.RecursiveConvert(t->term,q1,p1);RecursiveConvert(t->next,q2,p2)
Concat:Createanewstatenode,q.
Createanewstatenode,p.Createedgewithepsilonlabel,e.linkefromqtop.RecursiveConvert(t->term,s,q);RecursiveConvert(t->next,p,f);
KleeneClosure:Createanewstatenode,q.Createanewstatenode,p.createedgewithepsilonlabel,e.
-
8/2/2019 Project Hints
2/10
createedgewithepsilonlabel,f.linkefromstoq.linkefromptof.createedgewithepsilonlabel,d.createedgewithepsilonlabal,g.linkdfromstof.linkgfromptoq.RecursiveConvert(t->term,q,p);
...etc.:(covertherestofthecases)}}
Suggestions:Keeptrackoftheedgesintonodesastheedgesout-itmakesepsilon-removaleasier(youcantellwhenanodehasnomoreedgesintoit).Rememberthateachstatemustbeuniquelynamed-onesolution:aglobalincrementvariable.Youmayalsofinditusefultosaveallstatesinanarray.
STEP3:Removeepsilon-Transitions
ProposedMethod:DepthFirstGraphTraversalbyRecursiveFunction
Thisfunctionassumesthatyouhaveastartstate,s,andthatyou've
passeditintobegintherecursion.
RecursiveEpsilonRemoval(NodeT*s){Ifthisstateismarkedreturn,otherwisemarkitand:Foreachedgeoutofs,ifthelabelisepsilon,letq=destinationifqhasnotyetbeenvisitedinthisfunction,andq!=s,thenaddedgesinqnotinstoedgesofs(addtotheendofthelistofedges,sothattheseedgeswillbecheckedinthisfunctionaswell.Ifqisaccepting,letsbeacceptingaswell.Removetheedge.Foreachedgeoutofs,letq=destinationRecursiveEpsilonRemoval(q);}
Suggestions:Remembertoremoveunreachablestatesafteryou'vefinished,otherwiseyou'llwastememory.Saveeachnodeinsomekindoforderedarray--thiswillsimplifythingswhenyouconverttoatransitiontableanyway.
Tipsonhowtofreestuff(forthisfunction,andingeneral):Theeasiestwayisnottoworryaboutfreeingstatesonthefly,butrathertosaveatableofstatesandthenfreethemallwhenyou'redone(youcanfreeallnon-reachablestatesafterSTEP3,andthenfreeallthestatesafterSTEP4,alongwithallremainingedges).InSTEP3,you'llwanttofreeepsilon-edgesasyouremovethem.
STEP4:ConvertNFAtotransitiontable
ProposedMethod:DepthFirstGraphTraversalbyRecursiveFunction
Thisfunctionassumesthatyou'vecreatedattherootlevelsomestructureforrepresentinganNFAtransitiontable,andthatthesetsofnextstateshavebeeninitializedtobeingconsideredemptysets.I'mgoingtoaccessthisstructureasonewouldaccessatwo-dimensionalarray,thoughyoucanhaveanysuitabledatastructureforthis.
-
8/2/2019 Project Hints
3/10
ConvertToNFATable(NFATableT*t,NodeT*s){ifshasnotbeenvisited{Marksashavingbeenvisited.Foreache=edgeoutofs,letq=destinationForeachv=valueofthelabelofeAddq.nametot->table[s][v]Foreachedgeoutofs,letq=destination,ConvertToNFATable(t,q);}}
STEP5:ConvertNFAtoDFATransitionTable
ProposedMethod:LazyEvaluationforSubsetConstructiononNFAStateTable.
Thisfunctionassumesthatyou'vecreatedattherootlevelsomestructureforrepresentingaDFAtransitiontable.InthiscaseI'llbeusingadynamic2-dimensionalarrayofvaluesrepresentingthestates.ThissolutionalsoimpliessomewayoflabellingeachstateintheDFAwiththeNFAnodesthatitrepresents,thoughthisrepresentationcanbediscardedafterconstruction.Again,thisisjustpseudocode,soyoucanapproachthisproblemhoweveryou'dlike.
ConvertToDFATable(DFATableT*d,NFATableT*n){addrowto2-DarrayrepresentingNFAstate{0}foreachq=elementinthearrayofDFAstates{foreachn=NFAstatevalueinrepresentationofq,{Foreacha=inputvalue{FetchS=nextstatesetfromNFAtable[q][a]IfSrepresentsanewDFAstate(checkarrayofDFAstaterepresentations),createanewrowintheDFAtable,andaddthestaterepresentationtolistofDFAstaterepresentations.Also,ifScontainsafinalstate,remembertomaketheDFAstaterepresentingSafinalstate.AddtransitiontostaterepresentedbyStoDFAtable[q][a]}
}}}
Atsomepointinthisconstruction,ifthesizeoftheDFAtableexceedssomemaximumvalue,youcanquit,thusmaximizingthesizeoftheDFAstatetableyou'llproduce.Remember,thegeneralsizeoftheDFAtable(unlessyoucleverlyreduceit)isgoingtoben*input*sizeof(elements).Foratableof256inputsand4byteintegersrepresentingstates,a1024statetablerequiresamegabyteofstorage.
SECTION2:ReducingMemoryRequirementsforDFATable
Youwanttotrytoallowmorestateswithlessmemory,withoutlosinganyexpressivityinyourDFA.Thereareseveraldifferentapproachesyoumightwanttotake:
ElementSize:Youcanimmediatelyreducememoryuseby50%bystoringstatevaluesasshortsinsteadofints.Thiscapsthetablesizeat2^16,sincethatwillbethemaximumamountofinformationatwobyteshortcanrepresent.Itmightnotbeaproblem,becausethatstillallowssome64000states.Youcouldsave75%bystoringstatesaschars,butthenyoueffectivelymaxoutat2^8states,whichisasmallenoughtablethatyou
-
8/2/2019 Project Hints
4/10
wouldn'thavetoworryaboutmemoryanyway,sothebenefitsareeliminated.
ReductionsinInputSize:Youcantrytodeterminethe"importantcharacters"fromyourregularexpression.Youcansaveaseparatearrayof256chars,indexedbycharacter.Setthe"importantcharacters"indicestouniquevalues[0..255],representingtheindicesofthatinputintheDFAtable.
Why?WecanthusreducethewidthoftheDFAtabletothesizeofthe"importantcharacter"setplusone,representinganynon-distinguishedcharacters(oranycharacter).e.g.Saywehave9distinguishedcharacters(ourregularexpressionis"a.b.*c..def.hi.*(j)*.*",forinstance).OurDFAtablethusrequiresonlyn*10*sizeof(element)bytesasopposedton*256*sizeof(element)bytes,becauseallothercharacterscanberepresentedbythesametransitions.
Howtodetermineorexpandontheideaof"importantcharacters":Characterswhichappearonlyparalleltoeachotherinsinglecharacterunionexpressionscanbegiventhesameindexvalue,sincetheywillalwaystransitiontothesamestate.Ingeneral,*any*twoinputsymbolswhosetransitionsmatchforallstatescanbemergedandrepresentedbythesameindexintheDFAtable(theywouldbegiventhesamevalueintheinputindexarraythatImentionedabove).e.g.in"a|b|c",a,b,andc
canberepresentedbythesameinputindex.
Alternatively,wecansavecharactersets*as*charactersetsratherthanexpandingthem,andwhenaninputdoesn'tmatchanydistinguishedcharacter,youcancheckcharactersetstoseeifvaluesarerepresentedtherein.Thisonlymakessenseifyoumergeequivalentcharactersets,andmightbedifficulttoimplement.
Acombinationoftheseapproachesmaybethebestsolution,oryoumayfindaconstructionthatprovidesabettersolution.
SECTION3:OtherUsefulSuggestionsandConcepts:
*GraphrepresentationsforFA'saredirectedgraphs.Anedgegoesfromonenodetoanother,butdoesnotimplyanysortofedgegoingtheotherdirection.Thusedge(q,t)isverydifferentthanedge(t,q).Ifyouusesomekindofedgelistinyourconstruction,keepthisinmind.
*Onceyou'veconstructedaDFA,trytominimizeit.Thisdoesn'tspeeduptheDFA,butitreducesthesize,allowingyoutostoremoreDFA'sinmemory,ifyoudecidemultipleDFAsisimportanttoyoursearchstrategy.
*Youmighttrytobreakuplargeexpressionsintosmallerones,anduseadifferentsearchstrategyforacombinationofthesemachines.Forinstance,youcouldbreakr|s,whererandsrepresentregularexpressions,intotwoexpressions,convertbothtoDFAs,andminimize,and
thenruntheminparalleloveryourinput.ThisshouldresultinamoreefficientimplementationthanrunninganNFAonthelargerexpression,andshouldresultinalessmemoryintensiveimplementationthanthecombinedDFA,especiallyifyoucanthentakeadvantageofareductionininputsymbols.
*ForverylargeNFAsorDFAsinwhichmanystateshavetransitionsonrelativelyfewsymbols,butyoucan'treducetheinputalphabet,youmighttryusinga"sparsematrix".Memorysavingscouldbequitesubstantial.Therearemanywaystoimplementthese.Youcouldimplementitusinga
-
8/2/2019 Project Hints
5/10
charactersettostatewhichsymbolsaredefinedforthatstate,andthenstoringinamuchsmallerarray,indexedinsomewaybasedonthedefinedinputsforthatstate,thestatetransitionsonlyforthosesymbolsonwhichtransitionsaredefined.
*If,ontheotherhand,youhavelotsoftransitionsonlotsofinputsforeachstateofyourNFA,youmightconsiderrepresentingthetransitions,thesetofcurrentstates,andthesetoffinalstatesbybitfieldsandusingbitwiseoperatorsonthemtodeterminesetinclusion,etc.(thisisalsoafastwaytodoanNFAsearch).Thenumberofbitsrequiredisoneperstate,sothissavesmemoryonlyifyournextstaterepresentationarraysarelargerthanthesefieldsonaverage.Ifyouhave1024states,eachbitfieldwouldhave1024bits,or128bytes.Thisisprettybig,consideringthatyou'llhaveatableof1024*256*sizeof(nextstatestructure)bytes.Ifyoustorenextstatesasapointer(4bytes)toanarrayofshorts,theaveragenumberofnextstatesnecessarywouldbe6statesormoretoseeanybeneficialmemorysavings(6edgesoutofeachstate,onaverage).Thesearchtimeshouldbebetter,however,andit'sunlikely(thoughentirelypossible)you'llhaveNFA'sof1024states.(NotethatthetransitiontableforanNFAofthissizewouldbeabout32MB.)
*Becreative!Thereare1000'sofwaystomakeareallykick-assDFAand/orNFA,andmanymoreimplementationsthatmakeiteasiertodo
certainkindsofsearches.ThinkaboutPart2and3oftheprojectwhendesigningyourtables.Thinkspeed.Thinkcompactness.Thinkflexibility.Goodluck!
SECTION4:HowtoOutputYourNFAfortheEndofPart1:
Notethatyoumayhavetoadjustthistofityourprogramstructure.Iamdemonstratingbyaccessingvaluesasiftheywereapartofsomestructure,thoughyoumayhavetoaccessthemdifferently.
printf("NFA\n");printf("%s\n",regexpstring);printf("StartState=%d\n",nfa->start);
printf("FinalStates={");for(i=0;ifinalStateNum-1;i++){printf("%d,",nfa->finalStates[i]);}printf("%d}\n",nfa->finalStates[i]);for(i=0;inumStates;i++){printf("Stateq%d:\n",nfa->states[i].name);for(j=0;jnextstate[i][j][k]);printf("%d}\n",nfa->nextstate[i][j][k]);}}printf("EndNFA\n");
SECTION5:HowtoOutputYourDFAfortheEndofPart1:
printf("DFA\n");printf("%s\n",regexpstring);
-
8/2/2019 Project Hints
6/10
printf("StartState=%d\n",dfa->start);printf("FinalStates={");for(i=0;ifinalStateNum-1;i++){printf("%d,",dfa->finalStates[i]);}printf("%d}\n",dfa->finalStates[i]);for(i=0;inumStates;i++){printf("Stateq%d:\n",dfa->states[i].name);for(j=0;jnextstate[i][j]);}}printf("EndDFA\n");
NoteonSECTION4andSECTION5:Wewouldliketoseethisformatoutput.Ifyouchoosetorestrictyourinputs,pleasedosohereaswell.Ifyouchoosetoworkwithcharsets,comeupwithsomewayoflabellingthemandoutputtingthecharsetitself(thisfunctionisprovidedinparser.h)afteryou'vecompletedlistingoutthewholdDFAorNFAoutput,pairedwiththelabelyougaveitwhileprintingoutthemachine.(andifyou
can,trytomatchtheorderingaswell).
Westronglysuggestthatyoudothis,sothatitmakesiteasierforustogradeandtestyourproject.
SECTION6:Goodluck!
Goodluck!
ADDENDUM1:ProgrammingHints
Alotofpeoplehavebeensayingthattheydon'tknowhowtogetstarted,orwhatkindsofdatastructurestouse.I'mgoingtoprovideamore
concreteexampleofhowyoumightactuallybuildsomeofthesethingsinCbelow.Thefollowingcodeis,again,anexample,andmoreover,ithasneverbeentestedforeaseofuseorcompiled,even,sotheremaybesyntaxerrors.Iamprovidingitassortofanoutlineofhow*I*wouldgoaboutimplementingthemethodsthatIoutlinedabove.Irecommendthatyouthinkabouttheseproblems*first*,beforelookingattheseproposedprogrammingapproaches,becauseit'sagoodintellectualexerciseandimportantforyoutodevelopyourownproblemsolvingapproaches.
-Nate
PART1:ExampleStructures:
Herearesomeexamplestructuresyoumighttry,alongthelinesofthemethodsproposedabove.
typedefenum{SYM,ANY,E,CSET}EdgeLabelT;//SYM=symbol//ANY=anycharacter//E=epsilon//CSET=characterset
structnode_t;
-
8/2/2019 Project Hints
7/10
structedge_t;
typedefstructnode_t{intname;//uniqueidentifierforeachstate
//Thefollowingaredynamicarraysstructedge_t*edges_out;structedge_t**edges_in;//savesyouhassleinremovingunreachable
//states.Saveaspointers,soyou'llbeabletoidentifythe//edgestructureitself.
//You'llwanttosavetheallocatedsizeaswelltoallowdynamic //reallocation.intnumout,numoutalloc;intnumin,numinalloc;}NodeT;
typedefstructedge_t{structnode_t*from;structnode_t*to;EdgeLabelTtype;charvalue;//forwhentype=SYM;CharSetT*charset;//forwhentype=CSET;}EdgeT;
NodeT*stategraph;intnumstates;//Youcancomeupwithawaytonothavetodynamicallyreallocatethe//stategraph.Whatdoweknowaboutthenumberofstates,giventhis//construction,basedontheregularexpressioninputstring?
//TableConstructions://Forbothofthesetables,Iamassumingthatyourstatesaregoingto//benumbered0throughn-1,wherenisthenumberofstates.You'll//wanttofindawaytorenameyourNFAstatessothisistrue,orto//comeupwithanalternativewayofindexingthestategraph.This
//constructionalsoassumesthatyouassumeasetnumberofinputs(in//thiscase256).
//NFA
//You'llneedadynamically-reallocatablearrayofnextstates:typedefstruct{int*nextStates;intnextStateNum;intnextStateAlloc;}TransitionT;
typedefstruct{
intstart;
//You'llneedadynamically-reallocatablearrayoffinalstates:int*final;intnumFinalStates;intnumFinalStatesAlloc;
TransitionT*(nextstates[256]);//comingin,you'llknowthenumberof //states.intnumstates;
-
8/2/2019 Project Hints
8/10
}NFATableT;
//DFA:
typedefstruct{intstart;
int*final;intnumFinalStates;intnumFinalStatesAlloc;
int*(nextStates[256]);intnumstates;intnumstatesalloc;//Youneedtodynamicallyreallocate.Youwon't
//knowhowmanyDFAstatesyouneed.}DFATableT;
PART2:Howtodynamically-reallocateanarray:
First,startbyallocatingsomeminimumnumberofexpectedstates.Ifyoumakethisnumberlargeenough,youmayneverneedtodynamicallyreallocateanything:
intarrNum=0;intarrNumAlloc=64;ArrayTypeTarr=(ArrayTypeT*)malloc(sizeof(ArrayTypeT)*arrNumAlloc);//inthiscase,weallocate64elementstobeginwith.
if(!arr)Error("Nomemory!");//rememberthatmallocdoesn'tcheckforyouifyoudon'thaveenough
//memory.Youcanhandlethishoweveryou'dlike.Thebest//responseisprobablytobailfromtheprogramandtryand//figureoutwhythisfailed,andhowyoucanpredictfailureand//preventitinthefuture,forinstancebycappingthenumberof//statesthatyourDFAorNFAcanhave(theSUNmachineshavea//lotofmemory,butrememberthatthesegraphscanbe*really*
//big).
Asyouaddelementstothearray,incrementthevalueofnumArr:
arr[arrNum]=element;arrNum++;
WheneveryouincrementarrNum,checktoseeifthenewvalueofarrNumisequaltothenumberofallocatedelements(itshouldneverbegreaterthanthisnumber--youshouldprobablyproduceanerrorifitis,becausethenyouknowyou'vewrittenyourcodeincorrectly).Ifitisequal,youneedtoreallocatethearraytoalargersize.Asafebetisgenerallytodoublethesizeofthearray.
assert(arrNum
-
8/2/2019 Project Hints
9/10
againhavetochecktoseethatarrisnotequaltoNULL,asthisiswhatrealloc,likemalloc,returnswhenthereisnotenoughmemorytofulfillyourrequest.Youcanfreeablockthatyouhaverealloc'edinthesamewaythatyoufreeablockthatyou'vemalloc'ed,namely,withthefreefunction.
Notethatwhenyouarereducingthesizeofthearray(say,removingepsilontransitions),itdoesnotpaytoreallocthearraysmaller,becauseusuallyreallocjustignorestherequestandreturnsthesamepointer.Itshouldn'tmatter,becausetheonlythingthatisimportantintheendisthememoryyouallocatetoyourNFAandDFAtable.FortheNFAtable,youwillknowthenumberofstatescomingin(itwillbethesameasthenumberofstatesinyourgraph).FortheDFAtable,youshouldpickasthemaximumnumberofstatesyourDFAcanhaveavaluethatisapowerof2(ifyouapproachreallocationinthisway).Alternatively,whenyouknowthetruesizeofthearray,youcansavememorybycopyingitintoanarrayofexactlythesizethatyouneededinsteadofleavingitinanarraythatmaybewastingupto50%ofitsallocatedmemory.
TheremaybeamethodinUNIXtofindoutexactlyhowmuchRAMisavailableforallocation(I'mnotsure,andifthereis,itwouldbeplatformdependent,Ibelieve).Idon't*think*there'sanANSIfunctiontodothis,butIcouldbewrong.Ifthat'sthecase,themoreambitiousofyoucan*dynamically*decidethemaximumsizeofyourDFAbasedonthe
amountofmemoryavailable!
Goodluck!
ADDENDUM2:AnotherNoteonMemory:Thiscameupinsectiontonight.OntheUnixmachines,youhaveavastamountofmemoryatyourdisposal.TheRAMisverylarge.YoucanrepresentverylargeDFAsinit.TheproblemisthatifyouhaveaverylargeDFA,itislikelythatitwillgetpagedoutofmainmemorybythevirtualmemorysystem,andputontothedisk.Thiswillprobablystarthappeninglongbeforemalloccomplainsthatyoudon'thaveenoughmemoryandreturnsNULL,whichiswhycheckingformallocbeingNULLasatestto
yourDFAbeingtoolargeisabadtest.WhyisitbadifmainmemorypagesoutpartofyourDFAstructure?BecauseDFAscanhaveverypoorlocalityofreference.Whatdoesthatmean?You'renotguarenteedthatwhenyoureadfromonelocation(saytheinuptonacertainstate),thatthenextstateyouhavetoreadisgoingtobeanywhereneartheoriginalstateinmemory.Thisisespeciallytrueifyoudon'thaveyourDFAinacontiguousblock.Soit'slikelythatwhenyou'regoingfromstatetostate,you'llmovetoadifferentpageinmemorythatmayhavebeenpagedout.Thisisverybadbecauseittakesabout1000timesaslong(that'saguess,butitseemstobethegeneralnumberthattheywerethrowingoutinEE182),soit'sgoingtoreducethebenefitsofhavingconvertedtoaDFAinthefirstplace(andafterhavinggonethroughsomuchtrouble,wewouldn'twantthat!).
Ifanyonewantstotryasolutionthatworksaroundthis,onepossibleapproachwouldbetoattempttoputyourDFAincontiguousmemory(onceyou'vebuilttheDFA,allocateagiantblockofmemoryofthesizeofyourDFA),andcopytherowsovertothatblock,attemptingtoputrowsthatreferenceeachotherclosertogether,toincreaselocalityofreference.Thiswouldresultinmuchextracredit.Moreover,findinganoptimalsolutiontolocalityofreferenceinaDFAstategraph,Iamwillingtobet,isNPcomplete,soyou'dwanttofindaheuristicforit.I'mnotsureonthatthough.Extracreditalsoforanyonewhocanprovethatthis
-
8/2/2019 Project Hints
10/10
isorisnotNPcomplete.:)Ifyoutrythisapproach,trytheotherapproachaswell,andcomparetherunningtimesofyoursolution,ifyoucan.
Goodluck!