S1 DML Syntax and Invocation
-
Upload
arvind-surve -
Category
Education
-
view
30 -
download
0
Transcript of S1 DML Syntax and Invocation
GoalofTheseSlides
• ProvideyouwithbasicDMLsyntax• Linktoimportantresources• Invocation
Non-Goals• ComprehensivesyntaxandAPIcoverage
Resources
• Google“ApacheSystemml”• Documentation- https://apache.github.io/incubator-systemml/• DMLLanguageReference- https://apache.github.io/incubator-systemml/dml-language-reference.html• MLContext - https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#spark-shell-scala-example• Github - https://github.com/apache/incubator-systemml
Note• Somedocumentation isoutdated• Ifyoufindatypoorwanttoupdatethedocument, considermakingaPullRequest• AlldocsareinMarkdownformat• https://github.com/apache/incubator-systemml/tree/master/docs
AboutDMLBriefly
• DML=DeclarativeMachineLearning• R-likesyntax,somesubtledifferencesfromR• Dynamicallytyped• DataStructures
• Scalars– Boolean,Integers,Strings,DoublePrecision• Cacheable–Matrices,DataFrames
• DataStructureTerminology inDML• ValueType- Boolean,Integers,Strings,DoublePrecision• DataType– Scalar,Matrices,DataFrames*• YoucanhaveaDataType[ValueType],notallcombinationsaresupported
• Forinstance– matrix[double]
• Scoping• Oneglobalscope,exceptinside functions
*Coming soon
AboutDMLBriefly
• ControlFlow• Sequential imperativecontrolflow(likemostotherlanguages)• Looping–
• while (<condition>){…}• for (var in <for_predicate>){…}• parfor (var in <for_predicate>){…} //Iterationsinparallel
• Guards–• if (<condition>){...}[ else if (<condition>){...}...else {…}]
• Functions• Built-in– Listavailable inlanguagereference• UserDefined– (multiplereturnparameters)
• functionName =function (<formal_parameters>…)return (<formal_parameters>){...}• Canonlyaccessvariablesdefinedintheformal_parameters inthebodyofthefunction
• ExternalFunction– sameasuserdefined,cancallexternalJavaPackage
AboutDMLBriefly
• Imports• Canimportuserdefined/externalfunctions fromothersourcefiles• Disambiguationusingnamespaces
• CommandLineArguments• Byposition- $1,$2 …• Byname- $X,$Y ...
• Limitations• Auserdefinedfunctionscanonlybecalledontherighthandsideofassignmentsastheonlyexpression• Cannotwrite• X<- Y+bar()• for (i in foo(1,2,3)){…}
SampleCodeA = 1.0 # A is an integerX <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignmentY = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1sb <- t(X) %*% Y # %*% is matrix multiply, t(X) is transposeS = "hello world"
i=0while(i < max_iteration) {
H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element multW = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))i = i + 1; # i is an integer
}
print (toString(H)) # toString converts a matrix to a string
SampleCodesource("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace[W, b] = affine::init(D, M) # calls the init function, multiple return
parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallelfor (j in 1:ncol(X)) { # j iterates over 1 through num cols in X
# Computation ...}
}
write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to HDFSX = read (fileX) # fileX=file, also reads from HDFS
if (ncol (A) > 1) {# Matrix A is being sliced by a given range of columnsA[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)];
}
SampleCodeinterpSpline = function(double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) {i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1)
# misc computation …q = as.scalar(qm)
}
eigen = externalFunction(Matrix[Double] A) return(Matrix[Double] eval, Matrix[Double] evec)implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper", exectype="mem")
SampleCode(FromLinearRegDS.dml*)
A = t(X) %*% Xb = t(X) %*% y
if (intercept_status == 2) {A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ]) A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ]b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ]
}
A = A + diag (lambda)
print ("Calling the Direct Solver...")
beta_unscaled = solve (A, b)
*https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133
MLContext API
• YoucaninvokeSystemML fromthe• Commandlineora• SparkProgram
• TheMLContext APIletsyouinvokeitfromaSparkProgram• Commandlineinvocationdescribedlater• AvailableasaScalaAPIandaPythonAPI• TheseslideswillonlytalkabouttheScalaAPI
MLContext API– ExampleUsage
val ml = new MLContext(sc)
val X_train = sc.textFile("amazon0601.txt").filter(!_.startsWith("#")).map(_.split("\t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)}).toDF("prod_i", "prod_j", "x_ij").filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number.cache()
MLContext API– ExampleUsageval pnmf ="""# data & argsX = read($X)rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)write(W, $Wout)write(H, $Hout)"""
MLContext API– ExampleUsageval pnmf ="""# data & argsX = read($X)rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)write(W, $Wout)write(H, $Hout)"""
ml.registerInput("X", X_train)ml.registerOutput("W")ml.registerOutput("H")ml.registerOutput("negloglik")
val outputs = ml.executeScript(pnmf, Map("maxiter" -> "100", "rank" -> "10"))
val negloglik = getScalarDouble(outputs, "negloglik")
Invocation– HowtorunaDMLfile
• SystemML canrunon• Yourlaptop(Standalone)• Spark• HybridSpark– usingthebetterchoicebetweenthedriverandthecluster• Hadoop• HybridHadoop
• Forthispresentation,wecareaboutstandalone,spark &hybrid_spark• Documentationhasdetailedinstructionsontheothers
Invocation– HowtorunaDMLfile
StandaloneInthesystemml directorybin/systemml <dml-filename>[arguments]
Exampleinvocations:bin/systemml LinearRegCG.dml –nvargs X=X.mtx Y=Y.mtx B=B.mtxbin/systemml oddsRatio.dml –args X.mtx 50B.mtx
Namedarguments
Positionarguments
Invocation– HowtorunaDMLfile
Spark/ HybridSparkDefineSPARK_HOMEtopointtoyourApacheSparkInstallationDefineSYSTEMML_HOMEtopointtoyourApacheSystemML installation
Inthesystemml directoryscripts/sparkDML.sh<dml-filename>[systemmlarguments]
Exampleinvocations:scripts/sparkDML.sh LinearRegCG.dml --nvargs X=X.mtx Y=Y.mtxB=B.mtxscripts/sparkDML.sh oddsRatio.dml --args X.mtx 50B.mtx
Namedarguments
Positionarguments
Invocation– HowtorunaDMLfileSpark/ HybridSparkDefineSPARK_HOMEtopointtoyourApacheSparkInstallationDefineSYSTEMML_HOMEtopointtoyourApacheSystemML installationUsingthespark-submit script
$SPARK_HOME/bin/spark-submit--master<master-url>--classorg.apache.sysml.api.DMLScript${SYSTEMML_HOME}/SystemML.jar -f<dml-filename> <systemml arguments>-exec{hybrid_spark,spark}
Exampleinvocation:$SPARK_HOME/bin/spark-submit--masterlocal[*]--classorg.apache.sysml.api.DMLScript${SYSTEMML_HOME}/SystemML.jar -fLinearRegCG.dml --nvargs X=X.mtx Y=Y.mtx B=B.mtx
EditorSupport
• Veryrudimentaryeditorsupport• Bitofshamelessself-promotion:• Atom– HackableTexteditor
• Installpackage- https://atom.io/packages/language-dml• FromGUI- http://flight-manual.atom.io/using-atom/sections/atom-packages/• Orfromcommandline– apm installlanguage-dml• Rudimentarysnippetbasedcompletionofbuiltin function
• Vim• Installpackage- https://github.com/nakul02/vim-dml• WorkswithVundle (vimpackagemanager)
• ThereisanexperimentalZeppelinNotebookintegrationwithDML–• https://issues.apache.org/jira/browse/SYSTEMML-542• Availableasadocker imagetoplaywith- https://hub.docker.com/r/nakul02/incubator-zeppelin/
• Pleasesendfeedbackwhenusingthese,requestsforfeatures,bugs• I’llworkonthemwhenIcan
OtherInformation
• Allscriptsarein- https://github.com/apache/incubator-systemml/tree/master/scripts• AlgorithmScripts- https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms• TestScripts- https://github.com/apache/incubator-systemml/tree/master/src/test/scripts• Lookinsidethetestfolderforprogramsthatrunthetests,playaroundwithsomeofthem- https://github.com/apache/incubator-systemml/tree/master/src/test/java/org/apache/sysml/test
Thanks!
• Thedocumentationmightbeoutdatedandhavetypos• Pleasesubmitfixes
• Ifalanguagefeaturedoesnotmakesenseorismissing,askaSystemML teammember• HaveFun!