1 Smart Software with F# Joel Pobar Language Geek .
-
Upload
destinee-sherling -
Category
Documents
-
view
214 -
download
1
Transcript of 1 Smart Software with F# Joel Pobar Language Geek .
![Page 1: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/1.jpg)
1
Smart Software with F#
Joel PobarLanguage Geekhttp://callvirt.net/blog
![Page 2: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/2.jpg)
2
Agenda
What is it?F# IntroAlgorithms:
SearchFuzzy MatchingClassification (SVM)Recommendations
Q&A
![Page 3: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/3.jpg)
3
All This in 45 mins?
This is an awareness session!Lots of content, very broad, very fastYou’ll get all demos, pointers, and slide deck to take offline and digest
Two takeaways:F# is a great language for dataSmart algorithms aren’t hard – use them, explore more!
![Page 4: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/4.jpg)
4
F# is
...a functional, object-oriented, imperative and explorative programming language for .NET
what is Functional Programming?
http://callvirt.net/jaoo.zip
![Page 5: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/5.jpg)
5
What is Functional Programming?
Wikipedia: “A programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data”
-> Emphasizes functions-> Emphasizes shapes of data, rather than impl.-> Modeled on lambda calculus-> Reduced emphasis on imperative-> Safely raises level of abstraction
![Page 6: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/6.jpg)
6
Motivation for Functional
Simplicity in life is good: cheaper, easier, faster, better.
We typically achieve simplicity in software in two ways:
By raising the level of abstraction (and OO was one design to raise abstraction)Increasing modularity
Increasing signal to noise another good strategy:
Communicate more in less time with more clarityBetter composition and modularity == reuse
![Page 7: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/7.jpg)
7
Functional ProgrammingSafer, while still being useful
Unsafe Safe
Useful
Not Useful
C#, C++, … V.Next#
Haskell
F#
![Page 8: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/8.jpg)
8
What is F# for?
F# is a General Purpose languageCan be used for a broad range of programming tasksSuperset of imperative and dynamic features
Great for learning FP conceptsSome particularly important domains
Financial modeling and analysisData miningScientific data analysisDomain-specific modelingAcademic
![Page 9: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/9.jpg)
9
Let
‘Let’ binds values to identifiers
let helloWorld = “Hello, World”print_any helloWorld
let myNum = 12 let myAddFunction x y = let sum = x + y
sum
Type inference. The static typing of C# with
the succinctness of a scripting language
![Page 10: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/10.jpg)
10
Tuples
Simple, and most useful data structure
let site1 = (“msdn.com”, 10)let site2 = (“abc.net.au”, 12)let site3 = (“news.com.au”, 22)let allSites = (site1, site2, site3)
let fst (a, b) = alet snd (a, b) = b
![Page 11: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/11.jpg)
11
Lists, Arrays, Seq and Options
Lists & Arrays are first-class citizensOptions provide a some-or-nothing capability
let list1 = [“Joel"; "Luke"]let array = [|2; 3; 5;|]let myseq = seq [0; 1; 2; ]
let option1 = Some(“Joel")let option2 = None
![Page 12: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/12.jpg)
12
Records
Simple concrete type definition
type Person ={ Name: string; DateOfBirth: System.DateTime; }
let n = { Name = “Joel”; DateOfBirth = “13/04/81”; }
![Page 13: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/13.jpg)
13
Immutability (by default)
Values may not be changed
Data is immutable by default
![Page 14: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/14.jpg)
14
Discriminated Unions
Great for representing the structure of data
type Make = stringtype Model = stringtype Transport = | Car of Make * Model | Bicycle
let me = Car (“Holden”, “Barina”)let you = Bicycle
Both of these identifiers are of type “Transport”
![Page 15: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/15.jpg)
15
Functions
Functions: like delegates + unified and simpleDeep type inference
(fun x -> x + 1)
let myFunc x = x + 1val myFunc : int -> int
let rec factorial n =if n>1 then n * factorial (n-1)else 1
let data = [5; 3; 4; 4; 5]List.sort (fun x y -> x – y) data
![Page 16: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/16.jpg)
16
Pattern Matching
let (fst, _) = (“first”, “second”) Console.WriteLine(fst)
let switchOnType(a:obj) match a with | :? Int32 -> printfn “int!” | :? Transport -> printfn “Transport“ | _ -> printfn “Everything Else!”
Very important part of F#Helps deal with the ‘teasing apart’ of dataWorks best with Discriminated Unions & Records
![Page 17: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/17.jpg)
17
Lists, Types, Interactive
demo
![Page 18: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/18.jpg)
18
Search
Given a search term and a large document corpus, rank and return a list of the most relevant results…
![Page 19: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/19.jpg)
19
Blog Crawler
![Page 20: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/20.jpg)
20
Search
WordsStemming? Tokenize?
E.g ‘Python/Ruby’
MarkupTitle, Author, DateHeadings (h1,h2 etc)Paragraphs
LinksA sign of strength?
Let’s explore something simple…
![Page 21: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/21.jpg)
21
Search
Simplify:For easy machine/language manipulation… and most importantly, easy computation
Vectors: natures own quality data structureConvenient machine representation (lists/arrays)Lots of existing vector math algorithms
After a loving incubation period, moonlight 2.0 has been released. <a
href=“whatever”>source code</a><br><a
href”something else”>FireFox
binaries</a> … after 2
afte
r
1
incu
batio
n
1lo
ving
6m
oonl
ight
4
firef
ox
6
linux
2
bina
ries
![Page 22: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/22.jpg)
22
Term Count
Document1: Linux post:
Document2: Animal post:
Vector space:
9
the
1
incu
batio
n
1
craz
y
6
moo
nlig
ht
4
firef
ox
6
linux
2
peng
uin
2
the
1
dog
5
peng
uin
9
the
1
incu
batio
n
1
craz
y
6m
oonl
ight
4
firef
ox
6
linux
0
dog
2
peng
uin
2 0 2 0 0 0 1 5
2
craz
y
![Page 23: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/23.jpg)
23
Term Count Issues
‘the dog penguin’Linux: 9+0+2 = 11Animal: 2+1+5 = 8
‘the’ is overweightEnter TF-IDF: Term Frequency Inverse Document Frequency
A weight to evaluate how important a word is to a corpus
i.e. if ‘the’ occurs in 98% of all documents, we shouldn’t weight it very highly in the total query
9
the
1
incu
batio
n
1
craz
y
6
moo
nlig
ht
4
firef
ox
6
linux
0
dog
2
peng
uin
2 0 2 0 0 0 1 5
![Page 24: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/24.jpg)
24
TF-IDF
Normalise the term count:tf = termCount / docWordCount
Measure importance of termidf = log ( |D| / termDocumentCount)
where |D| is the total documents in the corpus
tfidf = tf * idfA high weight is reached by high term frequency, and a low document frequency
![Page 25: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/25.jpg)
25
Search Engine in under 10 mins
demo
![Page 26: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/26.jpg)
26
Fuzzy Matching
String similarity algorithms:SoundEx; MetaphoneJaro Winkler Distance; Cosine similarity; Sellers; Euclidean distance; …We’ll look at Levenshtein Distance algorithm
Defined as: The minimum edit operations which transforms string1 into string2
![Page 27: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/27.jpg)
27
Fuzzy Matching
Edit costs: In-place copy – cost 0Delete a character in string1 – cost 1Insert a character in string2 – cost 1Substitute a character for another – cost 1
Transform ‘kitten’ in to ‘sitting’kitten -> sitten (cost 1 – replace k with s)sitten -> sittin (cost 1 - replace e with i)sittin -> sitting (cost 1 – add g)
Levenshtein distance: 3
![Page 28: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/28.jpg)
28
Fuzzy Matching
Estimated string similarity computation costs:Hard on the GC (lots of temporary strings created and thrown away, use arrays if possible. Levenshtein can be computed in O (kl) time, where ‘l’ is the length of the shortest string, and ‘k’ is the maximum distance.Parallelisable – split the set of words to compare across n cores.Can do approximately 10,000 compares per second on a standard single core laptop.
![Page 29: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/29.jpg)
29
Did You Mean?
demo
![Page 30: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/30.jpg)
30
Classification
Support Vector Machines (SVM)Supervised learning for binary classificationTraining Inputs: ‘in’ and ‘out’ vectors.SVM will then find a separating ‘hyperplane’ in an n-dimensional space
Training costs, but classification is cheapCan retrain on the fly in some cases
![Page 31: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/31.jpg)
31
SVM Classification
![Page 32: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/32.jpg)
32
SVM Issues
Classification on 2 dimensions is easy, but most input is multi-dimensionalSome ‘tricks’ are needed to transform the input data
![Page 33: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/33.jpg)
33
SVM Classifier
demo
![Page 34: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/34.jpg)
34
F# and AlgorithmsNetflix Demo
Netflix Prize - $1 million USDMust beat Netflix prediction algorithm by 10% 480k users100 million ratings18,000 movies
Great example of deriving value out of large datasetsEarns Netflix loads and loads of $$$!
![Page 35: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/35.jpg)
35
MovieId CustomerId RatingClerks 444444 5Clerks 2093393 4Clerks 999 5Clerks 8668478 1Dogma 2432114 3Dogma 444444 5Dogma 999 5... ... ...
Nearest NeighbourFind neighbours who like what I like
![Page 36: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/36.jpg)
36
MovieId CustomerId RatingClerks 444444 5Clerks 2093393 4Clerks 999 5Clerks 8668478 1Dogma 2432114 3Dogma 444444 5Dogma 999 5... ... ...
Netflix Data FormatNetflix Demo
![Page 37: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/37.jpg)
37
CustomerId 302 4418 3 56 732
444444 5 4 5 2999 5 5 1
111211 3 5 366666 5 51212121 5 4
5656565 1
454545 5 5
Nearest Neighbour AlgorithmFind all my neighbours movies
Find the best movies my neighbours agree on
![Page 38: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/38.jpg)
38
Netflix Recommendations
demo
![Page 39: 1 Smart Software with F# Joel Pobar Language Geek .](https://reader031.fdocuments.us/reader031/viewer/2022032516/56649c765503460f9492a0b9/html5/thumbnails/39.jpg)
39
A Short Stop-over at Vector Math
A (x1,y1)
B (x2,y2)
C (x0,y0)
If we want to calculate the distance between A and B, we call on Euclidean Distance
We can represent the points in the same way using Vectors: Magnitude and Direction.
Having this Vector representation, allows us to work in ‘n’ dimensions, yet still achieveEuclidean Distance/Angle calculations.