Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data...
-
Upload
blanche-mccarthy -
Category
Documents
-
view
223 -
download
0
Transcript of Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data...
![Page 1: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/1.jpg)
Python Programming in Context
Chapter 7
![Page 2: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/2.jpg)
Objectives
• To use Python lists as a means of storing data• To implement a nontrivial data mining
application• To understand and implement cluster analysis• To use visualization as a means of displaying
patterns
![Page 3: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/3.jpg)
Cluster
• Data points that have something in common• Clusters are dissimilar to each other• Use simple Euclidean distance to measure
how close one point is to another• Centroid is a point that represents a cluster
(not necessarily a real data point)
![Page 4: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/4.jpg)
Figure 7.1
![Page 5: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/5.jpg)
Figure 7.2
![Page 6: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/6.jpg)
Figure 7.3
![Page 7: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/7.jpg)
Figure 7.4
![Page 8: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/8.jpg)
Listing 7.1
def euclidD(point1, point2): sum = 0 for index in range(len(point1)): diff = (point1[index]-point2[index]) ** 2 sum = sum + diff euclidDistance = math.sqrt(sum) return euclidDistance
![Page 9: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/9.jpg)
Figure 7.5
![Page 10: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/10.jpg)
Listing 7.2def readFile(filename): datafile = open(filename, "r") datadict = {}
key = 0 for aline in datafile: key = key + 1 score = int(aline)
datadict[key] = [score] return datadict
![Page 11: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/11.jpg)
Indefinite Iteration
• Repeating a process an unknown number of times
• Control is based on a boolean expression• Infinite loop is possible• Any for loop can be written as a while loop
![Page 12: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/12.jpg)
Listing 7.3
while <condition>: statement1 statement2 ... statementn
![Page 13: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/13.jpg)
Figure 7.6
![Page 14: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/14.jpg)
Listing 7.4
sum = 0for anum in range(1,11): sum = sum + anumprint(sum)
![Page 15: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/15.jpg)
Listing7.5
sum = 0anum = 1 #initializationwhile anum <= 10: #condition sum = sum + anum anum = anum + 1 #change of stateprint(sum)
![Page 16: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/16.jpg)
Listing 7.6
sum = 0anum = 1while anum <= 10: sum = sum + anumprint(sum)
![Page 17: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/17.jpg)
Listing 7.7def readFile(filename): datafile = open(filename, "r")
datadict = {}
key = 0 aline = datafile.readline() while aline != "": key = key + 1 score = int(aline) datadict[key] = [score]
aline = datafile.readline() return datadict
![Page 18: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/18.jpg)
Creating Clusters
• Decide on number of clusters• Choose data points to be initial centroids• Assign data points to be members of a
centroid• Recompute centroids• Repeat
![Page 19: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/19.jpg)
Listing 7.8def createCentroids(k, datadict): centroids=[] centroidCount = 0 centroidKeys = []
while centroidCount < k: rkey = random.randint(1,len(datadict)) if rkey not in centroidKeys: centroids.append(datadict[rkey]) centroidKeys.append(rkey) centroidCount = centroidCount + 1 return centroids
![Page 20: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/20.jpg)
Listing 7.9def createClusters(k, centroids, datadict, repeats): for apass in range(repeats): print("****PASS",apass,"****") clusters = [] for i in range(k): clusters.append([])
for akey in datadict: distances = [] for clusterIndex in range(k): dist = euclidD(datadict[akey],centroids[clusterIndex]) distances.append(dist)
mindist = min(distances) index = distances.index(mindist)
clusters[index].append(akey)
dimensions = len(datadict[1])
![Page 21: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/21.jpg)
Listing 7.9 continued for clusterIndex in range(k): sums = [0]*dimensions for akey in clusters[clusterIndex]: datapoints = datadict[akey] for ind in range(len(datapoints)): sums[ind] = sums[ind] + datapoints[ind] for ind in range(len(sums)): clusterLen = len(clusters[clusterIndex]) if clusterLen != 0: sums[ind] = sums[ind]/clusterLen centroids[clusterIndex] = sums for c in clusters: print ("CLUSTER") for key in c: print(datadict[key], end=" ") print() return clusters
![Page 22: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/22.jpg)
Figure 7.7
![Page 23: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/23.jpg)
Listing 7.10
def clusterAnalysis(dataFile): examDict = readFile(dataFile) examCentroids = createCentroids(5, examDict) examClusters = createClusters(5,
examCentroids, examDict, 3) clusterAnalysis("cs150exams.txt")
![Page 24: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/24.jpg)
Visualizing Clusters
• Earthquake data• Show clusters on a map• Use turtle module to plot data
![Page 25: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/25.jpg)
Figure 7.8
![Page 26: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/26.jpg)
Listing 7.11def visualizeQuakes(dataFile): datadict = readFile(dataFile) quakeCentroids = createCentroids(6, datadict) clusters = createClusters(6, quakeCentroids, datadict, 7) quakeT = turtle.Turtle() quakeWin = turtle.Screen() quakeWin.bgpic("worldmap.gif") quakeWin.screensize(448,266) quakeWin.setup(width=500, height=300) wFactor = (quakeWin.screensize()[0]/2)/180 hFactor = (quakeWin.screensize()[1]/2)/90
quakeT.hideturtle() quakeT.up()
colorlist = ["red","green","blue","orange","cyan","yellow"]
for clusterIndex in range(6): quakeT.color(colorlist[clusterIndex]) for akey in clusters[clusterIndex]: lon = datadict[akey][0] lat = datadict[akey][1] quakeT.goto(lon*wFactor,lat*hFactor) quakeT.dot() quakeWin.exitonclick()
![Page 27: Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0021a28abf838cc2f70/html5/thumbnails/27.jpg)
Figure 7.9