Google refine tutotial
-
Upload
vijaya-prabhu -
Category
Technology
-
view
511 -
download
1
Transcript of Google refine tutotial
![Page 1: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/1.jpg)
Google RefineTutorial
April, 08 2012
Sathishwaran.R - 10BM60079Vijaya Prabhu - 10BM60097
Vinod Gupta School of Management, IIT Kharagpur
This Tutorial was created using Google Refine Version 2.5 on a Windows 7 platform
![Page 2: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/2.jpg)
2
Data Cleansing
• Data cleansing is identifying the wrong or inaccurate records in the data set and making appropriate corrections to the records.
• It involves identifying incomplete, inaccurate, and incorrect parts of data and then either replacing them with correct data or deleting the incorrect data
• Data cleansing results in data which is consistent with the other standard data and is useful for performing various analysis
• The error in the data could be due to data entry error by the user, failure during transmission of data or improper data definitions.
![Page 3: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/3.jpg)
3
Need for Data Cleansing
• Incorrect or inaccurate data may lead to false conclusions and can cause investments to be misdirected in finance.
• Also government needs accurate data on population and census for directing the funds to the deserving areas.
• Many organizations tap into customer information. If the data is not accurate, for eg. If the address is not accurate then the business runs the risk of send wrong information, thus losing customers.
![Page 4: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/4.jpg)
4
Challenges Data Cleansing
• Loss of Information: In many cases the record may be incomplete, hence the whole record may require to be deleted which leads to loss of information. It could become costly if huge number of data is deleted.
• Maintenance of Data: Once the data is cleansed then any change in the data specification needs to affect only the new values. Hence data management solutions should be designed in such a way that the process of data entry and retrieval are altered to provide correct data.
• Data cleansing is an iterative process which needs significant work in exploration and corrction of entries.
![Page 5: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/5.jpg)
5
About Google Refine
• Google Refine is a powerful tool that can be effectively used for data cleansing.
• It helps in working with raw data, cleaning it up, transforming from one format to other, encompassing it with web services and linking it to databases.
• It is very easy to use and has a web interface.• It is freely available and works well with any browser.• Google Refine is a desktop application and it runs a
small web server on your system and we need to point our browser to the server to use refine.
![Page 6: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/6.jpg)
6
Getting Started - Installation
1. Download the zip file (appropriate Windows, Mac, Linux versions) from the link http://code.google.com/p/google-refine/wiki/Downloads?tm=2
2. Uncompress the files from the zip file.3. Run the “google-refine.exe” file.4. A command window opens and Google refine
runs taking the user to the home page in the default browser.
![Page 7: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/7.jpg)
7
Google Refine Homepage
![Page 8: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/8.jpg)
8
Importing Data
• Google Refine supports TSV, CSV, Excel (.xls and .xlsx), JSON, XML, and Google data document formats.
• Once imported the data is in Google Refine’s own data format.
• We have used TSV data on Disasters worldwide from 1900-2008 available from http://www.infochimps.com/datasets/disasters-worldwide-from-1900-2008 for the tutorial.
![Page 9: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/9.jpg)
9
Importing Data
![Page 10: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/10.jpg)
10
Importing Data
![Page 11: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/11.jpg)
11
Creating ProjectData Uploaded
![Page 12: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/12.jpg)
12
Creating Project Project Created
![Page 13: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/13.jpg)
13
Faceting
• Faceting is about seeing the big picture and filtering based on rows to work on data you want to change in bulk.
• We can create a facet for a column to get the details about that column and then we can filter to a subset of rows with a constraint.
• We can perform text facet, Numeric facet, timeline facet and scatterplot facet. Also various customized facets can be designed.
![Page 14: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/14.jpg)
14
Faceting
![Page 15: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/15.jpg)
15
Faceting
The Column Type has 18
unique options
![Page 16: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/16.jpg)
16
Removing Redundancy
Even though they are of same type, shows as
different options due to case
![Page 17: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/17.jpg)
17
Removing Redundancy
![Page 18: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/18.jpg)
18
Removing Redundancy
![Page 19: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/19.jpg)
19
Removing Redundancy
![Page 20: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/20.jpg)
20
Removing Redundancy
Reduced to 15 unique options
![Page 21: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/21.jpg)
21
Numeric Faceting
![Page 22: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/22.jpg)
22
Numeric Faceting
Highly clustered towards low
values
![Page 23: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/23.jpg)
23
Numeric Faceting
![Page 24: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/24.jpg)
24
Numeric Faceting
![Page 25: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/25.jpg)
25
Numeric Faceting
Cost column is blank and has no
value
![Page 26: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/26.jpg)
26
Numeric Faceting
Calamities with low cost
![Page 27: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/27.jpg)
27
Numeric Faceting
Calamities with high cost
![Page 28: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/28.jpg)
28
Clustering• Clustering is used to merge choices which look similar.
![Page 29: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/29.jpg)
29
Clustering
![Page 30: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/30.jpg)
30
Clustering
Data Merged
![Page 31: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/31.jpg)
31
Using Expressions• Expressions are used to transform existing data to create new data
![Page 32: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/32.jpg)
32
Using Expressions
![Page 33: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/33.jpg)
33
Using Expressions
![Page 34: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/34.jpg)
34
Data Augmentation
• Reconciliation option in Google refine allows data to be linked to web pages. Suppose we want details on the country where the calamity has struck we can perform the following steps
![Page 35: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/35.jpg)
35
Reconciliation
![Page 36: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/36.jpg)
36
Reconciliation
![Page 37: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/37.jpg)
37
Reconciliation
![Page 38: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/38.jpg)
38
Reconciliation
![Page 39: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/39.jpg)
39
Reconciliation
![Page 40: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/40.jpg)
40
Data Enrichment
![Page 41: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/41.jpg)
41
Data Enrichment
![Page 42: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/42.jpg)
42
Data Enrichment
![Page 43: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/43.jpg)
43
Data Enrichment
![Page 44: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/44.jpg)
44
Export
![Page 45: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/45.jpg)
45
Step 1
Step 2
How to Use Twitter Data
![Page 46: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/46.jpg)
46
Step 3
![Page 47: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/47.jpg)
47
Step 4
Step 5
![Page 48: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/48.jpg)
48
Step 6
![Page 49: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/49.jpg)
49
Step 7 Step 8
![Page 50: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/50.jpg)
50
Output
![Page 51: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/51.jpg)
51
Friends Events using Facebook data
![Page 52: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/52.jpg)
52
Friends Events using Facebook data
![Page 53: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/53.jpg)
53
Friends Events using Facebook data
![Page 54: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/54.jpg)
54
Friends Events using Facebook data
![Page 55: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/55.jpg)
55
Friends Events using Facebook data
![Page 56: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/56.jpg)
56
Friends Events using Facebook data
![Page 57: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/57.jpg)
57
Friends Events using Facebook data
![Page 58: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/58.jpg)
58
Friends Events using Facebook data
![Page 59: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/59.jpg)
59
Friends Events using Facebook data
![Page 60: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/60.jpg)
60
Friends Events using Facebook data
![Page 61: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/61.jpg)
61
Friends Events using Facebook data
• After splitting the cell using separator },{
![Page 62: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/62.jpg)
62
Friends Events using Facebook data
![Page 63: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/63.jpg)
63
Friends Events using Facebook data• After updating for other columns and rearranging it we get the events as
![Page 64: Google refine tutotial](https://reader036.fdocuments.us/reader036/viewer/2022070319/558455abd8b42adf748b4fe0/html5/thumbnails/64.jpg)
64
Thank You