Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you...

36
Harness Power Query to gain Competitive Analysis Insights from Linkedin (Originally posted on PowerPivotPro.com – Read article here ) Written by Gil Raviv (https:// il.linkedin.com/in/gilraviv ) In this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive analysis insights based on companies search. To get your attention right at the start, we will conduct this tutorial and analyze a domain that we all know so well and love – PowerPivot. We will analyze companies who specialize in PowerPivot. We will show you step by step how to utilize Power Query o extract information from Linkedin including company size, founding year, location, specialties, and more. When we are done, you can also download the workbook , read how to get Linkedin access token in this tutorial (Step 1-6), and start using the workbook as your dashboard for competitive analysis. You can use its parametrized queries to search for companies in any domain, refresh the workbook and get a tailor-made dashboard for the competitive posture of your interest. Before we start, here are few screenshots of what you can get at the end. This screenshot shows the world distribution of the 70 companies who specialize in PowerPivot (Created with Power Query and Power Map).

Transcript of Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you...

Page 1: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

Harness Power Query to gain Competitive Analysis Insights from Linkedin(Originally posted on PowerPivotPro.com – Read article here)

Written by Gil Raviv (https:// il.linkedin.com/in/gilraviv )

In this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive analysis insights based on companies search. To get your attention right at the start, we will conduct this tutorial and analyze a domain that we all know so well and love – PowerPivot. We will analyze companies who specialize in PowerPivot.

We will show you step by step how to utilize Power Query o extract information from Linkedin including company size, founding year, location, specialties, and more.

When we are done, you can also download the workbook, read how to get Linkedin access token in this tutorial (Step 1-6), and start using the workbook as your dashboard for competitive analysis. You can use its parametrized queries to search for companies in any domain, refresh the workbook and get a tailor-made dashboard for the competitive posture of your interest.

Before we start, here are few screenshots of what you can get at the end.

This screenshot shows the world distribution of the 70 companies who specialize in PowerPivot (Created with Power Query and Power Map).

Page 2: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

This screenshot shows the distribution of companies by founding year. It’s interesting to see a gradual incline of younger companies who specialize in PowerPivot from 2009 to 2013, and to see a decline in 2014. It seems that last year fewer companies were founded with PowerPivot as a specialty.

This screenshot shows company distribution by Specialty. You can the most common specialties for companies who specialize in PowerPivot (e.g. Business Intelligence, SQL Server and Excel).

Page 3: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

I am sure that by now, we got your attention

Before we startPower Query doesn’t provide a dedicated Linkedin connector (like Facebook, Salesforce, and Microsoft Dynamics CRM). As a result, the main challenge we will face is to authenticate to Linkedin via external tool, get an access token to use its API, and then start using Power Query’s generic web service import capabilities (Power Query From Web).

Linkedin backgroundLinkedin website provides easy user interface to search for companies by different criteria or certain keywords (see screenshot), but you cannot export the entire dataset to gain insights from the entire data.

Page 4: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

For developers (or data enthusiasts like us) Linkedin provides REST APIs that can be used for our purpose. If you are interested to learn more on the APIs, you can start here. (https://developer-programs.linkedin.com/apis)

In this tutorial I used the company-search API call. For more information on that call go here. (https://developer-programs.linkedin.com/documents/company-search).

Signing-in to Linkedin (OAuth 2.0)To use Linkedin APIs, we will first need to obtain an access token from Linkedin following a successful signing-in. The token is needed for all the API calls we will make. To learn more on Linkedin OAuth 2.0 authentication go here. (https://developer.linkedin.com/docs/oauth2).

Linkedin provides an easy way for developers to test their Apps and learn APIs through REST console: https://apigee.com/console/linkedin

We will use this console to obtain the required access token, and use it with Power Query From Web to extract the data we need.

Note: The method that is described here is meant for learning purposes. Depending on your solution, you may need to register to Linkedin Developer Program and follow the API Terms of Use here (https://developer.linkedin.com/legal/api-terms-of-use).

Page 5: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

Ready to go? Let’s start.

Obtaining the access token1. Go to the Linkedin REST Console here. (https://apigee.com/console/linkedin).2. On the Authentication dropdown menu select OAuth 2.

3. The Request Linkedin permissions dialog will appear. Click Sign in with Linkedin and follow the sign-in/permission process.

Page 6: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

4. After you allow access to Apigee console, you will be back to the Console. On the left pane click Retrieve basic profile data, and click Send.

Page 7: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

5. You will notice that the Request pane contains your access token (See selected blue area in the screenshot below). From the Request pane copy the access token to a notepad – This is the gibberish text that starts after “oauth2_access_token=” and is ends at “&format=json”

Page 8: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

6. Congratulations you just obtained your Linkedin access token. You will need this token in a minute.Note: This token can be used to retrieve personal information from your Linkedin profile. Please don’t share it with others. In this tutorial, we will show you how you can use the access token, and share the workbook while the token never leaves your computer.The method that is described here is meant for learning purposes. Depending on your solution, you may need to register to Linkedin Developer Program and follow the API Terms of Use here (https://developer.linkedin.com/legal/api-terms-of-use).

Building the query7. Create a new workbook, go to Power Query ribbon and click From Web (If you are using Office

2016 Preview, click Data New Query From Other Sources From Web).8. Copy the following URL to your notepad, and replace your access token from step Error:

Reference source not found with the highlighted text:https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))?oauth2_access_token=paste the access token here&keywords=powerpivot&count=100&start=0&sort=relevance

Page 9: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

9. Paste the URL and access token to the URL box of From Web dialog, and click OK

10. In Access Web content dialog, select Anonymous and click Connect

11. In the Navigator pane select companies and click Edit

Page 10: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

12. The Query Editor will open. You will see that Power Query extracted the table below, which contains 20 out of 70 companies that match our search criteria (Later we will show you how to get all the 70 companies using a function query, but one step at a time).

Removing access token from workbook13. Before we continue to get the desired data from Linkedin, let’s tune the M expression to ensure

that the access token will not be stored in the workbook, and also to improve our query expression for future reuse.In the Query Editor click View Advanced Editor(Before you proceed, ensure that your access token is saved. We will delete it in a second from the M expression, but will need it soon one last time).

Page 11: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

14. In the Advanced Editor dialog change the line that starts with “Source=” according to this table:

Before Source = Xml.Tables(Web.Contents("https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))?oauth2_access_token=[access token]&keywords=powerpivot&count=100&start=0&sort=relevance")),

After apiURL = "https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))", Source = Xml.Tables(Web.Contents(apiURL, [ApiKeyName="oauth2_access_token", Query=[keywords="powerpivot", count="20", sort="relevance", start="0"]])),

Let’s explain what we have done here: We created a text field called apiURL for the URL which is served for the Linkedin API

Call, without its parameters. We moved all the URL’s parameters to a Query record. We removed the access token and declared that the field oauth2_access_token is the

key that will be used as part of the Web API authentication.15. Click Done in the Advanced Editor.16. In the Query Editor click Edit Credentials (If you don’t see the a yellow message bar with Edit

Credentials button, click Home Refresh & Preview).

Page 12: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

17. In the dialog Web Access content select Web API, paste the access token (which was obtained in step Error: Reference source not found) in the Key textbox and click Save.

We are now back in the query editor with an improved M expression that doesn’t reveal the access token. This change will allow us to share the workbook with others. New users who open the workbook will be able to edit the credentials with their token (after they obtain the token from linkedin API console as we did).

Let’s begin the transformation18. Click the Expand button in the column header “company”, and click OK.

Page 13: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

The Expand window gives us hint on the type of data we are going to get soon: Company name Description Employee-count-range (We will get the company size out of this field) Industries Locations (We will get the country codes and cities from this field) Specialties (This can be promising. We will be able to correlate between different

specialties). Founded Year (Is PowerPivot adopted by old and mature consulting companies, or by

younger ones?)

Figuring out the company size19. Click the Expand button on the column header “company.empoyee-count-range”. Check the

field “name”, check Use original column name as prefix and click OK.

Page 14: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

20. Rename column “company.empoyee-count-range” to “Size” (To rename the header, double click the header text and change the text).You will notice that the column Size has a mixture of textual values. Some of those are ranges such as “2-10” or “11-50”, but there is also the value “myself only”. In the next steps we will transform this column into a column that shows the upper bound of each range (1,10,50, etc.)

21. Right click the header “Size” and click Replace Values…

Page 15: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

22. In the Replace Values dialog, write “myself only” in the Values To Find textbox, and “1-1” in Replace With textbox. Click OK.

23. Right click the header “Size” and click Split ColumnBy Delimiter…

Page 16: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

24. In the Split a column by delimiter dialog select --Custom—in the first drop down, type “–“ in the textbox, and click OK.

25. We now have two new columns instead of Size: “Size.1” for the lower bound and “Size.2” for the upper bound. As we are interested in the upper bound for the company size, let’s delete “Size.1” and rename “Size.2” to “Size”.

Page 17: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

26. Delete the last 3 columns: “Attribute:total”, “Attribute:count” and “Attirbute:start”.27. Renaming the following columns:

a. company.description Descriptionb. company.founded-year Foundedc. company.id Idd. company.name Company

Where is my Geo-location? (where should I start my next PowerPivot consulting branch?)

28. Click the Expand button on the column header “company.locations”. Check the field “location”, uncheck Use original column name as prefix and click OK.

29. Click the Expand button on the column header “location”. Check the field “address”, uncheck Use original column name as prefix and click OK.

Page 18: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

30. Click the Expand button on the column header “address”. Check the fields “city” and “country-code”, uncheck Use original column name as prefix and click OK.

We now have the geo-location data that is needed to build cool visualizations with Power Map.

Industrious step ahead 31. Click the Expand button on the column header “company.industries”. Check the field “industry”,

uncheck Use original column name as prefix and click OK.

Page 19: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

32. Click the Expand button on the column header “industry”. Check the field “name”, check Use original column name as prefix and click OK.

33. Rename the column “industry.name” to “Industry”.

Special Specialty (Part 1)34. Click the Expand button on the column header “company.specialties”. Check the field

“specialty”, uncheck Use original column name as prefix and click OK.

Page 20: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

35. Click the Expand button on the column header “specialty”. Check the field “Element:text”, uncheck Use original column name as prefix and click OK.

36. Rename column “Elelement:Text” to “Specialty”.

Special Specialty (Part 2) - How to expand a column with mixed types of tables and textLet’s test our query, and change the M expression to call for the next 20 “powerpivot” companies (by changing the highlighted text from “0” to “20”:

Source = Xml.Tables(Web.Contents(apiURL, [ApiKeyName="oauth2_access_token",

Page 21: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

Query=[keywords="powerpivot", count="20", sort="relevance", start="20"]])),

The Query Editor will now show three types of values under the column “specialty”: Table, null and text. Our new challenge is that the expanding we performed on the previous step will yield an error as Power Query cannot expand text. Let’s see how this can be fixed using custom columns and simple if-then-else statements.

37. Delete the last two steps (Expanding “specialty”, and renaming “Element:Text” to Specialty).38. We will now add two custom columns. Into the first column we will copy all the tables from

column “specialty”, so we will be able to expand it. Into the second column we will copy all the text values from column “specialty”. Then we will have two columns that can be merged.

39. Click Add Column Add Custom Column.40. In Custom column formula textbox write:

= if ([specialty] is table) then [specialty] else null

41. Click OK42. Now we will create the second custom column. Click Add Column Add Custom Column.43. In Custom column formula textbox write:

Page 22: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

= if ([specialty] is table) then null else [specialty]44. Click OK

We now have two new columns: Custom with all the tables, and Custom.1 with the text values.45. Delete the original column “specialty”.46. Click the Expand button on the column header “Custom”. Check the field “Element:text”,

uncheck Use original column name as prefix and click OK.

47. Select the two columns “Element:Text” and “Custom.1”, right click and select Merge Columns

Page 23: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

48. In the Merge Columns dialog select “--None--“ as the separator, and “Specialty” as New Column Name, and click OK.

Page 24: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

Scaling it up – From 20 companies to allAs Linkedin limits the results to 20 companies per API call, you will now see how to use Power Query to fetch all companies (Note: Even if we explicitly try to fetch more companies by using the count parameter, we can only get 20 companies).

As we know that there are 70 PowerPivot companies, we will turn the query into a function query and will call it 4 times with the relevant offset (ranging from 0 to 60). In the last section we will also show you the M expression which can dynamically decide how many calls are needed, so you will not need to know in advance how many API calls to use.

49. To create a function query, we will start by renaming our query from “companies” to “getCompanies”Well, this is not enough to turn the query into a function query, but it’s a start ;)

50. Now let’s edit the first few lines of the query expression according to the table below (Edit the query, and click ViewAdvanced Editor).

Beforelet apiURL = "https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))", Source = Xml.Tables(Web.Contents(apiURL, [ApiKeyName="oauth2_access_token", Query=[keywords=”powerpivot”, sort="relevance", start=20, count="20"]])),After(offset as text, inputKeywords as text) =>let apiURL = "https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))", Source = Xml.Tables(Web.Contents(apiURL, [ApiKeyName="oauth2_access_token", Query=[keywords=inputKeywords, sort="relevance", start=offset, count="20"]])),

51. After you made the changes in the Advanced Editor click Done.You will see that the query was turned into a function query. It’s time to save this query. Click HomeClose & Load.

Page 25: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

52. Create a blank query by clicking Power Query From Other Sources Blank Query (If you use Excel 2016 Preview click Data New Query From Other Sources Blank Query).

53. In the formula bar write:= {0..3}Type Enter

54. Type Enter and click To Table and click OK in the To Table dialog.

Page 26: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

55. Add a new column that multiply the first column by 20. This can be done by clicking Add Column Standard Multiply

56. Type the number 20 in the Multiply dialog and click OK.57. Delete Column1 and rename “Inserted Multiplication” column to “Offset”.58. Change “Offset” type to Text by selecting the column “Offset” right click and click Change Type

Text

Page 27: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

59. Click Add Column Add Custom Column and in the Custom column formula textbox write=getCompanies([Offset],"powerpivot")Click OK in the Custom Column Dialog.

60. Expand the column “Custom” by clicking the Expand button (marked below).

61. Ensure all columns are selected. Uncheck Use original column name as prefix, and click OK.

Page 28: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

62. Delete “Offset” column63. Click Home Close and Load to… and select Add this data to Data Model64. Let’s name this query “LoadAllCompanies”

Scaling it up even furtherPreviously we used a predefined list {0..3} to iterate over the 70 PowerPivot companies. We also used the keyword “powerpivot” inside the Power query expression. At this final section, we will build parameterized query the loads the search keywords from the workbook, and dynamically iterate Linkedin API to extract all the search results (without knowing in advance the number of companies that the search will yield). To know the number of companies in advance we will build a new query that gets reads the attribute “Attribute:Total” from linkedin (We saw it above in step 12).

65. We will create a new function query (using Blank Query)

(inputKeywords as text, max as number) as number =>

let

apiURL = "https://api.linkedin.com/v1/company-search:(companies:(id,name,description,founded-year,employee-count-range,specialties,industries,locations:(address:(country-code,city))))",

Source = Xml.Tables(Web.Contents(apiURL, [ApiKeyName="oauth2_access_token", Query=[keywords=inputKeywords, sort="relevance", start="0", count="1"]])),

Table0 = Source{0}[Table],

#"Changed Type" = Table.TransformColumnTypes(Table0,{{"Attribute:total", Int64.Type}, {"Attribute:count", Int64.Type}, {"Attribute:start", Int64.Type}}),

#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"company", "Attribute:count", "Attribute:start"}),

Page 29: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute:total", "Total"}}),

companyCount = #"Renamed Columns"[Total]{0},

returnNum = if (companyCount > max) then max else companyCount

in

returnNum

The input parameter “max” will be used to limit the number of Linkedin calls we will conduct.66. Save the new query and rename it “getCompaniesCount”.67. We will also adjust “LoadAllCompanies” query to get the keywords from a table in the

workbook. The keywords will be located in a simple Table named “Keywords” with a single column and a header named “Keywords”. The single cell will include our keywords (As shown in the screenshot below).

68. We will now edit “LoadAllCompanies” with the advanced editor and change the expression to the following one (The changes are highlighted):

let keywords = Excel.CurrentWorkbook(){[Name="Keywords"]}[Content][Keywords]{0}, lastNumberInList = Number.RoundUp(getCompaniesCount(keywords, 500) / 20), Source = List.Numbers(0, lastNumberInList), #"Table from List1" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Inserted Multiplication" = Table.AddColumn(#"Table from List1", "Inserted Multiplication", each List.Product({[Column1], 20}), type number), #"Removed Columns" = Table.RemoveColumns(#"Inserted Multiplication",{"Column1"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Inserted Multiplication", "Offset"}}), #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Offset", type text}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each getCompanies([Offset], keywords)), #"Expand Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Description", "Size", "Founded", "Id", "Industry", "city", "country-code", "Company", "Specialty"}, {"Description", "Size", "Founded", "Id", "Industry", "city", "country-code", "Company", "Specialty"}), #"Removed Columns1" = Table.RemoveColumns(#"Expand Custom",{"Offset"})in #"Removed Columns1"

The section in yellow imports the keywords from our table. The section in green invokes “getCompaniesCount” to know the total number of companies that Linkedin search should yield, it then calculates the number of API calls that are needed by using Number.RoundUp divided by 20 (the number of companies we get per call).

69. Save the modified query, and refresh the query.

Page 30: Harness Power Query to gain Competitive Analysis … · Web viewIn this document we will show you how to use Power Query in Excel to import data from Linkedin and gain amazing competitive

That’s it. We have just walked through seventy steps to build the ultimate queries for your competitive analysis dashboard. You are encouraged to take the workbook here to see the Dashboard we have created with these queries.

Note: Since we removed the access token from the workbook, if you share your competitive analysis dashboard, please share the instructions on how to get the token from Linkedin Console (steps 1 to 6), and use the token in the Key textbox (as shown in step 17).