Post on 26-Dec-2015
Team #6 Bill Cheng Sabina Del Rosso Stephen HomOmede Firouz Stacy Hsueh Wei Jiang Thoranis Karnasuta
Social Networking Analytics for Calbee
(SNAC)
CLIENT EER/SCHEMA NORMALIZATION QUERIES
Professor Ken Goldberg. IEOR 115. December 9, 2011.
DATABASE
Client Background: Calbee San Francisco
• CALBEE, Inc. is one of the largest snack companies in Japan– Company based on the premise of good health
• Calbee, San Francisco is the company’s first US-based flagship store– Founded in early 2011– Located in Westfield Mall
• Active in social media– Website, Facebook, Twitter
Image from calbeeshop.com
CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE
• Currently do not keep track of social media hits on any site
• Use Point of Sale for sales data and employee clock-ins
Current Infrastructure
CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE
Image from http://www.unrealstudio.com
• Handle future expansion into e-commerce• Increase social media marketing in targeted
demographics• View effect of promotions on sales and social
media to help better cater future promotions• Provide a foundation to maximize profits– Logistic management using integer programming– Data mining and machine learning to predict sales
Database Objectives
CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE
EER Diagram
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
Relational Design Schema (46 relations)Promotion/Sales/Retail: Relations Numbered 0-91. PRODUCT(ProdID, Name, IsSour, IsSweet, IsSalty, IsSavory, ManufCost, RetailPrice)2. PURCHASE(PurchaseID, ProdID1, PromoID3a, CustID6a, StoreID4a, EmpID5a, Timestamp, ipAddress)3a. PROMOTION(PromoID, PromoCode, StoreID, StartDate, EndDate, Discount)3b. PROMOTION_SPREAD_VIA_TWITTER(PromoID3a, TweetID10c)3c. PROMOTION_SPREAD_VIA_F(PromoID3a, F_CID11c)3d. PROMOTION_SPREAD_VIA_G+(PromoID3a, G_CID12c)3e. PROMOTION_SPREAD_VIA_S(PromoID3a, S_DID13c)3f. PROMOTION_SPREAD_VIA_B(PromoID3a, BPost_ID14a)3g. PROMOTION_INFO_VIA_W(PromoID3a, url15)4a. STORE(StoreID, AddressNo,StreetName, City, Country, ZipCode, PhoneNo)4b. STORE_CARRIES(StoreID4a, ProdID1, Stock)5a. EMPLOYEE(EmpID, LName, FName, Position, FavProdID1, StoreID4, AddressNo,StreetName, City, State, Country, ZipCode, SSN)5b. EMPLOYEE_IS_FRIEND(EmpID5a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)5c. EMPLOYEE_IS_CUSTOMER(EmpID5a, CustID6a)6a. CUSTOMER(CustID, LName, FName, AddressNo, StreetName, City, State, Country, ZipCode, FavProd1, BirthDate)6b. CUSTOMER_IS_FRIEND(CustID6a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)8a. PRODUCT_AD(P_Ad_ID, ProductID1, DateBeginAd, DateEndAd, F_or_G_Ad)8b. STORE_AD(S_Ad_ID, Store_ID, DateBeginAd, DateEndAd, F_or_G_Ad)8c. F_P_AD_CLICKED(P_Ad, F_UID, Timestamp, ipAddress)8d. G_P_AD_CLICKED(P_Ad_ID, G_UID, Timestamp, ipAddress)8e. F_S_AD_CLICKED(S_Ad_ID, F_UID, Timestamp, ipAddress)8f. G_S_AD_CLICKED(S_Ad_ID, G_UID, Timestamp, ipAddress)
Relational Design Schema
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
Social Media: Relations Numbered 10-1910a. T_USER(T_UID, T_Username, Fname, Lname, City, State, BirthDate, Email)10b. T_FOLLOWING(T_UID10a, Follower_T_UID10a, DateBeganFollowing)10c. TWEET(TweetID, T_UID10a, Auth_T_UID10a, TextStr, Timestamp)11a. F_USER(F_UID, Fname, Lname, City, State, BirthDate, Email)11b. F_FRIENDS(F_UID11a, Friend_F_UID11a, DateBecameFriends)11c. F_COMMENT(F_CID, Auth_F_UID11a, On_F_CID11c, TextStr, Timestamp)11d. F_LIKE(F_CID11c, F_UID11a, Timestamp)12a. G_USER(G_UID, Fname, Lname, City, State, BirthDate, Email)12b. G_FRIENDS(G_UID12a, Friend_G_UID12a, DateBecameFriends)12c. G_COMMENT(G_CID, Auth_G_UID12a, On_G_CID12c, TextStr, Timestamp)12d. G_LIKE(G_CID12c, G_UID12a, Timestamp)13a. S_USER(S_UID, Fname, Lname, City, State, BirthDate, Email)13b. S_FOLLOWING(S_UID13a, Follower_S_UID13a, DateBeganFollowing)13c. S_DISCOVERY(S_DID, S_UID13a, url, Timestamp)13d. S_REVIEW(S_DID13c, S_UID13a, TextStr, Like/Dislike, Timestamp)14a. BLOG_POST(url, BPost_ID, Author_Emp_ID5a, TextStr, Timestamp)14b. BLOG_COMMENT(BComment_ID, url, BPost_ID14a, TextStr, Timestamp, ipAddress)14c. ASSOCIATE_IP_T(T_UID10a, Timestamp, ipAddress)14d. ASSOCIATE_IP_FB(F_UID11a, Timestamp, ipAddress)14e. ASSOCIATE_IP_G(G_UID12a, Timestamp, ipAddress)14f. ASSOCIATE_IP_S(S_UID13a, Timestamp, ipAddress)15. MAIN_WEBSITE(url, link_to_html_file, Timestamp)
Relational Design Schema Cont.
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
Other Data: Relations Numbered 20-2920a. GOOGLE_TREND(GT_ID, word, city, country, day, hits)20b. RELATED_TREND(word, Related_Prod_ID1)
Relational Design Schema Cont.
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
Access Table Relationships
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT
Access Table Relationships Cont.
Normalization Analysis: 1NF
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Removal of a multi-valued attribute (flavor):
Normalization Analysis: 2NF
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Removal of a partial FD:{PromoID} {PromoCode, StoreID, StartDate, EndDate, Discount}
Normalization Analysis: 3NF
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Removal of a transitive FD:{T_UID} {T_Username, Fname, Lname, City, State, BirthDate, Email}
Normalization Analysis: BCNF
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Removal of a FD with a non-superkey attribute on the LHS:{PromoCode} {StartDate, EndDate, Discount}
Find out the most talked about products in a city and their quantities (stock). This will help us determine which products to move around to balance inventories in expectation of sale increases. Data can be exported to a solver to do a shipment problem.
Query 1: Popular Product Stock
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
SELECT Product.ProdID, Product.ProdName, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*') AS Hits, Store.City, Store_Carries.StoreID AS Store, Store_Carries.Stock AS StockFROM Product, Store, Store_CarriesWHERE (((Product.ProdID)=[Store_Carries].[ProdID]) AND ((Store.StoreID)=[Store_Carries].[StoreID]))ORDER BY Product.ProdName;
Query 1: SQL
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 1: Output
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
• We have a list of stores and their stock of different products– Transportation problem to encourage similar
levels of stock– Minimize shipments, shipping costs, etc.• Subject to: No outliers (stores with low stock)
Possible shipment constraintsPossible traffic constraintsEtc.
Query 1: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 1: Data Analysis (AMPL)
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 1: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Consider a promotion. Compare product social network comments in a given city two weeks before, during, and two week after a promotion to judge its effectiveness.
Order by the return on the investment.
Query 2: Promo Social Networking
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
SELECT Promotion.PromoID, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.StartDateAND F_Comment.Timestamp > Promotion.StartDate - 14) AS HitsBefore, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.EndDateAND F_Comment.Timestamp > Promotion.StartDate) AS HitsDuring, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.EndDate + 14AND F_Comment.Timestamp > Promotion.EndDate) AS HitsAfter, (SELECT SUM(Promotion.Discount*Product.RetailPrice)FROM Product) AS PromoCostFROM PromotionORDER BY PromoCost;
Query 2: SQL
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Use friendship data to rate friends by how many recommendations they have made.
Determine how many of a person's friends became friends with us after they became friends with us.
In this way, we identify possible priority customers of Calbee to target for special advertisements and promotions.
Query 3: Priority Customers
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
SELECT F.F_UID, ( SELECT COUNT(*)FROM F_Friends AS F2WHERE F2.F_UID = F.F_UID AND EXISTS( SELECT F3.DateBecameFriendsFROM F_FRIENDS F3WHERE F3.Friend_F_UID = 1AND F3.F_UID = F2.Friend_F_UIDAND F3.DateBecameFriends > F.DateBecameFriends)) AS friendCountFROM F_Friends AS FWHERE F.Friend_F_UID = 1;
Query 3: SQL
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 3: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Determine priority stores which don’t stock products that they should, as determined by google trend word popularity.
For a given google trend word, find the top 5 cities in which the word is most searched in year 2011. Then, find stores in those cities and which related products they do not stock. This will help us identify how to improve inventory.
Query 4: Google Trends and Stocks
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
SELECT Store.StoreID AS Store, Store.City AS City, Product.ProdID AS ProdFROM Store, Store_Carries, ProductWHERE Store.City IN (SELECT TOP 5 Google_Trend.CityFROM Google_TrendWHERE Google_Trend.Word = 'test')AND Store_Carries.StoreID = Store.StoreIDAND Store_Carries.ProdID = Product.ProdIDAND Store_Carries.Stock = 0AND Product.ProdID IN (SELECT Related_Trend.Related_Prod_IDFROM Related_TrendWHERE Related_Trend.Word = 'test');
Query 4: SQL
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Gather Social Networking, Google Trend, and Purchase data over time to formulate predictive models.For a given product, find the number of social network hits of a product, the related trend word hits, and the number of purchases in that product for a given city on a given day. In this way, we can use social network 'buzz' and trend data to predict purchases as a function of time and city. Order by product then timestamp.
Query 5: Social Network and Purchases
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
SELECT Product.ProdID, Product.ProdName, Purchase.Timestamp, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND F_Comment.Timestamp = Purchase.Timestamp) AS SocialNetworkHits, (SELECT SUM(Google_Trend.hits)FROM Google_TrendWHERE Google_Trend.word = Product.ProdNameAND Google_Trend.Timestamp = Purchase.Timestamp) AS TrendHitsFROM Product, PurchaseWHERE Purchase.ProdID = Product.ProdIDORDER BY Purchase.ProdID, Timestamp;
Query 5: SQL
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 5: Output
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 5: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
• Social media networking, Google Trends, and Purchases data used predictively– Group into weekly vectors
– Extract significant data using Principle Component Analysis to project onto 2 dimensions.
– Cluster data using K-Means
See if we can predict future sales using machine learning
Query 5: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Query 5: Data Analysis
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Employees login here:
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Login Interface
Allows employees to insert data in forms or run selected query
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Switchboard
Enter information on new Calbee employees
Forms: New Employee
NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA
Questions?
CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE
Thank you!