Blackpink Data

22
Blackpink Data Release 1.0.0 Marco Fantauzzo May 04, 2022

Transcript of Blackpink Data

Page 1: Blackpink Data

Blackpink DataRelease 1.0.0

Marco Fantauzzo

May 04, 2022

Page 2: Blackpink Data
Page 3: Blackpink Data

CONTENTS:

1 Indices and tables 1

2 How to Build 32.1 Set up your machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Clone the repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.3 Install dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.4 Set API keys as environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.5 Twitter API keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.6 YouTube API key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.7 Spotify API key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.8 Instagram USERNAME and PASSWORD . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Fork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 First run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Standard run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.4 Schedule the bot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Modules 73.1 Main script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Tweet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Utils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Birthdays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.5 YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.6 Instagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.7 Spotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.8 Billboard Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Python Module Index 15

Index 17

i

Page 4: Blackpink Data

ii

Page 5: Blackpink Data

CHAPTER

ONE

INDICES AND TABLES

• genindex

• modindex

• search

1

Page 6: Blackpink Data

Blackpink Data, Release 1.0.0

2 Chapter 1. Indices and tables

Page 7: Blackpink Data

CHAPTER

TWO

HOW TO BUILD

2.1 Set up your machine

2.1.1 Python

Make sure you have installed:

• Python 3.8

• pip

Please note that the spotify.pymodule, which is based on the library spotipy, seems to not work well with Windows,so I suggest to use Linux or WSL on Windows. All the following commands assume that you are in a Linux-likeenvironment.

2.1.2 Clone the repository

Run: git clone https://github.com/marco97pa/Blackpink-Data.git

For more info see this guide

Then cd to the new directory

2.1.3 Install dependencies

Run pip3 install -r requirements.txt to install all the required libraries

2.1.4 Set API keys as environment variables

The project is componed by different modules such as instagram.py, youtube.py and more. Each module is usedto get data from a different source. To get this data you need the corresponding API keys.

3

Page 8: Blackpink Data

Blackpink Data, Release 1.0.0

2.1.5 Twitter API keys

Go to the Twitter Developers page, log in, go to Dashboard and create a new app with read and write permissions.Then copy the generated keys and set them as environment variables, by running these lines (change them with youractual key values):

export TWITTER_CONSUMER_KEY='xxxx' export TWITTER_CONSUMER_SECRET='xxxx' exportTWITTER_ACCESS_KEY='xxxx' export TWITTER_ACCESS_SECRET='xxxx'

2.1.6 YouTube API key

Go to Google Developers and follow their istructions on how to get an API key for YouTubeThen copy the generated key and set it as environment variable, by running this line (change with your actual keyvalue):

export YOUTUBE_API_KEY='xxxx'

2.1.7 Spotify API key

Go to Spotify Developer Dashboard, create a new app and get the API keys. Then set them as environment variables,by running these lines:

export SPOTIPY_CLIENT_ID='xxxx' export SPOTIPY_CLIENT_SECRET='xxxx'

2.1.8 Instagram USERNAME and PASSWORD

You can set your username and password like this: export INSTAGRAM_ACCOUNT_USERNAME='xxxxxx' exportINSTAGRAM_ACCOUNT_PASSWORD='xxxxxx'

2.2 Fork

By editing the data.yaml file you can make the script work with a different artist group.

For example, you could make a BTS Data Bot by editing the provided sample_data.yaml file and saving it as data.yaml

Edit the data.yaml accordingly with all the data you know. Leave empty fields or write fake data if you don’t knowsome details: they will be overwritten with the real ones at the first launch of the script.

With minimal or no code edits, the script could work even for single artists and not only groups.

4 Chapter 2. How to Build

Page 9: Blackpink Data

Blackpink Data, Release 1.0.0

2.3 Run

2.3.1 First run

Assumed that you have a valid data.yaml file in the same directory as the script, run:python3 main.py -no-tweet

For the first run it is important that you use the -no-tweet option to prevent an overload of tweets in your timeline.You should also check that everything is fine by looking at the command line output and the data.yaml file

2.3.2 Standard run

From the next time, you can just run: python3 main.pyIt will tweet eventually changes on the dataset.

2.3.3 Parameters

By passing one or more parameters, you can disable a single module source. Actual parameters allowed are:

• -no-instagram: disables Instagram source

• -no-youtube: disables YouTube source

• -no-spotify: disables Spotify source

• -no-birthday: disables birthdays events source

• -no-twitter: disables Twitter source (used for reposting)

Remember that -no-twitter is different from -no-tweet:-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still bevisible on the console. This is really useful for testing.

2.3.4 Schedule the bot

If you want the bot to run 24/7, you should set the script to run (for example) every 5 minutes to check for updates.Look at How to schedule tasks on Linux using crontab to get an idea on how to do it.

2.3. Run 5

Page 10: Blackpink Data

Blackpink Data, Release 1.0.0

6 Chapter 2. How to Build

Page 11: Blackpink Data

CHAPTER

THREE

MODULES

3.1 Main script

main.check_args()

Checks the arguments passed by the command line

By passing one or more parameters, you can disable a single module source.

Actual parameters allowed are:

• -no-instagram: disables Instagram source

• -no-youtube: disables YouTube source

• -no-spotify: disables Spotify source

• -no-birthday: disables birthdays events source

• -no-twitter: disables Twitter source (used for reposting)

Remember that -no-twitter is different than -no-tweet:

-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still bevisible on the console. This is really useful for testing.

Returns: A dictionary that contains all the sources and their state (enabled or disabled, True or False)

main.load_group()

Reads the data.yaml YAML file

Data about a group is stored inside the data.yaml file in the same directory as the script

Returns: A dictionary that contains all the informations about the group

main.write_group(group)Writes the data.yaml YAML file

Data about a group is stored inside the data.yaml file in the same directory as the script

Args: group: dictionary that contains all the informations about the group

7

Page 12: Blackpink Data

Blackpink Data, Release 1.0.0

3.2 Tweet

tweet.check_duplicates(message)Checks tweet message against 3 latest user tweets to ensure no duplicative posts

Args: message: a string containing the message to be posted

Returns: Boolean which signals True if a duplicate is found

tweet.edit_image(filename, text, text_size=200, crop=False)Edit an image by adding a text (uses the Pillow module)

Args:

• filename: filename of the image to be modified

• text: text to be added

• text_size (optional): size of the text (default: 200)

• crop (optional): if enabled removes black bars from a video thumbnail (16:9 over 4:3)

tweet.remove_URLs(text)Remove URLs from a text string

Args: text: any text containing URL(s)

Returns: the same text without URL(s)

tweet.retrieve_own_tweets(num=3)Retrieves recent tweets made by the bot.

Args: num: an integer with the number of tweets to retrieve.

Returns: a list of tweet objects

tweet.set_test_mode()

Enables the test mode

Prevents tweets from being posted. They are still printed in the console. This is really useful for debuggingpurposes

tweet.twitter_post(message)Post a message on Twitter (uses the Tweepy module)

Args: message: a string containing the message to be posted

tweet.twitter_post_image(message, filename, text, text_size=200, crop=False)Post a photo with message on Twitter (uses the Tweepy module)

Args:

• message: a string containing the message to be posted

• filename: filename of the image to be posted

tweet.twitter_repost(artist)Retweets latest tweets of a given account

Args: artist: a dictionary with all the details of the artist

Returns: an dictionary containing all the updated data of the artist

8 Chapter 3. Modules

Page 13: Blackpink Data

Blackpink Data, Release 1.0.0

3.3 Utils

utils.convert_num(mode, num)Converts a number in any given number scale

Example: convert_num(“100K”, 600000) returns 6

Args:

• mode: (string) the scale for the conversion (“100K”, “M”, “10M”, “100M”, “B”)

• num: the number to be converted

Returns: the converted number

utils.display_num(num, short=False, decimal=False)Converts a number in a readable format

Args:

• num: the number to be converted

• short (optional): flag to get a long or short literal (“Mln” vs “million”)

• decimal (optional): flag to print also the first decimal digit (19.1 vs 19)

Returns: a string with a number in a readable format

utils.download(url, filename)Downloads a file, given an url and filename

Args: url: source from where download the image filename: name of the file to save

utils.download_image(url)Downloads an image, given an url

The image is saved in the download.jpg file

Args: url: source from where download the image

3.4 Birthdays

birthdays.check_birthdays(group)Checks if today is the birthday of a member of the group

It tweets if it is the birthday of someone

Args: group: a dictionary with all the details of the group

Returns: an dictionary containing all the updated data of the group

3.3. Utils 9

Page 14: Blackpink Data

Blackpink Data, Release 1.0.0

3.5 YouTube

youtube.youtube_check_channel_change(old_channel, new_channel, hashtags)Checks if there is any change in the number of subscribers or total views of the channel

It compares the old channel data with the new (already fetched) data.

Args:

• old_channel: dictionary that contains all the old data of the channel

• new_channel: dictionary that contains all the updated data of the channel

• hashtags: hashtags to add to the Tweet

Returns: a dictionary with updated data of the channel

youtube.youtube_check_videos_change(name, old_videos, new_videos, hashtags)Checks if there is any new video

It compares the old videos list of the artist with the new (already fetched) videos list. It tweets if there is a newrelease or if a video reaches a new views goal.

Args:

• name: name of the channel

• old_videos: list that contains all the old videos

• new_videos: list that contains all the updated videos

• hashtags: hashtags to append to the Tweet

Returns: new_videos

youtube.youtube_data(group)Runs all the YouTube related tasks

It scrapes data from YouTube for the whole group and the single artists

Args: group: dictionary with the data of the group to scrape

Returns: the same group dictionary with updated data

youtube.youtube_get_channel(api, channel_id)Gets details about a channel

Args:

• api: The YouTube instance

• channel_id: the ID of that channel on YouTube

Returns: an dictionary containing all the scraped data of that channel

youtube.youtube_get_videos(api, playlist_id, name)Gets videos from a playlist

Args:

• api: The YouTube instance

• playlist_id: the ID of the playlist on YouTube

• name: name of the channel owner of the playlist

Returns: a list of videos

10 Chapter 3. Modules

Page 15: Blackpink Data

Blackpink Data, Release 1.0.0

3.6 Instagram

instagram.clean_caption(caption)Removes unnecessary parts of an Instagram post caption

It removes all the hashtags and converts tags in plain text (@marco97pa –> marco97pa)

Args: caption: a text

Returns: the same caption without hashtags and tags

instagram.download_profile_pic(url)Downloads an image, given an url

The image is saved in the download.jpg file

Args: url: source from where download the image

instagram.instagram_data(group)Runs all the Instagram related tasks

It scrapes data from Instagram for the whole group and the single artists

Args: group: dictionary with the data of the group to scrape

Returns: the same group dictionary with updated data

instagram.instagram_last_post(artist, user_id)Gets the last post of a profile

It tweets if there is a new post: if the timestamp of the latest stored post does not match with the latest fetchedposts timestamp

Args:

• user_id: a profile ID

• artist: a dictionary with all the details of the artist

Returns: an dictionary containing all the updated data of the artist

instagram.instagram_profile(artist)Gets the details of an artist on Instagram

It tweets if the artist reaches a new followers goal

Args: artist: a dictionary with all the details of the artist

Returns:

• an dictionary containing all the updated data of the artist

• a Profile ID

3.6. Instagram 11

Page 16: Blackpink Data

Blackpink Data, Release 1.0.0

3.7 Spotify

spotify.check_new_songs(artist, collection, hashtags)Checks if there is any new song

It compares the old discography of the artist with the new (already fetched) discography. It tweets if there is anew release or featuring of the artist.

Args:

• artist: dictionary that contains all the data about the single artist

• collection: dictionary that contains all the updated discography of the artist

• hashtags: hashtags to append to the Tweet

Returns: an artist dictionary with updated discography details

spotify.get_artist(spotify, artist, hashtags)Gets details about an artist

It tweets if the artist reaches a new goal of followers on Spotify

Args:

• spotify: The Spotify instance

• artist: dictionary that contains all the data about the single artist

• hashtags: hashtags to append to the Tweet

Returns: an artist dictionary with updated profile details

spotify.get_discography(spotify, artist)Gets all the releases of an artist

A release is single, EP, mini-album or album: Spotify simply calls them all “albums”

Example:

• DDU-DU-DDU-DU of BLACKPINK is a single

• SQUARE UP of BLACKPINK is a mini-album

• THE ALBUM of BLACKPINK is (really) an album

It also gets releases where the artist is featured. Example:

• Sour Candy is a song of Lady Gaga, but BLACKPINK are featured

Spotify also makes many “clones” of the same album: there could be extended albums or albums that lateradded tracks. Each one of this makes a duplicate of the same album. So this function also tries to clean up thediscography by removing duplicates.

Args:

• spotify: The Spotify instance

• artist: dictionary that contains all the data about the single artist

Returns: an dictionary with updated discography details

spotify.link_album(album_id)Generates a link to an album

Args: album_id: ID of the album

12 Chapter 3. Modules

Page 17: Blackpink Data

Blackpink Data, Release 1.0.0

Returns: The link to that album on Spotify

spotify.link_artist(artist_id)Generates a link to an artist

Args: artist_id: ID of the artist

Returns: The link to that artist on Spotify

spotify.login()

Logs in to Spotify

Client credential authorization flow The following API keys are needed to be set as environment variables:

• SPOTIPY_CLIENT_ID

• SPOTIPY_CLIENT_SECRET

You can request API keys on the Spotify Developer Dashboard

See https://spotipy.readthedocs.io/en/2.16.1/#authorization-code-flow for more details

spotify.spotify_data(group)Runs all the Spotify related tasks

It scrapes data from Spotify for the whole group and the single artists

Args: group: dictionary with the data of the group to scrape

Returns: the same group dictionary with updated data

3.8 Billboard Charts

billboard_charts.billboard_data(group)Gets Billboard charts of a group

It starts all the tasks needed to get latest data and eventually tweet updates Data is updated once a day

Args:

• group: dictionary that contains all the data about the group

Returns: the same group dictionary with updated data

billboard_charts.get_artist_rank(artist, chart)Gets the Billboard Hot 100 chart and tries to find an artist

Args:

• artist: the artist to look for

Returns: a string containing the list of songs found in the chart (it can be empty)

3.8. Billboard Charts 13

Page 18: Blackpink Data

Blackpink Data, Release 1.0.0

14 Chapter 3. Modules

Page 19: Blackpink Data

PYTHON MODULE INDEX

bbillboard_charts, 13birthdays, 9

iinstagram, 11

mmain, 7

sspotify, 12

ttweet, 8

uutils, 9

yyoutube, 10

15

Page 20: Blackpink Data

Blackpink Data, Release 1.0.0

16 Python Module Index

Page 21: Blackpink Data

INDEX

Bbillboard_charts

module, 13billboard_data() (in module billboard_charts), 13birthdays

module, 9

Ccheck_args() (in module main), 7check_birthdays() (in module birthdays), 9check_duplicates() (in module tweet), 8check_new_songs() (in module spotify), 12clean_caption() (in module instagram), 11convert_num() (in module utils), 9

Ddisplay_num() (in module utils), 9download() (in module utils), 9download_image() (in module utils), 9download_profile_pic() (in module instagram), 11

Eedit_image() (in module tweet), 8

Gget_artist() (in module spotify), 12get_artist_rank() (in module billboard_charts), 13get_discography() (in module spotify), 12

Iinstagram

module, 11instagram_data() (in module instagram), 11instagram_last_post() (in module instagram), 11instagram_profile() (in module instagram), 11

Llink_album() (in module spotify), 12link_artist() (in module spotify), 13load_group() (in module main), 7login() (in module spotify), 13

Mmain

module, 7module

billboard_charts, 13birthdays, 9instagram, 11main, 7spotify, 12tweet, 8utils, 9youtube, 10

Rremove_URLs() (in module tweet), 8retrieve_own_tweets() (in module tweet), 8

Sset_test_mode() (in module tweet), 8spotify

module, 12spotify_data() (in module spotify), 13

Ttweet

module, 8twitter_post() (in module tweet), 8twitter_post_image() (in module tweet), 8twitter_repost() (in module tweet), 8

Uutils

module, 9

Wwrite_group() (in module main), 7

Yyoutube

module, 10

17

Page 22: Blackpink Data

Blackpink Data, Release 1.0.0

youtube_check_channel_change() (in moduleyoutube), 10

youtube_check_videos_change() (in moduleyoutube), 10

youtube_data() (in module youtube), 10youtube_get_channel() (in module youtube), 10youtube_get_videos() (in module youtube), 10

18 Index