MongoDB for storing humongous music database
-
Upload
prasoon-kumar -
Category
Technology
-
view
610 -
download
0
description
Transcript of MongoDB for storing humongous music database
![Page 1: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/1.jpg)
Working with Humongous Music Database
MongoDB
Prasoon Kumar
#HyderabadDataScienceGroup
![Page 2: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/2.jpg)
Agenda
• MongoDB Features
• Bulk Import
• Full Text Index creation
• Full Text Search
• Musicbrainz Database
![Page 3: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/3.jpg)
MUSIC BRAINZ
![Page 4: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/4.jpg)
What is MusicBrainz ? • MusicBrainz is a community-maintained open
source encyclopedia of music information.
• This means that anyone - including you - can help contribute to the project by adding information about your favorite artists and their related works.
• Robert Kaye founded MusicBrainz. The project has grown rapidly from a one-man operation to an international community of enthusiasts who appreciate both music and music metadata.
![Page 5: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/5.jpg)
MusicBrainz • Along the way, the scope of the project has
expanded from its origins as a mere a CDDB replacement to today, where MusicBrainz has become a true encyclopedia of music.
• As an encyclopedia and as a community, MusicBrainz exists solely to collect as much information about music as we can without discriminating or preferring one "type" of music over another.
![Page 6: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/6.jpg)
MusicBrainz Database
The MusicBrainz Database is where all of the various pieces of information we collect about music is stored, from artists and their releases to works and their composers, and of course much more. The majority of the data in the MusicBrainz Database is placed in the Public Domain, which means that anyone can download the data and use it in any way they see fit. The remaining data is released under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license.
![Page 7: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/7.jpg)
MongoDB
Document Database
Open-Source
General Purpose
![Page 8: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/8.jpg)
Scalability
Auto-Sharding
• Increase capacity as you go
• Commodity and cloud architectures
• Improved operational simplicity and cost visibility
![Page 9: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/9.jpg)
Morphia
MEAN Stack
Java
Python
Perl
Ruby
Support for the most popular languages and frameworks
Drivers & Ecosystem
![Page 10: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/10.jpg)
Music Mongo • Load (import)
• Run – Exact match – Full text search
• Todo
– Application interface
![Page 11: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/11.jpg)
AWS Setup
s0 54.225.100.65
s1 54.235.157.214
s2 54.225.100.42
Client & mongos 54.225.100.39
config 184.73.195.120
![Page 12: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/12.jpg)
Relevant schema of MusicBrainz:
![Page 13: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/13.jpg)
Import strategies
• Denormalized from source DB – Import TSV in PostgreSQL – Export joined tables from PostgreSQL – mongoimport TSV
• Separate collections from TSV – mongoimport TSVs into temporary collections – “Join” temporary collections in client (PyMongo) and
insert to destination collection
![Page 14: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/14.jpg)
Steps for creating denormalized table:
![Page 15: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/15.jpg)
Client join
![Page 16: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/16.jpg)
Import statistics
recording:
2013-11-11T22:02:51.213+0000 imported 12817015 objects real 69m49.949s
artist_credit:
2013-11-11T22:04:41.469+0000 imported 756247 objects real 1m50.256s
track:
2013-11-11T22:48:59.423+0000 imported 15427255 objects real 44m17.973s
release:
2013-11-11T22:53:06.627+0000 imported 1208854 objects real 4m7.183s
medium:
2013-11-11T22:57:45.030+0000 imported 1343234 objects real 4m38.414s
![Page 17: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/17.jpg)
Import via Postgres
Operation Time
Postgres Import 08m11s
Denormalize 14m57s
Export 00m29s
(Unsharded) (Sharded)
MongoDB Import 14m59s 12m15s
Index 07m45s 02m35s
Overall 45m23s 40m13s
![Page 18: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/18.jpg)
Indexes & Sharding
![Page 19: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/19.jpg)
Indexes & Sharding - Text Index
![Page 20: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/20.jpg)
Indexes & Sharding - Shard key
musicbrainz2.records3
shard key: { "name" : 1, "_id" : 1 }
chunks:
shard0002 18
shard0000 18
shard0001 18
![Page 21: MongoDB for storing humongous music database](https://reader034.fdocuments.us/reader034/viewer/2022052323/55855abad8b42a54608b517e/html5/thumbnails/21.jpg)
Thank You team = {
members: [“Jonathan”, “Prasoon”], company: “MongoDB }
@prasoonk