MusicGo GitHub Repo

Author

Jiadong Yan
Jiaming Xu
Xinyi Jiang

Latest Modify Date

May 10th 2017

Description

Special information retrieval system for music. User can search information with some words. This System also support filter by some fields such as Artist, Duration, Genre, etc.

Functionality

basic search: search the description on title, lyric, album and artist_name
advanced search: search on every possible filed, including title, lyric, album, artist_name, location, duration, genres, year
More like this: search similar tracks, based on its title, lyric and similarity of the artist
Sorting:
- default: sort by relevance
- sort by song hotness
- sort by danceability

Getting Started

install homebrew
brew tap homebrew/science
brew install hdf5
pip install Cython
install Tables by sudo pip install git+https://github.com/PyTables/PyTables
install other packages mentioned in Dependency.
build elasticsearch as mentioned in Build Elasticsearch
type redis-server in terminal

open another terminal window, type:

cd elasticsearch-<version>
./bin/elasticsearch

python query.py

Build Elasticsearch

cp lib/name_syn.txt [your elasticsearch path]/config/name_syn.txt
cp lib/cat_syn.txt [your elasticsearch path]/config/cat_syn.txt
(It is not recommended, but if you really want to let your web application access a folder outside its deployment directory. You need to add permission in java.policy file. Details see http://stackoverflow.com/questions/10454037/java-security-accesscontrolexception-access-denied-java-io-filepermission)
open elasticsearch server: cd elasticsearch-<version> ./bin/elasticsearch
run python ./lib/buildElaticSearch.py
build time: 9s
use another terminal to run redis-server

Functionality

We support baisc title and lyrics search for whatever you want!
We support many filters, like duration, artist, genre, etc!
You can find hotttest songs near your position!!!

Dependency

hdf5
Cython
Flask
PyTables
elasticsearch_dsl
elasticsearch
json
math
redis

Corpus Source

https://labrosa.ee.columbia.edu/millionsong/
https://www.musixmatch.com/

Corpus format (1000 songs from Hdf5 file)

{“1”:{
“trackID”: string,
“title”: (song’s name) string,
“year”: int,
“song_hotttnesss”: float,
“artistName”: string,
“artistID”: string,
“artist_hotttnesss”: float,
“artist_location”:String,
“duration”: (seconds) int,
“release”: (album name) string,
“similar_artists”: a list of (artistID) string,
“lyrics”: string,
“artist_longitude”: float,
“artist_latitude”: float,
“artist_location”: String,
“danceability”: float
},
“2”:{

},
…
}

Test Set and sample queries

We have a test corpus sample_corpus.json. To build elasticsearch with this corpus, call build method with the path of sample corpus as parameter: build("sample_corpus.json")

search by query

simple query: {‘description’: u’love’}}
advanced search query: d_query = {‘title’: u’’, ‘lyric’: u’’, ‘album’: u’’, ‘max_longitude’: u’’, ‘min_longitude’: u’’, ‘description’: u’love’, ‘max_duration’: u’’,’min_duration’: u’’, ‘artist_name’: u’’, ‘min_latitude’: u’’, ‘max_latitude’: u’’, ‘year’: u’’, ‘genre’: u’’, ‘artist_location’: u’’}

search by track

The parameter is the track_id

sort

use ‘hot’ or ‘dance’ as parameter, the results will sort by song_hotttnesss or danceability.

How to search

search({‘description’: u’love’},’hot’) search(d_query,’hot’) search_track(1,’dance’)

Modules

query.py

Main entry or runtime app of the Music Information Retrieval System.
Integrate all the models and handle all the http request.
Search Algorithm implementation.
Session management.

static folder:

css file and images

templates folder:

search box view
search results view

music_corpus.json:

the corpus of 10000 songs, the format is described above.

lib/Track.py

This file defines a doc_tpye track and its field, as well as the analyzers

lib/buildElaticSearch.py

This file builds the elasticsearch index from the music_corpus.json

lib/Search.py

This file takes different types of query as input, builds elasticsearch search query, search in elastic and return the results. It is responsible of search on all fields and sort by specific features.

getCorpus.py:

get raw information from hdf5 file ==> raw_music_corpus.json

getlyrics.py:

get lyrics information according to train data.txt file ==> music_corpus.json

mxm.py:

get lyrics information and genre information using API to MXM website ==> new_music_corpus.json

cleanCorpus.py:

get desired attributes from the corpus