MusicGo
MusicGo GitHub Repo
Author
Jiadong Yan
Jiaming Xu
Xinyi Jiang
Latest Modify Date
May 10th 2017
Description
Special information retrieval system for music. User can search information with some words. This System also support filter by some fields such as Artist, Duration, Genre, etc.
Functionality
- basic search: search the description on title, lyric, album and artist_name
- advanced search: search on every possible filed, including title, lyric, album, artist_name, location, duration, genres, year
- More like this: search similar tracks, based on its title, lyric and similarity of the artist
- Sorting:
- default: sort by relevance
- sort by song hotness
- sort by danceability
Getting Started
- install homebrew
brew tap homebrew/science
brew install hdf5
pip install Cython
- install Tables by
sudo pip install git+https://github.com/PyTables/PyTables
- install other packages mentioned in Dependency.
- build elasticsearch as mentioned in Build Elasticsearch
- type
redis-server
in terminal - open another terminal window, type:
cd elasticsearch-<version> ./bin/elasticsearch
python query.py
Build Elasticsearch
cp lib/name_syn.txt [your elasticsearch path]/config/name_syn.txt
cp lib/cat_syn.txt [your elasticsearch path]/config/cat_syn.txt
(It is not recommended, but if you really want to let your web application access a folder outside its deployment directory. You need to add permission in java.policy file. Details see http://stackoverflow.com/questions/10454037/java-security-accesscontrolexception-access-denied-java-io-filepermission)- open elasticsearch server:
cd elasticsearch-<version> ./bin/elasticsearch
- run
python ./lib/buildElaticSearch.py
- build time: 9s
- use another terminal to run
redis-server
Functionality
- We support baisc title and lyrics search for whatever you want!
- We support many filters, like duration, artist, genre, etc!
- You can find hotttest songs near your position!!!
Dependency
- hdf5
- Cython
- Flask
- PyTables
- elasticsearch_dsl
- elasticsearch
- json
- math
- redis
Corpus Source
- https://labrosa.ee.columbia.edu/millionsong/
- https://www.musixmatch.com/
Corpus format (1000 songs from Hdf5 file)
{“1”:{
“trackID”: string,
“title”: (song’s name) string,
“year”: int,
“song_hotttnesss”: float,
“artistName”: string,
“artistID”: string,
“artist_hotttnesss”: float,
“artist_location”:String,
“duration”: (seconds) int,
“release”: (album name) string,
“similar_artists”: a list of (artistID) string,
“lyrics”: string,
“artist_longitude”: float,
“artist_latitude”: float,
“artist_location”: String,
“danceability”: float
},
“2”:{
},
…
}
Test Set and sample queries
We have a test corpus sample_corpus.json
. To build elasticsearch with this corpus, call build method with the path of sample corpus as parameter: build("sample_corpus.json")
search by query
- simple query: {‘description’: u’love’}}
- advanced search query: d_query = {‘title’: u’’, ‘lyric’: u’’, ‘album’: u’’, ‘max_longitude’: u’’, ‘min_longitude’: u’’, ‘description’: u’love’, ‘max_duration’: u’’,’min_duration’: u’’, ‘artist_name’: u’’, ‘min_latitude’: u’’, ‘max_latitude’: u’’, ‘year’: u’’, ‘genre’: u’’, ‘artist_location’: u’’}
search by track
The parameter is the track_id
sort
use ‘hot’ or ‘dance’ as parameter, the results will sort by song_hotttnesss or danceability.
How to search
search({‘description’: u’love’},’hot’) search(d_query,’hot’) search_track(1,’dance’)
Modules
query.py
- Main entry or runtime app of the Music Information Retrieval System.
- Integrate all the models and handle all the http request.
- Search Algorithm implementation.
- Session management.
static folder:
css file and images
templates folder:
- search box view
- search results view
music_corpus.json:
the corpus of 10000 songs, the format is described above.
lib/Track.py
This file defines a doc_tpye track and its field, as well as the analyzers
lib/buildElaticSearch.py
This file builds the elasticsearch index from the music_corpus.json
lib/Search.py
This file takes different types of query as input, builds elasticsearch search query, search in elastic and return the results. It is responsible of search on all fields and sort by specific features.
getCorpus.py:
get raw information from hdf5 file ==> raw_music_corpus.json
getlyrics.py:
get lyrics information according to train data.txt file ==> music_corpus.json
mxm.py:
get lyrics information and genre information using API to MXM website ==> new_music_corpus.json
cleanCorpus.py:
get desired attributes from the corpus