Case Study - freedb–database loader (music CD database)


Case Studies:

Back to case study index


Title

freedb–database loader (music CD database)

Development of a system to load information from over 1.6 million music CDs (including artist and track titles) into a database. Data obtained from

   
Previous System / Environment None.
   
Requirements

In-house Extract Transform Load (ETL) training exercise

  • To extract music CD disc and track information from a compressed freedb–database format archive.
  • To store the extracted information in a database.
  • To create a program to extract information from the database in response to queries from CDDB–aware CD player programs on PCs.
   
New System Diagram
   
Technologies / Components Used
  • Software: Java programming language – core packages, JDBC
  • Software: Jakarta Apache sandbox library "commons-compress"
  • Software: MySQL Database
  • Software: Eclipse 3.0 Integrated Development Environment (IDE)
  • Hardware / Operating System: PC, Linux – Fedora Core 3
   
Implementation Summary

Background

  • Many people around the world listen to music CDs whilst using their PCs.
  • Most PC CD player programs can display information about the disc currently being played (such as the artist and track name) by using databases hosted by internet sites such as http://www.freedb.org.
  • The freedb.org website permit compressed archives of their CD databases to be downloaded.

Project details

  • We downloaded a compressed freedb–database format archive from http://www.freedb.org.
  • The archive size was 366Mb when compressed, 2.7Gb uncompressed, 6.3Gb totally unpacked.
  • The archive contained information for over 1.6 million CDs. The information for each CD is stored in a single file, using the "freedb file" format (file format definition available from the "download / misc" section of http://www.freedb.org)
  • We created a component to extract freedb files from a compressed freedb–database format archive.
  • We created a component to transform disc and track information from freedb files ready for loading into a database.
  • We created a component to load extracted disc and track information into a database.
  • We updated the business logic of the transformation component to cope with errors that occured when processing the downloaded archive. These errors were caused by files not conforming to the freedb file format, for example missing keywords in file headers, header lines joined together.
  • We created a component to extract information from the database in response to queries from CDDB–aware CD player programs (server listening for CDDB Protocol requests on the HTTP port).
   
Post Installation Notes None.

 

Top