Title |
freedbdatabase loader (music CD database)
Development of a system to load information from over 1.6 million music
CDs (including artist and track titles) into a database. Data obtained
from
|
|
|
Previous System / Environment |
None. |
|
|
Requirements |
In-house Extract Transform Load (ETL) training exercise
- To extract music CD disc and track information from a compressed freedbdatabase
format archive.
- To store the extracted information in a database.
- To create a program to extract information from the database in response
to queries from CDDBaware CD player programs on PCs.
|
|
|
New System Diagram |
|
|
|
Technologies / Components
Used |
- Software: Java programming language
core packages, JDBC
- Software: Jakarta Apache sandbox library
"commons-compress"
- Software: MySQL Database
- Software: Eclipse 3.0 Integrated Development
Environment (IDE)
- Hardware / Operating System: PC, Linux
Fedora Core 3
|
|
|
Implementation Summary |
Background
- Many people around the world listen to music CDs whilst using their
PCs.
- Most PC CD player programs can display information about the disc
currently being played (such as the artist and track name) by using
databases hosted by internet sites such as http://www.freedb.org.
- The freedb.org website permit
compressed archives of their CD databases to be downloaded.
Project details
- We downloaded a compressed freedbdatabase format archive from
http://www.freedb.org.
- The archive size was 366Mb when compressed, 2.7Gb uncompressed, 6.3Gb
totally unpacked.
- The archive contained information for over 1.6 million CDs. The information
for each CD is stored in a single file, using the "freedb file"
format (file format definition available from the "download / misc"
section of http://www.freedb.org)
- We created a component to extract freedb files from a compressed freedbdatabase
format archive.
- We created a component to transform disc and track information from
freedb files ready for loading into a database.
- We created a component to load extracted disc and track information
into a database.
- We updated the business logic of the transformation component to cope
with errors that occured when processing the downloaded archive. These
errors were caused by files not conforming to the freedb file format,
for example missing keywords in file headers, header lines joined together.
- We created a component to extract information from the database in
response to queries from CDDBaware CD player programs (server
listening for CDDB Protocol requests on the HTTP port).
|
|
|
Post Installation Notes |
None. |