Data Sources


Bibliographic and Source Description Metadata

The music scores are registered in a library catalog. Corresponding metadata such as bibliographic notes and identifiers, as well as source description data, accompany each entry in the catalog. Previously, the catalog was available only as LaTex files, thus, efficient access to the content was not possible. The organization of the library catalog has been considered for the implementation of the database structure for the storage and retrieval of the metadata. Therefore structured and textual unstructured data have been made accessible using mechanisms for standard relational database queries and full-text search queries respectively. The experimental online user interface currently provides search and navigation possibilities.

Images of Scanned Music Scores

The historical music scores were scanned and their digital representations were stored in the database as unstructured binary data. Two different versions of the images have been stored. One high quality representation is used mainly by the image processing algorithms for extracting visual content characteristics. Additionally one low quality representation contains a compressed version of the image in its original size used mainly for user navigation and viewing. The images are inserted into the database structure so that the dependencies between the scanned score and the catalog metadata are preserved. Currently about 100 music scores have been scanned to provide test material for storage, analysis and retrieval.

Data Import

The bibliographical and source description metadata are automatically extracted from the library catalog of music manuscripts and imported into the object-relational database management system (ORDBMS) from IBM - UDB DB2 V.8.1 together with the corresponding digital images.

An XML structure has been used as an intermediary format to map the LaTex data onto the database Sample XML file . This mapping has allowed preservation of the important formatting found in the LaTex descriptions in the catalog as tags, which are consequently recorded in the unstructured text fields of the database.

The Feature Base data is also automatically extracted from a hierarchical representation of the features in HTML files and imported into the database. The handwriting feature vectors are specified during the analysis of the music scores using the Feature Base.


Classification of Handwritten Music Scores

Afterwards the features from 150 music scores were extracted using the feature base in order to put the resulting feature vectors into the database. Classes with simliar hand writing features, which represent exactly one writer, were generated. Now new unknown feature vectors can be classified by assigning them to one of these classes. Detailed information can be found under Classification of Handwriting Vectors (Writer Identification).