For the eNoteHistory project, the usage of already existing data mining tools is too complex. Therefore, we implemented data mining techniques for writer identification using UDFs and integrated them into the DB2 database. UDFs can be distinguished by their return value, which can be either a scalar function, a column function or a table function. We implemented a UDF called A_FV written as a table function. A_FV calculates for a certain query feature vector the nearest feature vectors whose distance is under a given threshold and returns an ordered list of corresponding writers.
For this implementation, the UDFs class is used by the database, and the UDFapp class has the ability to be instantiated from the command line. Both classes have the UDF method, A_FV, with similar functionality. For implementation of the algorithms and structures, Java 1.4 was used. All Java classes are included in a Jar-file, called enh, in the package "de.enotehistory". The connection to the database occurs via the JDBC interface.
The following graphic shows a UML diagram in which the detailed program steps of the UDF are represented.