# Developers guide

The developers guide summarizes information on how to extend the library. It should not be necessary to go through it when just intending to use the provided functionality.

### The binding generator

While working on the library, it became increasingly difficult to keep all interfaces coherent by manual updates. This is, where the binding generator comes into play. Even though we looked at, e.g., SWIG, I was not satisfied with the possibilities. Especially, the overhead for exporting a highly templated interface is high, the MATLAB wrapper is not mature (or not even there), and exporting the Python documentation is not easy.

The code generator is currently in a separate project on github. Clone it into the library folder, so that the folder is fertilized-forests/fertilized-devtools. It will automatically ignored by git.

After making any change on the library interface, simply cd into fertilized-devtools/binding_generator and run python generate.py to invoke the generator and provide you with up-to-date interfaces to all languages.

You can find a rough outline of how the program works here:

1. We use the CppHeaderParser Python module to extract class information from all header files in the fertilized folder (with some exceptions, which can be found in CodeGenerator/ParseHeader.py.
2. The parsed information is still erroneous. There are some mechanisms in the module that correct known parsing errors and enrich the parsed structure with library information.
3. The module parses the doxygen class information to extract the information in what interfaces a class/method should be available and under which name and, if it is templated, with which instantiations (for Matlab and Python).
4. Some checks are run on the class graph. They ensure on the one hand, that classes used by other classes have all the necessary instantiations. Another important check ensures, that if a class is serializable, all used classes are serializable as well.
5. The interfaces are generated. This is the Python and Matlab interface, and the C++ Soil object as well.

#### Classes

When writing a new class, start out by copying "objecttemplate.h" in the 'fertilized' folder to its intendend location and renaming it to (all lower case) 'yourobjectname.h'. You can go through the file and replace all necessary fields. They are all documented and give you a valid and easy-to-extend stub. The comments should give you a complete walkthrough to implement your own class.

A note on abstract classes and interfaces: they must be exported in C++ and Python if any subclass is exported to these two languages or if any subclass is serializable. It does not make sense to mark them as being available in Matlab, because they can not be instantiated anyway.

#### Functions

Functions must be commented and annotated similar to a class, with a few differences:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  /** * This is an example function annotation. Start describing the * function similar to a class. Then the annotation follows, * starting with 'Available in:'. Optionally use the "Exported name:" * line to specify a different function name in the interfaces. This * makes using overloaded functions possible in Python and Matlab. * * ----- * Available in: * - C++ * - Python * - Matlab * . * Exported name: FunctionNameInInterface * * ----- * * \param Start your parameter list below the annotation section: * otherwise the doxygen generated documentation looks weird. */ 

For all this, you can find many examples in the library, since all functions are annotated. Since the project was growing dynamically, the code for the binding_generator is not well-structured and sometimes difficult to understand. If there are problems with the Generator and you can't figure them out, please write a bug report on Github and we'll have a look into it.

To run the binding generator, simply run python generate.py in the fertilized-devtools/binding_generator subfolder.

### Write a test

All new functionality needs testing! The library provides several testing facilities that you can use: the 'tests' folders contains C++ Boost.Test tests, and the examples in the 'examples' folder are all run on a regular basis as continuous integration tests (except the MATLAB ones, lacking a MATLAB installation on the test server).

Adding a test for the Boost.Test project is as simple as copying the 'test_template.cpp' file in the tests folder, renaming it to your desired name and adding your test functions. Any .cpp file in the folder is automatically compiled and added to the test suite.

If you want to test the high-level functionality as exposed by the interfaces, go to examples/{LANG} and create an appropriate program to test it. Your example must terminate with value 0 if successful, and non-zero otherwise. Register your example program in examples/CMakeLists.txt.

### An example: adding a new function to the library

In this section, I will explain how to add a new function to an existing library object. In particular, I will explain how the function compute_feature_importances was added to the library.

1. Identify the position on where to add the function.

How general is it going to be? In case of the compute_feature_importances function, I decided to have it available for all IDeciders, so I added it to the interface and all implementing classes, in this case only the ThresholdDecider. If you are using the Array class, include the file ndarray.h in the fertilized directory.

The signature in this case was:

 1 2 virtual Array compute_feature_importances() const VIRTUAL((Array)); 

You can, but don't have to, use the macro VIRTUAL({return type}) for virtual functions, as it is done throughout the library. If the return type is a pointer, use VIRTUAL_PTR.

2. In what interfaces should it be available?

You can freely choose for each instantiation of the function in which interfaces it should be available. I selected all interfaces for this one. The binding_generator automatically generates wrappers for all basic C++ types, std::vectors of such types, and Array<dtype, dimensions, row_major_contiguous_dimensions>s (for a more detailed explanation of the Array class, see the API documentation; the two last dimension parameters should always be equal, indicating a row major array, and specify the number of array dimensions). In C++ and Python, you can also return std::vectors of library objects. This is, however, not yet supported for MATLAB.

This is my doxygen comment for the function for the ThresholdDecider:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /** * \brief Computes a feature importance vector. * * The vector is normalized to sum to 1.0. It contains the relative * frequencies of the feature occurences. Its length is the number * of available features. * * ----- * Available in: * - C++ * - Python * - Matlab * . * * ----- */ 

After adding a respective function to all necessary objects, run python generate.py followed by a compilation. You should already be able to use the function from all interfaces.

• add an example program to the examples project and use it as test.
For this example, I decided to use the Boost.Test framework. For this, I just copied the test_template.cpp in the folder tests and renamed the copy to compute_feature_importances.cpp. The template provides you with a near-complete stub for any library test. Any .cpp file in the fertilized_tests folder will automatically be linked to the test project.
Pull requests are very welcome! If you tested your functionality and find it useful for others (or you want to be able to pull in mainline functionality to keep your project up-to-date), simply create your own fork of the project on github, do a git push and send a pull request over the github homepage.