API Documentation

Base

This module contains a base class of optimizers. This module purpose is to describe the API that optimizers should follow.

class fluentopt.base.Optimizer

Optimizer base class

suggest()

Use the surrogate to suggest a next input to evaluate

Returns:a dict, a list or a scalar
update(x, y)

Update the surrogate used by the optimizer using a single evaluation.

Parameters:x: dict, or list or scalar
update_many(xlist, ylist)

Update the surrogate used by the optimizer using a list of evaluations.

Parameters:xlist : list of dicts, or list of lists or list of scalars

Optimizers

class fluentopt.random.RandomSearch(sampler, random_state=None)

a random search optimizer. This optimizer is completely random, it does not use any surrogate model. It uses a sampler, which have to be a callable (e.g function) that takes a random state (integer) as input and returns a random sample. The suggest method just calls sampler each time.

Parameters:

sampler : callable

a callable used to sample an input for further evaluation. it takes one argument, a random number generator following the API of numpy.random and returns a dict, a list or a scalar.

random_state : int or None, optional

controls the random seed used by sampler.

Attributes

input_history_ (list of inputs evaluated) output_history_ : outputs corresponding to the evaluated inputs
update(x, y)

Update the surrogate used by the optimizer using a single evaluation.

Parameters:x: dict, or list or scalar
class fluentopt.bandit.Bandit(sampler, model=<fluentopt.transformers.Wrapper object>, nb_suggestions=100, score=<function ucb_maximize>, random_state=None)

a bandit based optimizer which uses a surrogate to model the mapping between inputs and outputs. Each time suggest is called, a total of nb_suggestions inputs are sampled from sampler. A score is then calculated for each sampled input, then the next input to evaluate is the one which has the maximum score (reward).

Parameters:

sampler : callable

a callable used to sample an input for further evaluation. it takes one argument, a random number generator following the API of numpy.random and returns a dict, a list or a scalar.

model : scikit-learn like model instance, optional

default is fluentopt.transformers.Wrapper(GaussianProcessRegressor(normalize_y=True)). Alternatives :

  • fluentopt.transformers.Wrapper(fluentopt.utils.RandomForestRegressorWithUncertainty())
  • or use another model which supports returning uncertainty in prediction:
    fluentopt.transformers.Wrapper(your_model())
  • you can also extend or change the Wrapper, the goal of the wrapper is to feed a vectorized input to the wrapped model.

nb_suggestions : int, optional[default=100]

number of random samples to draw from the sampler in each call of suggest to select the next input to evaluate.

score : callable, optional[default=ucb_maximize]

score function to use when selecting the next input to evaluate. it takes two arguments, a model and a list of inputs. it returns a list of scores. Available scores are : ucb_maximize, ucb_minimize.

random_state : int or None, optional

controls the random seed used by sampler.

Attributes

input_history_ (list of inputs evaluated) output_history_: outputs corresponding to the evaluated inputs
get_scores(inputs)

use score to get the list of scores of the inputs

update(x, y)

Update the surrogate used by the optimizer using a single evaluation.

Parameters:x: dict, or list or scalar

Scores

fluentopt.bandit.ucb_maximize(model, inputs, kappa=1.96)

UCB score that can be used as the score parameter of the Bandit optimizer. Use this score if the objective is maximization with ucb. UCB scores assume that the model can return std, that is, model.predict shoud accept a return_std parameter. An exception will be thrown if this is not the case.

Parameters:

model : scikit-learn like estimator with return_std

inputs : numpy array

kappa : float

controls the tradeoff between exploration and exploitation (higher value = more exploration)

fluentopt.bandit.ucb_minimize(model, inputs, kappa=1.96)

UCB score that can be used as the score parameter of the Bandit optimizer. Use this score if the objective is minimization. UCB scores assume that the model can return std, that is, model.predict shoud accept a return_std parameter. An exception will be thrown if this is not the case.

model : scikit-learn like estimator with return_std inputs : numpy array kappa : float

controls the tradeoff between exploration and exploitation (higher value = more exploration)

Transformers

this module contains transfomers that can vectorize data like dicts, lists of varying length.

class fluentopt.transformers.Wrapper(model, transform_X=<function vectorize>, transform_y=<function Wrapper.<lambda>>)

wraps a scikit-learn like estimator model to transform inputs and outputs using transform_X and transform_y. This is used to vectorize easily inputs that are passed to the model.

Parameters:

model : scikit-learn like estimator instance to wrap

transform_X : callable

used to transform the inputs before passing them to fit and predict

transform_y : callable

used to transform the outputs before passing them to fit

fluentopt.transformers.vectorize(X)
vectorizes X depending on its type:
  • if it is a list of dicts, use vectorize_list_of_dicts.
  • it it is a list of lists of varying length across examples, use vectorize_list_of_varying_length_lists.
  • if it is a list of fixed length lists or list of scalars, just convert to numpy array.
Parameters:`X` : a list of dicts or a list of varying length lists or a list of fixed length lists or list of scalars.
Returns:2D numpy array.
fluentopt.transformers.vectorize_list_of_dicts(dlist)

vectorize a list of dicts all columns are considered. rows that have missing columns will be replaced by np.nan.

Parameters:dlist : list of dicts
Returns:2D numpy array

Utils

This module contains utility functions used by other modules. It contains mostly validation functions to check for validity of the parameters that a function or a class gets as an input.

fluentopt.utils.check_sampler(sampler)

check whether sampler is a callable

fluentopt.utils.flatten_dict(D)

converts a deep dict D into a flattened version d. it uses a recursive algo:

  • start with an empty d
  • iterate through the keys and values of D:
    • if the current value is a dict then update
      d with the flattened version of the value by calling flatten_dict on the value
    • if the current value is a list or a tuple add all elements
      of the list with keys key_i where i is the index of the element of the list and values the value of the list at index i.
    • else just copy the key and value of D into d
fluentopt.utils.dict_vectorizer(dlist, colnames, missing=nan)

Converts a list of dicts into a numpy array. the i-th dimension of the vector will correspond to colnames[i]. if a column does not exist in a dict from dlist, it takes the value defined by missing.

Parameters:

dlist : list of dicts

colnames : list of strings

list of columns to use. the order of the columns in the resulting numpy will correspond to the order in colnames.

missing : scalar

the value to use for missing columns.

Returns:

2D numpy array

class fluentopt.utils.RandomForestRegressorWithUncertainty(n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False)

an extension of RandomForestRegressor with support of returning uncertainty. it just takes the trees and compute the std of the predicted values for each tree.

apply(X)

Apply trees in the forest to X, return leaf indices.

Parameters:

X : array-like or sparse matrix, shape = [n_samples, n_features]

The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.

Returns:

X_leaves : array_like, shape = [n_samples, n_estimators]

For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.

decision_path(X)

Return the decision path in the forest

New in version 0.18.

Parameters:

X : array-like or sparse matrix, shape = [n_samples, n_features]

The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.

Returns:

indicator : sparse csr array, shape = [n_samples, n_nodes]

Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes.

n_nodes_ptr : array of size (n_estimators + 1, )

The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.

feature_importances_
Return the feature importances (the higher, the more important the
feature).
Returns:feature_importances_ : array, shape = [n_features]
fit(X, y, sample_weight=None)

Build a forest of trees from the training set (X, y).

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csc_matrix.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The target values (class labels in classification, real numbers in regression).

sample_weight : array-like, shape = [n_samples] or None

Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.

Returns:

self : object

Returns self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

R^2 of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
transform(*args, **kwargs)

DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19. Use SelectFromModel instead.

Reduce X to its most important features.

Uses coef_ or feature_importances_ to determine the most important features. For models with a coef_ for each class, the absolute sum over the classes is used.
Parameters:

X : array or scipy sparse matrix of shape [n_samples, n_features]

The input samples.

threshold : string, float or None, optional (default=None)

The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if available, the object attribute threshold is used. Otherwise, “mean” is used by default.

Returns:

X_r : array of shape [n_samples, n_selected_features]

The input samples with only the selected features.