Day: January 18, 2021

  • RIVER

    RIVER

    River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River’s ambition is to be the go-to library for doing machine learning on streaming data.

    Paper

    Github

    As a quick example, we’ll train a logistic regression to classify the website phishing dataset. Here’s a look at the first observation in the dataset.

    >>> from pprint import pprint
    >>> from river import datasets
    
    >>> dataset = datasets.Phishing()
    
    >>> for x, y in dataset:
    ...     pprint(x)
    ...     print(y)
    ...     break
    {'age_of_domain': 1,
     'anchor_from_other_domain': 0.0,
     'empty_server_form_handler': 0.0,
     'https': 0.0,
     'ip_in_url': 1,
     'is_popular': 0.5,
     'long_url': 1.0,
     'popup_window': 0.0,
     'request_from_other_domain': 0.0}
    True

    Now let’s run the model on the dataset in a streaming fashion. We sequentially interleave predictions and model updates. Meanwhile, we update a performance metric to see how well the model is doing.

    >>> from river import compose
    >>> from river import linear_model
    >>> from river import metrics
    >>> from river import preprocessing
    
    >>> model = compose.Pipeline(
    ...     preprocessing.StandardScaler(),
    ...     linear_model.LogisticRegression()
    ... )
    
    >>> metric = metrics.Accuracy()
    
    >>> for x, y in dataset:
    ...     y_pred = model.predict_one(x)      # make a prediction
    ...     metric = metric.update(y, y_pred)  # update the metric
    ...     model = model.learn_one(x, y)      # make the model learn
    
    >>> metric
    Accuracy: 89.20%