Integrating your ML model with PyBroker

Evaluating model signals with a professional backtesting tool.

Apr 23, 2023

Introduction

Are you tired of using traditional trading strategies that don't always deliver consistent results? If so, you're not alone. Many traders are turning to machine learning signals to improve their trading performance. In this blog post, I'll describe how I integrated PyBroker with machine learning signals to achieve realistic backtesting performance. By the end of this post, you'll have a good understanding of how you can use PyBroker and machine learning signals to achieve similar results.

Background

Trading is a challenging task that requires a deep understanding of market dynamics and the ability to make quick decisions under pressure. Traditional trading strategies typically rely on technical and fundamental analysis to predict market trends. While these strategies can be effective in certain situations, they often fail to deliver consistent results over the long term.

Machine learning signals, on the other hand, use algorithms to analyze large datasets and identify patterns that can be used to make trading decisions. Machine learning signals can help traders to identify profitable trades more accurately and with greater speed than traditional methods. Common machine learning techniques used in trading include regression, classification, and clustering.

PyBroker is a Python package that can be used to thoroughly backtest strategies; as per official documentation, the key features of PyBroker are:

A super-fast backtesting engine built in NumPy and accelerated with Numba.
The ability to create and execute trading rules and models across multiple instruments with ease.
Access to historical data from Alpaca and Yahoo Finance, or from your own data provider.
The option to train and backtest models using Walkforward Analysis, which simulates how the strategy would perform during actual trading.
More reliable trading metrics that use randomized bootstrapping to provide more accurate results.
Caching of downloaded data, indicators, and models to speed up your development process.
Parallelized computations that enable faster performance.

Integration with Machine Learning Signals

I integrated PyBroker with machine learning signals to achieve a better understanding of my trading performance based on the generated signals from the ML model. To do this, I used historical data to train machine learning models to predict market trends. I then used the predictions generated by the models to backtest my strategy through PyBroker. The integration required careful data preprocessing and feature engineering to ensure that the machine learning models could make accurate predictions.

One of the main challenges I faced was ensuring that the predictions generated by the machine learning models were accurate and reliable. I had to fine-tune the models and test them extensively before using them in a live trading environment.

I will not really cover the topic of fine-tuning here, as it is sufficiently extensive to become a new Substack entry. For now, let’s just go ahead and integrate your predictions with PyBroker. At this point I assume you already have a set of predictions from your model and are just curious how to integrate them with PyBroker.

According to the official PyBroker documentation, your final data should look as follows:

where pred (or whatever name you choose) stores your model’s predictions. The data frame shown above is a sliced example, as the original one I have stores about 700 stocks per each week. This data frame ensures you can now use PyBroker to test the strategy of your choice which relies on the values of pred. Note, all below must be performed on a holdout set!

To do so, we first need to set up the strategy function:

from pybroker import StrategyConfig, Strategy
import pybroker

def pybroker_strategy(ctx):
    if not ctx.long_pos():
        if ctx.pred[-1] >= strategy_params['pred_threshold']:
            ctx.buy_shares = 100
            # ctx.hold_bars = None
            ctx.buy_fill_price = pybroker.common.PriceType.OPEN
    else:
        if ctx.pred[-1] < strategy_params['pred_threshold']:
            ctx.sell_fill_price = pybroker.common.PriceType.OPEN
            ctx.sell_all_shares()

where strategy_params['pred_threshold'] is the prediction threshold you have either chosen or optimized, e.g., 0.5. The pybroker_strategy is a long-only strategy in which you buy an asset when the prediction is above some threshold and sell when it is below. I trade at the day’s open prices, thus, e.g., ctx.buy_fill_price is set to pybroker.common.PriceType.OPEN (by default, the algo picks up a middle price). ctx.buy_shares = 100 is just a random value necessary for the next step; if you want to trade x shares only, then disregard the next paragraph and set ctx.buy_shares = x.

Optional paragraph: another critical thing are the weights the selected assets are traded. PyBroker does not really show how to do it properly, so grab this function below which allows you to choose between mean and softmax weights:

import numpy as np
from scipy.special import softmax

def pos_size_handler(ctx):
    def get_softmax(sgl):
        p = sgl.bar_data.pred[-1]
        list_preds = [s.bar_data.pred[-1] for s in signals]
        softmax_w = softmax(list_preds)
        p_index = list_preds.index(p)
        return softmax_w[p_index]

    signals = tuple(ctx.signals("buy"))
    if not signals:
        return
    if strategy_params['allocation_weights'] == 'softmax':
        for signal in signals:
            weight = get_softmax(signal)
            dollar_weight = np.round(weight * float(ctx.total_equity), 2)
            shares = float(dollar_weight / signal.bar_data.close[-1])
            ctx.set_shares(signal, shares)
    else:
        mean_weight = 1 / len(signals)
        for signal in signals:
            size = mean_weight
            dollar_weight = np.round(size * float(ctx.total_equity), 2)
            shares = float(dollar_weight / signal.bar_data.close[-1])
            ctx.set_shares(signal, shares)

The function above sets shares according to the available cash and chosen weights (in strategy_params['allocation_weights']). Softmax weights follow the signal (pred) strength and allocate higher weight per higher value; mean weights is the classic mean-weight allocation.

Let’s backtest the pybroker_strategy now. To begin with, we must register the column pred, so that PyBroker would recognize it. This is where a bunch of other backtesting packages fail, as they only work with pre-defined columns and the moment you want to add something new, the moment even ChatGPT won’t help you (I spent 2 days with ChatGPT to figure it out in Backtrader for instance, we both failed). All you have to do is to run this line:

pybroker.register_columns('pred')

and you are done. Next, just run the following line for caching data:

pybroker.enable_data_source_cache('my_neat_strategy_cache')

and begin configuring your strategy. The code below is just an example of how I have done it:

config = StrategyConfig(max_long_positions=add_params['number_stocks_to_hold'], 
fee_amount=costs, fee_mode=pybroker.common.FeeMode.ORDER_PERCENT, buy_delay=1, sell_delay=1, exit_on_last_bar=True, initial_cash=init_cash, enable_fractional_shares=True, bootstrap_sample_size=bootstrap)

where max_long_positions=add_params['number_stocks_to_hold'] is the number of stocks you want to hold (can be also None), fee_amount=costs are your % costs per trade, e.g., 0.002 (so 20 basis points), fee_mode=pybroker.common.FeeMode.ORDER_PERCENT ensures the costs are calculated as per order percents (and not some static amount, for instance), buy_delay=1, sell_delay=1 are the number of periods after which the trade is executed (so, if you have a signal today and trade at the open, you will trade at tomorrow’s open), exit_on_last_bar=True closes the open trade at the last available bar to include its performance in backtesting results, initial_cash=init_cash is your initial wealth, enable_fractional_shares=True allows to buy fractional shares, and finally bootstrap_sample_size=bootstrap are the bootstrapping sample over which a bootstrapped Sharpe is calculated (see here to study why a bootstrapped SR is recommended); as a rule of thumb, I just picked about 30% of the average stock sample, as follows

bootstrap = int(data_backtest.groupby('symbol').apply(len).mean() * 0.3)

where data_backtest is the final data mentioned in the beginning of this section.

Next, let’s prepare the Strategy class:

start_date = data_backtest['date'].iloc[0].strftime("%Y-%m-%d")
end_date = data_backtest['date'].iloc[-1].strftime("%Y-%m-%d")
symbols = data_backtest['symbol'].unique().tolist()
strategy = Strategy(data_backtest, start_date, end_date, config=config)

add pybroker_strategy and pos_size_handler:

strategy.add_execution(pybroker_strategy, symbols)
strategy.set_pos_size_handler(pos_size_handler)

and finally run the strategy!

result = strategy.backtest(calc_bootstrap=True)

After the run is done, fetch the important data as follows:

result_df = result.metrics_df
trades = result.trades
conf_intervals = result.bootstrap.conf_intervals

and analyze the results as you please. For instance, in the case of the strategy above, the conf_intervals look as below:

Thanks to the bootstrapped analysis, it can be seen that we can only be 90% confident that the SR is positive, while the moment the interval is increased to 95% or 97.5%, we cannot be so certain anymore. Generally, you should look at the lower interval to see if the strategy is reliable or not; both negative SR and a profit factor below 1 suggests that it may not be the case after all.

See the full code used in this section:

import pandas as pd
import numpy as np
from pybroker import StrategyConfig, Strategy
import pybroker
from scipy.special import softmax


def pybroker_strategy(ctx):
    if not ctx.long_pos():
        if ctx.pred[-1] >= strategy_params['pred_threshold']:
            ctx.buy_shares = 100
            # ctx.hold_bars = None
            ctx.buy_fill_price = pybroker.common.PriceType.OPEN
    else:
        if ctx.pred[-1] < strategy_params['pred_threshold']:
            ctx.sell_fill_price = pybroker.common.PriceType.OPEN
            ctx.sell_all_shares()


def pos_size_handler(ctx):
    def get_softmax(sgl):
        p = sgl.bar_data.pred[-1]
        list_preds = [s.bar_data.pred[-1] for s in signals]
        softmax_w = softmax(list_preds)
        p_index = list_preds.index(p)
        return softmax_w[p_index]

    signals = tuple(ctx.signals("buy"))
    if not signals:
        return
    if strategy_params['allocation_weights'] == 'softmax':
        for signal in signals:
            weight = get_softmax(signal)
            dollar_weight = np.round(weight * float(ctx.total_equity), 2)
            shares = float(dollar_weight / signal.bar_data.close[-1])
            ctx.set_shares(signal, shares)
    else:
        mean_weight = 1 / len(signals)
        for signal in signals:
            size = mean_weight
            dollar_weight = np.round(size * float(ctx.total_equity), 2)
            shares = float(dollar_weight / signal.bar_data.close[-1])
            ctx.set_shares(signal, shares)

data_backtest = pd.read_parquet('your_final_data.parquet')

bootstrap = int(data_backtest.groupby('symbol').apply(len).mean() * 0.3)
pybroker.register_columns('pred')
pybroker.enable_data_source_cache('ranking_and_pos_sizing_test')

config = StrategyConfig(max_long_positions=strategy_params['number_stocks_to_hold'], fee_amount=costs, fee_mode=pybroker.common.FeeMode.ORDER_PERCENT, buy_delay=1, sell_delay=1, exit_on_last_bar=True, initial_cash=init_cash, enable_fractional_shares=True, bootstrap_sample_size=bootstrap)

start_date = data_backtest['date'].iloc[0].strftime("%Y-%m-%d")
end_date = data_backtest['date'].iloc[-1].strftime("%Y-%m-%d")
symbols = data_backtest['symbol'].unique().tolist()

strategy = Strategy(data_backtest, start_date, end_date, config=config)
strategy.add_execution(pybroker_strategy, symbols)
strategy.set_pos_size_handler(pos_size_handler)
result = strategy.backtest(calc_bootstrap=True)

result_df = result.metrics_df
trades = result.trades
conf_intervals = result.bootstrap.conf_intervals

Conclusion

Integrating PyBroker with machine learning signals is an effective way to improve trading performance via thoroughly backtesting your signal strength. PyBroker allows to neatly integrate machine learning predictions into its codebase, which ultimately delivers robust and realistic statistics on the strategy of choice.

Call to Action

Feel free to leave a comment or ask a question if you have any doubts. Don’t forget to subscribe to stay on top of similar posts. Happy trading!

Piotr’s Substack

Discussion about this post