Expert Interview (Part 2): Tobi Bosede on Using Machine Learning to Predict Trading Volumes
At the Strata Data Conference in New York City in the fall, Paige Roberts of Syncsort had a chance to sit down with Tobi Bosede (@AniTobiB), Sr. Machine Learning Engineer.
In the first part of this blog, Bosede spoke on what goes into being a Machine Learning Engineer as well as some of the projects she is currently involved with. In this part, Bosede will dive into predicting trade volumes and the correlation between volume and volatility.
Roberts: Can you tell us something about what your presentation was on?
Bosede: It was on my master’s thesis which involved predicting trade volumes, and unrelated to my job. Well, predicting trade volumes was one goal, because regulators are interested in that kind of thing, and traders, of course, as well. Predicting trade volumes is useful in trade strategy, especially when you have all of this algorithmic trading going on. As a trader, you don’t want to move the market which basically means you don’t want to change the price too much. You want to trade small quantities, and knowing what quantity exactly would be a good amount to trade will be informed by someone, or by having that predicted knowledge.
The second part of my talk was about the relationship between trade volume and price volatility. There’s literature, and some past research, that shows that trade volume is correlated to price volatility, and it makes sense intuitively. If the price is very volatile, there is higher likelihood of people trading more because essentially, it’s more risky, right? And so, with the risk it’s also a more high rewards situation.
And then basically you get high volume trading which causes high volatility; high volatility causes high volume trading.
I’m more confident that the first part of your statement is true, than the second. In statistics, there is always the danger of assuming that correlation implies causation. I want to avoid making that mistake here. However, there is a synergy between volume and volatility and there’s an equilibrium, and essentially a lot of the analysis that I did validates not just past research but also ideas about market agents, which are essentially hedgers and speculators in their behavior especially based on information.
The reason I mentioned regulators is they’re also concerned about price volatility, so they might implement new regulations based on concerns about particular types of financial instruments and the impact they have on the economy. I’m specifically talking about regulators of futures markets because those are the trades that I was looking at, so maybe they might restrict trading activity if it alters the price too much. That’s why we’re interested in the relationship there.
I use what’s called penalized spline regression, which is a type of generalized additive model or GAM as a methodology on Spark for that analysis. I used a lot of data for the analysis, hundreds of millions of rows.
Spark was really useful in helping me to transform the data in a way that I needed for my statistical analysis or my visualizations. Spline regression is a little known methodology, but a spline is a real world tool in the sense that draftsmen and ship builders use really thin pieces of metal for drawing because they are very flexible. In the same way, mathematically, we use what are called splines which are basis functions to fit curves. We use it for trying to understand what function is underlying the data. It’s not ordinary linear regression, because we are using these lines to understand non-linear relationships that are unknown.
I don’t know if that was clear but basically the idea is that in a traditional linear model, all of our predictors are linear, whereas with the spline you can add nonlinear predictors to your model. And how do you figure out what that non-linear transformation is? You use splines.
Do you ever have the feedback loop? If you predict volatility, then you feed that back into the system, do you then affect the trade volume?
That’s not actually what I did, but you could do that. The only time you would want to use volatility as a predictor of trade volume is if there was a correlation.
So, what did you use to predict?
In the data I had there were fields like price, which were multiplied by 100 so it’s essentially a percent, I had the actual maturity, and the date the trade took place. I created some fields like time, day of week, and time and maturity. But in terms of number fields it wasn’t super high dimensional. The volatility I’m referring to is price volatility is actually a derived field, so I took the standard deviation across different hours, across different days to show how volatile trade prices were.
Okay, that makes sense.
Tune in for the final installment when Bosede goes into what it’s like as a woman and person of color in the tech world.
For more information on successfully leveraging machine learning, check out our white paper: Why Data Quality Is Essential for AI and Machine Learning Success