We've all been there, you want a well-behaved AI but it just won't predict accurately. You may be tempted to use a cost function that will discipline your network in proportion to their errors but did you know that this may not be the best approach?
Okay, actual AI time. Last night I was investigating network outputs because that's what I do with my time apparently and I realized an interesting detail in the data. For a few days now the networks have been oscillating a good deal between 37% and 55% directional accuracy. Now ignoring the fact the 37% accuracy is odd because that means it's actually quite a good predictor at what the stock isn't going to do, I wanted to take a closer look at why this was happening. The program should theoretically detect this loss of accuracy and make immediate changes to the input weights to put the network back on track but it was frequently coming back to this poor accuracy. Obviously, that's no good. So I took a look at these 37% graphs and found out that while it was true, on average they were not very good predictors, they had a real skill at predicting large changes. This is very cool, but still not super helpful because for someone using the algorithm. Even though it may be accurate for those high-profit or high-loss cases, that doesn't mean somebody looking at the graphs will no if today's prediction is finally going to be a accurate one or if its another bad one.
This made me realize a very interesting aspect about AI. Training truly does need to be specifically tailored to the AI. For stock market data, most of the days for the stock are pretty boring +1%, -2% and so on, but every once in a while you'll see something far bigger like a +10% and with the MSE (mean-squared error) cost function we use that means that the AI will be punished fair more harshly for missing a big change than a small change. This culminates in a disparity between a "well-trained" network and an "accurate" network. This is a little bit like (as the name of this post suggests) using the cane on a child as a punishment for big transgressions. This will definitely communicate to the child that what they did was bad (or at least had a bad consequence for them) but doesn't necessarily mean that for smaller transgressions they will actually learn their lesson. For the record I do not condone using the cane or any form of physical violence against children, I'm just using it as a metaphor for AI. But much like using the cane on a child, the network learns to predict those big changes accurately because when it doesn't, it has a large associated cost function value. This means though that for the smaller day-to-day changes of a stock, the network really doesn't care all that much because those errors have far less of an impact.
So how do you deal with this? Do you give your AI a lecture (my parents' personal favorite)? Take away its phone? So far I have thought up three different methods, z scores, logarithms and categorical data. These are probably methods that apply more to AI than parenting but hey, I'm not a parent so what do I know.
Z Scores:
Z scores are explicitly the number of standard deviations a data point is from the mean. In simple terms they tell you how far away a data point is from everything else. In the code I use, I already compute the z scores for everything so that I can dump bad glitchy data because my data provider is interesting... Simply put, you can narrow the acceptable z score window so that for days that are +10%, the network just straight up doesn't see them. This technically works but it means that on days where the stock does go nuts, your AI won't know what to do and you'll also have to make sure that as those days come and go, you keep filtering them out.
Logarithms:
Logarithms are awesome. They never cease to be helpful and in this case if you fit your data through a logarithmic (or exponential) function, you can basically stretch it. This means that with the right function you can make little changes seem proportionally more important and big changes proportionally less important. This means the network now has to care about everything instead of just the wild changes. This method is certainly the best combination of reliability and easiness to implement
Categorical Data:
The third option is to actually change the data itself such that the network output is different. This would make it a 1 unit output to a 2 unit output where each cell represented a movement up likelihood or a movement down likelihood. This approach is an interesting one and I have experimented with it loosely in the past but it sacrifices a lot of information by reducing data down to such a simple model. This can make predictions worse but to be honest, is still so "out there" that I don't know much about it.
Overall, I have opted to go with the logarithm model because it maintains all the data while taking extra care to highlight the smaller changes and not just the big ones. We shall see how this goes and if that can give a boost in prediction accuracy!