In keeping up with my fetish of blogging about my groceries, I went groceries shopping yesterday and was rather alarmed with a relatively large groceries receipt. It was more than twice what I usually spent per week, and the food I bought won’t even last me for a week. Alarmed, I went through my receipt, and found that there had been no errors. Then I started questioning myself, to see if what I know about my average weekly groceries spending were correct. I keep a close tab on my spendings, and so I tabulated them. This is the result (and my line of enquiry):
I know I pay for my groceries using only EFTPOS. I never pay for my groceries with cash. I only shop at Coles and I shop almost every week. These facts has facilitated a lot of conveniences for me (like allowing me to pull very accurate data off the bank’s site). So here’s how my average weekly groceries spendings (for the past 100 times I went to Coles) look like in graphs:

Not much is there? Nothing discernible. Its too messy. (Addendum: Read the graph from right to left. My EViews screwed up a little and so the time is arranged backwards)
Now let’s look at it from the point of view of distributions:

The bar on the far right? That’s tuesday’s shopping.
So, the next thing to ask is.. what affects my shopping habits (and by extension, my spending). Me being me, the first thing that jumps to mind is an endogenous answer (well, I would have jumped to an exogenous answer if I had exogenous data). So, one of the first things I did was to do a general trending check with a HP filter:

Who’s willing to bet that I spend more during winter? Anyway, after that, I did a regression test, since I had a nagging feeling that one week’s purchases will affect the next week’s. And this was the result:
| Dependent Variable: SPENT | ||||
| Method: Least Squares | ||||
| Date: 07/29/09 Time: 09:39 | ||||
| Sample (adjusted): 6 100 | ||||
| Included observations: 95 after adjustments
|
|
|||
| Variable
|
Coefficient
|
Std. Error
|
t-Statistic
|
Prob.
|
| C | 29.14656 | 7.661098 | 3.804489 | 0.0003 |
| SPENT(-5) | 0.041743 | 0.104764 | 0.398449 | 0.6913 |
| SPENT(-4) | -0.065367 | 0.109715 | -0.595787 | 0.5528 |
| SPENT(-3) | -0.014150 | 0.109858 | -0.128799 | 0.8978 |
| SPENT(-2) | 0.235955 | 0.106122 | 2.223438 | 0.0287 |
| SPENT(-1)
|
-0.014903
|
0.106144
|
-0.140400
|
0.8887
|
| R-squared | 0.054675 | Mean dependent var | 35.66537 | |
| Adjusted R-squared | 0.001566 | S.D. dependent var | 17.70590 | |
| S.E. of regression | 17.69203 | Akaike info criterion | 8.645180 | |
| Sum squared resid | 27857.69 | Schwarz criterion | 8.806478 | |
| Log likelihood | -404.6461 | Hannan-Quinn criter. | 8.710357 | |
| F-statistic | 1.029493 | Durbin-Watson stat | 2.006994 | |
| Prob(F-statistic)
|
0.405294
|
|
|
|
Nothing terribly interesting. Except that yesterday’s shopping was obviously affected by shopping a fortnight ago to a statistically significant point (98%). When a Wald coefficient test was done, Spent(-2) was also shown to be significant. But the important thing is the R² value. The adjusted R² being at 0.001566 meant there were no correlation at all. Here’s how the scatter plot look like:

Wow, looks pretty random, no? Even has random clusters. Hah, looks like I don’t have use Yuzoz afterall (btw, Yuzoz shut down
).
But wait… let’s not get ahead of ourselves. Run a unit root test to see if anything turns up. So, I run a Augmented Dicky-Fuller test and here be the results:
| Null Hypothesis: SPENT has a unit root | ||||
| Exogenous: Constant, Linear Trend | ||||
| Lag Length: 0 (Automatic based on SIC, MAXLAG=12)
|
||||
|
|
|
|
t-Statistic
|
Prob.*
|
| Augmented Dickey-Fuller test statistic | -10.33767 | 0.0000 | ||
| Test critical values: | 1% level | -4.053392 | ||
| 5% level | -3.455842 | |||
|
|
10% level
|
|
-3.153710
|
|
| *MacKinnon (1996) one-sided p-values. | ||||
| Augmented Dickey-Fuller Test Equation | ||||
| Dependent Variable: D(SPENT) | ||||
| Method: Least Squares | ||||
| Date: 07/29/09 Time: 10:04 | ||||
| Sample (adjusted): 2 100 | ||||
| Included observations: 99 after adjustments
|
|
|||
| Variable
|
Coefficient
|
Std. Error
|
t-Statistic
|
Prob.
|
| SPENT(-1) | -1.006491 | 0.097362 | -10.33767 | 0.0000 |
| C | 33.94476 | 4.986012 | 6.807998 | 0.0000 |
| @TREND(1)
|
0.043238
|
0.061618
|
0.701714
|
0.4846
|
| R-squared | 0.527568 | Mean dependent var | -0.478586 | |
| Adjusted R-squared | 0.517726 | S.D. dependent var | 25.22728 | |
| S.E. of regression | 17.51933 | Akaike info criterion | 8.594321 | |
| Sum squared resid | 29464.99 | Schwarz criterion | 8.672961 | |
| Log likelihood | -422.4189 | Hannan-Quinn criter. | 8.626139 | |
| F-statistic | 53.60193 | Durbin-Watson stat | 2.031067 | |
| Prob(F-statistic)
|
0.000000
|
|
|
|
Aaaand be set up for disappointment. The Dicky Fuller test statistic was -10… which in short meant, no, there was no unit root. It is stationary, and it should have some form of autocorrelation (i.e. its not random). But yet, when a proper autocorrelational test is done:
| Breusch-Godfrey Serial Correlation LM Test:
|
|
|||
| F-statistic | 0.132627 | Prob. F(2,94) | 0.8760 | |
| Obs*R-squared
|
0.275764
|
Prob. Chi-Square(2)
|
0.8712
|
|
| Test Equation: | ||||
| Dependent Variable: RESID | ||||
| Method: Least Squares | ||||
| Date: 07/29/09 Time: 10:19 | ||||
| Sample: 3 100 | ||||
| Included observations: 98 | ||||
| Presample missing value lagged residuals set to zero.
|
||||
| Variable
|
Coefficient
|
Std. Error
|
t-Statistic
|
Prob.
|
| C | 4.379644 | 9.744755 | 0.449436 | 0.6542 |
| SPENT(-2) | -0.120337 | 0.263558 | -0.456589 | 0.6490 |
| RESID(-1) | -0.016946 | 0.103238 | -0.164147 | 0.8700 |
| RESID(-2)
|
0.138355
|
0.282935
|
0.489001
|
0.6260
|
| R-squared | 0.002814 | Mean dependent var | 1.16E-15 | |
| Adjusted R-squared | -0.029011 | S.D. dependent var | 17.00190 | |
| S.E. of regression | 17.24676 | Akaike info criterion | 8.573086 | |
| Sum squared resid | 27960.37 | Schwarz criterion | 8.678595 | |
| Log likelihood | -416.0812 | Hannan-Quinn criter. | 8.615762 | |
| F-statistic | 0.088418 | Durbin-Watson stat | 1.991467 | |
| Prob(F-statistic)
|
0.966240
|
|
|
|
hah! It is not autocorrelated! And look at the Durbin Watson statistic – if its 1.3 and below, it is a strong indication of being autocorrelated. If its around 2, its not correlated in anyways, and if its >2, its negatively correlated.
Not to be defeated, I tried a novel way of checking if my groceries spending were random – by subjecting it to a randomness test, i.e. the Diehard tests. It failed. Diehard didn’t even generate a proper report. I tried Ent instead, and this was the result:

I have no idea why Ent actually showed different results for the serial correlation test. If I am not mistaken Ent uses the LM test as well. Maybe the variables is different, but Ent determined that my groceries spending weren’t random either (easiest test to read and intepret is the Chi squared test). I am also guessing that 600kb might be too small a file for Ent to determine if that’s random or not.
So, to recap:
- No autocorrelation shown by 2 tests – this usually implies randomness
- No unit root – this usually implies there is some form of autocorrelation
- Randomness tests show that its not random.
- Lingam would say: Looks random, feels random, doesn’t mean it is. (Damn, you’ve gotta love the way lawyers can twist the truth)
My thoughts on this: I think there is something I am missing. It could be that my knowledge of statistics and econometrics is sorely lacking, or it could be my understanding of randomness is sorely lacking. If its the latter, O great Benoit Mandelbrot, comment in this blog and tell me where I have gone wrong. If Mandelbrot doesn’t comment on blogs, then his protege Taleb would be welcome as well ;P (Actually Taleb would probably slaughter me for using regression on this one. Ah, but it’s one of the few tools I actually learnt, and given that I am pretty much mathematically retarded, I could have intepreted the data totally wrongly).
Of course, there is also the off chance that this might actually be a new form of timeseries (I know for a fact that my groceries are a test data for some other model I had in mind), as Aaron suggested (quite sure that was in jest though), though that would be highly unlikely. Or is it some new kind of randomness? Could it be because I misunderstood the fact that Not Autocorrelated = Random?
Questions questions. Maybe you should answer me.
Cool?
DiggDel.icio.usTechnoratiFeedsterFurl
RSS feed for comments on this post | TrackBack URI for this post
