42

Antworten zum Universum

July 29th, 2009 at 10:57 am

A New Kind of Random?

In keeping up with my fetish of blogging about my groceries, I went groceries shopping yesterday and was rather alarmed with a relatively large groceries receipt. It was more than twice what I usually spent per week, and the food I bought won’t even last me for a week. Alarmed, I went through my receipt, and found that there had been no errors. Then I started questioning myself, to see if what I know about my average weekly groceries spending were correct. I keep a close tab on my spendings, and so I tabulated them. This is the result (and my line of enquiry):

I know I pay for my groceries using only EFTPOS. I never pay for my groceries with cash. I only shop at Coles and I shop almost every week. These facts has facilitated a lot of conveniences for me (like allowing me to pull very accurate data off the bank’s site). So here’s how my average weekly groceries spendings (for the past 100 times I went to Coles) look like in graphs:

Groceries Line Graph

Not much is there? Nothing discernible. Its too messy. (Addendum: Read the graph from right to left. My EViews screwed up a little and so the time is arranged backwards)

Now let’s look at it from the point of view of distributions:

Distro of Groceries

The bar on the far right? That’s tuesday’s shopping.

So, the next thing to ask is.. what affects my shopping habits (and by extension, my spending). Me being me, the first thing that jumps to mind is an endogenous answer (well, I would have jumped to an exogenous answer if I had exogenous data). So, one of the first things I did was to do a general trending check with a HP filter:

Groceries spending, trended with a HP Filter

Who’s willing to bet that I spend more during winter? Anyway, after that, I did a regression test, since I had a nagging feeling that one week’s purchases will affect the next week’s. And this was the result:

Dependent Variable: SPENT
Method: Least Squares
Date: 07/29/09 Time: 09:39
Sample (adjusted): 6 100
Included observations: 95 after adjustments



Variable


Coefficient


Std. Error


t-Statistic


Prob.


C 29.14656 7.661098 3.804489 0.0003
SPENT(-5) 0.041743 0.104764 0.398449 0.6913
SPENT(-4) -0.065367 0.109715 -0.595787 0.5528
SPENT(-3) -0.014150 0.109858 -0.128799 0.8978
SPENT(-2) 0.235955 0.106122 2.223438 0.0287
SPENT(-1)


-0.014903


0.106144


-0.140400


0.8887


R-squared 0.054675 Mean dependent var 35.66537
Adjusted R-squared 0.001566 S.D. dependent var 17.70590
S.E. of regression 17.69203 Akaike info criterion 8.645180
Sum squared resid 27857.69 Schwarz criterion 8.806478
Log likelihood -404.6461 Hannan-Quinn criter. 8.710357
F-statistic 1.029493 Durbin-Watson stat 2.006994
Prob(F-statistic)


0.405294





Nothing terribly interesting. Except that yesterday’s shopping was obviously affected by shopping a fortnight ago to a statistically significant point (98%). When a Wald coefficient test was done, Spent(-2) was also shown to be significant. But the important thing is the R² value. The adjusted R² being at 0.001566 meant there were no correlation at all. Here’s how the scatter plot look like:

Groceries scatter plot

Wow, looks pretty random, no? Even has random clusters. Hah, looks like I don’t have use Yuzoz afterall (btw, Yuzoz shut down :( ).

But wait… let’s not get ahead of ourselves. Run a unit root test to see if anything turns up. So, I run a Augmented Dicky-Fuller test and here be the results:

Null Hypothesis: SPENT has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 0 (Automatic based on SIC, MAXLAG=12)





t-Statistic


Prob.*


Augmented Dickey-Fuller test statistic -10.33767 0.0000
Test critical values: 1% level -4.053392
5% level -3.455842

10% level



-3.153710



*MacKinnon (1996) one-sided p-values.
Augmented Dickey-Fuller Test Equation
Dependent Variable: D(SPENT)
Method: Least Squares
Date: 07/29/09 Time: 10:04
Sample (adjusted): 2 100
Included observations: 99 after adjustments



Variable


Coefficient


Std. Error


t-Statistic


Prob.


SPENT(-1) -1.006491 0.097362 -10.33767 0.0000
C 33.94476 4.986012 6.807998 0.0000
@TREND(1)


0.043238


0.061618


0.701714


0.4846


R-squared 0.527568 Mean dependent var -0.478586
Adjusted R-squared 0.517726 S.D. dependent var 25.22728
S.E. of regression 17.51933 Akaike info criterion 8.594321
Sum squared resid 29464.99 Schwarz criterion 8.672961
Log likelihood -422.4189 Hannan-Quinn criter. 8.626139
F-statistic 53.60193 Durbin-Watson stat 2.031067
Prob(F-statistic)


0.000000





Aaaand be set up for disappointment. The Dicky Fuller test statistic was -10… which in short meant, no, there was no unit root. It is stationary, and it should have some form of autocorrelation (i.e. its not random). But yet, when a proper autocorrelational test is done:

Breusch-Godfrey Serial Correlation LM Test:



F-statistic 0.132627 Prob. F(2,94) 0.8760
Obs*R-squared


0.275764


Prob. Chi-Square(2)


0.8712


Test Equation:
Dependent Variable: RESID
Method: Least Squares
Date: 07/29/09 Time: 10:19
Sample: 3 100
Included observations: 98
Presample missing value lagged residuals set to zero.


Variable


Coefficient


Std. Error


t-Statistic


Prob.


C 4.379644 9.744755 0.449436 0.6542
SPENT(-2) -0.120337 0.263558 -0.456589 0.6490
RESID(-1) -0.016946 0.103238 -0.164147 0.8700
RESID(-2)


0.138355


0.282935


0.489001


0.6260


R-squared 0.002814 Mean dependent var 1.16E-15
Adjusted R-squared -0.029011 S.D. dependent var 17.00190
S.E. of regression 17.24676 Akaike info criterion 8.573086
Sum squared resid 27960.37 Schwarz criterion 8.678595
Log likelihood -416.0812 Hannan-Quinn criter. 8.615762
F-statistic 0.088418 Durbin-Watson stat 1.991467
Prob(F-statistic)


0.966240





hah! It is not autocorrelated! And look at the Durbin Watson statistic – if its 1.3 and below, it is a strong indication of being autocorrelated. If its around 2, its not correlated in anyways, and if its >2, its negatively correlated.

Not to be defeated, I tried a novel way of checking if my groceries spending were random – by subjecting it to a randomness test, i.e. the Diehard tests. It failed. Diehard didn’t even generate a proper report. I tried Ent instead, and this was the result:

Randomness Test with Ent

I have no idea why Ent actually showed different results for the serial correlation test. If I am not mistaken Ent uses the LM test as well. Maybe the variables is different, but Ent determined that my groceries spending weren’t random either (easiest test to read and intepret is the Chi squared test). I am also guessing that 600kb might be too small a file for Ent to determine if that’s random or not.

So, to recap:

  1. No autocorrelation shown by 2 tests – this usually implies randomness
  2. No unit root – this usually implies there is some form of autocorrelation
  3. Randomness tests show that its not random.
  4. Lingam would say: Looks random, feels random, doesn’t mean it is. (Damn, you’ve gotta love the way lawyers can twist the truth)

My thoughts on this: I think there is something I am missing. It could be that my knowledge of statistics and econometrics is sorely lacking, or it could be my understanding of randomness is sorely lacking. If its the latter, O great Benoit Mandelbrot, comment in this blog and tell me where I have gone wrong. If Mandelbrot doesn’t comment on blogs, then his protege Taleb would be welcome as well ;P (Actually Taleb would probably slaughter me for using regression on this one. Ah, but it’s one of the few tools I actually learnt, and given that I am pretty much mathematically retarded, I could have intepreted the data totally wrongly).

Of course, there is also the off chance that this might actually be a new form of timeseries (I know for a fact that my groceries are a test data for some other model I had in mind), as Aaron suggested (quite sure that was in jest though), though that would be highly unlikely. Or is it some new kind of randomness?  Could it be because I misunderstood the fact that Not Autocorrelated = Random?

Questions questions. Maybe you should answer me. :D

Cool? DiggDel.icio.usTechnoratiFeedsterFurl
RSS feed for comments on this post
 |  TrackBack URI for this post

 

RSS feed for comments on this post | TrackBack URI