Finding relationships in data


This publication has received some attention in the popular presses (see the Wired article):

In just over a day, a powerful computer program accomplished a feat that took physicists centuries to complete: extrapolating the laws of motion from a pendulum’s swings.

Developed by Cornell researchers, the program deduced the natural laws without a shred of knowledge about physics or geometry.

The research is being heralded as a potential breakthrough for science in the Petabyte Age, where computers try to find regularities in massive datasets that are too big and complex for the human mind and its standard computational tools.

The cool part is not just the physics, but the use of an evolutionary algorithm to fit deterministic equations to any data set. The point is to find relationships in real data and see if it has any predictive power.

This approach, of course, misses the stochastic/ probabilistic properties of nature but still a useful tool.

The other cool part is that the the tool is free for the public to download (windows, linux or mac).

I did some little tests to see how it works. In excel I created the simple function f(x) = x^2 / 32 + sin(x). For the x’s I took a list of automatically generated random integers from random.org and set the confidence level at .9 for all numbers.

After about 2 minutes, it came up with this:

x/32 is ~= to .03x so the program did a pretty good job, I think.

Now lets try something more interesting: inputing 2 lists of random numbers for x and y:

If the purpose of the engine is to to find relationships between raw data, it stands to reason that it could find relationships where none exist.

Lets test the meme that LA Lakers wins/loses predict stock market price changes. I used the data from Nasdaq rates (between 1987 and 2007) provided by the LA Time and also converted Lakers performance into quantitative terms. -1 represents a championship loss, +1 a championship win while 0 represents a time when the Lakers didn’t make the finals:

Lets graph the function:

According to our function, a victory is suggestive of stock market losses and a finals loss is indicated by some NASDAQ gains. Of course, the last two years of data have smashed these “predictions.”  Interestingly, our equation predicts the highest gains when the Lakers don’t make the playoffs at all. I think this is suggestive.

3 Comments  »

  1. Shane says:

    This is a wonderful article, great topic thank you for writing it!

    I’m new to the statistical world and am wondering if Eureqa is a good choice for this challenge or something else?

    —-
    Lets say I have 10 data sets.

    I would like to see if there is a correlation or relationship between any or all of these data sets. Maybe 1st data corrs well with 9th data etc. I would also like to check variations of each data set, moving average of data, time offset etc.

    I know I can put this all in excel and use the correl function but is there a program or a faster way to do this?
    —-

    Thank you!
    Shane

  2. I take pleasure in, cause I found exactly what I used to be having a look for.
    You’ve ended my four day lengthy hunt! God Bless you man.
    Have a nice day. Bye

  3. Apple cider vinegar, enriched with organic acids, nutrients and enzymes acts as a perfect home remedy for over weight problems.
    I use two 12 x 18 in gel packs, so one is
    always in the freezer. Treatment of obesity complications by medicinal plants: plants
    and diabetes, disorders of plants and cholesterol: Lagerstroemia (Banaba).

Trackbacks/Pingbacks

    1. c socket programming

    RSS feed for comments on this post, TrackBack URI

    Leave a Comment