The rdatamarket post on the Revolutions blog and this post on Decision Stats reminded me about my list of Data APIs/feeds available as packages in R on Cross-Validated (which is a great site that you all should use). Many of these packages are from Omega Hat, which is an awesome site.
This is the most comprehensive list I'm aware of, so please alert me to any ommisions:
Practical tools for predictive modeling, data science, machine learning and web scraping
Friday, August 26, 2011
Because it's Friday: Spurious correlation edition
If the Flight of the Concords taught me anything, it's that you can't trust Australians. This morning I was poking around the DataMarket site, when I noticed something suspicious about Australian sheep production:
Labels:
Friday,
Global Warming,
Great Australian Sheep Decline,
R
Tuesday, August 23, 2011
Graphically analyzing variable interactions in R
I studied Ecology as an undergraduate, which meant I spent a lot of time gathering and analyzing field data. One of the basic tools we used to look for relationships in a large set of variables was correlation and scatterplot matrices. Each of these requires a single line of code in R:


Labels:
R graphics
Monday, August 22, 2011
Recession forecasting II: Assessing Hussman's Accuracy
In my last post on recessions, I implemented John Hussman's Recession Warning Composite in R. In this post I will examine how well this index performs and discuss how we might improve it. If you would like to follow along at home, be sure to run the code from the last post, before running anything from this post.
Labels:
finance,
R,
recessions
Wednesday, August 10, 2011
Using the google prediction API from R
Google has a "black box" prediction API that they provide for use with creating recommender systems or filtering spam. Furthermore, they provide an R package for interfacing that API, but try as I might I cannot get it to work under windows. Here are the instructions for setting up the API to run in R under linux. I haven't tried this out yet, so let me know in the comments if it works, or if you can get it to run on Windows.
Labels:
model building,
R
Scraping web data in R
In my last post, I went through a lot of effort to scrape the PMI index off the ISM website. It turns out that was unnecessary effort, as commentator "senne" pointed out that this index is available from FRED, with the symbol NAPM. I've updated my code, which now pulls all the data straight from FRED.
However, it was surprisingly easy to scrape web data into R, using the readHTMLTable function in the XML package. I thought I'd keep the code I used on my blog, as it's a good example of how easily you can pull web data into R.
However, it was surprisingly easy to scrape web data into R, using the readHTMLTable function in the XML package. I thought I'd keep the code I used on my blog, as it's a good example of how easily you can pull web data into R.
Labels:
R,
web scraping
Tuesday, August 9, 2011
Forecasting recessions
John Hussman has a Recession Warning Composite that I am attempting to replicate/improve. The underlying data seems to be easy enough to get from FRED using the quantmod package in R. I don't quite understand the index Hussman is using for commercial paper, so I used the '3-month AA financial commercial paper index' from FRED.
Labels:
finance,
R,
recessions
Subscribe to:
Posts (Atom)