Hi,
Does anybody know when the new variables LC announced in their most recent blog-post (
http://blog.lendingclub.com/2012/09/28/investor-updates-and-enhancements/) are going to show up in the CSV files for download?
Also, when you guys are picking loans, how do you count loans marked as late (or grace period) in the LoanStats.csv file? I'm building a model on the data and am currently counting them as bad, but that might discount the fact that many of them do recover. This may make the model forecast too pessimistic. Any thoughts?
Markus
Most people who are doing models for Lending Club and Prosper, when looking at the entire database, use a discounting method for late loans. You can see Lendstats model here:
http://www.lendstats.com/loansearch/lc/lcloanfilter.php They use loss factors of 0.5 for payment plans, 0.25 for in grace period, 0.5 for 16-30 days late, 0.75 for 31-120 days late and 0.99 for defaults.
Others, such as Interest Radar, use the Lending Club recovery rate data which is more optimistic than Lendstats:
https://www.lendingclub.com/info/statistics-performance.action
One way to go is only to model on loans that have termed out. No discounting necessary. Downside is you're three years out of date.
Does the LendStats model use historic data? It seems that the data available for download shows the current status of loans and does not include historical status changes. For example, I just dug out old loan files from September and October and LoanID 1024323 went from Late31-120 to Late16-30 in September back to Current in October. Unless these changes over time are accounted for and, say, the "worst" state for over the lifetime of the loan is being used in the model, then the discounting factor of the loan would change all the time.
Does anybody know of a way to get old versions of the LoanStats files?
You are right that once a loan goes back to current at Lendstats and others it is treated as always being current. But we know that it has a higher likelihood of default than a loan that has never been late.
There is no way to obtain old versions of the Loanstats.csv, it is updated every day. I have downloaded about a dozen versions on my computer dating back to 2010 so I can see how things change over time.
Could you put them in zip-file somewhere? :-) Then I can update my model to account for loans that were paying late.
Actually I don't mind at all. Here is a link to the Zip file with five different Loanstats files from 2010 and 2011. Warning, the file is 85 Meg.
https://www.dropbox.com/s/mh8jl5lh5dfhpzu/LoanStatsArchive.zipLet me know what your analysis finds.
Sweet! Thank you very much. I'll look into it and will report back :-)
The loan ratings my current model comes up with (updated occasionally) are at
http://cervisia.org/lc_credit/ , btw. The model uses a variety of features I've derived from the data (among them some from the loan description), but is a simple binary model that counts late/grace/defaults as bad and fully paid as good (loans that are current aren't used). Given the recovery rate of loans it is probably too conservative in the estimates.
Thanks. Look forward to your results. And thanks for the URL to your model - I have seen something similar developed by other investors.
Quick update: it looks like it skewed my model into predicting better (more accurate?) results for loans in lower credit tranches. It's a bit odd and I had to change my definition of bad loans to exclude loans in grace period. It looks like almost everyone misses a payment every now and then.