Clicky

  • Welcome to P2P Lending / NFT Lending Forum.
 

ETH.LOAN

News:

This was the original Lend Academy peer-to-peer lending forum, since forensically restored by deBanked and now reintroduced to eth.loan.

To restore access to your user account, email [email protected]. We apologize for errors you may experience during the recovery.

Main Menu
NEW LOANS:   | 804.eth 2.500 Ξ | remoraid.eth 0.299 Ξ | remoraid.eth 0.299 Ξ | ALL

Old Data Download

Started by Peter, December 06, 2015, 11:00:00 PM

Previous topic - Next topic

qwertyfan

Does anyone have a logged-in Lending Club data download with the old, augmented fields available? The files were named LoanStats3a_securev1.csv et al. The current files on Lending Club have significantly fewer fields available. The Internet Archive does not have the 'secure' version of the files either, but an insecure version without all the fields. The latest file I found is from April 2014, leaving a big hole in my data and making modeling harder. I would super appreciate anyone sharing their data, would be glad to buy them the digital equivalent of a coffee https://forum.lendacademy.com/Smileys/default/smiley.gif" alt=":)" title="Smiley" class="smiley" />

PhilGD

The only files I am aware of with *all* the credit data available are the ones on the internet archive. They incorporate August 2012 - June 2014.  Which is all of LoanStats3b and January to June of LoanStats3c. Prior to August 2012 LC did not provide expanded credit data.

Those old files on the internet archive, as you mentioned, are the "insecure" files. The data that is missing from the "insecure" files is the borrower FICO score at time of origination and the 3-digit zip code. These data points can be pulled from the current files provided by LC, and merged with the "old" files.

If anyone has additional data to this I'd be more than happy to provide an additional digital equivalent of a coffee

TravelingPennies

Great, thank you for your help.

As you suggested, I looked on the InternetArchive, and got all the data up to Sep 30th, 2014. The only fields missing are the ones you indicated, which are available in the new data. Unless someone downloaded data after that date but before LC got rid of the additional variables, I believe this is the best we'll do.

Rob L

I think I have the old files you are looking for.
Send me a PM and I'll send you a Dropbox link that will let you upload them.

qwertuser

I am looking for the old Lending club files as well. I tried searching InternetArchive but couldn't find them. It will be very helpful if some could provide a link to the old files.
 
Thanks

panther02912

This is regarding data that LC once provided, but no longer posts for fresh or recent loans, right?    If so, then the missing data would help predict/model older loans, but it's not clear how one could use it to predict new loans.

Or are you, qwertyuser et al, trying to figure out performance of older loans for Folio purchases?

TravelingPennies

The question in the OP isn't relevant anymore because LC is back to providing the expanded data attributes for all loans issued from August 2012 - present in their historical data downloads. Therefore the granularity of the historical data is now on par (for the most part) with the granularity of the data on loans available for investment.

TravelingPennies

There were quite a number of data fields in the Loanstats files prior to the LC IPO that were sanitized or removed.
It was assumed the changes were made to better protect the identity of the borrowers.
However, since this info isn't available any more it's hard to see what value it could have today (except possibly for Folio).

TravelingPennies

I am very new to P2P lending and have only started putting a little bit of money last month. Based on what I've read here, the returns seems to have gone down in the last couple of months. However, I am trying to take an algorithmic/AI based approach to figure out the best loans. A model is as good as the data you build it on.

I am trying to find out variables that can be good predictors. The usual ones like Purpose, Home Ownership, dti etc are fairly commonly used when deciding whether to buy a loan or not. However there are more than 100 variables in that dataset and I am trying to see if there are certain other predictors which I can find.

One other issue that I am facing is that I am unable to identify which columns in the files don't change once a loan is issued. There are columns like "open_acc", "total_acc". Lets say a new loan was available for investment in 2012 where the "open_acc" was 5. Loan was paid back in 2015, but 3 accounts were opened in 2013. Will the lending club file show "open_acc" as 5 (original value) or 8 (updated value)? There are probably 50 such variables where this sort of update can take place. For eg "last_fico_high" is a very good predictor of defaults but i am sure that it changes once the loan is issued. Thus I am trying to find out which variables are immutable once a loan is issued and which ones change thereafter. I am guessing the old files might help me figure this out. I have looked at the data dictionary on their website, but it has not been very helpful. I was wondering if anyone has suggestions to tackle this problem.

Also the some of the files till 2014 have some missing columns like "open_acc_6m", "open_il_6m",   "open_il_12m",   "open_il_24m", "mths_since_rcnt_il" etc. I am not sure whether the old data has these variables or not and thus want to take a look at it.

Any suggestions are welcome.
Thanks

Fred93


TravelingPennies

Thanks for replying. Is there a comprehensible list of immutable variables?

There are variables like mthsSinceLastDelinq, mthsSinceLastMajorDerog, mthsSinceLastRecord, mthsSinceRcntIl or totCollAmt, totCurBal, totHiCredLim, totalAcc or openIl12m, openIl24m, openIl6m, percentBcGt75 and many more. I don't know whether any of them are even useful, but I am not sure which ones to discard out completely.



TravelingPennies

Start with only the data fields (CSV columns) that are available in the files containing new loans that are made available 4 times a day. For making buy/pass decisions nothing else matters. A couple of these fields change in real time as notes are being purchased by investors; for example FUNDED_AMOUNT and possibly INVESTOR_COUNT. I strongly believe all the rest are immutable. Many, but not all, of these fields are simply copied into the LoanStats file for historic record (and back testing of course) and never change once there. If this were not true then back testing with NSR or equivalent would have been fundamentally flawed from the start. I've never heard anyone propose this. One caution. An empty field may indicate data not available, or it may not. For example if the field MTHS_SINCE_RECENT_BC_DLQ is empty it could mean data not available, or more likely there has never been a BC_DLQ.

NEW LOANS:   | 804.eth 2.500 Ξ | remoraid.eth 0.299 Ξ | remoraid.eth 0.299 Ξ | ALL