Clicky

  • Welcome to P2P Lending / NFT Lending Forum.
 

ETH.LOAN

News:

This was the original Lend Academy peer-to-peer lending forum, since forensically restored by deBanked and now reintroduced to eth.loan.

To restore access to your user account, email [email protected]. We apologize for errors you may experience during the recovery.

Main Menu
NEW LOANS:   | seaking.eth 1.500 Ξ | muk.eth 1.500 Ξ | torkoal.eth 0.200 Ξ | ALL

Automatic order creation and selection

Started by Peter, February 13, 2013, 11:00:00 PM

Previous topic - Next topic

sociallender

Hello everyone,

I am new to lendingclub but prettty good with numbers and programming.  I started an account at LC and found it too time consuming to choose good loans and then create an order including each loan.  So, I created some software that statistically data mines the loans.  I also created a windows application to allow users of LC to easily create an order without manually having to click on each url of the loan. 

I just started with LC so not sure how the loan selection is going to turn out but thought some of you may be interested in trying out the software (in beta test now) as well as the loan selection process.  The site is still under construction but the URL is sociallender.blogspot.com

I will be updating the site daily with the loans that meet the statistical models criteria.  I currently use regression with penalization to create a strict selection criteria. 

For those that are interested in my other investment venture, I have a stock market system that I have implemented using similar modelling techniques (NNs) with over a year of production history.  It is assetclassta.blogspot.com.  As you can see, I enjoy numbers https://forum.lendacademy.com/Smileys/default/smiley.gif" alt=":)" title="Smiley" class="smiley" />

John





TravelingPennies

zpbsfg,

The loans are modeled using the LCs historical database.  Unfortunately, it is very difficult to explain the rules created by the model.  Many algorithms such as neural networks are black boxes with no way (well actually there are some new processes that can on some) to tell.  Other algos such as decision trees can show the process but with the number of attributes, the trees contain hundreds of nodes.  In my case, I tested many algos such as SVMs, Bayes, linear regression, NN and settled on random forests with cost adjustment for imbalanced data.  My stat package provides attribute selection as well as precision metrics.  With 10 fold cross validation, the model seems to do a good job generalizing true positive (good loans) with precision at approx 93%.  However, many of the loans are discarded due to the cost penalization even if they do become fully paid.  The idea for me was to only choose loans that have a good chance of being paid in full even if many false postive (good loans that were incorrectly classified as bad) loans were discarded.

The stock market system is a completely different methodology using neural networks.  I am working on an overview document of how it works in detail.  I hope to have it posted there in the next few weeks.

Thanks for pointing out the link wasnt working.  It was not my intention.  I just changed it to point to the folder of the spreadsheet.  I am not quite sure why gdocs is giving me issues.  I am using googlecl to upload the docs but it doesnt seem to be putting in the sub folder correctly.  I have to manually move them and I think that may be the issue.  The folder link should be working tho (and the icon now to the folder). 

John

yojoakak

You should add a link directly to the loan, e.g. add a new column to the left of A and fill it with this:

="https://www.lendingclub.com/browse/loanDetail.action?loan_id=" class="bbc_link" target="_blank">https://www.lendingclub.com/browse/loanDetail.action?loan_id=" & B2


breitenm

Hi John,

That looks pretty interesting. I built something similar, but I am counting all loans that are (or ever were) late as a bad loan. I'm a very conservative investor and try to only invest in loans that make no problems whatsoever https://forum.lendacademy.com/Smileys/default/smiley.gif" alt=":)" title="Smiley" class="smiley" />

Can you share a variable importance plot from the random forest? Does desc_len in the spreadsheet refer to the length of the loan description? Also have you tried splitting the data in time? What I mean is you could train the model on loans issued up to say January 2011 and before.  The accuracy gets estimated on loans that were issued after that point in time.

New Jersey Guy

"Of the over 18,000 loans, approximately 18% of the loans were charged off (loan was not paid in full).  As expected, grade A loans perform the best with an average default rate of only 8% while grade G defaulted 44% of the time."

Algorithms, models, peanut logs and fruitcakes.  All of this is beyond a simpleton like me.   As a regular Joe that can't add 3 numbers with a calculator correctly, I even find this statement off your website hard to believe.  Nearly half of G loans will default?

Perhaps some of you smarter old-timers who have been doing this longer can elaborate on the accuracy of this.  It appears to me it is inconsistent with what Lending Club reports.  Or, am I wrong?

With a 33% to 43% default rate on E, F and G loans, it doesn't seem possible to achieve a positive return if these are the grades your diversifying in.



TravelingPennies

@yojoakak

Good suggestion.  I am working on that for today's run

@zpbsfg

(for my other stock market blog) the second column is for my system (weekly market trades), the 3rd column is the benchmark S&P500.  So for example, over the past 99 weeks, my system has a compound return of 64% while the S&P during that same period is at 13%.  You can take a look at the performance page for more of the trades and breakdown.

@breitenm

Sadly my stat package (weka) does not have the ability to display the RF trees.  Even if it did, it uses multiple trees so determining the importance of a single attribute would also need to be coded.  It just doesn't have that function to my knowledge.  However I did run a attribute evaluator (InfoGain and Ranker) with the following results (descending order of importance):

Ranked attributes:
 0.033125    4 credit_grade
 0.031705    2 interest_rate
 0.020555    3 loan_length
 0.017653    9 fico_range
 0.013517   14 revolving_line_utilization
 0.009888    5 loan_purpose
 0.007297    8 monthly_income
 0.005468   15 inquiries_in_the_last_6
 0.004476    6 debt-to-income_ratio
 0.003225   21 months_since_last_record
 0.003147   20 public_records_on_file
 0.00206    22 employment_length
 0.001964   12 total_credit_lines
 0.001875   11 open_credit_lines
 0.001812    1 amount_requested
 0.001585   10 earliest_credit_line
 0.00086    23 desc_len
 0.000705    7 home_ownership
 0          13 revolving_credit_balance
 0          19 months_since_last_delinquency
 0          18 delinquencies_(last_2_yrs)
 0          16 accounts_now_delinquent
 0          17 delinquent_amount

Also, if i understand the time question, you are interested in knowing the accuracy using a percentage split of the training set for the cross validation instead of folds?  I would have to include the date to get a correct split (no dates are used in current training set).  However, the loans are sorted oldest to newest in the training file, so after doing a 66% split, with 34% cross validation (presumably the most recent loans), the precision accuracy of true positive is still 94%. Of 6526 instances in test set, 1429 loans were classified as good.  Of these 1429 instances, 1351 were correct and 78 were incorrect).  This is consistent with 10 fold cross validation.  Hope that answers your question

@New Jersey Guy

Mmmm... i love peanuts and fruitcake!  I just did a simple pivot table in excel of the loanStats.csv file that you can download from lendingclub.com.  However, I need to qualify that loans were categrorized as charged off if they were:

Charged Off
Default
Does not meet the current credit policy  Status: Charged Off
Does not meet the current credit policy  Status: Default

Unless I messed something up, these are the default rates for each loan grade.  However, average 18% for all loan grades.  If I am right (someone please confirm), then it pays to select your loans wisely.

John


TravelingPennies

If you freeze the first row (View > Freeze Rows > Freeze 1 row) then the headers will stay in place.

TravelingPennies

@sociallender

Is the only reason you're investing with LC because you want more diversification?
With the returns you have achieved in the stock market, it seems like a much better return than LC could possibly yield....?



TravelingPennies

" My stock market strategy is very volatile (low sharpe).  It is profitable but takes confidence and faith.  Not something that I am going to bet the whole bank on until I have a few more years of evidence to support more investment."

It's me again, Mr. Simpleton.
My portfolio of stocks and bonds are nothing more than mutual funds.  See, simple!  I put money in and hopefully it grows.

  Personally, I'm glad to have you on board.  There are others on this board who are equally knowledgable and into crunching numbers in order to squeeze an extra .001%
You'll fit right in and I look forward to taking advantage of all your hard work!


NEW LOANS:   | seaking.eth 1.500 Ξ | muk.eth 1.500 Ξ | torkoal.eth 0.200 Ξ | ALL