Project 3 - Team Gryffindor Homepage

Also you can check Assignment 2 - Form to PHP to SQL
Hello Gryffindor

The Lending Club

Money

The Lending Club data is a publicly available dataset which collects a relatively large numbers of features (63 columns) for a sample of individual loans across the United States Although the dataset is not terribly large (686 rows) it does provide an opportunity to test our data wrangling and analytical abilities.

The main challenge for this type of project is that we all work with the same data without the possibility of version control issues. While we could have just placed the data set and even a whole SQL database in some share folder environment like GitHub, this would still be a non-real time access solution.

We decided instead to create a MySQL Server in AWS (cloud) and create a database with the main Lending Club database plus other associated tables which could help our SQL queries. This way the whole team would access data in real-time without worries of version control issues.

Below you will find a few SQL queries that through PHP, query directly the MySQL database. We have provided a few examples of queries which respond back with an HTML Table which could easily be accessed from R using the rvest package.

The final part of our project was to perform analysis of the data. You can see a few examples below. Finally we provide links tour .RMD file used for accessing, data wrangling and visualizations and analytics.

Examples of Visualizations

Visualization Visualization Visualization
Visualization Visualization Visualization
Visualization Visualization Visualization
Visualization Visualization Visualization
Visualization Visualization Visualization

Findings from our data analysis

In conclusion, we found the most common job titles were, NA, managers, teachers, owners, registered nurse and driver. However, teachers had the highest proportion of E graded loans, whilst owners had the the highest proportion of A graded loans. Loan purpose requests were dominated by Debt consolidation and Credit Card refinancing. FICO scores had a clearly negative linear relationship with loan rates. States with the most loan requests were California, NY and Florida, however Missouri and Idaho had the highest mean loan amount. Strikingly, Missouri also had one of the lowest mean annual incomes, while states such as Maryland/DC, Boston, New Jersey and Massachusetts, had the highest salaries and relatively modest loan rates. Next steps would be to merge demographic data from the census api to conduct an analysis of how this all relates to different aspects such as income, race, sex and more focused geographical boundaries. Teachers in Missouri have the second-lowest starting salary based on research done by Study.com see the article attached. https://study.com/academy/popular/teacher-salary-by-state.html. Although the cost of living in Missouri is not that high ( 7th lowest in the nation according to Missouri Economic and Information Center https://meric.mo.gov/data/cost-living-data-series), One might assume that loans sought out by teachers could be to pay loans for education.


Examples of Web to MySQL queries

Show Structure of LendingCLub Table

List all loans in the database (Selected Fields)

List number of loans by State and Home Ownership

List Loan Balances by State and Home Ownership

List number of loans by State and Loan Status

List Loan Balances by State and Loan Status

Links to our code, documents, etc

Project's Presentation Slides

GITHUB - Data Folder

Miscelaneous Code and Files

PHP Code Files

Folder with Images