The Citibike business development team proposed several key business questions about rider activity, the effect of weather on bicycle demand, and potential locations for new bicycle sharing stations. My goal was to not only answer these questions effectively, but to transform the data into a cohesive dashboard that various stakeholders can comprehend.
Citibike Analysis & Dashboard
Project Overview and Goal
Methodologies and Tools
This project involved attention to detail from start to finish. Data was checked for consistency, prepared, then sampled before an in-depth analysis was conducted to answer key questions and create impactful visualizations with various Python libraries within JupyterLab. Various other steps included web-scraping, development and deployment of an interactive dashboard via Streamlit.
Data Cleaning / Consistency Checks:
The NYC Citibike dataset had over 28 million rows across several .csv files, which I merged and sampled down to 25% using a random seed for reproducibility. After performing consistency checks (formatting, duplicates, null values) in JupyterLab, I excluded unnecessary columns. Once the data was clean, I created additional columns to better analyze rider behavior. These columns included total ride time, season, and eventually average daily temperature after the NOAA data had been scraped and merged. I also noticed discrepancies between latitude and longitude of bicycle stations of the same name which I fixed by taking the first instance of each station and applying the corresponding lat/long for each recurring instance of the same station.
Example of Code to address latitude / longitude discrepancies. This was applied to both starting and ending stations.
The identification of seasonal trends required weather data that was not already present in the Citibike data. I utilized an API to collect the weather data for that year from the NOAA. Once I had that data, I re-formatted the Citibike data set, setting the date column as the index and creating a new column for the average daily temperature. I merged the two data sets along this new column and created another new column, dividing the dates into the four seasons, and looped this through the dataset to populate this column. I felt ready to continue my analysis... until I realized that there was a possibility that rider activity was impacted by landmarks, after all, NYC is a tourism hotspot. I decided to scrape a website that listed popular NYC destinations (after which I manually added latitude and longitude).
Web Scraping:
Interactive Mapping:
I was excited when it came to mapping the data, as I knew I wanted to create two separate maps with Kepler.gl: the first being a map of most popular ride paths relative to the previously scraped landmarks, the second being a map of all stations, with count determining how popular a station was. I loaded my data, then realized it was too large to initialize properly, so I ended up having to sample the data again, with another random seed and reproducible results. The two maps I created were integral in answering some of the questions posed by the business team.
Screenshots of both maps initialized in Kepler.gl. One map shows the most popular ride paths relative to popular NYC landmarks, while the other shows all Citibike stations, with darker colors indicating increased rider activity.
Data Visualization and Dashboard Design:
Finally, I was able to begin the dashboard design and create the visualizations for this project! Being unfamiliar with Streamlit, I encountered a bit of a learning curve when ensuring that all of my data and files were accessible in both my remote and local environments. Once I worked through this hiccup, I decided on a color scheme for my dashboard and visualizations. Choosing blues and reds to complement Citibike’s pre-existing branding, I tried to keep contrast high with accessibility in mind. There was a lot of stop-and-go during this step as I continued to make minor changes that would need to be pushed to GitHub. I made sure to answer all business questions and make thorough recommendations with the interactive visualizations to back my suggestions.
Examples of Visualizations included in the Citibike Dashboard.
Recommendations / Response to Key Business Questions:
-
With the obvious increase in demand in the summer months, scaling back during the winter months is a great way to increase profit margin and reduce costs. The winter and spring months (Nov-Apr) combined only account for 36% of all rides, while summer months account for 34.5% of total rides. Scaling back bike availability by 50% would still leave room for growth and increasing biciyle demand, but reduce the number of bikes not being used.
-
To identify potential need along waterfront locales, the first step would be to observe the most popular ride paths and the popularity of current starting and ending stations. Along the Hudson River, there is a long (but popular) ride path that could be shortened by adding a station near the Chelsea Waterslide Park. Ganesvoort Peninsula and Pier 45 are also along this ride path and could be potentially viable stations. Along the West Channel, Citibike could benefit from a station near John Jay Park as it is close to pre-existing popular ride paths. Additional stations surrounding Central Park also demonstrate potential based on popular ride paths.
-
To ensure bicycle availability, Citibike could implement a discount or incentive for riders to make round-trips from popular stations. While this might not equate to the exact same bike being returned, a similar rideable would complete the trip. This could be applied in the form of a discount offered to a rider if they return a rideable to the popular station from which they departed within the next 12-24 hours. Due to many riders being members, they're more likely to be consistent users that would benefit from a round-trip type of service.
-
Additional recommendations would include the acquisition of additional electric rideables, as they appear to have slightly increased appeal to member riders (the majority of riders in the NYC area). While noting that casual riders aren't the majority, they are consistently taking significantly longer rides. Marketing efforts to brand Citibike as convenient and efficient could draw in new riders and incentivize additional memberships.