The data we used was from the month of July 2017. It's open data available on the internet, but we chose to pull it from the NYU CUSP data facility gateway server. We read in the data with this format, so it's reproducible:
df = pd.read_csv("/gws/open/Student/citibike/201707-citibike-tripdata.csv.zip")
After reading in the data, we created a new data frame with only the columns relevant our question, the "tripduration" and "gender" features. From there, we cleaned out the outliers to help visualize our data. We assumed all trips greater than 5000 seconds were not relevant for our analysis.