View on GitHub

Project 2 : Bicycle Race

CS 424 Project

Index About the Project Project Data Team SourceCode How-To Interesting Findings Patterns Unconvered
Download this project as a .zip file Clone This Project

Project Data

The core crux of the visualizations created for the Project 2: Bicycle Race was the data provided to the team. The data for visualization was at Bike Sharing Data Website . The website also had cleaned up data divided into multiple forms which helped the team to get all the data they needed. The SQL data for the visualizations was pulled from Steve Vance's Github This helped the team to get the huge dataset onto a MYSQL server which helped the team to transform data according to each team member's needs and also the whole project's need. The Time Interval to be explored for the project was between 28th June and 31st December 2013. The Team also had to show weather during calendar playback mode so as the user can get a sense of the usage patterns during major weather events and daily events like sunrise and sunset.

Data Separation and Data Changes
Mapping Stations with Communities:

The data we received is not completely cleaned. As an example we have the stations_data.csv file which have all the stations related information including latitude and longitude. But this data did not have communities. We manually mapped the all the 300 stations to one of the 77 communities. This ensured to investigate and produce visualizations even based on the communities.

Historical Weather Data The data has been stored in JSONs and can be found here.. This includes historical data for each day and subdivided by the hour. Java Code used for Data Cleaning
The sample scripts that are used for data cleaning are available in this public repository. These script files generated JSON files which have all the required data specific to the particular visualization. All the JSON files we generated are stored here.

MYSQL Server

The huge dataset being explored by the team was always going to be a problem for the team. The solution was to use a MYSQL Database which gave the team multiple granular levels to transform data and analyze it according each requirement of the project. University of Illinois at Chicago (UIC) has a MYSQL Database for each student's personal website and the team used the same for the project. Once the database was deployed, the team used HeidiSQL to upload the SQL data available to it. Once the dataset was deployed to the database, the team normalized the data into smaller/finer subsets which made it easier to access. It was also easier to apply indexes to the tables which resulted in faster execution. The team made a conscious decision to use Java/Python to subdivide the data into multiple CSVs for usage which D3 and Leaflet. Leaflet and D3 javascript libraries are readily compatible with different file formats such as CSV, JSON, GeoJSON. This made accessing data and filtering it during the coding phase easier and gave less headaches. Also, it is easier to access JSON as variables by including them as various javascript. Another concern various by various team members was the huge delay in accessing the database on the fly inside the code. Therefore, the team came to a decision to have data generated locally as per each team members's need.

Data Separation


Data was primarily needed for the following tasks during the project. Historical Weather Data The data has been stored in JSONs and can be found on git in the /App/Json/Map/Weather path. This includes historical data for each day and subdivided by the hour. The data is updated to an hour interval. D3 Visualization Leaflet/Map Visualization Scripts for Data Cleaning The sample scripts that are used for data cleaning are available in this public repository.