Cafe Sales Data Project

description

This project demonstrates transforming raw sales data into a clean, structured,
and queryable relational database. It includes data cleaning, normalization
was our prompted client for the project, and each of us had to create a website
into 3NF, schema design, data insertion with Pandas, and SQL queries.
while also accounting for usability and design. The project consisted of keeping track
of user stories, user testing, a design guide, and more. The project required using
HTML, CSS, JavaScript, jQuery, API Integration, form validation, and more.
Due to the time constraint and requirements of the project, there are no media queries.

The raw data can be found here.
View all queries and code here.

timeline

06/2025

role

individual

tools

sqlite,
vscode,
sql,
python (pandas),


design


entity relationship diagram

The original data was a flat, unnormalized table. In order to achieve 3NF within the data, I constructed an ERD Diagram showcasing the different connections within the data through separate tables and keys, involving both cardinality and modality, key relationships, and data types.

data dictionary

In order to have a clear reference for my data, I created a data dictionary that specifies the characteristics of each input, separated into the tables the final database would hold.


data cleaning / insertion


data cleaning

I utilized Pandas to clean the dirty_cafe_sales.csv (10,000 rows of data). Although I didn't have much prior experience with Pandas, I learned a lot as I worked towards how exactly I wanted to clean the data and what would make sense in a real-world scenario. I removed duplicates and standardized formatting, converted data types, calculated missing values, and ensured NaN usage, preserving rows with missing values in the Transactions table as to not lose/skew data.

schema and insert

In order to create my database tables, I used both Pandas and SQL. I used SQL to create the relational tables, which I based on my Data Dictionary and ERD Diagram. I used Pandas to insert data from the new cleaned CSV into my schema tables, and ensured uniqueness in entries and correct usage of merging.


sql queries


queries

I came up with 10 different questions to write queries for to demonstrate aggregate functions, joins, grouping, and more. I wrote each question keeping in mind what the data could be used for in the real world. Results can be viewed below.