Reference no: EM133985473
Data Management and Application Development
Write a python program that reads the olympics.csv file found on the BUS 640 Canvas site and conforms to the specifications described below. The olympics.csv file contains a record for each Olympic athlete that has competed in the Olympic Games since 1896. The file has the fields (columns) shown in the table attached.
Exam Specifications:
Your program should find and print output showing the ten countries that have won the most gold medals. Also print the number of gold medals that each of the countries have won.
Your program should find and print output showing the ten athletes that have won the most gold medals. Also print the following information for each of these ten athletes:
The years in which the athlete has competed in the Olympic Games
The number of gold medals won by the athlete in each year
Your program should find and print the top three sporting events in which athletes in each decade of life have won medals (all categories of medals bronze, silver, and gold). Also print the number of medals won in each of the sporting events. Get AI-free online assignment help by the best tutors.
Format the output of your program as shown in the examples on the following pages.
Project Overview
The Final Project is a comprehensive assessment designed to evaluate your ability to apply Python programming, data analysis, pandas, and introductory machine learning techniques to a real-world dataset. You will demonstrate both technical proficiency and analytical reasoning by completing a full data analysis workflow and communicating your findings in a professionally written paper.
This project integrates all major topics covered in the course, including Python fundamentals, decision statements, iteration, data structures, file input and output, pandas Series and DataFrames, data aggregation, and linear regression.
Dataset Requirement (Mandatory)
You are required to obtain your dataset from Kaggle. To complete this project, you must:
Create a free Kaggle account if you do not already have one
Select a dataset available in CSV format
Choose a dataset with multiple numerical variables suitable for aggregation and regression
Ensure the dataset is sufficiently large and complex to support meaningful analysis
Datasets that are overly simplified, lack numerical variables, or do not allow grouping or modeling will not be accepted. You must include the Kaggle dataset title and direct link in your written paper.
Project Scenario
You are acting as a data analyst tasked with analyzing a real-world dataset to extract insights that support decision-making. Your goal is to identify patterns, summarize trends, and model relationships between variables using linear regression.
The project should reflect an applied perspective, demonstrating how data analysis supports decisions in business, public policy, healthcare, technology, or another relevant domain.
Part 1: Python Program (100 Points)
You will develop a Python script or Jupyter Notebook that performs a complete data analysis pipeline using your selected Kaggle dataset.
Program Requirements
Your program must include:
Reading data from a CSV file using pandas
Use of variables and arithmetic expressions
Use of decision statements
Use of iteration using for or while loops
Use of at least one list and one dictionary
Data cleaning or preparation steps such as filtering, handling missing values, or type conversion
Creation and manipulation of pandas Series and DataFrames
Data aggregation using groupby or pivot tables
At least one linear regression analysis
Clear output of results using printed summaries, tables, or visualizations
Your code must be logically organized and commented to explain major steps.
Part 2: Written Paper (100 Points)
You will submit a written paper explaining your project, methods, and findings in a clear and professional manner. The paper should be written for an audience that may not have a technical background.
Paper Requirements
Length: 6 to 8 pages, double spaced, excluding title page and references Formatting: Anderson University (AU) Style Formatting
Your paper must include the following sections:
Introduction
Describe the purpose of the analysis and why the chosen Kaggle dataset is appropriate.
Dataset Description
Explain the dataset, including size, variables, and source. Include the Kaggle dataset title and link.
Methodology
Describe how the data was cleaned, prepared, and analyzed, including aggregation and regression.
Results
Summarize key findings from descriptive statistics, aggregation, and linear regression.
Discussion
Interpret results and explain their real-world or business implications.
Conclusion
Summarize insights, discuss limitations, and suggest future analysis.
References
Include the Kaggle dataset and any additional sources using Anderson University style formatting.