Create a new pandas dataframe of data that will be used

Assignment Help Python Programming
Reference no: EM131555254

Assignment: Python and postgres

In this exercise you will create some data "assets" for use by the XYZ company in direct marketing campaigns. You will download data from the SSCC server to your local computer, create relational database tables that you'll store locally, create a "flat" file with selected customers and variables, and report on purchasing by gender. You'll document your work by providing your commented code. You'll save the new assets you created for future use and for sharing with others.

To do this assignment you'll be using the SSCC and a Postgres server running on it, Python and the pandas package, and the sqlite database. You'll use Python and pandas for data manipulation and reporting.

Getting Your Data from the SSCC

XYZ's data are in a Postgres DB server on the SSCC. To get them, log in to the dornick server. You'll need to do this using a VPN client if you are off campus. Then, connect to the Postgres database server.

The Working with the SSCC pdf document and the SSCC Cheat Sheet pdf on Canvas provide some information about how to connect to the SSCC and to get your data from the Postgres DB.

You'll find three tables in the Postgres DB pilot schema that are named item, mail, and customer. Export each table as a csv file with a header record, using a temporary view to do each one. Using psql is the easiest way to do this. After you are done, delete each temporary view.

Download your three csv files to your computer so that you can work with the data in them using Python. Don't forget to log off from dornick.

You'll find documentation about XYZ's data (attached). Note that like most real world documentation, it's not "perfect." But it is the real thing.

Do These Things

Once you have your csv files on your computer, do the following five (5) things, most of which have "subthings:"

#1) Import each of the csv files you downloaded from the SSCC into a pandas DataFrame.

(a) Provide the code you used to do this.
(b) Print out the column names of your item DataFrame and the first four (4) records in it.
(c) Decribe the data types of the columns in the DataFrame.

Include your commented code for each of the above.

#2) Write each of you pandas DataFrames to a local SQLite DB named xyz.db. Include only data for active buyers in these tables. Verify that you have written the tables to your SQLite DB correctly.

(Commented code, of course.)

#3) Now, using the same data as you used for 2, above, create a new table called custSum that you also write to xyz.db, and that has the following characteristics. This table should have one row per customer record.

(a) Include on each customer's record a binary, Y or N, indicator of whether the customer is a 'heavy buyer,' where the definition of a 'heavy buyer' is a customer whose YTD purchasing in 2009 is greater than 90% of the 2009 YTD purchasing of all customers who are active buyers. Verify your coding of this new variable by crosstabulating it with an indicator of whether their 2009 YTD purchasing is at at least the 90th percentile of all 2009 YTD purchasing.

(b) Add to each customer's record whether the customer has the following credit cards: AMEX, Discover, VISA, and Mastercard, with each credit card variable codes as a Y or a N for yes or no, respectively. Document your creation of these codes by showing how they are related to the code values in the data

(c) Add to each customer's record their estimated HH income, and the genders of adults "1" and "2."

(d) Add to each customer's record their ZIP code and ZIP+4 code.

(e) Be sure to include the account number on each record in the SQL tables you create so that the tables can be related to each other, later.

(f) Provide a count of the number of records in each table.

(g) Verify that you have written this table to your SQLite DB correctly.

(Don't forget to comment your code so that a reader can understand what it is supposed to do.)

#4) Create a new pandas DataFrame of data that will be used for target maketing and write it out to a headered csv file.

(a) This DataFrame should have one row per customer. The customers included should be active buyers or lapsed buyers.

(b) The row for each customer should include the customer's account identifier, and an indicator variable (Y/N, or 1/0) for each product category the customer has made at least one purchase in.

(c) Include for each customer their buyer status, and the total dollar amount of the purchases they have made from XYZ using all data available for him or her.

(d) Write your DataFrame to a csv file, and also store it in a shelve database.

(e) Verify that the files you wrote your customer DataFrame to were written correctly.

(Commented code, of course.)

#5) Report the six most frequently purchased product categories by the gender of "adult 1" using the data for the active customers. Include for these categories the total spend in dollars on each category, the total number of products purchased in these categories, and the number of adults in each gender category.

(Be sure to comment your code.)

Your Deliverables

Provide the above in up to six (6) pages, but in no more than 7 pages, in a pdf file. Be sure that everything is readable. Address each of the five above parts in turn. Do 1 by providing your commented code and results. Then do 2 providing code + results, and so on.

Do not provide a list of code for all of the above items in a block, followed by the results of your code in a block. An assignment organized in this way will be returned ungraded. Be sure all of your code is syntactically correct, and that it approximates good Python coding style.

Reference no: EM131555254

Questions Cloud

Create a table or matrix to perform evaluation comparison : Create a table or matrix to perform your evaluation comparison. describe in detail the evaluation method that you plan to use to compare and contrast 3 options.
What computations with q and r will produce the solution : Provide a glimpse of some widely used matrix factorizations, some of which are discussed later in the text.
Testing the advertising claims of a tire manufacturer : You work for a consumer watchdog publication and are testing the advertising claims of a tire manufacturer.
How the issue might affect the role or duty of the police : Identify how the issue might affect the role or duty of the police, judiciary, or corrections.
Create a new pandas dataframe of data that will be used : Provide a count of the number of records. Create a new pandas DataFrame of data that will be used for target maketing and write it out to a headered csv file.
Describe the functions or commands of a matrix program : For block operations, it may be necessary to access or enter submatrices of a large matrix.
How a single court ruling can have systemic effects : Select a Supreme Court case from the list provided below, outline details of the case and its deposition, and trace the ruling's repercussions .
Describe the commands or operations of the matrix program : Suppose memory or size restrictions prevent a matrix program from working with matrices having more than 32 rows and 32 columns.
What are the most important features that you would look : Explain why is this critical. What are the most important features (at least 5) that you would look for in a tool to capture evidence correctly?

Reviews

Write a Review

Python Programming Questions & Answers

  Do a python project and get it running using app engine

Do a python project and get it running using App engine. Something very basic and simple

  Python regular expressions and dictionaries

Complete as many exercises from the book as necessary to understand the concepts. These will not be graded. The graded part of the assignment is to use regular expression

  Design a prgram using python

Design a prgram USING PYTHON that students can use to calculate what score they need on final exam to get a certan final grade for a course.

  Write a program that asks for the user age

Write a program that asks for the user's age. Based on their response print "You can vote" (18 years old or older) or "You can't vote" and also whether or not he/she is a senior citizen (over 64)

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Implement your algorithm in python

Write an algorithm in structured English (pseudocode) that describes the steps required to perform the task specified and reinforce topic material related to the programming work cycle, and the input, processing, output program structure.

  Write a program that asks the user for a letter

Write a program that asks the user for a letter. The program should then determine if the letter is a vowel or not

  Construction of the data dictionary

A first systematic step to a data science process, as we have learned from the lectures and practical sessions, is to construct a data dictionary for the dataset.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Create a python program that allows user enter class name

Create a Python program that allows user enter class name, faculty name, name of each student, and letter grade each student earned.

  Compare the scores of two volleyball teams

Compare the scores of two volleyball teams that play each other. To win a match in volleyball, a team must get 25 points.

  Question 1 research 5-8 species within one family of birds

question 1 research 5-8 species within one family of birds. be sure to use primary or very good secondary literature

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd