Write a basic word count mapreduce program

Assignment Help Computer Engineering
Reference no: EM133883227

Lab: Hadoop MapReduce Implementation in AWS using Amazon EMR

Objective:
In this lab, students will:
Set up a Hadoop cluster using Amazon EMR.
Write and deploy a simple MapReduce program in Java or Python.
Run the MapReduce job on the EMR cluster and analyze the output.

Step 1: Setting Up an Amazon EMR Cluster

Login to AWS Console:
Go to
Login with your credentials.
Navigate to Amazon EMR:
In the search bar, type EMR and select Amazon EMR from the services list.
Create a Cluster:
Click on the Create cluster button.
Select Go to advanced options.
Configure Software and Steps:
Under Software Configuration, select Release: emr-6.x.x (latest).
For Applications, ensure that Hadoop is selected. Other applications like Hive and Spark can be unchecked for simplicity.
Click Next.

Configure Hardware:
Set the Instance Type for both Master and Core nodes to m5.xlarge (or another instance type depending on your resource needs and budget).
Keep the Number of Core Instances as 1.
Click Next.

General Cluster Settings:
Give the cluster a meaningful name, like `Hadoop-MapReduce-Lab`.
Leave the default settings for networking and permissions.
Click Next.

Security Settings:
Create a new EC2 key pair if you don't have one, or select an existing key pair.
Click Create cluster.
Wait for the Cluster to Launch:
The cluster will take a few minutes to launch. You can monitor the status in the Cluster List.

Step 2: Writing the MapReduce Program

You can write a basic word count MapReduce program. Here's an example in Python:
Mapper (mapper.py):

Reducer (reducer.py):

Program Structure:
The Mapper splits input text into words and outputs each word with a count of 1.
The Reducer sums the counts for each word and outputs the total.

Step 3: Deploying the Program on Amazon EMR

Upload the Python Scripts to S3:
Navigate to the S3 service in the AWS Console.
Create a bucket (e.g., `hadoop-mapreduce-lab-bucket`) and upload the `mapper.py` and
`reducer.py` files.

Submit the Job to the EMR Cluster:
Go back to your EMR cluster page.
Click on the Steps tab.
Select Add Step.
Choose Custom JAR as the step type.
For the JAR location, select the built-in Hadoop streaming JAR:
`command-runner.jar`.
For Arguments, enter:

Replace `<your-bucket-name>` with your actual S3 bucket name. Get online assignment help services Now!

Add Input Data:
Upload a sample text file to the `input/` folder in your S3 bucket (e.g., a text file containing a few paragraphs).
Run the Job:
The job will start running, and you can monitor its progress in the Steps tab.

Step 4: Analyzing the Output

View the Output:
Once the job is complete, navigate to the `output/` folder in your S3 bucket.
Download the output files (`part-00000`, etc.) and view them to see the word count results.
Clean Up Resources:

Terminate the EMR cluster to avoid additional charges.
Delete the S3 bucket.

Reference no: EM133883227

Questions Cloud

Introduction to reimbursement and coding course differently : How would you have approached the introduction to reimbursement and coding course differently?
How do systemic factors-like the cost of nursing homes : How do systemic factors-like the cost of nursing homes or hospital understaffing-affect a consumer's ability to make the "right" decision for a loved one?
What role do global health organizations : What role do global health organizations, such as those mentioned above, play in preventing and controlling the spread of a communicable disease like COVID-19?
How does krakauer position himself in the narrative : Class Discussion Notes Use the questions below to take notes during the class discussion. How does Krakauer position himself in the narrative?
Write a basic word count mapreduce program : Week 7 Lab-Tutorial: Hadoop MapReduce Implementation in AWS using Amazon EMR - write a basic word count MapReduce program
Name the female filmmaker who began her career with action : Name the female filmmaker who began her career with action films such as Point Break and Strange Days and ultimately became the first woman to win an Oscar.
How does the protagonist experience connect : How does the protagonist experience(s) connect to other first-generation college students as they formulate language to understand their identity?
Principles of motivational interviewing : Discuss utilization of trends in end-of-life care to maximize the holistic caring process. Describe the four guiding principles of motivational interviewing
Describe at least two dimensions of wellness and wellbeing : Explain how the social determinants could influence or impact on the case study child's dimensions of wellness and wellbeing.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd