Analyse social media by applying natural language processing

Assignment Help Other Subject
Reference no: EM133997640 , Length: Word Count:1500

Social Media and Network Analysis

Overview
One of the tasks a data scientists or analyst does is to answer questions from data - these questions could be business, social or even research oriented. In this assignment, you will answer some (interesting) questions using Reddit data, and will involve from data collection to communication the findings.

Learning Outcome 1: Apply data science to analyse social media and social networks;

Learning Outcome 2: Analyse social media by applying Natural Language Processing (NLP) techniques to detect sentiment and topics;

Learning Outcome 3: Synthesise and present insights from the social media and network analysis performed.

Background

Scenario
Imagine you are working for a hypothetical organisation, CarefreeInsights, which uses social media to help their clients do market research. CarefreeInsights has a potential client, who wishes to conduct some market research about their brand. In particular, they are interested in knowing what people are saying about them online, and what their feelings are towards their brand. This client has never used social media based analytics before, and would like a demonstration of what can be done. Your manager has asked you to provide this demonstration.

Assignment Details
In this assignment, you'll use Reddit data to provide such a demonstration. You'll get to practice on going through the whole data science process. The assignment can be broken into 5 parts, but by no means is this the order you have to follow, and remember the process tends not to be a sequential or waterfall model, but one that has constant backtracking and more of an iterative process.
First we examine the questions to answer. No AI shortcuts — Only authentic assignment help from real expert tutors.

Questions to Answer
In order to answer the clients' goals of understanding what people are talking about and how they perceive their brand, it you should consider and answer the following questions:
What are the trending concepts and topics associated with this person or event?
What are the perceptions and feelings towards this person or event?

Data Collection
Next step is to collect data. Our focus is on Reddit, but your manager (and lecturer) want you to to analyse something of interest to you. Hence, in this assignment and task, select a person, brand, product or event that you are interested about. Using either the API, gather at least 1 month1 worth of posts. For the analysis to be interesting, you'll want to select an entity that generates a fair amount of interest, and ideally there is a fair amount of interest and submissions and comments.
If you having issues with selecting an entity to study and analyse, consider selecting:
Your favourite band, actress/actor, TV series, sport team
Your favourite hobby
Current major events

A well known brand or company
As a suggestion, gather some initial subreddits, submissions and comments from your selected entity, and examine the number of "posts" collected. If there are too few, select another one. If there are too many, e.g., millions of posts in a week, select another one as the processing time will take an substantial amount of your time if there are too many posts.

After you selected an entity to analyse, please submit this via a form in OneDrive. Please do this by no later than end of Tuesday, March 24. Note we are trialling unique topics for everyone, hence we want everyone to study something different. The list of proposed topics will be publicly visible and there will be first come first served principle applied - i.e., whoever first states a topic to study on the form will get that, and subsequent students posing the same topic will be asked to choose something else. So there is incentive to think about and propose a topic as early as possible. Feel free to reach out to Jeff for ideas or feedback about topic selection.

This is a rough guide, you may need more or less, but no hard rule.

Data Pre-processing and Exploration
The next suggested step is to pre-process and do some initial exploration of it. There might be a feedback loop between these two steps.
To help you get started, you might want to do the following. Compute basic statistics on the data, e.g., number of submissions, top K unique words etc. Do you think the data appears adequate to answer the questions? Hint: remember you want a reasonable number of posts, and likely to be more interesting to analyse if there are enough unique words, indicating a diversity of topics discussed. Open some gathered submissions and comments and read them. Do we need to do any pre-processing or cleaning? E.g., are their foreign language Reddit posts (for this assignment we stick with English, so your lecturer can also read them!)? Are there characters that aren't useful for analysis?
After initial exploration and pre-processing of the data, consider what models or approaches you'll use to answer the questions. Do they need the data to be pre-processed? If so, what kind of pre-processing? Perform this pre-processing.

Method/Model
Run the selected models/approaches. Perform initial analysis - what do the results indicate? Does the approach selected have parameters - if so, what effect does the parameter settings have on the results obtained? Would a different approach produce different answer? This step requires some exploration and analysis, and that might lead to more data pre-processing.
To get started, examine the previous labs to get ideas on possible approaches.

Analysis
Remember we are trying to use data to answer the questions. Hence, for this component, first present outputs of your models and/or data analysis to answer the questions. E.g., present what topics have been discussed about a user via a top-K terms, or a word-cloud of the topics discovered by topic modelling. Then discuss this, e.g., what are the topics, does it correspond to recent news or other sources of information, if the results don't correspond to background knowledge, why do you think that is so?

Communication
In this assignment, you'll produce a report about your answers, findings and insights to the questions. Describe your data, outline and describe your approach, your findings and insights to the questions. Use tables, plots/graphs, word clouds and other visualisations to help you communicate the results (in addition to text).

As the audience of the report is your hypothetical client and your lecturer, so you'll include things that might not typically include for a business report, such as describing the approach in some detail.

Assessment
The assessment is predominantly based on your report and submitted scripts.

Report: Examine the assessment rubric below, but you'll be assessed on how you approached in answering the questions and what findings and conclusions did you derive from your analysis. The report presentation is also important. Report should have a maximum of 15 A4 pages.

Code: In addition, your code will be assessed on their readability and to see if your code and actual analysis justifies the findings and conclusion described in your report. You will be asked to demonstrate your code to a tutor in person at a nominated time, where you will be essentially receive a pass / fail mark for this component. This is an opportunity for you to explain your code and approach. If you fail this part due to an inability to explain your code or answer questions about it, you may be referred for academic misconduct.

Reference no: EM133997640

Questions Cloud

What is the underlying principle of empowerment : Implementation of the commNursing interventions and approaches for helping individuals and families assume. What is the underlying principle of empowerment?
Which characteristic observed during a nurses assessment : Which characteristic observed during a nurse's assessment of a patient brought to the hospital following a car accident in which the vehicle was totaled?
What activity is the nurse conducting : Do you have any symptoms that improve when you are away from your home or work? What activity is the nurse conducting?
Identify common problems and potential improvements : Assess the specific criteria to assess PHM investments. Identify common problems and potential improvements from your needs assessment exercise
Analyse social media by applying natural language processing : COSC2671 Social Media and Network Analysis, RMIT University - Analyse social media by applying Natural Language Processing (NLP) techniques to detect sentiment
Explain the importance of your selected health indicator : Explain the importance of your selected health indicator. Develop a plan for prioritizing quality improvement efforts and mobilize the resources in your health.
Describe potential behaviors a patient of culture display : Describe potential behaviors a patient of this culture might display while hospitalized, considering cultural preferences and taboos.
Discuss the performance evaluation process for nurses : Discuss the performance evaluation process for nurses. Create a template for the review identifying four key areas. Demonstrate the ability to rate performance.
What action by the nurse is best : The charge nurse on medical unit is preparing to admit several clients who have possible pandemic flu during preparedness drill. What action by nurse is best?

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd