1) The average number of sentences per word over time
Do this two ways: First, using a simple regex that splits on puntuation. Second, using the natural language toolkit's sentence tokenizer.2) The average number of unique words per 100 words over time
- remove common words ( see the lecture slide for a list )Do this two ways:First: assume that different words are all unique, even if they share suffixes ( like run and running )
Second: using the stemming code from the NLTK.
Make sure that you sort the speeches by their date, and write the data into a text file with two columns: date and statistic.EC) Build a word cloud, as seen in the slides from lectures 11 and 12
Part 2: Collect an interesting web statistic.Use urlweb and a website of your choice to collect a statistic. Write a one paragraph description of your statistic at the top of the codefile.
Examples include sports game data, weather statistics, name statistics, etc.