SCMP 118 – Lab 5: President’s Inaugural Speeches Word Analysis

Project Goal

The goal of this lab is to create a program check how often a given word occurs in each of the US presidents inaugural speeches.

For this project we need a lot of file, and OnlinGBD will not work. Therefor we are using codeboard to do the program. You will need to create an account, and save the program and turn in the link just as you do with OnlinGBD.

A project like this needs a few resources. These includes all the inaugural speeches. These can be found here https://cs.jimskon.com/text/president/. They are also included in the codeboard project.v

Finally this project is complicated enough that you will need to break it up into a bunch of different functions, and you need to write and test each of these before assembling them into a complete working solution.

Method

You will start with this project: Project Link

NOTE: In CodeBoard you must preceed file names with “./Root/“. For example, “./Root/president/09.JamesMonroe2.txt“.

import random
import requests
import sys
def ReadSpeech(filename):
    # read speech file
    # Return list of lines
    # ...
    return text
def getSpeechList():
    # Read in the speeches file, return list of filenamesameri
    return speechList
def removePunctuation(s):
    # Remove all punctucation from a line
    import string
    for c in string.punctuation:
        s = s.replace(c, "")
    return s
def countWord(d, w):
    # Given a diction and word
    # If the word is in the dictionay
    # add one to the count
    # otherwise add the word with a 1
    return 0
def createFreqDict(text):
    # Build a dictionary of words and counts
    # Given text, and list of lines
    # Parse the lines into words
    # and if the word is not in the dictionary
    # Add it with a count of 1
    # otherwise increment it's count
    freqList = {}
    for line in text:
      print(line)
    return freqList
def main():
    wordFreqList = []
    # 1. Get the list of speech names
    speechnames = {"test"}
    # 2. For each speech
    for speechname in speechnames:
        # 3. read in the speech
  
        print(".", end="")
        sys.stdout.flush()
        # 4. Build a word frequency dictionary
        # 5. Add to list of dictionaries
    print()
    done = False
    while not done:
        # 6. Get the word
        word = input("Enter word to search for in speeches: ").lower()
        # 7. Check each speech for the word
        for dic in wordFreqList:
          
        # 8. Do we continue?  
        if input("Hit 'e' to stop...") == 'e':
            done = True
main()

As you did in Lab 4, you should go through and complete and test each function, and then put them together as a final, working, system. You will not be given separate projects for this, you are left to do this on your own. You may either write and up test the functions in this project, or create other replit’s to do this.

Example

The program should look something like this:

Enter word to search for in speeches: book
'book' occurs, 1,  times in 38.FranklinDelanoRoosevelt2.txt
'book' occurs, 1,  times in 52.GeorgeHWBush.txt
Hit 'e' to stop...
Enter word to search for in speeches: wife
'wife' occurs, 1,  times in 19.AbrahamLincoln1.txt
'wife' occurs, 2,  times in 48.GeraldFord.txt
Hit 'e' to stop...kenyon
Enter word to search for in speeches: kenyon
Sorry, kenyon is not in any speeches
Hit 'e' to stop...
Enter word to search for in speeches: congress
'congress' occurs, 1,  times in 03.JohnQuincyAdams.txt
'congress' occurs, 5,  times in 09.JamesMonroe2.txt
'congress' occurs, 6,  times in 10.JohnAdams.txt
'congress' occurs, 2,  times in 11.AndrewJackson1.txt
'congress' occurs, 1,  times in 13.MartinVanBuren.txt
'congress' occurs, 11,  times in 14.WilliamHenryHarrison.txt
'congress' occurs, 3,  times in 15.JamesKPolk.txt
'congress' occurs, 4,  times in 16.ZacharyTaylor.txt
'congress' occurs, 9,  times in 18.JamesBuchanan.txt
'congress' occurs, 5,  times in 19.AbrahamLincoln1.txt
'congress' occurs, 1,  times in 21.UlyssesSGrant1.txt
'congress' occurs, 1,  times in 22.UlyssesSGrant2.txt
'congress' occurs, 1,  times in 23.RutherfordBHayes.txt
'congress' occurs, 9,  times in 24.JamesAGarfield.txt
'congress' occurs, 8,  times in 26.BenjaminHarrison.txt
'congress' occurs, 18,  times in 28.WilliamMcKinley1.txt
'congress' occurs, 9,  times in 29.WilliamMcKinley2.txt
'congress' occurs, 14,  times in 31.WilliamHowardTaft.txt
'congress' occurs, 1,  times in 34.WarrenGHarding.txt
'congress' occurs, 2,  times in 35.CalvinCoolidge.txt
'congress' occurs, 3,  times in 36.HerbertHoover.txt
'congress' occurs, 4,  times in 37.FranklinDelanoRoosevelt1.txt
'congress' occurs, 1,  times in 48.GeraldFord.txt
'congress' occurs, 1,  times in 50.RonaldReagan1.txt
'congress' occurs, 1,  times in 51.RonaldReagan2.txt
'congress' occurs, 3,  times in 52.GeorgeHWBush.txt
'congress' occurs, 3,  times in 53.BillClinton1.txt
'congress' occurs, 2,  times in 54.BillClinton2.txt
'congress' occurs, 1,  times in 57.BarackObama1.txt
'congress' occurs, 1,  times in 58.BarackObama2.txt
Hit 'e' to stop...eCode language: JavaScript (javascript)

Learning Goals

  1. Reading from files on the web
  2. Using dictionaries
  3. Using Lists
  4. Using functions (including parameters and returning values)

Method

For this assignment it is essential that you break the problem up into smaller pieces and write and test each before assembling them into a final version. The following steps will help you with this.

Following is a framework for the program. You are to finish each of the functions, and then the final program: Lab5

Step 1 – Get the speech file name list. This function reads the speech filenames into

def getSpeechList():
  # Read in the speeches.txt file
  return filenamesCode language: PHP (php)

Step 2 – Create a word frequency dictionary

This function creates dictionary of words with frequency counts. The trick is to read each line from the file, remove the punctuation, then check if the word in in the stop word dictionary. If it is not, we check if the word is in the word frequency dictionary. If it isn’t, we add it with a 1 for the word count, otherwise we update the word count by adding one.

The word frequency dictionary should look like this (Biden’s speech):

{'chief': 1, 'justice': 5, 'roberts': 1, 'vicepresident': 2, 'harris': 2, 'speaker': 1, 'pelosi': 1, 'leader': 2, 'schumer': 1, 'mcconnell': 1, 'pence': 1, 'distinguished': 1, 'guests': 1, 'fellow': 6, 'americans': 11, 'americas': 2, 'day': 7, 'democracys': 1, 'history': 7, 'hope': 3, 'renewal': 1, 'resolve': 3, 'through': 8, 'crucible': 1, 'ages': 2, 'america': 16, 'tested': 3, 'anew': 1, 'risen': 1, 'challenge': 3, 'today': 7, 'celebrate': 1, 'triumph': 1, 'candidate': 1, 'cause': 4, 'democracy': 11, 'people': 11, 'heard': 1, 'heeded': 1, 'weve': 5, 'learned': 1, 'precious': 1, 'fragile': 1, 'hour': 2, 'friends': 2, 'prevailed':2, ...}Code language: JavaScript (javascript)

Step 3 – Create a list of word frequency dictionaries

Above we saw how to create a word frequency dictionary for a single speech. What we need to do is create a different one for every speech, and then collect them together into a list of word frequency dictionaries. E.g.

#wordFreqDicList=[]
#for each speech sp:
#   wordfreqDic = createWordFreqDic(sp)
#   wordFreqDicList.append(wordfreqDic)Code language: PHP (php)

Step 4 – Query for a word, and list its frequency in every speech

Step 5 – Put it all together.

Now we just need to assemble the final solution.

Turn in

  1. Submit the project on repl.it
  2. Turn the project URL into Moodle
  3. Run the program and turn in a file copied from the output.

Grading

RequirementsGrading CommentsPointsScore
Fully functioning code that works exactly like as described60
Completion of all functions in the start code30
Fully commented code, with you name and date at the top.10
Total100
Scroll to Top