Project Goal
The goal of this lab is to create a program check how often a given word occurs in each of the US presidents inaugural speeches.
A project like this needs a few resources. These includes all the inaugural speeches. These can be found at https://cs.jimskon.com/text/president/ They are also included in the onlineGDB project.
This project is complicated enough that you will need to break it up into a bunch of different functions, and you need to write and test each of these before assembling them into a complete working solution.
Method
You will start with the project given on onlineGDB which contains the following skeleton code. Copy each of the txt files from https://cs.jimskon.com/text/president/ and add them to your project.
import random
import requests
import sys
def ReadSpeech(filename):
# read speech file
# Return list of lines
# ...
return text
def getSpeechList():
# Read in the speeches file, return list of filenamesameri
return speechList
def removePunctuation(s):
# Remove all punctucation from a line
import string
for c in string.punctuation:
s = s.replace(c, "")
return s
def countWord(d, w):
# Given a diction and word
# If the word is in the dictionay
# add one to the count
# otherwise add the word with a 1
return 0
def createFreqDict(text):
# Build a dictionary of words and counts
# Given text, and list of lines
# Parse the lines into words
# and if the word is not in the dictionary
# Add it with a count of 1
# otherwise increment it's count
freqList = {}
for line in text:
print(line)
return freqList
def main():
wordFreqList = []
# 1. Get the list of speech names
speechnames = {"test"}
# 2. For each speech
for speechname in speechnames:
# 3. read in the speech
print(".", end="")
sys.stdout.flush()
# 4. Build a word frequency dictionary
# 5. Add to list of dictionaries
print()
done = False
while not done:
# 6. Get the word
word = input("Enter word to search for in speeches: ").lower()
# 7. Check each speech for the word
for dic in wordFreqList:
# 8. Do we continue?
if input("Hit 'e' to stop...") == 'e':
done = True
main()
As you did in Lab 4, you should go through and complete and test each function, and then put them together as a final, working, system. You will not be given separate projects for this, you should do this on your own. You may either write and up test the functions in this project, or create separate programs to do this.
Example
The program should look something like this:
Enter word to search for in speeches: book
'book' occurs, 1, times in 38.FranklinDelanoRoosevelt2.txt
'book' occurs, 1, times in 52.GeorgeHWBush.txt
Hit 'e' to stop...
Enter word to search for in speeches: wife
'wife' occurs, 1, times in 19.AbrahamLincoln1.txt
'wife' occurs, 2, times in 48.GeraldFord.txt
Hit 'e' to stop...kenyon
Enter word to search for in speeches: kenyon
Sorry, kenyon is not in any speeches
Hit 'e' to stop...
Enter word to search for in speeches: congress
'congress' occurs, 1, times in 03.JohnQuincyAdams.txt
'congress' occurs, 5, times in 09.JamesMonroe2.txt
'congress' occurs, 6, times in 10.JohnAdams.txt
'congress' occurs, 2, times in 11.AndrewJackson1.txt
'congress' occurs, 1, times in 13.MartinVanBuren.txt
'congress' occurs, 11, times in 14.WilliamHenryHarrison.txt
'congress' occurs, 3, times in 15.JamesKPolk.txt
'congress' occurs, 4, times in 16.ZacharyTaylor.txt
'congress' occurs, 9, times in 18.JamesBuchanan.txt
'congress' occurs, 5, times in 19.AbrahamLincoln1.txt
'congress' occurs, 1, times in 21.UlyssesSGrant1.txt
'congress' occurs, 1, times in 22.UlyssesSGrant2.txt
'congress' occurs, 1, times in 23.RutherfordBHayes.txt
'congress' occurs, 9, times in 24.JamesAGarfield.txt
'congress' occurs, 8, times in 26.BenjaminHarrison.txt
'congress' occurs, 18, times in 28.WilliamMcKinley1.txt
'congress' occurs, 9, times in 29.WilliamMcKinley2.txt
'congress' occurs, 14, times in 31.WilliamHowardTaft.txt
'congress' occurs, 1, times in 34.WarrenGHarding.txt
'congress' occurs, 2, times in 35.CalvinCoolidge.txt
'congress' occurs, 3, times in 36.HerbertHoover.txt
'congress' occurs, 4, times in 37.FranklinDelanoRoosevelt1.txt
'congress' occurs, 1, times in 48.GeraldFord.txt
'congress' occurs, 1, times in 50.RonaldReagan1.txt
'congress' occurs, 1, times in 51.RonaldReagan2.txt
'congress' occurs, 3, times in 52.GeorgeHWBush.txt
'congress' occurs, 3, times in 53.BillClinton1.txt
'congress' occurs, 2, times in 54.BillClinton2.txt
'congress' occurs, 1, times in 57.BarackObama1.txt
'congress' occurs, 1, times in 58.BarackObama2.txt
Hit 'e' to stop...eCode language: JavaScript (javascript)
Learning Goals
- Reading from files
- Using dictionaries
- Using Lists
- Using functions (including parameters and returning values)
Method
For this assignment it is essential that you break the problem up into smaller pieces and write and test each before assembling them into a final version. The following steps will help you with this.
Following is a framework for the program. You are to finish each of the functions, and then the final program: Lab5
Step 1 – Get the speech file name list. This function reads the speech filenames into
def getSpeechList():
# Read in the speeches.txt file, and make a list of filenames, one for each speech
return filenamesCode language: PHP (php)
Step 2 – Create a word frequency dictionary
This function creates dictionary of words with frequency counts. The trick is to read each line from the file, remove the punctuation, then check if the word is in the stop word dictionary. If it is not, we check if the word is in the word frequency dictionary. If it isn’t, we add it with a 1 for the word count, otherwise we update the word count by adding one.
The word frequency dictionary should look like this (Biden’s speech):
{'chief': 1, 'justice': 5, 'roberts': 1, 'vicepresident': 2, 'harris': 2, 'speaker': 1, 'pelosi': 1, 'leader': 2, 'schumer': 1, 'mcconnell': 1, 'pence': 1, 'distinguished': 1, 'guests': 1, 'fellow': 6, 'americans': 11, 'americas': 2, 'day': 7, 'democracys': 1, 'history': 7, 'hope': 3, 'renewal': 1, 'resolve': 3, 'through': 8, 'crucible': 1, 'ages': 2, 'america': 16, 'tested': 3, 'anew': 1, 'risen': 1, 'challenge': 3, 'today': 7, 'celebrate': 1, 'triumph': 1, 'candidate': 1, 'cause': 4, 'democracy': 11, 'people': 11, 'heard': 1, 'heeded': 1, 'weve': 5, 'learned': 1, 'precious': 1, 'fragile': 1, 'hour': 2, 'friends': 2, 'prevailed':2, ...}Code language: JavaScript (javascript)
Step 3 – Create a list of word frequency dictionaries
Above we saw how to create a word frequency dictionary for a single speech. What we need to do is create a different one for every speech, and then collect them together into a list of word frequency dictionaries. E.g.
#wordFreqDicList=[]
#for each speech sp:
# wordfreqDic = createWordFreqDic(sp)
# wordFreqDicList.append(wordfreqDic)Code language: PHP (php)
Step 4 – Query for a word, and list its frequency in every speech
Step 5 – Put it all together.
Now we just need to assemble the final solution.
Turn in
Submit the project on onlineGDB and turn in the project link to Moodle. Be sure to include the standard information and the honor statement at the top of your program file. Any submission that is missing the Academic Integrity Statement will not be graded.
Grading
| Requirements | Grading Comments | Points | Score |
|---|---|---|---|
| Fully functioning code that works exactly like the game shown above | 50 | ||
| Completion of all functions in the start code | 40 | ||
| Fully commented code, with name, date, description, honor statement, and comments on all functions. | 10 | ||
| Total | 100 |
