Emily Dickinson Activity 2

In-Class Group Activity: Word Counting and Indexing

Objective

Develop your programming skills by creating functions that count word occurrences and index words within a text file using dictionaries in Python.

Resources

Activity Steps


Step 1: Creating a Word Count Dictionary

Task:
Implement a function create_count_dic(filename) that reads a text file and returns a dictionary mapping each word to its frequency.

Example Output:

{'abide': 1, 'afternoon': 3, 'are': 15}

Guidelines:

  • Open and read the specified file.
  • Process the text to extract words.
  • Populate the dictionary with word counts.

Step 2: Counting Unique Words

Task:
Utilize the dictionary from Step 1 to determine the total number of unique words in the text.

Expected Outcome:
Display the count of distinct words.

Guidelines:

  • Use the len() function on the dictionary keys to find the number of unique words.

Step 3: Sorting and Displaying Word Counts Alphabetically

Task:
Modify your program to print an alphabetically sorted list of words along with their counts.

Example Output:

abide: 1
afternoon: 3
are: 15

Guidelines:

  • Convert dictionary keys to a list:
    aList = list(aDic.keys())
  • Sort the list alphabetically using sorted():
    sortedList = sorted(aList)
  • Iterate through the sorted list and print each word with its corresponding count.

Step 4: Building a Word Index with Line Numbers

Task:
Create a dictionary that maps each word to a list of line numbers where the word appears, effectively creating an index similar to that found in books.

Example Output:

wordIndex['tree'] = [3, 6, 8, 12, 16]

Interpretation:
The word “tree” appears on lines 3, 6, 8, 12, and 16.

Guidelines:

  • Read the text file line by line.
  • For each word in a line, add the line number to the corresponding list in the dictionary.
  • Ensure that each word maps to a list of unique line numbers.

Step 5: Querying Line Numbers for a Specific Word

Task:
Allow users to input a word and retrieve the list of line numbers where that word appears.

Guidelines:

  • Prompt the user to enter a word.
  • Search the word in the wordIndex dictionary.
  • Display the list of line numbers or inform the user if the word is not found.

Step 6: Displaying Lines Containing a Specific Word

Task:
Enhance the program to display the actual lines from the text that contain the user-specified word.

Guidelines:

  • After retrieving the list of line numbers from Step 5, read the text file again.
  • Extract and display the lines corresponding to those line numbers.
  • Format the output for readability, indicating which line number corresponds to each line of text.

Summary

By completing this activity, you’ll gain hands-on experience with:

  • Reading and processing text files.
  • Utilizing dictionaries to count word frequencies and index word occurrences.
  • Sorting and displaying data in an organized manner.
  • Implementing user interactions to query and display specific information.

Next Steps:

  • Experiment with additional features, such as ignoring case sensitivity or excluding common stop words.
  • Optimize the functions for larger text files.
  • Explore more advanced text processing techniques using libraries like re for regular expressions.
Scroll to Top