In-Class Group Activity: Word Counting and Indexing
Objective
Develop your programming skills by creating functions that count word occurrences and index words within a text file using dictionaries in Python.
Resources
- Word Count Code Example: Access the Codeboard Project or as onlineGDB project
Activity Steps
Step 1: Creating a Word Count Dictionary
Start code: link
Task:
Implement a function create_count_dic(filename) that reads a text file and returns a dictionary mapping each word to its frequency.
Example Output:
{'abide': 1, 'afternoon': 3, 'are': 15}
Guidelines:
- Open and read the specified file.
- Process the text to extract words.
- Populate the dictionary with word counts.
Step 2: Counting Words
Task:
Utilize the dictionary from Step 1 to determine the total number of unique words in the text, as well as the total number of words in the file.
Expected Outcome:
Display the count of distinct words, as well as the total number of words.
Step 3: Sorting and Displaying Word Counts Alphabetically
Task:
Modify your program to print an alphabetically sorted list of words along with their counts.
Example Output:
abide: 1
afternoon: 3
are: 15
Guidelines:
- Convert dictionary keys to a list:
aList = list(aDic.keys()) - Sort the list alphabetically using
sorted():sortedList = sorted(aList) - Iterate through the sorted list and print each word with its corresponding count.
Step 4: Building a Word Index with Line Numbers
Task:
Create a dictionary that maps each word to a list of line numbers where the word appears, effectively creating an index similar to that found in books.
Example Output:
wordIndex['tree'] = [3, 6, 8, 12, 16]
Interpretation:
The word “tree” appears on lines 3, 6, 8, 12, and 16.
Guidelines:
- Read the text file line by line.
- For each word in a line, add the line number to the corresponding list in the dictionary.
- Ensure that each word maps to a list of unique line numbers.
Step 5: Querying Line Numbers for a Specific Word
Task:
Allow users to input a word and retrieve the list of line numbers where that word appears.
Guidelines:
- Prompt the user to enter a word.
- Search the word in the
wordIndexdictionary. - Display the list of line numbers or inform the user if the word is not found.
Step 6: Displaying Lines Containing a Specific Word
Task:
Enhance the program to display the actual lines from the text that contain the user-specified word.
Guidelines:
- After retrieving the list of line numbers from Step 5, read the text file again.
- Extract and display the lines corresponding to those line numbers.
- Format the output for readability, indicating which line number corresponds to each line of text.
Summary
By completing this activity, you’ll gain hands-on experience with:
- Reading and processing text files.
- Utilizing dictionaries to count word frequencies and index word occurrences.
- Sorting and displaying data in an organized manner.
- Implementing user interactions to query and display specific information.
Next Steps:
- Experiment with additional features, such as ignoring case sensitivity or excluding common stop words.
- Optimize the functions for larger text files.
- Explore more advanced text processing techniques using libraries like
refor regular expressions.
