Python word count does not work

Python word count does not work … here is a solution to the problem.

Python word count does not work

Textfile1 contains words, some of which are repetitive:

Train

21street

Train

And many more.

I need to count the number of times they appear and write them to Textfile2 while removing duplicates. Also, alphabetically, which is why I have sorted there. Example of what the final Textfile2 should look like:

Train 2

21street 1

… Wait a minute.

Here is my attempt :

file1=open(textfile1,"r")
list1=[]

for line in file1:
    list1.append(line)

import collections

counter=collections. Counter(list1) #not sure how I can use this in my program

list2=list(set(list1))

list3=sorted(list2)

file2=open(textfile2,"w")

for i in list3:

file2.write(i+count((i)in list1))

Word count doesn’t seem to work, I don’t know how to solve it. Thanks for your help.

Solution

Let’s make some changes step by step, starting with your mistake.

file2.write(i+count((i)in list1))
#             ^^^^^^^^^^^^^^^^^^ 
# NameError: name 'count' is not defined

The problem is that the count of your visits is incorrect. Counter works like dict; The key is what is being calculated and the value is the count (int). You named Counter counter, so to get the count of the i row, change it to this, which gives others the error reason:

file2.write(i+counter[i])
#             ^^^^^^^^^^ 
# TypeError: must be str, not int

Even if we succeed in getting the count, we can’t add it to row i like this. Lines and counts are two different types; One is text (str) and the other is numeric (int). We need to convert that number to its text representation. If this confuses you, think of it this way: 2 + 2 == 4 and “2” + "2" ==22". Here’s how:

file2.write(i+str(counter[i]))

No more errors, but depending on how you tested, the file opened as file2 may still be empty. Changes are written to disk only when you are finished closing it. To never forget to do this, you can use the with statement to leave the bookkeeping work to Python. At the end of the indented block, the file closes automatically. Below is the complete code, with some comment changes:

# imports at the top
import collections

list1=[]
with open(textfile1,"r") as file1:
    for line in file1:
        list1.append(line)
# file1 automatically closed here
counter=collections. Counter(list1)
list2=list(set(list1))
list3=sorted(list2)
with open(textfile2,"w") as file2:
    # i implies index which it isn't; let's call it line too
    for line in list3:
        file2.write(line+str(counter[line]))
# file2 automatically closed here

Once run, the file opened with file2 will look like this:

21street
1Train
2

The number ends on the next line. This happens because the lines you store in the list are not just “21street” and “Train”, but “21street\n” and “Train\n". The “\n" at the end is a newline character that is used as a line separator. If you add any text after that, it will end on a new line – that’s the point. In list, such a separator is no longer needed, so let’s remove it:

        list1.append(line.rstrip("\n"))
        #                ^^^^^^^^^^^^^

Now your output will be something like this:

21street1Train2

When writing to the file again, you need to add the delimiter in the correct place. What is the right place? At the end of a line. Also, the space between the line and the count would be nice:

        file2.write(line+" "+str(counter[line])+"\n")
        #               ^^^^                   ^^^^^

The final output is as desired:

21street 1
Train 2

Related Problems and Solutions