Python word count does not work
Textfile1 contains words, some of which are repetitive:
Train
21street
Train
And many more.
I need to count the number of times they appear and write them to Textfile2 while removing duplicates. Also, alphabetically, which is why I have sorted
there. Example of what the final Textfile2 should look like:
Train 2
21street 1
… Wait a minute.
Here is my attempt :
file1=open(textfile1,"r")
list1=[]
for line in file1:
list1.append(line)
import collections
counter=collections. Counter(list1) #not sure how I can use this in my program
list2=list(set(list1))
list3=sorted(list2)
file2=open(textfile2,"w")
for i in list3:
file2.write(i+count((i)in list1))
Word count doesn’t seem to work, I don’t know how to solve it. Thanks for your help.
Solution
Let’s make some changes step by step, starting with your mistake.
file2.write(i+count((i)in list1))
# ^^^^^^^^^^^^^^^^^^
# NameError: name 'count' is not defined
The problem is that the count of your visits is incorrect. Counter
works like dict
; The key is what is being calculated and the value is the count (int
). You named Counter
counter, so to get the count of the
i
row, change it to this, which gives others the error reason:
file2.write(i+counter[i])
# ^^^^^^^^^^
# TypeError: must be str, not int
Even if we succeed in getting the count, we can’t add it to row i
like this. Lines and counts are two different types; One is text (str
) and the other is numeric (int
). We need to convert that number to its text representation. If this confuses you, think of it this way: 2 + 2 == 4 and “2” + "2" ==
“22"
. Here’s how:
file2.write(i+str(counter[i]))
No more errors, but depending on how you tested, the file opened as file2
may still be empty. Changes are written to disk only when you are finished closing it. To never forget to do this, you can use the with
statement to leave the bookkeeping work to Python. At the end of the indented block, the file closes automatically. Below is the complete code, with some comment changes:
# imports at the top
import collections
list1=[]
with open(textfile1,"r") as file1:
for line in file1:
list1.append(line)
# file1 automatically closed here
counter=collections. Counter(list1)
list2=list(set(list1))
list3=sorted(list2)
with open(textfile2,"w") as file2:
# i implies index which it isn't; let's call it line too
for line in list3:
file2.write(line+str(counter[line]))
# file2 automatically closed here
Once run, the file opened with file2
will look like this:
21street
1Train
2
The number ends on the next line. This happens because the lines you store in the list are not just “21street” and “Train”, but “21street
\n” and “Train
\n"
. The “
\n"
at the end is a newline character that is used as a line separator. If you add any text after that, it will end on a new line – that’s the point. In list
, such a separator is no longer needed, so let’s remove it:
list1.append(line.rstrip("\n"))
# ^^^^^^^^^^^^^
Now your output will be something like this:
21street1Train2
When writing to the file again, you need to add the delimiter in the correct place. What is the right place? At the end of a line. Also, the space between the line and the count would be nice:
file2.write(line+" "+str(counter[line])+"\n")
# ^^^^ ^^^^^
The final output is as desired:
21street 1
Train 2