Word counter PDF files

A program for counting the number of words(word tokenize) in PDF files.

It should be noted that this program does not detect scanned files.

How to run

To run this file; Just use steps below:

Install python3, pip, PyPDF2, nltk.
Clone the project Word_counter

Tip

NLTK libraries are required.

If you want to install them on your system You must run the following code:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

Parameters

You must modify the filename variable to rename the input file:

filename = 'Your_file.pdf'

To change the number of output words, you must modify the variable count_word:

count_word = 30

TODO List

Create a CSV file
Create a Wordclouds

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
corpora		corpora
tokenizers		tokenizers
LICENSE		LICENSE
README.md		README.md
main.py		main.py
out.csv		out.csv
requirements.txt		requirements.txt
test.pdf		test.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word counter PDF files

How to run

Table Of Contents

Tip

Parameters

TODO List

About

Uh oh!

Releases 1

Packages

Languages

License

mohammad26845/Word_counter

Folders and files

Latest commit

History

Repository files navigation

Word counter PDF files

How to run

Table Of Contents

Tip

Parameters

TODO List

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages