Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 1.33 KB

README.md

File metadata and controls

12 lines (9 loc) · 1.33 KB

Natural-Language-Processing

Here we will apply natural language process using variety packages like keras, textblob, genism etc Application of NLP concepts using regex functionality and NLTK in which application LDA and LSA is applied for text summerization LDA is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.

  • Each document is modeled as a multinomial distribution of topics and each topic is modeled as a multinomial distribution of words.
  • LDA assumes that the every chunk of text we feed into it will contain words that are somehow related. Therefore choosing the right corpus of data is crucial.
  • It also assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution.

Latent Semantic Analysis (LSA) involves creating structured data from a collection of unstructured texts. Before getting into the concept of LSA, let us have a quick intuitive understanding of the concept. When we write anything like text, the words are not chosen randomly from a vocabulary.

word frequencies in different documents play a key role in extracting the latent topics. LSA tries to extract the dimensions using a machine learning algorithm called Singular Value Decomposition or SVD.