KEY PHRASE EXTRACTION (PYTHON)

KeyPhrase ExtractionChallenge

Scanning large amount of textual data to find the most relevant feedback is one of the greatest challenge for marketing. We had enormous amount of unstructured text where a capability was required to quickly identify the main points in a collection of feedbacks. Together with sentiment analysis, key phrase extraction helps in focusing on the top items which needed immediate action. i.e., what are the top 10 key phrases linked to positive sentiments? or What are top 10 key phrases linked to negative sentiment which should be prioritized in order to improve the experience? While it is not realistic to fix all the problem at the same time, we could focus on those 20% of problems improving 80% of the experience, just like the 80/20 pareto principle!

Idea and Solution

Keyword extraction uses machine learning artificial intelligence (AI) with natural language processing (NLP) to break down human language so that it can be understood and analyzed by machines. Using the power of Python and Microsoft Power BI, we implemented a machine learning ML algorithm called RAKE (Rapid Automatic Keyword Extraction) to solve this problem.  It is a keyword extraction method which uses a list of stopwords and phrase delimiters to detect the most relevant words or phrases in a piece of text. So, we leveraged Python + Power BI combination to visualize the key phrases in word clouds and tables.

Word Cloud with correlation of negative sentiments

 

Word Cloud with correlation of positive sentiments

GitHub: Key Phrase Extraction and Visualization

 

Results

Keyword extraction has helped to automatically index data, summarize a text and generation of tag clouds with the most representative keywords. Now, we have real time customer experience insights right after the marketing event. This project has not helped in improving the customer experience and sentiments but have also saved huge of time wasted in reviewing large amount of textual data, This allows to analyze as much data as we want. On the top of it, we checking the trends of positive/ neutral/ negative comments over time has become a piece of cake!

Follow me on Linkedin, Medium, GitHub for more stuff like this