Key Phrase Extraction and Visualization: Python and Microsoft Power BI

By jayant kodwani on March 10, 2021 • ( 1 Comment )

Discover insights in unstructured text

Implementing RAKE algorithm in Python and Power BI integration

Key-Phrase Extraction, Photo by Rabie Madaci on Unsplash

We live in an age where data is the new currency! This makes the Big tech giants the richest companies in the world. The best investment for the next few decades will be the investment in data. So, what do these companies do with this data? How can anyone handle pieces of textual and unstructured data from Facebook posts, Twitter or Linkedin? To a layman, scanning or sampling might sound like a good idea, however, data scientists know the risks of sampling and the pain of scanning text by text, row by row, and word by word 😬. This is where data experts use “Key-phrase Extraction”.

Key-phrase Extraction is the skill to evaluate unstructured text and returning a list of key phrases. For example, given input text “The food was delicious and there were wonderful staff”, the service returns the main talking points: “food” and “wonderful staff”.

What will we Discuss?

In this story, we will extract key-phrases using RAKE algorithm in Python on a sample set of data and then and visualize in Microsoft Power BI.

Here is the link for the sample data that we will use: Sample Data

What is RAKE?

RAKE is short for Rapid Automatic Keyword Extraction algorithm, it is a domain-independent keyword extraction algorithm that tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurrence with other words in the text.

Resources Required

Python instance (i.e. Spyder)
Microsoft Power BI Desktop (Pro License)
(OPTIONAL) Microsoft Azure Subscription (Free Trial or Paid) to correlate key-phrases together with sentiments.

Are you ready?? Here we go 🏄

Step 1: Install RAKE package and store stop wordlist

1.1 Installation: Open Python instance (i.e. Spyder 🐍 ) and issue below command to install the rake package.

!pip install python-rake==1.4.4

Image for post — Installing RAKE algorithm package in Spyder Python instance

1.2 Create stop wordlist: Stop words are the words that generally do not help in text analysis and are typically dropped within all the informational systems and also not included in various text analyses as they are considered to be meaningless. Words that are considered to carry a meaning related to the text are described as the content bearing and are called content words. You can download the stopwords list here and customize the same as per your requirements. Save it at the desired location and copy the path for configuring the Python script.

Step 2: Open Power BI, Import Data & Configure Python script

2.1 Power BI Data Import: Open a new instance of Power BI desktop>> Import Data from Excel (Sample Data) >>Browse the sample data file >> Import data >> Calling “Run Python script” in Power Query Editor (Under Transform)

2.2 Prepare your Python Script: You can use the below Python script and customize the same by replacing the path for stopwords list in row 11.

Also, you can specify/restrict the # of key-phrases to be extracted by modifying the count in row 31 (i.e. replace [-1:] to [-5:] to get up to 5 key-phrases from 1 text input)

	"""
	@author: Jayant Kumar Kodwani
	"""
	# 'dataset' holds the input data for this script

	import RAKE
	import pandas as pd

	"""Add stopwords list, REPLACE path as required"""

	stop_dir = r"C:\Users\Jayant\.spyder-py3\stopwords.txt"
	rake_object = RAKE.Rake(stop_dir)

	"""Create a empty dataframe to store output"""

	Rake_Final_Output = pd.DataFrame()

	#Assign your dataset to a variable
	df= dataset

	def Sort_Tuple(tup):
	tup.sort(key= lambda x:x[1])
	return tup

	# Loop through all the field/column values and apply RAKE
	for x in range(len(df)):
	subtitles = df.Answer[x]
	print (subtitles)

	"""Run Rake Algorithm, You can change the parameter [-1:] to get more than 1 keyphrase from the text"""
	keywords=Sort_Tuple(rake_object.run(subtitles))[-1:]

	# create DataFrame using RAKE output data
	Output = pd.DataFrame(keywords, columns =['Word', 'KeywordScore'])

	Output['Keywords']=keywords
	Output['KeywordScore'] = Output['KeywordScore'].astype('float')

	Output['Date']=df.Date[x]
	Output['Question']=df.Question[x]
	Output['Answer']=df.Answer[x]
	Output['Index']=df.Index[x]

	Rake_Final_Output = Rake_Final_Output.append(Output, ignore_index=True)

view raw Key-phrase extraction and Integration in Power BI hosted with ❤ by GitHub

Once done with the customization, you can apply the script and expand the “Rake_Final_Output” dataset. You can Save and Close the Power Query Editor to apply the script. This is how your dataset looks like after new fields added for key-phrases and their scores.

Step 3: Power BI Integration and Visualization

Now comes the fun part that we all love, the visualizations! 💝

In order to visualize the key-phrases, I would recommend to use a Word Cloud ☁️ together with tables preferably with sentiment analysis 😃, so you can relate the key-phrases with positive, neutral and negative sentiments.

You can download a sample Power BI template which integrates Sentiment analysis as well as key-phrase extraction all packed together in a Power BI.

As you can see in the below example, we have “Top 10 Key phrases with negative sentiments” where phrases like “Slower Connections” and “restart 10 times” are directly correlated to negative sentiments 😢

Similarly, we have “Top 10 Key phrases with positive sentiments” where phrases like “explained neatly” and “great in depth knowledge” are directly correlated to positive sentiments 😃.

Conclusion

We learned 📘 how to apply RAKE algorithm to extract key-phrases and integrate the analysis in Microsoft Power BI to develop visualizations.

You could use other datasets and customize the code to see what suits your use case best! 👍

Came across a different approach for key-phrase extraction? Please drop it in the comments !

References

[1] https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/tutorials/tutorial-power-bi-key-phrases

[2] https://towardsdatascience.com/analyzing-and-visualizing-sentiments-from-unstructured-raw-data-c263ba96cc2c

[3] Data source: prepared manually by the Author

Follow me on Linkedin, Medium, GitHub for more stuff like this

Categories: Data Science

	jayant kodwani on VBA Macro to Split Single Exce…
	norberto on VBA Macro to Split Single Exce…
	jayant kodwani on VBA Macro to Split Single Exce…
	PM Aspire on PMP vs PRINCE2 Practitioner: W…
	PM Aspire on A Beginner’s Guide for P…

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

JayantKodwani.com

परोपकारार्थं इदं शरीरं – This life is to help others

Key Phrase Extraction and Visualization: Python and Microsoft Power BI

Discover insights in unstructured text

What will we Discuss?

What is RAKE?

Resources Required

Are you ready?? Here we go 🏄

Step 1: Install RAKE package and store stop wordlist

Step 2: Open Power BI, Import Data & Configure Python script

Step 3: Power BI Integration and Visualization

Conclusion

References

Related

Published by jayant kodwani

1 reply »

Leave a CommentCancel reply

Translate

Subscribe to JayantKodwani.com via Email

Top Posts & Pages

Recent Comments

Categories

Like My Facebook Page

Archives

Key Phrase Extraction and Visualization: Python and Microsoft Power BI

Discover insights in unstructured text

What will we Discuss?

What is RAKE?

Resources Required

Are you ready?? Here we go 🏄

Step 1: Install RAKE package and store stop wordlist

Step 2: Open Power BI, Import Data & Configure Python script

Step 3: Power BI Integration and Visualization

Conclusion

References

Share this:

Related

Published by jayant kodwani

1 reply »

Leave a CommentCancel reply

Translate

Subscribe to JayantKodwani.com via Email

Top Posts & Pages

Recent Comments

Categories

Like My Facebook Page

Archives

Discover more from JayantKodwani.com