Hello Readers, Are you a programmer like me, who have who have mastered SQL over the years and then ventured into the world of Python?? As per the TIOBE (The Importance Of Being Earnest ) and PYPL (PopularitY of Programming Language Index), Python is ranked as #1 most popular programming language based on the number of skilled engineers world-wide, courses and third party vendors and how often language tutorials are searched on Google.

TIOBE Index report https://www.tiobe.com/tiobe-index/
While I have been a VBA/SQL guy 🧰 for over a decade and I used to resist learning Python, the industry trends and data science boom pulled me like a black hole. Once you get in, there is no way out! In the end, I fell in love 💖 with Python but at the same time I could never underestimate the power of VBA and SQL for data analytics and data science. One of the initial challenges programmers like me face are why should I relearn techniques in Python when I already know how to do the same by SQL. Some practitioners may get some help from Python libraries like sqlite or pandasql but honestly, you will see that at times they are hard to implement and increases the complexity level. As a result, you realize if there was an easy way out to just learn and convert your SQL commands to Python.
Relax, I have got you covered! Here is a Top 25 SQL commands to Python Cheat Sheet ✍️which has been guaranteed and tested OK by me. You can easily learn and convert your most common SQL commands (JOINS, UPDATE, REPLACE) to Python by leveraging this cheat sheet. If you wish, you can also download the sample raw data files and scripts used for the learning, validation and demonstration purposes.
Press Ctrl + D to Bookmark this page” for references”
Top 25 SQL commands to Python Cheat Sheet by Jayant Kodwani
# | Type of Action | SQL | Python using Pandas |
1 | Select all rows from Table | SELECT * FROM [Table1] | print(df) |
2 | Select Top 3 rows from Table | SELECT TOP 3 * FROM [Table1] | print(df.head(3)) |
3 | Select all rows and sort by Age in Ascending order (ensure null values appear first) | SELECT * FROM [Table1] ORDER BY Age | print(df.sort_values(by=[‘Age’], ascending=True, na_position=’first’)) |
4 | Select all rows and sort by Age in Descending order (ensure null values appear last) | SELECT * FROM [Table1] ORDER BY Age DESC | print(df.sort_values(by=[‘Age’], ascending=False, na_position=’last’)) |
5 | Select DISTINCT Column values | SELECT DISTINCT [Department] FROM [Table1] | print(df.Department.unique()) |
6 | Select count of rows | SELECT Count(*) FROM [Table1] | print(len(df)) |
7 | Select SUM of any column | SELECT sum(Salary) FROM [Table1] | print(df[‘Salary’].sum()) |
8 | Select all rows where specific column values are NULL | SELECT * FROM [Table1] WHERE [Age] IS NULL | print(df[df[‘Age’].isnull()]) |
9 | Select all rows where specific column values are NOT NULL | SELECT * FROM [Table1] WHERE [Age] IS NOT NULL | print(df[~df[‘Age’].isnull()]) |
10 | Select all rows where specific column values > numeric value i.e. 35 | SELECT * FROM [Table1] WHERE [Age] > 35 | print(df[df.Age >= 35]) |
11 | Select all rows where specific column values is conditioned by AND operator | SELECT * FROM [Table1] WHERE [Age] >=30 AND [Age] <= 40 | print(df[(df.Age >= 30) & (df.Age <= 40)]) |
12 | Select all rows where specific column values is conditioned by OR operator | SELECT * FROM [Table1] WHERE [Department] =’HR’ OR [Department] = ‘Marketing’ | print(df[(df.Department==’HR’) | (df.Department==’Marketing’)]) |
13 | Select all rows where specific column values is conditioned by NOT operator | SELECT * FROM [Table1] WHERE [Department] <> ‘HR’ | print(df[~(df.Department==’HR’)]) |
14 | Group by the data by specific column (i.e. Department) and get sum of other column (i.e. Salary) | SELECT Department, SUM(Salary) AS ‘Total’ FROM [Table1] Group By Department | print(df.groupby([‘Department’])[‘Salary’].sum()) |
15 | Replace NULL values in a specific column with the mean of the column | UPDATE [Table1] SET Age=(Select Avg(Age) From Table1) WHERE AGE IS NULL SELECT * FROM [Table1] | print(df[‘Age’].fillna((df[‘Age’].mean()), inplace=True)) print(df) |
16 | Replace NULL values in a specific column with the median of the column | UPDATE [Table1] SET Age=( SELECT x.Age from Table1 x, Table1 y GROUP BY x.Age HAVING SUM(SIGN(1-SIGN(y.Age-x.Age))) = (COUNT(*)+1)/2) WHERE AGE IS NULL | print(df[‘Age’].fillna((df[‘Age’].median()), inplace=True)) print(df) |
17 | INNER Join | SELECT * FROM [Table1] T1 INNER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID | print(pd.merge(df,df2,on=’EmployeeID’)) |
18 | LEFT Join | SELECT * FROM [Table1] T1 LEFT JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID | print(pd.merge(df,df2,on=’EmployeeID’,how=’left’)) |
19 | RIGHT Join | SELECT * FROM [Table1] T1 RIGHT JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID | print(pd.merge(df,df2,on=’EmployeeID’,how=’right’)) |
20 | FULL OUTER Join | SELECT * FROM [Table1] T1 FULL OUTER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID | print(pd.merge(df,df2,on=’EmployeeID’,how=’outer’)) |
21 | INNER Join with multiple keys | SELECT * FROM [Table1] T1 INNER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID AND T1.Name=T2.Name | print(pd.merge(df,df2,how=’inner’,left_on=[‘EmployeeID’,’Name’],right_on=[‘EmployeeID’,’Name’])) |
22 | RANK Function | SELECT *, RANK() OVER (ORDER BY Age DESC) FROM Table1 | df[“AgeRank”] = df.Age.rank(ascending=False) print(df) |
23 | DENSE RANK Function | SELECT *, DENSE_RANK() OVER (ORDER BY Age DESC) FROM Table1 | df[“AgeRank”] = df.Age.rank(method=”dense”, ascending=False) print(df) |
24 | ROW NUMBER Function | SELECT *, ROW_NUMBER() OVER (ORDER BY Age DESC) FROM Table1 | import numpy as np df.insert(loc=0, column=’RowNum’, value=np.arange(len(df))+1) print(df) |
25 | CTE Function (CASE WHEN) | SELECT *, CASE WHEN Salary >0 AND Salary <=20000 THEN ‘Low’ WHEN Salary >20000 AND Salary <=40000 THEN ‘Medium’ WHEN Salary >40000 THEN ‘High’ ELSE ‘None’ END AS ‘Category’ FROM Table1 | df[‘Category’] = df.apply(lambda row: ‘High’ if row[‘Salary’] > 40000 else (‘Medium’ if row[‘Salary’] > 20000 and row[‘Salary’] <= 40000 else (‘Low’ if row[‘Salary’] > 0 and row[‘Salary’] <= 20000 else None)),axis = 1) print(df) |
26 | Pivot Table (Bonus) | SELECT column_name, Sum(Salary) AS Total_Salary FROM [Table1] T1 GROUP BY column_name | pivot = df.pivot_table(index=[‘column_name’], values=[‘Salary’], aggfunc=’sum’) |
References
[2] TIOBE Index for January 2022 https://www.tiobe.com/tiobe-index/
[3] The PYPL PopularitY of Programming Language Index https://pypl.github.io/PYPL.html
Follow me
Follow me on Linkedin, Medium, GitHub for more stuff like thisCategories: Data Science