Data Science

Top 25 SQL Commands to Python Cheat Sheet

SQL to Python v1

Hello Readers, Are you a programmer like me, who have who have mastered SQL over the years and then ventured into the world of Python??  As per the TIOBE (The Importance Of Being Earnest ) and PYPL (PopularitY of Programming Language Index),  Python is ranked as #1 most popular programming language based on the number of skilled engineers world-wide, courses and third party vendors and how often language tutorials are searched on Google. 

While I have been a VBA/SQL  guy 🧰 for over a decade and I used to resist learning Python, the industry trends and data science boom pulled me like a black hole. Once you get in, there is no way out! In the end, I fell in love 💖 with Python but at the same time I could never underestimate the power of VBA and SQL for data analytics and data science.  One of the initial challenges  programmers like me face are why should I relearn techniques in Python when I already know how to do the same by SQL. Some practitioners may get some help from Python libraries like sqlite or pandasql but honestly, you will see that at times they are hard to implement and increases the complexity level. As a result, you realize if there was an easy way out to just learn and convert your SQL commands to Python. 

Relax, I have got you covered! Here is a Top 25 SQL commands to Python Cheat Sheet ✍️which has been guaranteed and tested OK by me. You can easily learn and convert your most common SQL commands (JOINS, UPDATE, REPLACE) to Python  by leveraging this cheat sheet.  If you wish, you can also download the sample raw data files and scripts used for the learning, validation and demonstration purposes.

Press Ctrl + D to Bookmark this page” for references” 

Top 25 SQL commands to Python Cheat Sheet by Jayant Kodwani

#Type of ActionSQLPython using Pandas
1Select all rows from TableSELECT * FROM [Table1]  print(df)
2Select Top 3 rows from TableSELECT TOP 3 * FROM [Table1] print(df.head(3))
3Select all rows and sort by Age in Ascending order (ensure null values appear first)SELECT * FROM [Table1] ORDER BY Ageprint(df.sort_values(by=[‘Age’], ascending=True, na_position=’first’))
4Select all rows and sort by Age in Descending order (ensure null values appear last)SELECT * FROM [Table1] ORDER BY Age DESCprint(df.sort_values(by=[‘Age’], ascending=False, na_position=’last’))
5Select DISTINCT Column valuesSELECT DISTINCT [Department] FROM [Table1]print(df.Department.unique())
6Select count of rowsSELECT Count(*) FROM [Table1]print(len(df))
7Select SUM of any columnSELECT sum(Salary) FROM [Table1]print(df[‘Salary’].sum())
8Select all rows where specific column values are NULLSELECT * FROM [Table1] WHERE [Age] IS NULLprint(df[df[‘Age’].isnull()])
9Select all rows where specific column values are NOT NULLSELECT * FROM [Table1] WHERE [Age] IS NOT NULLprint(df[~df[‘Age’].isnull()])
10Select all rows where specific column values > numeric value i.e. 35SELECT * FROM [Table1] WHERE [Age] > 35 print(df[df.Age >= 35])
11Select all rows where specific column values is conditioned by AND operatorSELECT * FROM [Table1] WHERE [Age] >=30 AND [Age] <= 40print(df[(df.Age >= 30) & (df.Age <= 40)])
12Select all rows where specific column values is conditioned by OR operatorSELECT * FROM [Table1] WHERE [Department] =’HR’ OR [Department] = ‘Marketing’print(df[(df.Department==’HR’) | (df.Department==’Marketing’)])
13Select all rows where specific column values is conditioned by NOT operatorSELECT * FROM [Table1] WHERE [Department] <> ‘HR’print(df[~(df.Department==’HR’)])
14Group by the data by specific column (i.e. Department) and get sum of other column (i.e. Salary)SELECT Department, SUM(Salary) AS ‘Total’ FROM [Table1] Group By Departmentprint(df.groupby([‘Department’])[‘Salary’].sum())
15Replace NULL values in a specific column with the mean of the columnUPDATE [Table1] SET Age=(Select Avg(Age) From Table1) WHERE AGE IS NULL
SELECT * FROM [Table1]
print(df[‘Age’].fillna((df[‘Age’].mean()), inplace=True))
print(df)
16Replace NULL values in a specific column with the median of the columnUPDATE [Table1] SET Age=(
SELECT x.Age from Table1 x, Table1 y
GROUP BY x.Age
HAVING SUM(SIGN(1-SIGN(y.Age-x.Age))) = (COUNT(*)+1)/2)
WHERE AGE IS NULL
print(df[‘Age’].fillna((df[‘Age’].median()), inplace=True))
print(df)
17INNER JoinSELECT * FROM [Table1] T1 INNER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeIDprint(pd.merge(df,df2,on=’EmployeeID’))
18LEFT JoinSELECT * FROM [Table1] T1 LEFT JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeIDprint(pd.merge(df,df2,on=’EmployeeID’,how=’left’))
19RIGHT JoinSELECT * FROM [Table1] T1 RIGHT JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeIDprint(pd.merge(df,df2,on=’EmployeeID’,how=’right’))
20FULL OUTER JoinSELECT * FROM [Table1] T1 FULL OUTER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeIDprint(pd.merge(df,df2,on=’EmployeeID’,how=’outer’))
21INNER Join with multiple keysSELECT * FROM [Table1] T1 INNER JOIN [Table2] T2 ON T1.EmployeeID = T2.EmployeeID AND T1.Name=T2.Nameprint(pd.merge(df,df2,how=’inner’,left_on=[‘EmployeeID’,’Name’],right_on=[‘EmployeeID’,’Name’]))
22RANK FunctionSELECT *, RANK() OVER (ORDER BY Age DESC) FROM Table1df[“AgeRank”] = df.Age.rank(ascending=False)
print(df)
23DENSE RANK FunctionSELECT *, DENSE_RANK() OVER (ORDER BY Age DESC) FROM Table1df[“AgeRank”] = df.Age.rank(method=”dense”, ascending=False)
print(df)
24ROW NUMBER FunctionSELECT *, ROW_NUMBER() OVER (ORDER BY Age DESC) FROM Table1import numpy as np
df.insert(loc=0, column=’RowNum’, value=np.arange(len(df))+1)
print(df)
25CTE Function (CASE WHEN)SELECT *,
CASE WHEN Salary >0 AND Salary <=20000 THEN ‘Low’
     WHEN Salary >20000 AND Salary <=40000 THEN ‘Medium’
WHEN Salary >40000 THEN ‘High’
ELSE ‘None’ END AS ‘Category’
FROM Table1
df[‘Category’] = df.apply(lambda row:
        ‘High’ if row[‘Salary’] > 40000
  else (‘Medium’ if row[‘Salary’] > 20000 and row[‘Salary’] <= 40000
  else (‘Low’ if row[‘Salary’] > 0 and row[‘Salary’] <= 20000
  else None)),axis = 1)
print(df)
26Pivot Table (Bonus)SELECT column_name, Sum(Salary) AS Total_Salary
FROM [Table1] T1
GROUP BY column_name
pivot = df.pivot_table(index=[‘column_name’], values=[‘Salary’], aggfunc=’sum’)
Top 25 SQL Commands to Python Cheat Sheet by JayantKodwani.com

References

[1] GitHub https://github.com/jayantkodwani/Python/tree/main/Top%2025%20SQL%20Commands%20to%20Python%20Cheat%20Sheet%20by%20JayantKodwani.com

[2] TIOBE Index for January 2022 https://www.tiobe.com/tiobe-index/

[3] The PYPL PopularitY of Programming Language Index https://pypl.github.io/PYPL.html


Follow me

Follow me on Linkedin, Medium, GitHub for more stuff like this

Categories: Data Science

Tagged as: , , , , ,

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.