Chatbot
Chatbot Toggle
Chat with our AI

SQL Queries On Pandas DataFrame

SQL Queries On Pandas DataFrame

Writing SQL query for Pandas DataFrame?
Yes we can use Pandasql which support you to write SQL query for Pandas Data Frame. If you are very much comfortable with SQL queries than pandas code for filter the DataFrame we can use pandasql.sqldf to run sql type of queries on Dataframe.

Install Pandasql

Pip install pandasql

Syntax

psql.sqldf(query, locals())

Run Basic Query on Pandas Dataframe

import pandas as pd
import pandasql as psql
# Create a sample DataFrame
data = {
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’, ‘Dravid’, ‘Keerthi’],
‘age’: [24, 27, 22, 32, 29],
‘score’: [85, 88, 92, 95, 78]
}
df = pd.DataFrame(data)
# Define a SQL query
query = “SELECT * FROM df WHERE age > 25”
# Execute the query
result = psql.sqldf(query, locals())
print(result)

Joining DataFrames: You can perform SQL joins on multiple DataFrames:

import pandas as pd
import pandasql as psql
# Create sample DataFrames
df1 = pd.DataFrame({‘id’: [1, 2, 3],
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’]})
df2 = pd.DataFrame({
‘id’: [1, 2, 4],
‘score’: [85, 88, 92]})
# SQL query to join DataFrames
query = “””SELECT df1.id, df1.name, df2.score
FROM df1
LEFT JOIN df2 ON df1.id = df2.id”””
# Execute the query
result = psql.sqldf(query, locals())
print(result)

Aggregations and Group By

You can perform aggregations and use the `GROUP BY` clause:
import pandas as pd
import pandasql as psql
# Create a sample DataFrame
data = {
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’, ‘Dravid’, ‘Keerthi’],
‘city’: [‘Bangalore’, ‘Chennai’, ‘Mysore’, ‘Hassan’, ‘HYD’],
‘score’: [85, 88, 92, 95, 78]}
df = pd.DataFrame(data)
# SQL query to aggregate data
query = “””
SELECT city, AVG(score) as avg_score
FROM df
GROUP BY city
“””
# Execute the query
result = psql.sqldf(query, locals())
print(result)

Subqueries

You can use subqueries to perform more complex operations:
import pandas as pd
import pandasql as psql
# Create a sample DataFrame
data = {
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’, ‘Dravid’, ‘Keerthi’],
‘age’: [24, 27, 22, 32, 29],
‘score’: [85, 88, 92, 95, 78]}
df = pd.DataFrame(data)
# SQL query with subquery
query = “””
SELECT name, age
FROM df
WHERE score > (
SELECT AVG(score) FROM df
)
# Execute the query
result = psql.sqldf(query, locals())
print(result)

complex SQL queries involving multiple conditions and calculations

import pandas as pd
import pandasql as psql
# Create a sample DataFrame
data = {
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’, ‘Dravid’, ‘Keerthi’],
‘age’: [24, 27, 22, 32, 29],
‘score’: [85, 88, 92, 95, 78]}
df = pd.DataFrame(data)
# Complex SQL query
query = “””
SELECT name, age, score,
CASE
WHEN age < 25 THEN ‘Young’
WHEN age BETWEEN 25 AND 30 THEN ‘Adult’
ELSE ‘Senior’
END as age_group
FROM df
WHERE score > 80
ORDER BY score DESC
“””
# Execute the query
result = psql.sqldf(query, locals())
print(result)

Window Functions

Implementing SQL window functions for advanced analytical queries.
import pandas as pd
import pandasql as psql
# Sample DataFrame
data = {
‘name’: [‘Abhi’, ‘Bharath’, ‘Chandan’, ‘Dravid’, ‘Keerthi’],
‘city’: [‘Bangalore’, ‘Chennai’, ‘Hassan’, ‘Hassan’, ‘HYD’],
‘score’: [85, 88, 92, 95, 78]}
df = pd.DataFrame(data)
# SQL query using window functions
query = “””
SELECT name, city, score,
RANK() OVER (PARTITION BY city ORDER BY score DESC) as rank
FROM df
“””
# Execute the query
result = psql.sqldf(query, locals())
print(result)

Best Practices for Advanced PandasSQL

  1. Combine SQL and Pandas: Use SQL for complex querying and aggregations,
    and Pandas for further data manipulation and analysis.
  2. Optimize Queries: Index your DataFrames or SQL tables where possible to
    improve query performance.
  3. Leverage External Databases: For very large datasets, connect to scalable
    databases like PostgreSQL or MySQL and perform heavy lifting there before
    bringing data into Pandas.
  4. Use Window Functions: Utilize window functions for complex analytics,
    such as running totals, moving averages, and ranking.
  5. Handle Hierarchical Data: Use recursive CTEs for hierarchical data, which
    can simplify complex recursive logic.

Limitations and Considerations

  1. Memory Constraints: `pandasql` loads data into memory, so be mindful of
    memory usage with large datasets.
  2. SQLite Limitations: `pandasql` uses SQLite under the hood, which may not
    support all SQL features available in other databases.
  3. Complex Queries: Complex SQL queries can be harder to debug and may
    require thorough testing and validation

Data Engineer

Full Stack Development

DevOps

Microsoft Business Intelligence

Full Stack Development - MERN

Business Analyst

Quality Assurance

Data & Analytics | Data Science & AI

Medical Healthcare BPO (US) | Human Recourses Recruitment

Employee Wellness

Full Stack Development | Mobile App Development

Investment Management (Wealth) | Financial Analysis

Business Administration

Commerce

Arts and Others

Engineering

Communication & Self Development

Customized program to select ?

Sign in to

Sign in to

To assist you better, please provide the following information: