What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026

What are Pandas? Let's Start with the Basics

Imagine you have a big pile of shopping receipts from your family's shop. Each receipt shows the names of items, prices, and how many you sold. Now, you want to quickly add up total sales or find the most popular item. Doing this by hand takes hours. That's where Pandas comes in – it's like a smart helper that organizes this messy paper into neat tables on your computer, so we can play with the numbers easily.

What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026

Pandas is a free tool for Python, the simple programming language many of us use. It helps us handle data in tables, just like a notebook where rows are your records and columns are details like name, age, or price. We call these tables DataFrames. Think of a DataFrame as an Excel sheet but super fast and powerful inside Python.

Inside Pandas, there are two main stars: Series and DataFrame.

A Series is like one single column from your table – say, just the list of prices. It's the smallest building block.
A DataFrame is a bunch of Series put together side by side, making a full table with rows and columns.

Let's see this in action with a simple example from daily life. Suppose we run a small fruit shop in our neighborhood. We have lists of fruits, their prices, and quantities sold today. We'll turn these into a Pandas table right now.

First, we need Python ready. If you're new, no worry – just open a notebook like Jupyter or Google Colab (we'll talk more about setup soon).

Here's how we make our first Series and DataFrame:

import pandas as pd  # We bring Pandas in with this line, like inviting a friend

# Simple lists from our fruit shop
fruits = ['Apple', 'Banana', 'Orange', 'Mango']
prices = [50, 30, 40, 80]  # in rupees per kg
quantity = [10, 20, 15, 5]

# Make Series – one column each
fruit_series = pd.Series(fruits)
price_series = pd.Series(prices)
qty_series = pd.Series(quantity)

print("Fruit names as Series:")
print(fruit_series)
print("\nPrices as Series:")
print(price_series)

When we run this, Pandas shows:


text
Fruit names as Series:
0     Apple
1    Banana
2    Orange
3     Mango
dtype: object

Prices as Series:
0    50
1    30
2    40
3    80
dtype: int64

See? Each Series has numbers on the left (called index, like row numbers starting from 0) and our data on the right. It's simple, like labeling shelves in your shop.

Now, the fun part – let's glue these into a DataFrame, our full table:

# Make a DataFrame from lists – easy way
data = {
    'Fruit': fruits,
    'Price': prices,
    'Quantity': quantity
}
df = pd.DataFrame(data)

print("Our first fruit shop table:")
print(df)

Output looks like this neat table:

	Fruit	Price	Quantity
0	Apple	50	10
1	Banana	30	20
2	Orange	40	15
3	Mango	80	5

Wow! Just a few lines, and we have a proper table. No more scribbling on paper. We can print it anytime with print(df), and it shows clearly. This DataFrame remembers everything – we can add more rows later, like tomorrow's sales.

Why is this useful in real life? Remember last Diwali when we tracked festival sales? Instead of Excel crashes with big files, Pandas handles thousands of rows without sweat. For example, if you're a teacher marking student scores, turn names and marks into a DataFrame. One command shows average marks: df['Marks'].mean() – instant result!

Let's try another everyday example. Suppose we track our weekly expenses for the home budget. Lists: items like 'Rice', 'Milk', 'Petrol'; costs: 500, 50, 2000; dates.

expenses_data = {
    'Item': ['Rice', 'Milk', 'Petrol', 'Veggies'],
    'Cost': [500, 50, 2000, 300],
    'Date': ['2026-01-10', '2026-01-12', '2026-01-15', '2026-01-16']
}
expense_df = pd.DataFrame(expenses_data)

print("Home expenses table:")
print(expense_df)

This prints:

	Item	Cost	Date
0	Rice	500	2026-01-10
1	Milk	50	2026-01-12
2	Petrol	2000	2026-01-15
3	Veggies	300	2026-01-16

Perfect for spotting where money goes fast – Petrol is the big eater! Pandas adds smart labels automatically.

One cool thing: DataFrames can mix types. Names as text, numbers as math-ready values. If we do total_sales = df['Price'] * df['Quantity'], it gives . Boom, sales figures!

But wait, what if our lists are uneven? Pandas fixes it smartly or warns us. For shop owners like us in Lucknow markets, this means quick checks: "How many apples left?" – just slice the table.

To make it even clearer, here's a comparison table showing Series vs DataFrame, like choosing between a single notebook page or a full ledger book:

Feature	Series (One Column)	DataFrame (Full Table)
What it holds	List like prices:	Rows and columns: fruits + prices + qty
Daily Example	Your phone's contact list (names only)	Full phonebook with names, numbers, emails
Size	Small, fast for one thing	Big, handles shops or school records
Print Look	Vertical list with index	Grid like Excel sheet
Use When	Quick math on one list	Compare across items, like sales report

This table helps us see why DataFrame is the hero – it connects everything.

We can even make a DataFrame from a plain list of lists, like reading from a notebook:

simple_list = [
    ['Apple', 50, 10],
    ['Banana', 30, 20],
    ['Orange', 40, 15]
]
df_from_list = pd.DataFrame(simple_list, columns=['Fruit', 'Price', 'Quantity'])
print(df_from_list)

Same neat table! Great for when data comes from forms or apps.

In our fruit shop, printing df daily lets customers see stock. Or for students, track homework scores. Pandas makes data feel like chatting with a friend – ask, and it answers.

Think of a teacher in our BCA class. We have 30 students' names and test scores. One DataFrame, and print(df) shows the class list instantly during roll call.

We've now built our first tables hands-on. It's exciting to see lists turn into something we can touch and change.

Now that we've set up our tables and peeked at the data, real life hits us – data from shops, schools, or forms is often messy. Prices missing here, wrong numbers there, repeats everywhere. Like vegetables from the market: some rotten, some doubled up. We need to clean it fast. Let's fix this dirty data step by step, using our fruit shop as the example.

We'll start with a messy DataFrame. Imagine we got sales data from three sellers, but emails forgot to fill some spots, quantities typed wrong, and one sale listed twice.

import pandas as pd

messy_data = {
    'Fruit': ['Apple', 'Banana', 'Orange', 'Mango', 'Apple', None],
    'Price': [50, 30, '40', 80.0, 50, 60],
    'Quantity': [10, 20, 15, 5, 10, 8],
    'Seller': ['Ram', 'Shyam', 'Ram', 'Geeta', 'Ram', 'Shyam']
}
df_messy = pd.DataFrame(messy_data)
print("Messy shop data:")
print(df_messy)

It looks like:

	Fruit	Price	Quantity	Seller
0	Apple	50	10	Ram
1	Banana	30	20	Shyam
2	Orange	40	15	Ram
3	Mango	80.0	5	Geeta
4	Apple	50	10	Ram
5	NaN	60	8	Shyam

See the problems? Empty fruit (None), Price as text '40', duplicate Apple row.

Handling Missing Values – No More Gaps

Missing data is common, like a seller forgetting to note quantity. Pandas shows them as NaN (Not a Number).

Drop them with dropna(): Remove whole rows with gaps.

df_no_missing = df_messy.dropna()
print("After dropping missing:")
print(df_no_missing)

This kills row 5. Good if gaps are few, but we lose sales info!

Fill them smartly with fillna(): Put average or zero instead.

Like filling missing weight on a parcel with shop average.

# Fill missing fruit with 'Unknown'
df_messy['Fruit'] = df_messy['Fruit'].fillna('Unknown')

# Fill any future gaps in Quantity with 0
df_messy['Quantity'] = df_messy['Quantity'].fillna(0)

print("After filling:")
print(df_messy)

Now no NaNs. For prices, we might fill with average: df_messy['Price'].fillna(df_messy['Price'].mean()).

Daily Example: In our home budget tracker, if petrol cost is missing one day, fill with last week's average – keeps total spend real.

Fix Wrong Data Types – Make Numbers Work

Prices should be numbers for math, but '40' is text. Pandas mixes them, but math fails.

Change with astype(): Turn text to number.

df_messy['Price'] = df_messy['Price'].astype(float)  # Now all numbers
print("Fixed Price type:")
print(df_messy['Price'].dtype)  # Shows float64

Test: df_messy['Price'].mean() now works – average 51.67 rupees.

Pro Tip Table for types:

Data Problem	Fix Command	Example Output	When to Use
Text as Number	`df['col'].astype(float)`	float64	Prices, ages for math
Dates as Text	`pd.to_datetime(df['Date'])`	datetime64	Sales by week
Yes/No Text	`df['col'].map({'Yes':1})`	int64	Count approvals
Too Many Decimals	`df['col'].round(2)`	Still float	Money: 50.00 not 49.999

Like fixing a bike speedometer – wrong units, no ride!

Remove Duplicates – Clean Repeats

Duplicate Apple row? Spot with eyes, but thousands? No way.

df_clean = df_messy.drop_duplicates()
print("No duplicates:")
print(df_clean)

Pops row 4. Keeps first one. For shops, this avoids double-counting sales.

Life Hack: In student attendance, drop_duplicates() removes kids listed twice by mistake.

Now our df_clean is shiny. Let's move to ordering and picking.

Sort and Filter – Find What We Need Fast

Clean data is great, but unsorted like a messy almirah. We sort and filter to spotlight stars.

Sort with sort_values() – High to Low or A to Z

Want top sellers first? Like arranging students by marks.

# Sort by Price high to low
df_sorted = df_clean.sort_values('Price', ascending=False)
print("Highest price first:")
print(df_sorted)

Output:

	Fruit	Price	Quantity	Seller
3	Mango	80.0	5	Geeta
0	Apple	50.0	10	Ram
5	Unknown	60.0	8	Shyam
2	Orange	40.0	15	Ram
1	Banana	30.0	20	Shyam

Mango tops! Add by=['Price', 'Quantity'] for multi-sort.

Shop Example: Sort Quantity descending – restock Bananas first (20 sold).

Filter – Pick Matching Rows

Like "Show only adults over 25" or "Fruits above 40 rupees".

# Filter Price > 40
expensive = df_clean[df_clean['Price'] > 40]
print("Expensive fruits:")
print(expensive)

Gives Apple, Mango, Unknown. Use & for AND: df_clean[(df_clean['Price'] > 40) & (df_clean['Seller'] == 'Ram')].

Daily Use: Filter home expenses >500 – spot big spends like petrol.

Filter Quick Guide:

Equals: df[df['City']=='Delhi']
Greater: df[df['Sales']>1000]
In list: df[df['Fruit'].isin(['Apple','Banana'])]
Not null: df[df['Price'].notna()]

Add New Columns – Create Magic Numbers

Now, calculate totals without calculator. Like adding profit column.

df_clean['Total Sales'] = df_clean['Price'] * df_clean['Quantity']
print("With Total Sales:")
print(df_clean)

New table:

Fruit	Price	Quantity	Seller	Total Sales
Apple	50.0	10	Ram	500.0
Banana	30.0	20	Shyam	600.0
...	...	...	...	...

Banana wins! Other math: df['Discount'] = df['Price'] * 0.1 for 10% off.

Examples:

Age group: df['Adult'] = df['Age'] > 18 (True/False)
Category: df['Type'] = np.where(df['Price']<50, 'Cheap', 'Premium')

For budgets: df['Daily Avg'] = df['Total']/7.

Group and Count – Team Up Data

Group by Seller – total sales per person?

sales_by_seller = df_clean.groupby('Seller')['Total Sales'].sum()
print("Sales per seller:")
print(sales_by_seller)

Ram: 500+600? Wait, with data: Ram high, Shyam next.

Average: .mean(). Count: .count().

Multi-group Table:

Group By	Command Example	Output Meaning
One Column	`groupby('City')['Sales'].sum()`	Total sales per city
Average	`groupby('Month')['Price'].mean()`	Avg price each month
Count	`groupby('Seller').size()`	How many sales per seller
Max	`groupby('Fruit')['Qty'].max()`	Peak quantity per fruit

Like school: groupby('Class')['Marks'].mean() – best class average.

Life Example: Group expenses by 'Item' – Rice eats most budget?

Save Clean Data – Keep It Forever

Work done? Save to file, share with team.

df_clean.to_csv('clean_fruit_sales.csv', index=False)  # No row numbers
df_clean.to_excel('clean_fruit_sales.xlsx', index=False)

Open in Excel anytime. Add index=False for clean look.

Save Options:

CSV: Small, works everywhere (shops share via WhatsApp)
Excel: Colors, formulas (boss reports)
JSON: For apps (online store)

Our shop now has 'clean_fruit_sales.csv' ready for tomorrow.

We've turned mess into gold – sorted, filtered, grouped, saved. Perfect for daily hustle.

Next, we'll combine tables like joining student lists with grades for full stories.

Building on our clean, saved tables, sometimes one group isn't enough – we need cross looks, like sales by city and month. Or join separate sheets. Plus, dates and text need tweaks, and trends over time. Let's level up with these tools, using school and sales examples from our daily world.

Multiple Groups with Pivot Table – Cross Views Easy

Pivot tables are like magic spreadsheets that twist data two ways. No Excel needed – Pandas does it in one line. Perfect for reports: "Show Banana sales by seller and month?"

First, let's make sample sales data for a chain of fruit shops.

import pandas as pd

pivot_data = {
    'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Mango', 'Apple'],
    'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai', 'Delhi', 'Mumbai'],
    'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb'],
    'Sales': [500, 600, 400, 700, 300, 450]
}
df_pivot = pd.DataFrame(pivot_data)
print("Raw sales:")
print(df_pivot)

Now, pivot for Sales by City AND Month:

pivot_table = df_pivot.pivot_table(values='Sales', index='City', columns='Month', aggfunc='sum')
print("Pivot: Sales by City & Month:")
print(pivot_table)

Output table:

City	Jan	Feb
Delhi	1100	300
Mumbai	400	1150

Delhi rocks Jan, Mumbai Feb! Like a shop owner checking which city booms when.

Pivot Power Tips:

Multiple Values: aggfunc=['sum', 'mean'] – totals and averages.
Rows & Cols Swap: index='Month', columns='City'.
Fill Empty: fill_value=0 for zero gaps.

Daily Pivot Examples Table:

Scenario	Pivot Code	What It Shows
Shop Sales by Fruit/City	index='Fruit', columns='City'	Apples best in Delhi?
Student Marks by Subject/Class	index='Class', columns='Subject'	Math avg in Class 10
Expenses by Category/Month	index='Month', columns='Category'	Food spend jumps in festivals
Website Visits by Page/Day	index='Day', columns='Page'	Home page peaks weekends

For freelancers like us, pivot invoices by client and month – spot slow payers.

Join Tables with Merge – Combine Worlds

Got students list separate from marks? Merge glues them like stapling sheets.

Two tables:

students = pd.DataFrame({
    'Student_ID': [1, 2, 3, 4],
    'Name': ['Amit', 'Priya', 'Ravi', 'Seema'],
    'Age': [20, 21, 19, 22]
})

marks = pd.DataFrame({
    'Student_ID': [1, 2, 3, 5],
    'Subject': ['Math', 'Math', 'Math', 'Math'],
    'Score': [85, 92, 78, 88]
})

print("Students:")
print(students)
print("\nMarks:")
print(marks)

Join on Student_ID:

full_data = pd.merge(students, marks, on='Student_ID', how='inner')  # Only matching
print("Merged students + marks:")
print(full_data)

Result:

Student_ID	Name	Age	Subject	Score
1	Amit	20	Math	85
2	Priya	21	Math	92
3	Ravi	19	Math	78

Seema and ID5 missing – inner join skips non-matches.

Merge Types Table:

Type	Code: how='...'	Keeps What	Example Use
inner	'inner'	Only matches both sides	Common students with marks
left	'left'	All from left, match from right	All students, even no marks
right	'right'	All from right, match from left	All marks, even unknown kids
outer	'outer'	Everything, NaN for no match	Full audit

Life Example: Merge customer orders with delivery status – track delays.

Time Data – Make Dates Smart

Dates as text? Can't group by week. Convert first.

time_data = {
    'Date': ['2026-01-01', '2026-01-03', '2026-01-08', '2026-01-15'],
    'Sales': [100, 150, 120, 200]
}
df_time = pd.DataFrame(time_data)
df_time['Date'] = pd.to_datetime(df_time['Date'])  # Fix to date type
print("Dates fixed:")
print(df_time['Date'].dtype)  # datetime64

Now, sales by week:

df_time['Week'] = df_time['Date'].dt.isocalendar().week
weekly_sales = df_time.groupby('Week')['Sales'].sum()
print("Sales by week:")
print(weekly_sales)

Or month: df_time['Month'] = df_time['Date'].dt.month_name().

Date Tricks:

Day name: .dt.day_name()
Year: .dt.year
Resample monthly: df_time.set_index('Date')['Sales'].resample('M').sum()

Shop Calendar Table:

Goal	Code	Output Example
Sales by Month	`df['Month'] = df['Date'].dt.month`	1 for Jan
Weekday Trends	`df['Day'] = df['Date'].dt.weekday`	0=Monday sales low?
Days Since Start	`df['Days'] = (df['Date'] - df['Date'].min()).dt.days`	Trend over time
Quarterly Total	`resample('Q').sum()`	Jan-Mar total

Like tracking YouTube views by upload date – peaks on weekends?

Window Functions – Trends Without Losing Rows

Want 3-day rolling average sales, but keep all days? Windows slide over data.

df_time = df_time.sort_values('Date')
df_time['Rolling_Avg_3'] = df_time['Sales'].rolling(window=3).mean()
print("With rolling average:")
print(df_time)

Date	Sales	Rolling_Avg_3
2026-01-01	100	NaN
2026-01-03	150	NaN
2026-01-08	120	123.33
2026-01-15	200	156.67

Smooths ups/downs – sales steady?

Rank within groups:

df_pivot['Rank_City'] = df_pivot.groupby('City')['Sales'].rank(ascending=False)
print("Rank per city:")
print(df_pivot)

Window Examples:

Cumulative: .cumsum() – running total sales.
Shift: .shift(1) – yesterday's sales.
Percent rank: .rank(pct=True)

Freelancer Use: Rolling avg earnings over 7 days – steady income?

Text Cleaning – Tidy Names and Notes

Names like "Ram Kumar" with spaces? Split or clean.

df_text = pd.DataFrame({
    'Full_Name': ['Ram Kumar ', 'Priya Sharma!!', ' ravi singh'],
    'City': ['Delhi', 'Mumbai', 'Delhi']
})

# Strip spaces
df_text['Clean_Name'] = df_text['Full_Name'].str.strip().str.title()

# Split first/last
df_text['First_Name'] = df_text['Full_Name'].str.split().str[0].str.title()

print("Cleaned text:")
print(df_text)

Ram Kumar, Priya Sharma, Ravi Singh. Perfect.

Patterns: df['Phone'] = df['Text'].str.extract(r'(\d{10})') grabs 10-digit numbers.

Text Tools Table:

Task	Code Example	Before/After
Remove Spaces	`.str.strip()`	" ram " → "ram"
Upper/Lower	`.str.upper()`	"ram" → "RAM"
Split Words	`.str.split(expand=True)`	"A B" → cols A, B
Replace	`.str.replace('old', 'new')`	"bad" → "good"
Contains Pattern	`.str.contains('apple')`	True/False flag
Length	`.str.len()`	"abc" → 3

Example: Clean customer feedback – count "good" mentions.

Like fixing addresses for delivery – no more lost parcels.

These advanced moves turn raw info into insights, ready for visuals next.

Up next, we'll plot these tables into charts and style them pretty.

With our data cleaned, grouped, and analyzed, it's time to show it off – not just numbers, but pictures everyone understands. Like turning shop ledger into colorful charts for family or boss. Pandas has built-in plots, plus ways to make tables look pro. We'll use our fruit sales and school examples to draw them step by step.

Pandas Built-in Plots – Charts in Seconds

Pandas plots with Matplotlib under the hood – no extra setup. Just df.plot() and boom! Great for quick checks, like "Does sales rise on weekends?"

First, recall our df_time with dates and sales:

Date	Sales	Rolling_Avg_3
2026-01-01	100	NaN
2026-01-03	150	NaN
2026-01-08	120	123.33
2026-01-15	200	156.67

Line Charts – Trends Over Time

Perfect for sales growth, like watching YouTube subscribers climb.

import matplotlib.pyplot as plt  # Helper for titles

df_time.set_index('Date')['Sales'].plot(kind='line', title='Daily Fruit Sales')
plt.show()

This draws a line jumping 100→150→120→200. See the spike? Restock day!

Add rolling avg on same chart:

df_time.set_index('Date')[['Sales', 'Rolling_Avg_3']].plot(kind='line')
plt.title('Sales with 3-Day Smooth')
plt.ylabel('Rupees')
plt.show()

Smooth blue line under wiggly sales – trends clear!

Line Chart Tips:

Multiple lines: Pass list of columns.
Zoom: xlim=('2026-01-01', '2026-01-15')
Markers: marker='o' for dots.

Daily Life Lines:

Home power bill over months – spot summer AC jump.
Weight tracker – steady loss? Good!
Freelance earnings weekly – ups after new client.

Bar Charts – Compare Categories

Who sold most? Bars shine for groups.

From df_pivot sales by city/month:

pivot_table.plot(kind='bar', title='Sales by City')
plt.ylabel('Total Sales')
plt.show()

Delhi tall bar, Mumbai shorter – easy compare!

Horizontal: kind='barh' for long names.

Grouped bars: Use pivot with fruits.

# From earlier df_clean
df_clean.plot(x='Fruit', y='Total Sales', kind='bar', title='Sales per Fruit')
plt.show()

Banana highest bar – stock more!

Bar Chart Guide Table:

Chart Type	Code: kind='...'	Best For	Example Output Insight
bar	'bar'	Compare groups side-by-side	Fruits: Banana leads
barh	'barh'	Long labels (cities, names)	Sellers ranked
bar stacked	stacked=True	Parts to whole (sales/fruit)	Total = Apple + Banana
bar grouped	Use pivot first	Multi-category (city/fruit)	Delhi Apples vs Mumbai

School Example: df_students.plot(x='Name', y='Score', kind='bar') – Priya tops class visually.

Histograms and Pie – Distributions and Shares

Histogram: How many sales buckets? Like age groups in class.

sales_big = pd.DataFrame({'Sales': [100,150,200,300,120,500,80,250,400,90]})
sales_big['Sales'].plot(kind='hist', bins=5, title='Sales Distribution')
plt.show()

Shows most sales 80-200, few high – normal shop day.

Pie for shares:

df_clean.groupby('Seller')['Total Sales'].sum().plot(kind='pie', autopct='%1.1f%%')
plt.title('Seller Share')
plt.show()

Ram 40%, etc. – bonus time?

Other Plots Table:

Plot Type	Code Example	Use Case	Pro Tip
hist	`df['Col'].plot(kind='hist')`	Spread of numbers (prices)	bins=10 for more detail
pie	`groupby().plot(kind='pie')`	% shares (expenses)	autopct for labels
scatter	`df.plot.scatter(x='Price', y='Qty')`	Relation (high price low qty?)	Spot outliers
box	`df['Sales'].plot(kind='box')`	Outliers in data	Whiskers show range
area	`kind='area'`	Stacked trends over time	Cumulative sales

YouTube Creator Example: Histogram of video views – most under 1k, virals above 10k.

For shop, df_clean.plot.scatter('Price', 'Quantity') – cheap fruits sell more?

Customize all:

df_clean.plot(kind='bar', color=['red','green','blue'], figsize=(10,6))
plt.title('Fruit Sales Colors')
plt.xlabel('Fruits')
plt.ylabel('Total')
plt.legend()
plt.show()

Bigger, colorful – report ready!

Full Plot Workflow:

Clean data first.
Group if needed.
df.plot() – tweak title, labels.
plt.savefig('chart.png') – save image for blog/YouTube.

Like our Pratap Solution blog – charts boost reader stay time 2x!

Style Tables – Make Them Pop

Plots great, but tables in notebooks/reports? Make them shine with colors.

Pandas style highlights like Excel conditional formatting.

From df_clean:

def highlight_max(s):
    is_max = s == s.max()
    return ['background-color: yellow' if v else '' for v in is_max]

styled = df_clean.style.apply(highlight_max, subset=['Total Sales']).format({'Total Sales': '{:.0f}'})
styled  # In Jupyter, shows pretty table

Yellow background on top sales row – eyes go there!

Color Conditions – Rules Like Traffic Lights

def color_sales(val):
    color = 'green' if val > 500 else 'orange' if val > 300 else 'red'
    return f'color: {color}'

df_clean['Total Sales'].style.map(color_sales).format('{:.0f}')

Green for stars, red for low – quick scan!

Style Recipes Table:

Goal	Code Snippet	Effect
Highlight Max	`style.highlight_max()`	Bold/yellow top value
Min Lowlight	`style.highlight_min(props='color:red')`	Red for worst
Bars in Cells	`style.bar(subset=['Sales'])`	Mini bar chart in table
Percent Format	`style.format({'Pct': '{:.1%}'})`	25.0% nice
Precision	`style.format(precision=0)`	No decimals
Background Gradient	`style.background_gradient()`	Color fade high to low

Combine:

styled_full = (df_clean.style
               .background_gradient(subset='Total Sales')
               .highlight_max('Quantity')
               .format({'Price': '₹{:.0f}', 'Total Sales': '{:,.0f}'})
               .set_caption('Styled Fruit Report')
              )
styled_full

Gradient green-red on sales, yellow max qty, rupee signs – boss impressed!

Export Styled:

styled_full.to_html('report.html')  # Web page
styled_full.to_excel('styled.xlsx') # Excel keeps some style

Daily Examples:

Budget Table: Red overspend, green savings.
Class Marks: Green >80, amber 60-80, red below.
Shop Stock: Bold low stock items.
Blog Analytics: Gradient on top posts views.

For our content creation, style top viral videos table – thumbnails next to colored rows.

Advanced Styling:

Icons? Custom functions with HTML.
Themes: style.set_table_styles([{'selector': 'th', 'props': [('font-weight', 'bold')]}])
Conditional text: Hide negatives display: none if val<0.

In Google Colab, styles shine for sharing links.

Plot + Style Combo Workflow:

df.head().style – quick pretty peek.
Plot trends.
Style summary table.
Save both for presentation.

Imagine pitching freelance project: "See this chart? Sales up 30%!"

Engagement Boosters:

Use figsize=(12,8) for big screens.
plt.tight_layout() no overlap.
Subplots: fig, axs = plt.subplots(2,2); df.plot(ax=axs[0,0])

For YouTube Shorts, screenshot styled table – hook: "Pandas magic in 60s!"

We've visualized and styled – data now speaks loud and clear.

This wraps our Pandas journey, but real power comes in combining with your projects.

Now our tables dazzle with charts and colors, but what about huge files from big shops or exam results? Or slow code on old laptops? Let's add pro tricks for speed, big data, and fancy structures – like upgrading from cycle to bike for city traffic.

Big Data Tricks – Handle Giant Files

Million rows crash notebooks? Pandas reads smart, not all at once.

Chunksize in read_csv – Bite-Sized Loads

Like eating big mango one slice at a time. For 10GB sales log:

chunk_list = []
for chunk in pd.read_csv('huge_sales.csv', chunksize=10000):
    # Process small piece: clean, add column
    chunk['Total'] = chunk['Price'] * chunk['Qty']
    chunk_list.append(chunk)

big_df = pd.concat(chunk_list, ignore_index=True)

Each chunk 10k rows – memory safe! Process: filter errors per chunk.

When to Chunk Table:

File Size	Chunksize Tip	Example Scenario
<100MB	No need	Daily shop CSV
100MB-1GB	50,000	Monthly e-commerce
>1GB	10,000-100k	Yearly bank statements
Streaming	Process in loop, no save	Live website logs

Life Example: Gov scheme applicant list (lakhs rows) – chunk to find duplicates without crash.

Sample Large Files – Quick Taste

Don't load all – peek 10%.

df_sample = pd.read_csv('huge_file.csv', nrows=1000)  # First 1k rows
df_random = pd.read_csv('huge_file.csv', nrows=10000).sample(frac=0.1)  # 10% random
print(df_sample.shape)

Fast preview: averages, issues. Like tasting sabzi before full plate.

Sampling Types:

nrows=5000: Top rows.
skiprows=range(1,10000): Skip first 10k.
sample(n=1000): Random pick post-load.

For YouTube analytics dump – sample to spot viral patterns quick.

Speed Hacks – Run Like Flash

Loops slow like walking in Lucknow heat. Pandas loves vector ops.

Vector Operations – No Loops Needed

Bad: Loop over rows.

# Slow loop
for i in range(len(df)):
    df.loc[i, 'Discount'] = df.loc[i, 'Price'] * 0.1

Fast: Whole column!

df['Discount'] = df['Price'] * 0.1  # Vector – 100x faster!
df['Tax'] = df['Total'] * 0.18

Math on arrays – lightning!

Speed Comparison Table:

Method	Time for 1M Rows	Code Style	Use For
Loop (for)	30 seconds	Row by row	Never! Avoid
Vector (*)	0.1 seconds	df['New'] = df.A * df.B	Math, filters
apply()	2 seconds	df['New'].apply(func)	Simple functions
Vectorized str	0.5s	df['Name'].str.upper()	Text ops

Example: 10k student records – vector ages>18 in blink.

Apply vs Vector – Choose Wise

Apply runs function per row – ok for complex.

# apply example
def cat(price):
    if price > 50: return 'Premium'
    return 'Regular'

df['Category'] = df['Price'].apply(cat)

But vector better: df['Category'] = np.where(df['Price']>50, 'Premium', 'Regular') – faster!

Pro Rule: Vector first, apply last resort.

Freelance invoices: Vector discount calc saves hours monthly.

MultiIndex – Group Multiple Levels

Like nested folders: Sales > City > Month > Fruit.

From pivot_data:

multi = df_pivot.set_index(['City', 'Month'])
print(multi.index)  # MultiIndex

Access: multi.loc['Delhi'] – all Delhi rows.

Groupby multi-level:

grouped = df_pivot.groupby(['City', 'Fruit'])['Sales'].sum()
print(grouped)

City    Fruit
Delhi   Apple     500
        Banana    600
        Mango     300
Mumbai  Apple     850

Unstack to table: grouped.unstack().

MultiIndex Uses Table:

Structure	Code	Benefit
Set Index Multi	`set_index(['A','B'])`	Easy slice: loc['Delhi','Jan']
Groupby List	`groupby(['City','Month'])`	Nested sums/avgs
Pivot to Multi	`pivot_table(..., index=['City','Fruit'])`	Spreadsheet feel
Swap Levels	`swaplevel(0,1)`	Flip order

Example: Blog posts by Topic > Year > Month – top performer drill-down.

School: Marks by Class > Subject > Student.

Custom Functions – Your Own Tools

Lambda in Apply – Quick Math

Short functions: df['Profit'] = df['Sales'].apply(lambda x: x * 0.2)

Or complex:

df['Grade'] = df['Score'].apply(lambda s: 'A' if s>=90 else 'B' if s>=80 else 'C')

Inline power!

Complex Cleaning – Define Once, Use Many

def clean_name(name):
    return name.strip().title().replace('!!', '')

df['Clean_Name'] = df['Full_Name'].apply(clean_name)

Reusable: Phone validate, address standardize.

Custom Func Table:

Task	Lambda Example	Full Func When
Simple Calc	`lambda x: x*1.1` (10% hike)	Always
If-Else	`lambda p: 'High' if p>100 else 'Low'`	3+ conditions
Text Parse	`lambda t: t.split()[0]`	Regex needed
Date Custom	`lambda d: d.weekday() ==4` (Friday)	Business logic

Content Creator: Lambda video length to category: Shorts <60s.

Multiple Sheets Excel

One file, many tables.

with pd.ExcelWriter('report.xlsx') as writer:
    df_clean.to_excel(writer, sheet_name='Sales', index=False)
    pivot_table.to_excel(writer, sheet_name='Pivot')
    students.to_excel(writer, sheet_name='Students')

Boss opens: Tabs for all!

JSON and SQL – Modern Saves

JSON for apps:

df_clean.to_json('data.json', orient='records')  # List of dicts

SQL database:

from sqlalchemy import create_engine
engine = create_engine('sqlite:///shop.db')
df_clean.to_sql('sales', engine, if_exists='replace')

Query later: pd.read_sql('SELECT * FROM sales', engine)

Export Options Table:

Format	Code	Best For	Size/Features
Excel Multi	ExcelWriter()	Reports, bosses	Colors, sheets (50MB limit)
JSON	to_json(orient='records')	APIs, web apps	Human read, compact
SQL	to_sql()	Databases, reuse	Queryable, big data
Parquet	to_parquet() (needs pyarrow)	Fast load, big files	1/10 size of CSV
HTML	to_html()	Blogs, emails	Styled tables

Pro Tip: pd.options.display.max_columns = None before export – all columns.

For our blogs: Export styled to HTML, embed in posts.

Gov data: SQL for ongoing queries like "Ration card updates".

Full Pro Workflow:

Chunk big load.
Vector clean.
Multi-group.
Custom apply.
Plot + style.
Multi-export.

Like BCA project: Shop dashboard from raw CSV to Excel + SQL.

These hacks make Pandas your daily superpower – fast, big, flexible.

We've covered from basics to pro, ready for your next data adventure.

Pandas Mastery – Your Toolkit Ready

We've journeyed from simple tables to pro charts, big data, and speedy exports. Now, quick answers to common hurdles.

FAQ – Fast Fixes

Slow on big files? Use chunksize=10000 in read_csv.
Memory crash? Sample with nrows=1000 first.
No plots show? Add plt.show() or %matplotlib inline in Colab.
Wrong types? astype(float) or pd.to_datetime().
Export with style? df.style.to_excel() for basics.

In summary, Pandas turns messy data into clear stories – for shops, schools, blogs, or budgets. Start small: load, clean, plot. Practice on your files daily.

Takeaway: Copy our fruit shop code, tweak for your world. Share your first chart in comments – we're here to cheer!

Happy data crunching, friends!