What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026



What are Pandas? Let's Start with the Basics

Imagine you have a big pile of shopping receipts from your family's shop. Each receipt shows the names of items, prices, and how many you sold. Now, you want to quickly add up total sales or find the most popular item. Doing this by hand takes hours. That's where Pandas comes in – it's like a smart helper that organizes this messy paper into neat tables on your computer, so we can play with the numbers easily.

What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026

Pandas is a free tool for Python, the simple programming language many of us use. It helps us handle data in tables, just like a notebook where rows are your records and columns are details like name, age, or price. We call these tables DataFrames. Think of a DataFrame as an Excel sheet but super fast and powerful inside Python.

Inside Pandas, there are two main stars:  Series and DataFrame

  • A Series is like one single column from your table – say, just the list of prices. It's the smallest building block. 
  • A DataFrame is a bunch of Series put together side by side, making a full table with rows and columns.

Let's see this in action with a simple example from daily life. Suppose we run a small fruit shop in our neighborhood. We have lists of fruits, their prices, and quantities sold today. We'll turn these into a Pandas table right now.

First, we need Python ready. If you're new, no worry – just open a notebook like Jupyter or Google Colab (we'll talk more about setup soon). 

Here's how we make our first Series and DataFrame:

import pandas as pd # We bring Pandas in with this line, like inviting a friend # Simple lists from our fruit shop fruits = ['Apple', 'Banana', 'Orange', 'Mango'] prices = [50, 30, 40, 80] # in rupees per kg quantity = [10, 20, 15, 5] # Make Series – one column each fruit_series = pd.Series(fruits) price_series = pd.Series(prices) qty_series = pd.Series(quantity) print("Fruit names as Series:") print(fruit_series) print("\nPrices as Series:") print(price_series)

When we run this, Pandas shows:

text
Fruit names as Series: 0 Apple 1 Banana 2 Orange 3 Mango dtype: object Prices as Series: 0 50 1 30 2 40 3 80 dtype: int64

See? Each Series has numbers on the left (called index, like row numbers starting from 0) and our data on the right. It's simple, like labeling shelves in your shop.

Now, the fun part – let's glue these into a DataFrame, our full table:

# Make a DataFrame from lists – easy way data = { 'Fruit': fruits, 'Price': prices, 'Quantity': quantity } df = pd.DataFrame(data) print("Our first fruit shop table:") print(df)

Output looks like this neat table:

Fruit Price Quantity
0 Apple 50 10
1 Banana 30 20
2 Orange 40 15
3 Mango 80 5

Wow! Just a few lines, and we have a proper table. No more scribbling on paper. We can print it anytime with print(df), and it shows clearly. This DataFrame remembers everything – we can add more rows later, like tomorrow's sales.

Why is this useful in real life? Remember last Diwali when we tracked festival sales? Instead of Excel crashes with big files, Pandas handles thousands of rows without sweat. For example, if you're a teacher marking student scores, turn names and marks into a DataFrame. One command shows average marks: df['Marks'].mean() – instant result!

Let's try another everyday example. Suppose we track our weekly expenses for the home budget. Lists: items like 'Rice', 'Milk', 'Petrol'; costs: 500, 50, 2000; dates.

expenses_data = { 'Item': ['Rice', 'Milk', 'Petrol', 'Veggies'], 'Cost': [500, 50, 2000, 300], 'Date': ['2026-01-10', '2026-01-12', '2026-01-15', '2026-01-16'] } expense_df = pd.DataFrame(expenses_data) print("Home expenses table:") print(expense_df)

This prints:

Item Cost Date
0 Rice 500 2026-01-10
1 Milk 50 2026-01-12
2 Petrol 2000 2026-01-15
3 Veggies 300 2026-01-16

Perfect for spotting where money goes fast – Petrol is the big eater! Pandas adds smart labels automatically.

One cool thing: DataFrames can mix types. Names as text, numbers as math-ready values. If we do total_sales = df['Price'] * df['Quantity'], it gives . Boom, sales figures!

But wait, what if our lists are uneven? Pandas fixes it smartly or warns us. For shop owners like us in Lucknow markets, this means quick checks: "How many apples left?" – just slice the table.

To make it even clearer, here's a comparison table showing Series vs DataFrame, like choosing between a single notebook page or a full ledger book:

Feature Series (One Column) DataFrame (Full Table)
What it holds List like prices: Rows and columns: fruits + prices + qty
Daily Example Your phone's contact list (names only) Full phonebook with names, numbers, emails
Size Small, fast for one thing Big, handles shops or school records
Print Look Vertical list with index Grid like Excel sheet
Use When Quick math on one list Compare across items, like sales report

This table helps us see why DataFrame is the hero – it connects everything.

We can even make a DataFrame from a plain list of lists, like reading from a notebook:

simple_list = [ ['Apple', 50, 10], ['Banana', 30, 20], ['Orange', 40, 15] ] df_from_list = pd.DataFrame(simple_list, columns=['Fruit', 'Price', 'Quantity']) print(df_from_list)

Same neat table! Great for when data comes from forms or apps.

In our fruit shop, printing df daily lets customers see stock. Or for students, track homework scores. Pandas makes data feel like chatting with a friend – ask, and it answers.

Think of a teacher in our BCA class. We have 30 students' names and test scores. One DataFrame, and print(df) shows the class list instantly during roll call.

We've now built our first tables hands-on. It's exciting to see lists turn into something we can touch and change.

What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026

Now that we've set up our tables and peeked at the data, real life hits us – data from shops, schools, or forms is often messy. Prices missing here, wrong numbers there, repeats everywhere. Like vegetables from the market: some rotten, some doubled up. We need to clean it fast. Let's fix this dirty data step by step, using our fruit shop as the example.

We'll start with a messy DataFrame. Imagine we got sales data from three sellers, but emails forgot to fill some spots, quantities typed wrong, and one sale listed twice.

import pandas as pd messy_data = { 'Fruit': ['Apple', 'Banana', 'Orange', 'Mango', 'Apple', None], 'Price': [50, 30, '40', 80.0, 50, 60], 'Quantity': [10, 20, 15, 5, 10, 8], 'Seller': ['Ram', 'Shyam', 'Ram', 'Geeta', 'Ram', 'Shyam'] } df_messy = pd.DataFrame(messy_data) print("Messy shop data:") print(df_messy)

It looks like:

Fruit Price Quantity Seller
0 Apple 50 10 Ram
1 Banana 30 20 Shyam
2 Orange 40 15 Ram
3 Mango 80.0 5 Geeta
4 Apple 50 10 Ram
5 NaN 60 8 Shyam

See the problems? Empty fruit (None), Price as text '40', duplicate Apple row.

Handling Missing Values – No More Gaps

Missing data is common, like a seller forgetting to note quantity. Pandas shows them as NaN (Not a Number).

  • Drop them with dropna(): Remove whole rows with gaps.

df_no_missing = df_messy.dropna() print("After dropping missing:") print(df_no_missing)

This kills row 5. Good if gaps are few, but we lose sales info!

  • Fill them smartly with fillna(): Put average or zero instead.

Like filling missing weight on a parcel with shop average.

# Fill missing fruit with 'Unknown' df_messy['Fruit'] = df_messy['Fruit'].fillna('Unknown') # Fill any future gaps in Quantity with 0 df_messy['Quantity'] = df_messy['Quantity'].fillna(0) print("After filling:") print(df_messy)

Now no NaNs. For prices, we might fill with average: df_messy['Price'].fillna(df_messy['Price'].mean()).

Daily Example: In our home budget tracker, if petrol cost is missing one day, fill with last week's average – keeps total spend real.

Fix Wrong Data Types – Make Numbers Work

Fix Wrong Data Types – Make Numbers Work

Prices should be numbers for math, but '40' is text. Pandas mixes them, but math fails.

  • Change with astype(): Turn text to number.

df_messy['Price'] = df_messy['Price'].astype(float) # Now all numbers print("Fixed Price type:") print(df_messy['Price'].dtype) # Shows float64

Test: df_messy['Price'].mean() now works – average 51.67 rupees.

Pro Tip Table for types:

Data Problem Fix Command Example Output When to Use
Text as Number df['col'].astype(float) float64 Prices, ages for math
Dates as Text pd.to_datetime(df['Date']) datetime64 Sales by week
Yes/No Text df['col'].map({'Yes':1}) int64 Count approvals
Too Many Decimals df['col'].round(2) Still float Money: 50.00 not 49.999

Like fixing a bike speedometer – wrong units, no ride!

Remove Duplicates – Clean Repeats

Duplicate Apple row? Spot with eyes, but thousands? No way.

df_clean = df_messy.drop_duplicates() print("No duplicates:") print(df_clean)

Pops row 4. Keeps first one. For shops, this avoids double-counting sales.

Life Hack: In student attendance, drop_duplicates() removes kids listed twice by mistake.

Now our df_clean is shiny. Let's move to ordering and picking.

Sort and Filter – Find What We Need Fast

Clean data is great, but unsorted like a messy almirah. We sort and filter to spotlight stars.

Sort with sort_values() – High to Low or A to Z

Want top sellers first? Like arranging students by marks.

# Sort by Price high to low df_sorted = df_clean.sort_values('Price', ascending=False) print("Highest price first:") print(df_sorted)

Output:

Fruit Price Quantity Seller
3 Mango 80.0 5 Geeta
0 Apple 50.0 10 Ram
5 Unknown 60.0 8 Shyam
2 Orange 40.0 15 Ram
1 Banana 30.0 20 Shyam

Mango tops! Add by=['Price', 'Quantity'] for multi-sort.

Shop Example: Sort Quantity descending – restock Bananas first (20 sold).

Filter – Pick Matching Rows

Like "Show only adults over 25" or "Fruits above 40 rupees".

# Filter Price > 40 expensive = df_clean[df_clean['Price'] > 40] print("Expensive fruits:") print(expensive)

Gives Apple, Mango, Unknown. Use & for AND: df_clean[(df_clean['Price'] > 40) & (df_clean['Seller'] == 'Ram')].

Daily Use: Filter home expenses >500 – spot big spends like petrol.

Filter Quick Guide:

  • Equals: df[df['City']=='Delhi']

  • Greater: df[df['Sales']>1000]

  • In list: df[df['Fruit'].isin(['Apple','Banana'])]

  • Not null: df[df['Price'].notna()]

Add New Columns – Create Magic Numbers

Now, calculate totals without calculator. Like adding profit column.

df_clean['Total Sales'] = df_clean['Price'] * df_clean['Quantity'] print("With Total Sales:") print(df_clean)

New table:

Fruit Price Quantity Seller Total Sales
Apple 50.0 10 Ram 500.0
Banana 30.0 20 Shyam 600.0
... ... ... ... ...

Banana wins! Other math: df['Discount'] = df['Price'] * 0.1 for 10% off.

Examples:

  • Age group: df['Adult'] = df['Age'] > 18 (True/False)

  • Category: df['Type'] = np.where(df['Price']<50, 'Cheap', 'Premium')

For budgets: df['Daily Avg'] = df['Total']/7.

Group and Count – Team Up Data

Group by Seller – total sales per person?

sales_by_seller = df_clean.groupby('Seller')['Total Sales'].sum() print("Sales per seller:") print(sales_by_seller)

Ram: 500+600? Wait, with data: Ram high, Shyam next.

Average: .mean(). Count: .count().

Multi-group Table:

Group By Command Example Output Meaning
One Column groupby('City')['Sales'].sum() Total sales per city
Average groupby('Month')['Price'].mean() Avg price each month
Count groupby('Seller').size() How many sales per seller
Max groupby('Fruit')['Qty'].max() Peak quantity per fruit

Like school: groupby('Class')['Marks'].mean() – best class average.

Life Example: Group expenses by 'Item' – Rice eats most budget?

Save Clean Data – Keep It Forever

Work done? Save to file, share with team.

df_clean.to_csv('clean_fruit_sales.csv', index=False) # No row numbers df_clean.to_excel('clean_fruit_sales.xlsx', index=False)

Open in Excel anytime. Add index=False for clean look.

Save Options:

  • CSV: Small, works everywhere (shops share via WhatsApp)

  • Excel: Colors, formulas (boss reports)

  • JSON: For apps (online store)

Our shop now has 'clean_fruit_sales.csv' ready for tomorrow.

We've turned mess into gold – sorted, filtered, grouped, saved. Perfect for daily hustle.

Next, we'll combine tables like joining student lists with grades for full stories.

Building on our clean, saved tables, sometimes one group isn't enough – we need cross looks, like sales by city and month. Or join separate sheets. Plus, dates and text need tweaks, and trends over time. Let's level up with these tools, using school and sales examples from our daily world.

Multiple Groups with Pivot Table – Cross Views Easy

Pivot tables are like magic spreadsheets that twist data two ways. No Excel needed – Pandas does it in one line. Perfect for reports: "Show Banana sales by seller and month?"

First, let's make sample sales data for a chain of fruit shops.

import pandas as pd pivot_data = { 'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Mango', 'Apple'], 'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai', 'Delhi', 'Mumbai'], 'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb'], 'Sales': [500, 600, 400, 700, 300, 450] } df_pivot = pd.DataFrame(pivot_data) print("Raw sales:") print(df_pivot)

Now, pivot for Sales by City AND Month:

pivot_table = df_pivot.pivot_table(values='Sales', index='City', columns='Month', aggfunc='sum') print("Pivot: Sales by City & Month:") print(pivot_table)

Output table:

City Jan Feb
Delhi 1100 300
Mumbai 400 1150

Delhi rocks Jan, Mumbai Feb! Like a shop owner checking which city booms when.

Pivot Power Tips:

  • Multiple Values: aggfunc=['sum', 'mean'] – totals and averages.

  • Rows & Cols Swap: index='Month', columns='City'.

  • Fill Empty: fill_value=0 for zero gaps.

Daily Pivot Examples Table:

Scenario Pivot Code What It Shows
Shop Sales by Fruit/City index='Fruit', columns='City' Apples best in Delhi?
Student Marks by Subject/Class index='Class', columns='Subject' Math avg in Class 10
Expenses by Category/Month index='Month', columns='Category' Food spend jumps in festivals
Website Visits by Page/Day index='Day', columns='Page' Home page peaks weekends

For freelancers like us, pivot invoices by client and month – spot slow payers.

Join Tables with Merge – Combine Worlds

Got students list separate from marks? Merge glues them like stapling sheets.

Two tables:

students = pd.DataFrame({ 'Student_ID': [1, 2, 3, 4], 'Name': ['Amit', 'Priya', 'Ravi', 'Seema'], 'Age': [20, 21, 19, 22] }) marks = pd.DataFrame({ 'Student_ID': [1, 2, 3, 5], 'Subject': ['Math', 'Math', 'Math', 'Math'], 'Score': [85, 92, 78, 88] }) print("Students:") print(students) print("\nMarks:") print(marks)

Join on Student_ID:

full_data = pd.merge(students, marks, on='Student_ID', how='inner') # Only matching print("Merged students + marks:") print(full_data)

Result:

Student_ID Name Age Subject Score
1 Amit 20 Math 85
2 Priya 21 Math 92
3 Ravi 19 Math 78

Seema and ID5 missing – inner join skips non-matches.

Merge Types Table:

Type Code: how='...' Keeps What Example Use
inner 'inner' Only matches both sides Common students with marks
left 'left' All from left, match from right All students, even no marks
right 'right' All from right, match from left All marks, even unknown kids
outer 'outer' Everything, NaN for no match Full audit

Life Example: Merge customer orders with delivery status – track delays.

Time Data – Make Dates Smart

Time Data – Make Dates Smart

Dates as text? Can't group by week. Convert first.

time_data = { 'Date': ['2026-01-01', '2026-01-03', '2026-01-08', '2026-01-15'], 'Sales': [100, 150, 120, 200] } df_time = pd.DataFrame(time_data) df_time['Date'] = pd.to_datetime(df_time['Date']) # Fix to date type print("Dates fixed:") print(df_time['Date'].dtype) # datetime64

Now, sales by week:

df_time['Week'] = df_time['Date'].dt.isocalendar().week weekly_sales = df_time.groupby('Week')['Sales'].sum() print("Sales by week:") print(weekly_sales)

Or month: df_time['Month'] = df_time['Date'].dt.month_name().

Date Tricks:

  • Day name: .dt.day_name()

  • Year: .dt.year

  • Resample monthly: df_time.set_index('Date')['Sales'].resample('M').sum()

Shop Calendar Table:

Goal Code Output Example
Sales by Month df['Month'] = df['Date'].dt.month 1 for Jan
Weekday Trends df['Day'] = df['Date'].dt.weekday 0=Monday sales low?
Days Since Start df['Days'] = (df['Date'] - df['Date'].min()).dt.days Trend over time
Quarterly Total resample('Q').sum() Jan-Mar total

Like tracking YouTube views by upload date – peaks on weekends?

Want 3-day rolling average sales, but keep all days? Windows slide over data.

df_time = df_time.sort_values('Date') df_time['Rolling_Avg_3'] = df_time['Sales'].rolling(window=3).mean() print("With rolling average:") print(df_time)
Date Sales Rolling_Avg_3
2026-01-01 100 NaN
2026-01-03 150 NaN
2026-01-08 120 123.33
2026-01-15 200 156.67

Smooths ups/downs – sales steady?

Rank within groups:

df_pivot['Rank_City'] = df_pivot.groupby('City')['Sales'].rank(ascending=False) print("Rank per city:") print(df_pivot)

Window Examples:

  • Cumulative: .cumsum() – running total sales.

  • Shift: .shift(1) – yesterday's sales.

  • Percent rank: .rank(pct=True)

Freelancer Use: Rolling avg earnings over 7 days – steady income?

Text Cleaning – Tidy Names and Notes

Names like "Ram Kumar" with spaces? Split or clean.

df_text = pd.DataFrame({ 'Full_Name': ['Ram Kumar ', 'Priya Sharma!!', ' ravi singh'], 'City': ['Delhi', 'Mumbai', 'Delhi'] }) # Strip spaces df_text['Clean_Name'] = df_text['Full_Name'].str.strip().str.title() # Split first/last df_text['First_Name'] = df_text['Full_Name'].str.split().str[0].str.title() print("Cleaned text:") print(df_text)

Ram Kumar, Priya Sharma, Ravi Singh. Perfect.

Patterns: df['Phone'] = df['Text'].str.extract(r'(\d{10})') grabs 10-digit numbers.

Text Tools Table:

Task Code Example Before/After
Remove Spaces .str.strip() " ram " → "ram"
Upper/Lower .str.upper() "ram" → "RAM"
Split Words .str.split(expand=True) "A B" → cols A, B
Replace .str.replace('old', 'new') "bad" → "good"
Contains Pattern .str.contains('apple') True/False flag
Length .str.len() "abc" → 3

Example: Clean customer feedback – count "good" mentions.

Like fixing addresses for delivery – no more lost parcels.

These advanced moves turn raw info into insights, ready for visuals next.

Up next, we'll plot these tables into charts and style them pretty.

With our data cleaned, grouped, and analyzed, it's time to show it off – not just numbers, but pictures everyone understands. Like turning shop ledger into colorful charts for family or boss. Pandas has built-in plots, plus ways to make tables look pro. We'll use our fruit sales and school examples to draw them step by step.

Pandas Built-in Plots – Charts in Seconds

Pandas Built-in Plots – Charts in Seconds

Pandas plots with Matplotlib under the hood – no extra setup. Just df.plot() and boom! Great for quick checks, like "Does sales rise on weekends?"

First, recall our df_time with dates and sales:

Date Sales Rolling_Avg_3
2026-01-01 100 NaN
2026-01-03 150 NaN
2026-01-08 120 123.33
2026-01-15 200 156.67

Line Charts – Trends Over Time

Perfect for sales growth, like watching YouTube subscribers climb.

import matplotlib.pyplot as plt # Helper for titles df_time.set_index('Date')['Sales'].plot(kind='line', title='Daily Fruit Sales') plt.show()

This draws a line jumping 100→150→120→200. See the spike? Restock day!

Add rolling avg on same chart:

df_time.set_index('Date')[['Sales', 'Rolling_Avg_3']].plot(kind='line') plt.title('Sales with 3-Day Smooth') plt.ylabel('Rupees') plt.show()

Smooth blue line under wiggly sales – trends clear!

Line Chart Tips:

  • Multiple lines: Pass list of columns.

  • Zoom: xlim=('2026-01-01', '2026-01-15')

  • Markers: marker='o' for dots.

Daily Life Lines:

  • Home power bill over months – spot summer AC jump.

  • Weight tracker – steady loss? Good!

  • Freelance earnings weekly – ups after new client.

Bar Charts – Compare Categories

Who sold most? Bars shine for groups.

From df_pivot sales by city/month:

pivot_table.plot(kind='bar', title='Sales by City') plt.ylabel('Total Sales') plt.show()

Delhi tall bar, Mumbai shorter – easy compare!

Horizontal: kind='barh' for long names.

Grouped bars: Use pivot with fruits.

# From earlier df_clean df_clean.plot(x='Fruit', y='Total Sales', kind='bar', title='Sales per Fruit') plt.show()

Banana highest bar – stock more!

Bar Chart Guide Table:

Chart Type Code: kind='...' Best For Example Output Insight
bar 'bar' Compare groups side-by-side Fruits: Banana leads
barh 'barh' Long labels (cities, names) Sellers ranked
bar stacked stacked=True Parts to whole (sales/fruit) Total = Apple + Banana
bar grouped Use pivot first Multi-category (city/fruit) Delhi Apples vs Mumbai

School Example: df_students.plot(x='Name', y='Score', kind='bar') – Priya tops class visually.

Histograms and Pie – Distributions and Shares

Histogram: How many sales buckets? Like age groups in class.

sales_big = pd.DataFrame({'Sales': [100,150,200,300,120,500,80,250,400,90]}) sales_big['Sales'].plot(kind='hist', bins=5, title='Sales Distribution') plt.show()

Shows most sales 80-200, few high – normal shop day.

Pie for shares:

df_clean.groupby('Seller')['Total Sales'].sum().plot(kind='pie', autopct='%1.1f%%') plt.title('Seller Share') plt.show()

Ram 40%, etc. – bonus time?

Other Plots Table:

Plot Type Code Example Use Case Pro Tip
hist df['Col'].plot(kind='hist') Spread of numbers (prices) bins=10 for more detail
pie groupby().plot(kind='pie') % shares (expenses) autopct for labels
scatter df.plot.scatter(x='Price', y='Qty') Relation (high price low qty?) Spot outliers
box df['Sales'].plot(kind='box') Outliers in data Whiskers show range
area kind='area' Stacked trends over time Cumulative sales

YouTube Creator Example: Histogram of video views – most under 1k, virals above 10k.

For shop, df_clean.plot.scatter('Price', 'Quantity') – cheap fruits sell more?

Customize all:

df_clean.plot(kind='bar', color=['red','green','blue'], figsize=(10,6)) plt.title('Fruit Sales Colors') plt.xlabel('Fruits') plt.ylabel('Total') plt.legend() plt.show()

Bigger, colorful – report ready!

Full Plot Workflow:

  1. Clean data first.

  2. Group if needed.

  3. df.plot() – tweak title, labels.

  4. plt.savefig('chart.png') – save image for blog/YouTube.

Like our Pratap Solution blog – charts boost reader stay time 2x!

Style Tables – Make Them Pop

Plots great, but tables in notebooks/reports? Make them shine with colors.

Pandas style highlights like Excel conditional formatting.

From df_clean:

def highlight_max(s): is_max = s == s.max() return ['background-color: yellow' if v else '' for v in is_max] styled = df_clean.style.apply(highlight_max, subset=['Total Sales']).format({'Total Sales': '{:.0f}'}) styled # In Jupyter, shows pretty table

Yellow background on top sales row – eyes go there!

Color Conditions – Rules Like Traffic Lights

def color_sales(val): color = 'green' if val > 500 else 'orange' if val > 300 else 'red' return f'color: {color}' df_clean['Total Sales'].style.map(color_sales).format('{:.0f}')

Green for stars, red for low – quick scan!

Style Recipes Table:

Goal Code Snippet Effect
Highlight Max style.highlight_max() Bold/yellow top value
Min Lowlight style.highlight_min(props='color:red') Red for worst
Bars in Cells style.bar(subset=['Sales']) Mini bar chart in table
Percent Format style.format({'Pct': '{:.1%}'}) 25.0% nice
Precision style.format(precision=0) No decimals
Background Gradient style.background_gradient() Color fade high to low

Combine:

styled_full = (df_clean.style .background_gradient(subset='Total Sales') .highlight_max('Quantity') .format({'Price': '₹{:.0f}', 'Total Sales': '{:,.0f}'}) .set_caption('Styled Fruit Report') ) styled_full

Gradient green-red on sales, yellow max qty, rupee signs – boss impressed!

Export Styled:

styled_full.to_html('report.html') # Web page styled_full.to_excel('styled.xlsx') # Excel keeps some style

Daily Examples:

  • Budget Table: Red overspend, green savings.

  • Class Marks: Green >80, amber 60-80, red below.

  • Shop Stock: Bold low stock items.

  • Blog Analytics: Gradient on top posts views.

For our content creation, style top viral videos table – thumbnails next to colored rows.

Advanced Styling:

  • Icons? Custom functions with HTML.

  • Themes: style.set_table_styles([{'selector': 'th', 'props': [('font-weight', 'bold')]}])

  • Conditional text: Hide negatives display: none if val<0.

In Google Colab, styles shine for sharing links.

Plot + Style Combo Workflow:

  1. df.head().style – quick pretty peek.

  2. Plot trends.

  3. Style summary table.

  4. Save both for presentation.

Imagine pitching freelance project: "See this chart? Sales up 30%!"

Engagement Boosters:

  • Use figsize=(12,8) for big screens.

  • plt.tight_layout() no overlap.

  • Subplots: fig, axs = plt.subplots(2,2); df.plot(ax=axs[0,0])

For YouTube Shorts, screenshot styled table – hook: "Pandas magic in 60s!"

We've visualized and styled – data now speaks loud and clear.

This wraps our Pandas journey, but real power comes in combining with your projects.

Now our tables dazzle with charts and colors, but what about huge files from big shops or exam results? Or slow code on old laptops? Let's add pro tricks for speed, big data, and fancy structures – like upgrading from cycle to bike for city traffic.

Big Data Tricks – Handle Giant Files

Big Data Tricks – Handle Giant Files

Million rows crash notebooks? Pandas reads smart, not all at once.

Chunksize in read_csv – Bite-Sized Loads

Like eating big mango one slice at a time. For 10GB sales log:

chunk_list = [] for chunk in pd.read_csv('huge_sales.csv', chunksize=10000): # Process small piece: clean, add column chunk['Total'] = chunk['Price'] * chunk['Qty'] chunk_list.append(chunk) big_df = pd.concat(chunk_list, ignore_index=True)

Each chunk 10k rows – memory safe! Process: filter errors per chunk.

When to Chunk Table:

File Size Chunksize Tip Example Scenario
<100MB No need Daily shop CSV
100MB-1GB 50,000 Monthly e-commerce
>1GB 10,000-100k Yearly bank statements
Streaming Process in loop, no save Live website logs

Life Example: Gov scheme applicant list (lakhs rows) – chunk to find duplicates without crash.

Sample Large Files – Quick Taste

Don't load all – peek 10%.

df_sample = pd.read_csv('huge_file.csv', nrows=1000) # First 1k rows df_random = pd.read_csv('huge_file.csv', nrows=10000).sample(frac=0.1) # 10% random print(df_sample.shape)

Fast preview: averages, issues. Like tasting sabzi before full plate.

Sampling Types:

  • nrows=5000: Top rows.

  • skiprows=range(1,10000): Skip first 10k.

  • sample(n=1000): Random pick post-load.

For YouTube analytics dump – sample to spot viral patterns quick.

Speed Hacks – Run Like Flash

Loops slow like walking in Lucknow heat. Pandas loves vector ops.

Vector Operations – No Loops Needed

Bad: Loop over rows.

# Slow loop for i in range(len(df)): df.loc[i, 'Discount'] = df.loc[i, 'Price'] * 0.1

Fast: Whole column!

df['Discount'] = df['Price'] * 0.1 # Vector – 100x faster! df['Tax'] = df['Total'] * 0.18

Math on arrays – lightning!

Speed Comparison Table:

Method Time for 1M Rows Code Style Use For
Loop (for) 30 seconds Row by row Never! Avoid
Vector (*) 0.1 seconds df['New'] = df.A * df.B Math, filters
apply() 2 seconds df['New'].apply(func) Simple functions
Vectorized str 0.5s df['Name'].str.upper() Text ops

Example: 10k student records – vector ages>18 in blink.

Apply vs Vector – Choose Wise

Apply runs function per row – ok for complex.

# apply example def cat(price): if price > 50: return 'Premium' return 'Regular' df['Category'] = df['Price'].apply(cat)

But vector better: df['Category'] = np.where(df['Price']>50, 'Premium', 'Regular') – faster!

Pro Rule: Vector first, apply last resort.

Freelance invoices: Vector discount calc saves hours monthly.

MultiIndex – Group Multiple Levels

Like nested folders: Sales > City > Month > Fruit.

From pivot_data:

multi = df_pivot.set_index(['City', 'Month']) print(multi.index) # MultiIndex

Access: multi.loc['Delhi'] – all Delhi rows.

Groupby multi-level:

grouped = df_pivot.groupby(['City', 'Fruit'])['Sales'].sum() print(grouped)
City Fruit Delhi Apple 500 Banana 600 Mango 300 Mumbai Apple 850

Unstack to table: grouped.unstack().

MultiIndex Uses Table:

Structure Code Benefit
Set Index Multi set_index(['A','B']) Easy slice: loc['Delhi','Jan']
Groupby List groupby(['City','Month']) Nested sums/avgs
Pivot to Multi pivot_table(..., index=['City','Fruit']) Spreadsheet feel
Swap Levels swaplevel(0,1) Flip order

Example: Blog posts by Topic > Year > Month – top performer drill-down.

School: Marks by Class > Subject > Student.

Custom Functions – Your Own Tools

Lambda in Apply – Quick Math

Short functions: df['Profit'] = df['Sales'].apply(lambda x: x * 0.2)

Or complex:

df['Grade'] = df['Score'].apply(lambda s: 'A' if s>=90 else 'B' if s>=80 else 'C')

Inline power!

Complex Cleaning – Define Once, Use Many

def clean_name(name): return name.strip().title().replace('!!', '') df['Clean_Name'] = df['Full_Name'].apply(clean_name)

Reusable: Phone validate, address standardize.

Custom Func Table:

Task Lambda Example Full Func When
Simple Calc lambda x: x*1.1 (10% hike) Always
If-Else lambda p: 'High' if p>100 else 'Low' 3+ conditions
Text Parse lambda t: t.split()[0] Regex needed
Date Custom lambda d: d.weekday() ==4 (Friday) Business logic

Content Creator: Lambda video length to category: Shorts <60s.

Export Pro – Share Like Boss

Multiple Sheets Excel

One file, many tables.

with pd.ExcelWriter('report.xlsx') as writer: df_clean.to_excel(writer, sheet_name='Sales', index=False) pivot_table.to_excel(writer, sheet_name='Pivot') students.to_excel(writer, sheet_name='Students')

Boss opens: Tabs for all!

JSON and SQL – Modern Saves

JSON for apps:

df_clean.to_json('data.json', orient='records') # List of dicts

SQL database:

from sqlalchemy import create_engine engine = create_engine('sqlite:///shop.db') df_clean.to_sql('sales', engine, if_exists='replace')

Query later: pd.read_sql('SELECT * FROM sales', engine)

Export Options Table:

Format Code Best For Size/Features
Excel Multi ExcelWriter() Reports, bosses Colors, sheets (50MB limit)
JSON to_json(orient='records') APIs, web apps Human read, compact
SQL to_sql() Databases, reuse Queryable, big data
Parquet to_parquet() (needs pyarrow) Fast load, big files 1/10 size of CSV
HTML to_html() Blogs, emails Styled tables

Pro Tip: pd.options.display.max_columns = None before export – all columns.

For our blogs: Export styled to HTML, embed in posts.

Gov data: SQL for ongoing queries like "Ration card updates".

Full Pro Workflow:

  1. Chunk big load.

  2. Vector clean.

  3. Multi-group.

  4. Custom apply.

  5. Plot + style.

  6. Multi-export.

Like BCA project: Shop dashboard from raw CSV to Excel + SQL.

These hacks make Pandas your daily superpower – fast, big, flexible.

We've covered from basics to pro, ready for your next data adventure.

Pandas Mastery – Your Toolkit Ready


We've journeyed from simple tables to pro charts, big data, and speedy exports. Now, quick answers to common hurdles.

FAQ – Fast Fixes

  • Slow on big files? Use chunksize=10000 in read_csv.

  • Memory crash? Sample with nrows=1000 first.

  • No plots show? Add plt.show() or %matplotlib inline in Colab.

  • Wrong types? astype(float) or pd.to_datetime().

  • Export with style? df.style.to_excel() for basics.

In summary, Pandas turns messy data into clear stories – for shops, schools, blogs, or budgets. Start small: load, clean, plot. Practice on your files daily.

Takeaway: Copy our fruit shop code, tweak for your world. Share your first chart in comments – we're here to cheer!

Happy data crunching, friends!