What Are Pandas in Python? Easy Step-by-Step Guide for Beginners 2026
What are Pandas? Let's Start with the Basics
Imagine you have a big pile of shopping receipts from your family's shop. Each receipt shows the names of items, prices, and how many you sold. Now, you want to quickly add up total sales or find the most popular item. Doing this by hand takes hours. That's where Pandas comes in – it's like a smart helper that organizes this messy paper into neat tables on your computer, so we can play with the numbers easily.
Pandas is a free tool for Python, the simple programming language many of us use. It helps us handle data in tables, just like a notebook where rows are your records and columns are details like name, age, or price. We call these tables DataFrames. Think of a DataFrame as an Excel sheet but super fast and powerful inside Python.
Inside Pandas, there are two main stars: Series and DataFrame.
- A Series is like one single column from your table – say, just the list of prices. It's the smallest building block.
- A DataFrame is a bunch of Series put together side by side, making a full table with rows and columns.
Let's see this in action with a simple example from daily life. Suppose we run a small fruit shop in our neighborhood. We have lists of fruits, their prices, and quantities sold today. We'll turn these into a Pandas table right now.
First, we need Python ready. If you're new, no worry – just open a notebook like Jupyter or Google Colab (we'll talk more about setup soon).
Here's how we make our first Series and DataFrame:
import pandas as pd # We bring Pandas in with this line, like inviting a friend # Simple lists from our fruit shop fruits = ['Apple', 'Banana', 'Orange', 'Mango'] prices = [50, 30, 40, 80] # in rupees per kg quantity = [10, 20, 15, 5] # Make Series – one column each fruit_series = pd.Series(fruits) price_series = pd.Series(prices) qty_series = pd.Series(quantity) print("Fruit names as Series:") print(fruit_series) print("\nPrices as Series:") print(price_series)
When we run this, Pandas shows:
textFruit names as Series: 0 Apple 1 Banana 2 Orange 3 Mango dtype: object Prices as Series: 0 50 1 30 2 40 3 80 dtype: int64
See? Each Series has numbers on the left (called index, like row numbers starting from 0) and our data on the right. It's simple, like labeling shelves in your shop.
Now, the fun part – let's glue these into a DataFrame, our full table:
# Make a DataFrame from lists – easy way data = { 'Fruit': fruits, 'Price': prices, 'Quantity': quantity } df = pd.DataFrame(data) print("Our first fruit shop table:") print(df)
Output looks like this neat table:
| Fruit | Price | Quantity | |
|---|---|---|---|
| 0 | Apple | 50 | 10 |
| 1 | Banana | 30 | 20 |
| 2 | Orange | 40 | 15 |
| 3 | Mango | 80 | 5 |
Wow! Just a few lines, and we have a proper table. No more scribbling on
paper. We can print it anytime with print(df), and it shows
clearly. This DataFrame remembers everything – we can add more rows later,
like tomorrow's sales.
Why is this useful in real life? Remember last Diwali when we tracked
festival sales? Instead of Excel crashes with big files, Pandas handles
thousands of rows without sweat. For example, if you're a teacher marking
student scores, turn names and marks into a DataFrame. One command shows
average marks: df['Marks'].mean() – instant result!
Let's try another everyday example. Suppose we track our weekly expenses for the home budget. Lists: items like 'Rice', 'Milk', 'Petrol'; costs: 500, 50, 2000; dates.
expenses_data = { 'Item': ['Rice', 'Milk', 'Petrol', 'Veggies'], 'Cost': [500, 50, 2000, 300], 'Date': ['2026-01-10', '2026-01-12', '2026-01-15', '2026-01-16'] } expense_df = pd.DataFrame(expenses_data) print("Home expenses table:") print(expense_df)
This prints:
| Item | Cost | Date | |
|---|---|---|---|
| 0 | Rice | 500 | 2026-01-10 |
| 1 | Milk | 50 | 2026-01-12 |
| 2 | Petrol | 2000 | 2026-01-15 |
| 3 | Veggies | 300 | 2026-01-16 |
Perfect for spotting where money goes fast – Petrol is the big eater! Pandas adds smart labels automatically.
One cool thing: DataFrames can mix types. Names as text, numbers as
math-ready values. If we do
total_sales = df['Price'] * df['Quantity'], it gives . Boom,
sales figures!
But wait, what if our lists are uneven? Pandas fixes it smartly or warns us. For shop owners like us in Lucknow markets, this means quick checks: "How many apples left?" – just slice the table.
To make it even clearer, here's a comparison table showing Series vs DataFrame, like choosing between a single notebook page or a full ledger book:
| Feature | Series (One Column) | DataFrame (Full Table) |
|---|---|---|
| What it holds | List like prices: | Rows and columns: fruits + prices + qty |
| Daily Example | Your phone's contact list (names only) | Full phonebook with names, numbers, emails |
| Size | Small, fast for one thing | Big, handles shops or school records |
| Print Look | Vertical list with index | Grid like Excel sheet |
| Use When | Quick math on one list | Compare across items, like sales report |
This table helps us see why DataFrame is the hero – it connects everything.
We can even make a DataFrame from a plain list of lists, like reading from a notebook:
simple_list = [ ['Apple', 50, 10], ['Banana', 30, 20], ['Orange', 40, 15] ] df_from_list = pd.DataFrame(simple_list, columns=['Fruit', 'Price', 'Quantity']) print(df_from_list)
Same neat table! Great for when data comes from forms or apps.
In our fruit shop, printing df daily lets customers see stock.
Or for students, track homework scores. Pandas makes data feel like chatting
with a friend – ask, and it answers.
Think of a teacher in our BCA class. We have 30 students' names and test
scores. One DataFrame, and print(df) shows the class list
instantly during roll call.
We've now built our first tables hands-on. It's exciting to see lists turn into something we can touch and change.
Now that we've set up our tables and peeked at the data, real life hits us – data from shops, schools, or forms is often messy. Prices missing here, wrong numbers there, repeats everywhere. Like vegetables from the market: some rotten, some doubled up. We need to clean it fast. Let's fix this dirty data step by step, using our fruit shop as the example.
We'll start with a messy DataFrame. Imagine we got sales data from three sellers, but emails forgot to fill some spots, quantities typed wrong, and one sale listed twice.
import pandas as pd messy_data = { 'Fruit': ['Apple', 'Banana', 'Orange', 'Mango', 'Apple', None], 'Price': [50, 30, '40', 80.0, 50, 60], 'Quantity': [10, 20, 15, 5, 10, 8], 'Seller': ['Ram', 'Shyam', 'Ram', 'Geeta', 'Ram', 'Shyam'] } df_messy = pd.DataFrame(messy_data) print("Messy shop data:") print(df_messy)
It looks like:
| Fruit | Price | Quantity | Seller | |
|---|---|---|---|---|
| 0 | Apple | 50 | 10 | Ram |
| 1 | Banana | 30 | 20 | Shyam |
| 2 | Orange | 40 | 15 | Ram |
| 3 | Mango | 80.0 | 5 | Geeta |
| 4 | Apple | 50 | 10 | Ram |
| 5 | NaN | 60 | 8 | Shyam |
See the problems? Empty fruit (None), Price as text '40', duplicate Apple row.
Handling Missing Values – No More Gaps
Missing data is common, like a seller forgetting to note quantity. Pandas shows them as NaN (Not a Number).
-
Drop them with dropna(): Remove whole rows with gaps.
df_no_missing = df_messy.dropna() print("After dropping missing:") print(df_no_missing)
This kills row 5. Good if gaps are few, but we lose sales info!
-
Fill them smartly with fillna(): Put average or zero instead.
Like filling missing weight on a parcel with shop average.
# Fill missing fruit with 'Unknown' df_messy['Fruit'] = df_messy['Fruit'].fillna('Unknown') # Fill any future gaps in Quantity with 0 df_messy['Quantity'] = df_messy['Quantity'].fillna(0) print("After filling:") print(df_messy)
Now no NaNs. For prices, we might fill with average:
df_messy['Price'].fillna(df_messy['Price'].mean()).
Daily Example: In our home budget tracker, if petrol cost is missing one day, fill with last week's average – keeps total spend real.
Fix Wrong Data Types – Make Numbers Work
Prices should be numbers for math, but '40' is text. Pandas mixes them, but math fails.
-
Change with astype(): Turn text to number.
df_messy['Price'] = df_messy['Price'].astype(float) # Now all numbers print("Fixed Price type:") print(df_messy['Price'].dtype) # Shows float64
Test: df_messy['Price'].mean() now works – average 51.67
rupees.
Pro Tip Table for types:
| Data Problem | Fix Command | Example Output | When to Use |
|---|---|---|---|
| Text as Number |
df['col'].astype(float)
|
float64 | Prices, ages for math |
| Dates as Text |
pd.to_datetime(df['Date'])
|
datetime64 | Sales by week |
| Yes/No Text |
df['col'].map({'Yes':1})
|
int64 | Count approvals |
| Too Many Decimals |
df['col'].round(2)
|
Still float | Money: 50.00 not 49.999 |
Like fixing a bike speedometer – wrong units, no ride!
Remove Duplicates – Clean Repeats
Duplicate Apple row? Spot with eyes, but thousands? No way.
df_clean = df_messy.drop_duplicates() print("No duplicates:") print(df_clean)
Pops row 4. Keeps first one. For shops, this avoids double-counting sales.
Life Hack: In student attendance, drop_duplicates() removes kids listed twice by mistake.
Now our df_clean is shiny. Let's move to ordering and picking.
Sort and Filter – Find What We Need Fast
Clean data is great, but unsorted like a messy almirah. We sort and filter to spotlight stars.
Sort with sort_values() – High to Low or A to Z
Want top sellers first? Like arranging students by marks.
# Sort by Price high to low df_sorted = df_clean.sort_values('Price', ascending=False) print("Highest price first:") print(df_sorted)
Output:
| Fruit | Price | Quantity | Seller | |
|---|---|---|---|---|
| 3 | Mango | 80.0 | 5 | Geeta |
| 0 | Apple | 50.0 | 10 | Ram |
| 5 | Unknown | 60.0 | 8 | Shyam |
| 2 | Orange | 40.0 | 15 | Ram |
| 1 | Banana | 30.0 | 20 | Shyam |
Mango tops! Add by=['Price', 'Quantity'] for multi-sort.
Shop Example: Sort Quantity descending – restock Bananas first (20 sold).
Filter – Pick Matching Rows
Like "Show only adults over 25" or "Fruits above 40 rupees".
# Filter Price > 40 expensive = df_clean[df_clean['Price'] > 40] print("Expensive fruits:") print(expensive)
Gives Apple, Mango, Unknown. Use & for AND:
df_clean[(df_clean['Price'] > 40) & (df_clean['Seller'] ==
'Ram')].
Daily Use: Filter home expenses >500 – spot big spends like petrol.
Filter Quick Guide:
-
Equals:
df[df['City']=='Delhi'] -
Greater:
df[df['Sales']>1000] -
In list:
df[df['Fruit'].isin(['Apple','Banana'])] -
Not null:
df[df['Price'].notna()]
Add New Columns – Create Magic Numbers
Now, calculate totals without calculator. Like adding profit column.
df_clean['Total Sales'] = df_clean['Price'] * df_clean['Quantity'] print("With Total Sales:") print(df_clean)
New table:
| Fruit | Price | Quantity | Seller | Total Sales |
|---|---|---|---|---|
| Apple | 50.0 | 10 | Ram | 500.0 |
| Banana | 30.0 | 20 | Shyam | 600.0 |
| ... | ... | ... | ... | ... |
Banana wins! Other math:
df['Discount'] = df['Price'] * 0.1 for 10% off.
Examples:
-
Age group:
df['Adult'] = df['Age'] > 18(True/False) -
Category:
df['Type'] = np.where(df['Price']<50, 'Cheap', 'Premium')
For budgets: df['Daily Avg'] = df['Total']/7.
Group and Count – Team Up Data
Group by Seller – total sales per person?
sales_by_seller = df_clean.groupby('Seller')['Total Sales'].sum() print("Sales per seller:") print(sales_by_seller)
Ram: 500+600? Wait, with data: Ram high, Shyam next.
Average: .mean(). Count: .count().
Multi-group Table:
| Group By | Command Example | Output Meaning |
|---|---|---|
| One Column |
groupby('City')['Sales'].sum()
|
Total sales per city |
| Average |
groupby('Month')['Price'].mean()
|
Avg price each month |
| Count |
groupby('Seller').size()
|
How many sales per seller |
| Max |
groupby('Fruit')['Qty'].max()
|
Peak quantity per fruit |
Like school: groupby('Class')['Marks'].mean() – best class average.
Life Example: Group expenses by 'Item' – Rice eats most budget?
Save Clean Data – Keep It Forever
Work done? Save to file, share with team.
df_clean.to_csv('clean_fruit_sales.csv', index=False) # No row numbers df_clean.to_excel('clean_fruit_sales.xlsx', index=False)
Open in Excel anytime. Add index=False for clean look.
Save Options:
-
CSV: Small, works everywhere (shops share via WhatsApp)
-
Excel: Colors, formulas (boss reports)
-
JSON: For apps (online store)
Our shop now has 'clean_fruit_sales.csv' ready for tomorrow.
We've turned mess into gold – sorted, filtered, grouped, saved. Perfect for daily hustle.
Next, we'll combine tables like joining student lists with grades for full stories.
Building on our clean, saved tables, sometimes one group isn't enough – we need cross looks, like sales by city and month. Or join separate sheets. Plus, dates and text need tweaks, and trends over time. Let's level up with these tools, using school and sales examples from our daily world.
Multiple Groups with Pivot Table – Cross Views Easy
Pivot tables are like magic spreadsheets that twist data two ways. No Excel needed – Pandas does it in one line. Perfect for reports: "Show Banana sales by seller and month?"
First, let's make sample sales data for a chain of fruit shops.
import pandas as pd pivot_data = { 'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Mango', 'Apple'], 'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai', 'Delhi', 'Mumbai'], 'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb'], 'Sales': [500, 600, 400, 700, 300, 450] } df_pivot = pd.DataFrame(pivot_data) print("Raw sales:") print(df_pivot)
Now, pivot for Sales by City AND Month:
pivot_table = df_pivot.pivot_table(values='Sales', index='City', columns='Month', aggfunc='sum') print("Pivot: Sales by City & Month:") print(pivot_table)
Output table:
| City | Jan | Feb |
|---|---|---|
| Delhi | 1100 | 300 |
| Mumbai | 400 | 1150 |
Delhi rocks Jan, Mumbai Feb! Like a shop owner checking which city booms when.
Pivot Power Tips:
-
Multiple Values:
aggfunc=['sum', 'mean']– totals and averages. -
Rows & Cols Swap: index='Month', columns='City'.
-
Fill Empty:
fill_value=0for zero gaps.
Daily Pivot Examples Table:
| Scenario | Pivot Code | What It Shows |
|---|---|---|
| Shop Sales by Fruit/City | index='Fruit', columns='City' | Apples best in Delhi? |
| Student Marks by Subject/Class | index='Class', columns='Subject' | Math avg in Class 10 |
| Expenses by Category/Month | index='Month', columns='Category' | Food spend jumps in festivals |
| Website Visits by Page/Day | index='Day', columns='Page' | Home page peaks weekends |
For freelancers like us, pivot invoices by client and month – spot slow payers.
Join Tables with Merge – Combine Worlds
Got students list separate from marks? Merge glues them like stapling sheets.
Two tables:
students = pd.DataFrame({ 'Student_ID': [1, 2, 3, 4], 'Name': ['Amit', 'Priya', 'Ravi', 'Seema'], 'Age': [20, 21, 19, 22] }) marks = pd.DataFrame({ 'Student_ID': [1, 2, 3, 5], 'Subject': ['Math', 'Math', 'Math', 'Math'], 'Score': [85, 92, 78, 88] }) print("Students:") print(students) print("\nMarks:") print(marks)
Join on Student_ID:
full_data = pd.merge(students, marks, on='Student_ID', how='inner') # Only matching print("Merged students + marks:") print(full_data)
Result:
| Student_ID | Name | Age | Subject | Score |
|---|---|---|---|---|
| 1 | Amit | 20 | Math | 85 |
| 2 | Priya | 21 | Math | 92 |
| 3 | Ravi | 19 | Math | 78 |
Seema and ID5 missing – inner join skips non-matches.
Merge Types Table:
| Type | Code: how='...' | Keeps What | Example Use |
|---|---|---|---|
| inner | 'inner' | Only matches both sides | Common students with marks |
| left | 'left' | All from left, match from right | All students, even no marks |
| right | 'right' | All from right, match from left | All marks, even unknown kids |
| outer | 'outer' | Everything, NaN for no match | Full audit |
Life Example: Merge customer orders with delivery status – track delays.
Time Data – Make Dates Smart
Dates as text? Can't group by week. Convert first.
time_data = { 'Date': ['2026-01-01', '2026-01-03', '2026-01-08', '2026-01-15'], 'Sales': [100, 150, 120, 200] } df_time = pd.DataFrame(time_data) df_time['Date'] = pd.to_datetime(df_time['Date']) # Fix to date type print("Dates fixed:") print(df_time['Date'].dtype) # datetime64
Now, sales by week:
df_time['Week'] = df_time['Date'].dt.isocalendar().week weekly_sales = df_time.groupby('Week')['Sales'].sum() print("Sales by week:") print(weekly_sales)
Or month:
df_time['Month'] = df_time['Date'].dt.month_name().
Date Tricks:
-
Day name:
.dt.day_name() -
Year:
.dt.year -
Resample monthly:
df_time.set_index('Date')['Sales'].resample('M').sum()
Shop Calendar Table:
| Goal | Code | Output Example |
|---|---|---|
| Sales by Month |
df['Month'] = df['Date'].dt.month
|
1 for Jan |
| Weekday Trends |
df['Day'] = df['Date'].dt.weekday
|
0=Monday sales low? |
| Days Since Start |
df['Days'] = (df['Date'] - df['Date'].min()).dt.days
|
Trend over time |
| Quarterly Total |
resample('Q').sum()
|
Jan-Mar total |
Like tracking YouTube views by upload date – peaks on weekends?
Window Functions – Trends Without Losing Rows
Want 3-day rolling average sales, but keep all days? Windows slide over data.
df_time = df_time.sort_values('Date') df_time['Rolling_Avg_3'] = df_time['Sales'].rolling(window=3).mean() print("With rolling average:") print(df_time)
| Date | Sales | Rolling_Avg_3 |
|---|---|---|
| 2026-01-01 | 100 | NaN |
| 2026-01-03 | 150 | NaN |
| 2026-01-08 | 120 | 123.33 |
| 2026-01-15 | 200 | 156.67 |
Smooths ups/downs – sales steady?
Rank within groups:
df_pivot['Rank_City'] = df_pivot.groupby('City')['Sales'].rank(ascending=False) print("Rank per city:") print(df_pivot)
Window Examples:
-
Cumulative:
.cumsum()– running total sales. -
Shift:
.shift(1)– yesterday's sales. -
Percent rank:
.rank(pct=True)
Freelancer Use: Rolling avg earnings over 7 days – steady income?
Text Cleaning – Tidy Names and Notes
Names like "Ram Kumar" with spaces? Split or clean.
df_text = pd.DataFrame({ 'Full_Name': ['Ram Kumar ', 'Priya Sharma!!', ' ravi singh'], 'City': ['Delhi', 'Mumbai', 'Delhi'] }) # Strip spaces df_text['Clean_Name'] = df_text['Full_Name'].str.strip().str.title() # Split first/last df_text['First_Name'] = df_text['Full_Name'].str.split().str[0].str.title() print("Cleaned text:") print(df_text)
Ram Kumar, Priya Sharma, Ravi Singh. Perfect.
Patterns:
df['Phone'] = df['Text'].str.extract(r'(\d{10})') grabs
10-digit numbers.
Text Tools Table:
| Task | Code Example | Before/After |
|---|---|---|
| Remove Spaces |
.str.strip()
|
" ram " → "ram" |
| Upper/Lower |
.str.upper()
|
"ram" → "RAM" |
| Split Words |
.str.split(expand=True)
|
"A B" → cols A, B |
| Replace |
.str.replace('old', 'new')
|
"bad" → "good" |
| Contains Pattern |
.str.contains('apple')
|
True/False flag |
| Length |
.str.len()
|
"abc" → 3 |
Example: Clean customer feedback – count "good" mentions.
Like fixing addresses for delivery – no more lost parcels.
These advanced moves turn raw info into insights, ready for visuals next.
Up next, we'll plot these tables into charts and style them pretty.
With our data cleaned, grouped, and analyzed, it's time to show it off – not just numbers, but pictures everyone understands. Like turning shop ledger into colorful charts for family or boss. Pandas has built-in plots, plus ways to make tables look pro. We'll use our fruit sales and school examples to draw them step by step.
Pandas Built-in Plots – Charts in Seconds
Pandas plots with Matplotlib under the hood – no extra setup. Just
df.plot() and boom! Great for quick checks, like "Does
sales rise on weekends?"
First, recall our df_time with dates and sales:
| Date | Sales | Rolling_Avg_3 |
|---|---|---|
| 2026-01-01 | 100 | NaN |
| 2026-01-03 | 150 | NaN |
| 2026-01-08 | 120 | 123.33 |
| 2026-01-15 | 200 | 156.67 |
Line Charts – Trends Over Time
Perfect for sales growth, like watching YouTube subscribers climb.
import matplotlib.pyplot as plt # Helper for titles df_time.set_index('Date')['Sales'].plot(kind='line', title='Daily Fruit Sales') plt.show()
This draws a line jumping 100→150→120→200. See the spike? Restock day!
Add rolling avg on same chart:
df_time.set_index('Date')[['Sales', 'Rolling_Avg_3']].plot(kind='line') plt.title('Sales with 3-Day Smooth') plt.ylabel('Rupees') plt.show()
Smooth blue line under wiggly sales – trends clear!
Line Chart Tips:
-
Multiple lines: Pass list of columns.
-
Zoom:
xlim=('2026-01-01', '2026-01-15') -
Markers:
marker='o'for dots.
Daily Life Lines:
-
Home power bill over months – spot summer AC jump.
-
Weight tracker – steady loss? Good!
-
Freelance earnings weekly – ups after new client.
Bar Charts – Compare Categories
Who sold most? Bars shine for groups.
From df_pivot sales by city/month:
pivot_table.plot(kind='bar', title='Sales by City') plt.ylabel('Total Sales') plt.show()
Delhi tall bar, Mumbai shorter – easy compare!
Horizontal: kind='barh' for long names.
Grouped bars: Use pivot with fruits.
# From earlier df_clean df_clean.plot(x='Fruit', y='Total Sales', kind='bar', title='Sales per Fruit') plt.show()
Banana highest bar – stock more!
Bar Chart Guide Table:
| Chart Type | Code: kind='...' | Best For | Example Output Insight |
|---|---|---|---|
| bar | 'bar' | Compare groups side-by-side | Fruits: Banana leads |
| barh | 'barh' | Long labels (cities, names) | Sellers ranked |
| bar stacked | stacked=True | Parts to whole (sales/fruit) | Total = Apple + Banana |
| bar grouped | Use pivot first | Multi-category (city/fruit) | Delhi Apples vs Mumbai |
School Example:
df_students.plot(x='Name', y='Score', kind='bar') – Priya
tops class visually.
Histograms and Pie – Distributions and Shares
Histogram: How many sales buckets? Like age groups in class.
sales_big = pd.DataFrame({'Sales': [100,150,200,300,120,500,80,250,400,90]}) sales_big['Sales'].plot(kind='hist', bins=5, title='Sales Distribution') plt.show()
Shows most sales 80-200, few high – normal shop day.
Pie for shares:
df_clean.groupby('Seller')['Total Sales'].sum().plot(kind='pie', autopct='%1.1f%%') plt.title('Seller Share') plt.show()
Ram 40%, etc. – bonus time?
Other Plots Table:
| Plot Type | Code Example | Use Case | Pro Tip |
|---|---|---|---|
| hist |
df['Col'].plot(kind='hist')
|
Spread of numbers (prices) | bins=10 for more detail |
| pie |
groupby().plot(kind='pie')
|
% shares (expenses) | autopct for labels |
| scatter |
df.plot.scatter(x='Price', y='Qty')
|
Relation (high price low qty?) | Spot outliers |
| box |
df['Sales'].plot(kind='box')
|
Outliers in data | Whiskers show range |
| area |
kind='area'
|
Stacked trends over time | Cumulative sales |
YouTube Creator Example: Histogram of video views – most under 1k, virals above 10k.
For shop, df_clean.plot.scatter('Price', 'Quantity') –
cheap fruits sell more?
Customize all:
df_clean.plot(kind='bar', color=['red','green','blue'], figsize=(10,6)) plt.title('Fruit Sales Colors') plt.xlabel('Fruits') plt.ylabel('Total') plt.legend() plt.show()
Bigger, colorful – report ready!
Full Plot Workflow:
-
Clean data first.
-
Group if needed.
-
df.plot()– tweak title, labels. -
plt.savefig('chart.png')– save image for blog/YouTube.
Like our Pratap Solution blog – charts boost reader stay time 2x!
Style Tables – Make Them Pop
Plots great, but tables in notebooks/reports? Make them shine with colors.
Pandas style highlights like Excel conditional
formatting.
From df_clean:
def highlight_max(s): is_max = s == s.max() return ['background-color: yellow' if v else '' for v in is_max] styled = df_clean.style.apply(highlight_max, subset=['Total Sales']).format({'Total Sales': '{:.0f}'}) styled # In Jupyter, shows pretty table
Yellow background on top sales row – eyes go there!
Color Conditions – Rules Like Traffic Lights
def color_sales(val): color = 'green' if val > 500 else 'orange' if val > 300 else 'red' return f'color: {color}' df_clean['Total Sales'].style.map(color_sales).format('{:.0f}')
Green for stars, red for low – quick scan!
Style Recipes Table:
| Goal | Code Snippet | Effect |
|---|---|---|
| Highlight Max |
style.highlight_max()
|
Bold/yellow top value |
| Min Lowlight |
style.highlight_min(props='color:red')
|
Red for worst |
| Bars in Cells |
style.bar(subset=['Sales'])
|
Mini bar chart in table |
| Percent Format |
style.format({'Pct': '{:.1%}'})
|
25.0% nice |
| Precision |
style.format(precision=0)
|
No decimals |
| Background Gradient |
style.background_gradient()
|
Color fade high to low |
Combine:
styled_full = (df_clean.style .background_gradient(subset='Total Sales') .highlight_max('Quantity') .format({'Price': '₹{:.0f}', 'Total Sales': '{:,.0f}'}) .set_caption('Styled Fruit Report') ) styled_full
Gradient green-red on sales, yellow max qty, rupee signs – boss impressed!
Export Styled:
styled_full.to_html('report.html') # Web page styled_full.to_excel('styled.xlsx') # Excel keeps some style
Daily Examples:
-
Budget Table: Red overspend, green savings.
-
Class Marks: Green >80, amber 60-80, red below.
-
Shop Stock: Bold low stock items.
-
Blog Analytics: Gradient on top posts views.
For our content creation, style top viral videos table – thumbnails next to colored rows.
Advanced Styling:
-
Icons? Custom functions with HTML.
-
Themes:
style.set_table_styles([{'selector': 'th', 'props': [('font-weight', 'bold')]}]) -
Conditional text: Hide negatives
display: none if val<0.
In Google Colab, styles shine for sharing links.
Plot + Style Combo Workflow:
-
df.head().style– quick pretty peek. -
Plot trends.
-
Style summary table.
-
Save both for presentation.
Imagine pitching freelance project: "See this chart? Sales up 30%!"
Engagement Boosters:
-
Use
figsize=(12,8)for big screens. -
plt.tight_layout()no overlap. -
Subplots:
fig, axs = plt.subplots(2,2); df.plot(ax=axs[0,0])
For YouTube Shorts, screenshot styled table – hook: "Pandas magic in 60s!"
We've visualized and styled – data now speaks loud and clear.
This wraps our Pandas journey, but real power comes in combining with your projects.
Now our tables dazzle with charts and colors, but what about huge files from big shops or exam results? Or slow code on old laptops? Let's add pro tricks for speed, big data, and fancy structures – like upgrading from cycle to bike for city traffic.
Big Data Tricks – Handle Giant Files
Million rows crash notebooks? Pandas reads smart, not all at once.
Chunksize in read_csv – Bite-Sized Loads
Like eating big mango one slice at a time. For 10GB sales log:
chunk_list = [] for chunk in pd.read_csv('huge_sales.csv', chunksize=10000): # Process small piece: clean, add column chunk['Total'] = chunk['Price'] * chunk['Qty'] chunk_list.append(chunk) big_df = pd.concat(chunk_list, ignore_index=True)
Each chunk 10k rows – memory safe! Process: filter errors per chunk.
When to Chunk Table:
| File Size | Chunksize Tip | Example Scenario |
|---|---|---|
| <100MB | No need | Daily shop CSV |
| 100MB-1GB | 50,000 | Monthly e-commerce |
| >1GB | 10,000-100k | Yearly bank statements |
| Streaming | Process in loop, no save | Live website logs |
Life Example: Gov scheme applicant list (lakhs rows) – chunk to find duplicates without crash.
Sample Large Files – Quick Taste
Don't load all – peek 10%.
df_sample = pd.read_csv('huge_file.csv', nrows=1000) # First 1k rows df_random = pd.read_csv('huge_file.csv', nrows=10000).sample(frac=0.1) # 10% random print(df_sample.shape)
Fast preview: averages, issues. Like tasting sabzi before full plate.
Sampling Types:
-
nrows=5000: Top rows. -
skiprows=range(1,10000): Skip first 10k. -
sample(n=1000): Random pick post-load.
For YouTube analytics dump – sample to spot viral patterns quick.
Speed Hacks – Run Like Flash
Loops slow like walking in Lucknow heat. Pandas loves vector ops.
Vector Operations – No Loops Needed
Bad: Loop over rows.
# Slow loop for i in range(len(df)): df.loc[i, 'Discount'] = df.loc[i, 'Price'] * 0.1
Fast: Whole column!
df['Discount'] = df['Price'] * 0.1 # Vector – 100x faster! df['Tax'] = df['Total'] * 0.18
Math on arrays – lightning!
Speed Comparison Table:
| Method | Time for 1M Rows | Code Style | Use For |
|---|---|---|---|
| Loop (for) | 30 seconds | Row by row | Never! Avoid |
| Vector (*) | 0.1 seconds | df['New'] = df.A * df.B | Math, filters |
| apply() | 2 seconds | df['New'].apply(func) | Simple functions |
| Vectorized str | 0.5s | df['Name'].str.upper() | Text ops |
Example: 10k student records – vector ages>18 in blink.
Apply vs Vector – Choose Wise
Apply runs function per row – ok for complex.
# apply example def cat(price): if price > 50: return 'Premium' return 'Regular' df['Category'] = df['Price'].apply(cat)
But vector better:
df['Category'] = np.where(df['Price']>50, 'Premium',
'Regular')
– faster!
Pro Rule: Vector first, apply last resort.
Freelance invoices: Vector discount calc saves hours monthly.
MultiIndex – Group Multiple Levels
Like nested folders: Sales > City > Month > Fruit.
From pivot_data:
multi = df_pivot.set_index(['City', 'Month']) print(multi.index) # MultiIndex
Access: multi.loc['Delhi'] – all Delhi rows.
Groupby multi-level:
grouped = df_pivot.groupby(['City', 'Fruit'])['Sales'].sum() print(grouped)
City Fruit Delhi Apple 500 Banana 600 Mango 300 Mumbai Apple 850
Unstack to table: grouped.unstack().
MultiIndex Uses Table:
| Structure | Code | Benefit |
|---|---|---|
| Set Index Multi |
set_index(['A','B'])
|
Easy slice: loc['Delhi','Jan'] |
| Groupby List |
groupby(['City','Month'])
|
Nested sums/avgs |
| Pivot to Multi |
pivot_table(..., index=['City','Fruit'])
|
Spreadsheet feel |
| Swap Levels |
swaplevel(0,1)
|
Flip order |
Example: Blog posts by Topic > Year > Month – top performer drill-down.
School: Marks by Class > Subject > Student.
Custom Functions – Your Own Tools
Lambda in Apply – Quick Math
Short functions:
df['Profit'] = df['Sales'].apply(lambda x: x * 0.2)
Or complex:
df['Grade'] = df['Score'].apply(lambda s: 'A' if s>=90 else 'B' if s>=80 else 'C')
Inline power!
Complex Cleaning – Define Once, Use Many
def clean_name(name): return name.strip().title().replace('!!', '') df['Clean_Name'] = df['Full_Name'].apply(clean_name)
Reusable: Phone validate, address standardize.
Custom Func Table:
| Task | Lambda Example | Full Func When |
|---|---|---|
| Simple Calc |
lambda x: x*1.1 (10% hike)
|
Always |
| If-Else |
lambda p: 'High' if p>100 else 'Low'
|
3+ conditions |
| Text Parse |
lambda t: t.split()[0]
|
Regex needed |
| Date Custom |
lambda d: d.weekday() ==4 (Friday)
|
Business logic |
Content Creator: Lambda video length to category: Shorts <60s.
Export Pro – Share Like Boss
Multiple Sheets Excel
One file, many tables.
with pd.ExcelWriter('report.xlsx') as writer: df_clean.to_excel(writer, sheet_name='Sales', index=False) pivot_table.to_excel(writer, sheet_name='Pivot') students.to_excel(writer, sheet_name='Students')
Boss opens: Tabs for all!
JSON and SQL – Modern Saves
JSON for apps:
df_clean.to_json('data.json', orient='records') # List of dicts
SQL database:
from sqlalchemy import create_engine engine = create_engine('sqlite:///shop.db') df_clean.to_sql('sales', engine, if_exists='replace')
Query later: pd.read_sql('SELECT * FROM sales', engine)
Export Options Table:
| Format | Code | Best For | Size/Features |
|---|---|---|---|
| Excel Multi | ExcelWriter() | Reports, bosses | Colors, sheets (50MB limit) |
| JSON | to_json(orient='records') | APIs, web apps | Human read, compact |
| SQL | to_sql() | Databases, reuse | Queryable, big data |
| Parquet | to_parquet() (needs pyarrow) | Fast load, big files | 1/10 size of CSV |
| HTML | to_html() | Blogs, emails | Styled tables |
Pro Tip:
pd.options.display.max_columns = None before export –
all columns.
For our blogs: Export styled to HTML, embed in posts.
Gov data: SQL for ongoing queries like "Ration card updates".
Full Pro Workflow:
-
Chunk big load.
-
Vector clean.
-
Multi-group.
-
Custom apply.
-
Plot + style.
-
Multi-export.
Like BCA project: Shop dashboard from raw CSV to Excel + SQL.
These hacks make Pandas your daily superpower – fast, big, flexible.
We've covered from basics to pro, ready for your next data adventure.
Pandas Mastery – Your Toolkit Ready
We've journeyed from simple tables to pro charts, big data, and speedy exports. Now, quick answers to common hurdles.
FAQ – Fast Fixes
-
Slow on big files? Use
chunksize=10000in read_csv. -
Memory crash? Sample with
nrows=1000first. -
No plots show? Add
plt.show()or%matplotlib inlinein Colab. -
Wrong types?
astype(float)orpd.to_datetime(). -
Export with style?
df.style.to_excel()for basics.
In summary, Pandas turns messy data into clear stories – for shops, schools, blogs, or budgets. Start small: load, clean, plot. Practice on your files daily.
Takeaway: Copy our fruit shop code, tweak for your world. Share your first chart in comments – we're here to cheer!
Happy data crunching, friends!