Popular Libraries (NumPy, Pandas, Matplotlib)

NumPy

Installation

bash
pip install numpy



Basic Operations


import numpy as np  # Import NumPy with alias 'np' (convention)

Creating different types of arrays: various methods to initialize NumPy arrays
arr1 = np.array([1, 2, 3, 4, 5])           # Create array from Python list (most common method)
arr2 = np.zeros(5)                          # Create array of 5 zeros: [0., 0., 0., 0., 0.] (float type)
arr3 = np.ones(3)                           # Create array of 3 ones: [1., 1., 1.] (float type)
arr4 = np.arange(0, 10, 2)                  # Create array with step: [0, 2, 4, 6, 8] (like range() but returns array)
arr5 = np.linspace(0, 1, 5)                 # Create 5 evenly spaced values: [0., 0.25, 0.5, 0.75, 1.] (inclusive endpoints)

Vectorized operations: apply operations to entire arrays at once (much faster than loops)
print(arr1 + 2)      # Element-wise addition: [3, 4, 5, 6, 7] (adds 2 to each element)
print(arr1 * 2)      # Element-wise multiplication: [2, 4, 6, 8, 10] (multiplies each element by 2)
print(arr1 ** 2)     # Element-wise exponentiation: [1, 4, 9, 16, 25] (squares each element)


Multi-dimensional Arrays


Multi-dimensional arrays: working with 2D and 3D data structures
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # Create 2D array (3x3 matrix) from nested lists
print(matrix.shape)  # Output: (3, 3) - shows dimensions (rows, columns) as tuple

Creating special matrices for different purposes: common matrix initialization patterns
zeros_matrix = np.zeros((3, 3))    # 3x3 matrix filled with zeros (useful for initialization)
identity_matrix = np.eye(3)        # 3x3 identity matrix (1s on diagonal, 0s elsewhere) - useful for linear algebra
random_matrix = np.random.rand(3, 3)  # 3x3 matrix with random values between 0 and 1 (uniform distribution)

Indexing and slicing multi-dimensional arrays: accessing specific elements and subarrays
print(matrix[0, 1])      # Access element at row 0, column 1: 2 (zero-based indexing)
print(matrix[0, :])      # Get entire first row: [1, 2, 3] (colon means "all columns")
print(matrix[:, 1])      # Get entire second column: [2, 5, 8] (colon means "all rows")


Mathematical Operations


arr = np.array([1, 2, 3, 4, 5])  # Sample array for statistical operations

Basic statistical operations on arrays: common mathematical functions
print(np.mean(arr))      # Calculate arithmetic mean: 3.0 (sum of all elements divided by count)
print(np.std(arr))       # Calculate standard deviation: 1.414... (measure of data spread)
print(np.sum(arr))       # Calculate sum of all elements: 15 (total of all values)
print(np.max(arr))       # Find maximum value: 5 (highest value in array)
print(np.min(arr))       # Find minimum value: 1 (lowest value in array)

Matrix operations: working with 2D arrays for linear algebra
A = np.array([[1, 2], [3, 4]])  # First 2x2 matrix for demonstration
B = np.array([[5, 6], [7, 8]])  # Second 2x2 matrix for demonstrationprint(A + B)             # Element-wise addition (adds corresponding elements position by position)
print(A * B)             # Element-wise multiplication (multiplies corresponding elements position by position)
print(np.dot(A, B))      # Matrix multiplication (dot product) - proper matrix multiplication
print(A @ B)             # Matrix multiplication using @ operator (Python 3.5+) - same as np.dot()
Broadcasting


Broadcasting: NumPy's ability to perform operations on arrays of different shapes automatically
arr = np.array([1, 2, 3, 4, 5])                    # 1D array with 5 elements
matrix = np.array([[1, 2, 3], [4, 5, 6]])          # 2D array (2x3 matrix)print(arr + 10)          # Broadcasting: scalar 10 is added to each element: [11, 12, 13, 14, 15] (scalar expands to match array)
print(matrix * 2)        # Broadcasting: scalar 2 multiplies each element: [[2, 4, 6], [8, 10, 12]] (scalar expands to match matrix)
Pandas

Installation

bash
pip install pandas



Series


import pandas as pd  # Import pandas with alias 'pd' (convention)

Creating different types of Series (1D labeled arrays): various initialization methods
s1 = pd.Series([1, 2, 3, 4, 5])                        # Series with default numeric index (0, 1, 2, 3, 4)
s2 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])  # Series with custom string index
s3 = pd.Series({'a': 1, 'b': 2, 'c': 3})               # Series from dictionary (keys become index automatically)

Statistical operations on Series: built-in methods for data analysis
print(s1.mean())         # Calculate mean: 3.0 (arithmetic average of all values)
print(s1.std())          # Calculate standard deviation: 1.581... (measure of data variability)
print(s1.describe())     # Generate comprehensive summary statistics (count, mean, std, min, 25%, 50%, 75%, max)


DataFrames


Creating DataFrame from dictionary (most common method): structured data container
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],      # Column 1: names (string data)
    'Age': [25, 30, 35, 28],                           # Column 2: ages (integer data)
    'City': ['NYC', 'LA', 'Chicago', 'Boston'],        # Column 3: cities (string data)
    'Salary': [50000, 60000, 70000, 55000]             # Column 4: salaries (integer data)
}
df = pd.DataFrame(data)  # Create DataFrame from dictionary (keys become column names)

Basic DataFrame operations and information: essential methods for data exploration
print(df.head())         # Display first 5 rows of DataFrame (quick preview of data)
print(df.info())         # Show DataFrame structure and data types (memory usage and non-null counts)
print(df.describe())     # Generate statistical summary of numeric columns (count, mean, std, min, 25%, 50%, 75%, max)


Reading and Writing Data


Reading data from different file formats: pandas supports many data sources
df = pd.read_csv('data.csv')      # Read data from CSV file (most common format for tabular data)
df = pd.read_excel('data.xlsx')   # Read data from Excel file (requires openpyxl or xlrd package)
df = pd.read_json('data.json')    # Read data from JSON file (nested data structures)

Writing data to different file formats: export DataFrames to various formats
df.to_csv('output.csv', index=False)    # Save as CSV (index=False excludes row numbers from output)
df.to_excel('output.xlsx', index=False) # Save as Excel file (requires openpyxl package)
df.to_json('output.json')               # Save as JSON file (preserves data structure)


Data Manipulation


Data selection and filtering operations: accessing and subsetting data
names = df['Name']                    # Select single column (returns Series object)
subset = df[['Name', 'Age']]          # Select multiple columns (returns DataFrame with specified columns)

Filtering data based on conditions: boolean indexing for row selection
young_people = df[df['Age'] < 30]     # Filter rows where Age is less than 30 (boolean mask)
high_salary = df[df['Salary'] > 60000] # Filter rows where Salary is greater than 60000 (boolean mask)

Sorting data: arrange rows based on column values
df_sorted = df.sort_values('Age', ascending=False)  # Sort by Age in descending order (oldest first)

Grouping and aggregation: split data into groups and apply functions
grouped = df.groupby('City')['Salary'].mean()  # Group by City and calculate mean salary for each city


Handling Missing Data


Handling missing data (NaN values) in DataFrames: essential for data cleaning
print(df.isnull().sum())  # Count missing values in each column (returns Series with column names and counts)

Filling missing values with appropriate strategies: different approaches for different scenarios
df['Age'].fillna(df['Age'].mean(), inplace=True)  # Fill missing ages with mean age (inplace=True modifies original DataFrame)
df.dropna(inplace=True)  # Remove rows that contain any missing values (inplace=True modifies original DataFrame)


Merging and Joining


Creating another DataFrame for demonstration of merging operations: combining data from multiple sources
df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Eve'],           # Names (some overlap with df for demonstration)
    'Department': ['IT', 'HR', 'Marketing']    # Department information (new data to merge)
})

Different types of joins between DataFrames: SQL-like operations for combining data
merged = pd.merge(df, df2, on='Name', how='inner')      # Inner join: only rows that exist in both DataFrames
left_merged = pd.merge(df, df2, on='Name', how='left')  # Left join: all rows from left DataFrame (df), matching rows from right DataFrame (df2)


Matplotlib

Installation

bash
pip install matplotlib

Basic Plotting

import matplotlib.pyplot as plt  # Import matplotlib for plotting and visualization
import numpy as np               # Import numpy for numerical operations and array generation

Creating a simple line plot: basic visualization example
x = np.linspace(0, 10, 100)      # Create 100 evenly spaced points from 0 to 10 (x-axis values)
y = np.sin(x)                    # Calculate sine values for each x point (y-axis values)plt.plot(x, y)                   # Create line plot of x vs y (connects points with lines)
plt.title('Sine Wave')           # Set plot title (appears at top of figure)
plt.xlabel('x')                  # Set x-axis label (horizontal axis description)
plt.ylabel('sin(x)')             # Set y-axis label (vertical axis description)
plt.grid(True)                   # Add grid lines to plot (helps with reading values)
plt.show()                       # Display the plot (opens in new window or notebook cell)

Multiple Plots

Creating multiple subplots in a single figure: organize multiple plots in one window
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))  # Create 1 row, 2 columns of subplots with specified figure size

First subplot: sine wave
ax1.plot(x, np.sin(x))           # Plot sine function on first subplot (ax1 is the first subplot object)
ax1.set_title('Sine Wave')       # Set title for first subplot (appears above the plot)
ax1.grid(True)                   # Add grid to first subplot (horizontal and vertical reference lines)

Second subplot: cosine wave
ax2.plot(x, np.cos(x))           # Plot cosine function on second subplot (ax2 is the second subplot object)
ax2.set_title('Cosine Wave')     # Set title for second subplot (appears above the plot)
ax2.grid(True)                   # Add grid to second subplot (horizontal and vertical reference lines)plt.tight_layout()               # Automatically adjust spacing between subplots (prevents overlap)
plt.show()                       # Display the figure with both subplots (opens in new window or notebook cell)

Different Plot Types

Different types of plots for various data visualization needs

Scatter plot: shows relationship between two variables
x = np.random.randn(100)          # Generate 100 random x values
y = np.random.randn(100)          # Generate 100 random y values
plt.scatter(x, y, alpha=0.6)      # Create scatter plot with transparency
plt.title('Scatter Plot')         # Set plot title
plt.show()                        # Display plot

Bar plot: shows categorical data
categories = ['A', 'B', 'C', 'D']  # Category labels
values = [4, 3, 2, 1]             # Corresponding values
plt.bar(categories, values)        # Create bar plot
plt.title('Bar Plot')             # Set plot title
plt.show()                        # Display plot

Histogram: shows distribution of data
data = np.random.randn(1000)      # Generate 1000 random values
plt.hist(data, bins=30, alpha=0.7) # Create histogram with 30 bins and transparency
plt.title('Histogram')            # Set plot title
plt.show()                        # Display plot

Pie chart: shows proportions of a whole
sizes = [30, 20, 25, 15, 10]      # Values representing proportions
labels = ['A', 'B', 'C', 'D', 'E'] # Labels for each slice
plt.pie(sizes, labels=labels, autopct='%1.1f%%')  # Create pie chart with percentage labels
plt.title('Pie Chart')            # Set plot title
plt.show()                        # Display plot

Plotting with Pandas

Using pandas DataFrame
df = pd.DataFrame({
    'x': np.linspace(0, 10, 100),
    'y1': np.sin(np.linspace(0, 10, 100)),
    'y2': np.cos(np.linspace(0, 10, 100))
})

Line plot
df.plot(x='x', y=['y1', 'y2'])
plt.title('Sine and Cosine Waves')
plt.show()

Scatter plot
df.plot.scatter(x='x', y='y1')
plt.title('Scatter Plot')
plt.show()

Customizing Plots

Setting style
plt.style.use('seaborn')

Creating figure with custom size
fig, ax = plt.subplots(figsize=(10, 6))

Plotting with custom colors and styles
ax.plot(x, np.sin(x), color='blue', linewidth=2, label='sin(x)')
ax.plot(x, np.cos(x), color='red', linewidth=2, linestyle='--', label='cos(x)')

Customizing appearance
ax.set_title('Trigonometric Functions', fontsize=16, fontweight='bold')
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)

Setting limits
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)plt.tight_layout()
plt.show()

Saving Plots

Save as PNG
plt.savefig('plot.png', dpi=300, bbox_inches='tight')

Save as PDF
plt.savefig('plot.pdf', bbox_inches='tight')

Save as SVG
plt.savefig('plot.svg', bbox_inches='tight')

Integration Example

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Generate sample data
np.random.seed(42)
data = {
    'x': np.random.randn(1000),
    'y': np.random.randn(1000),
    'category': np.random.choice(['A', 'B', 'C'], 1000)
}
df = pd.DataFrame(data)

Create visualization
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 10))

Scatter plot
ax1.scatter(df['x'], df['y'], alpha=0.6)
ax1.set_title('Scatter Plot')

Histogram
ax2.hist(df['x'], bins=30, alpha=0.7)
ax2.set_title('Histogram of X')

Box plot by category
df.boxplot(column='y', by='category', ax=ax3)
ax3.set_title('Box Plot by Category')

Correlation heatmap
correlation = df[['x', 'y']].corr()
im = ax4.imshow(correlation, cmap='coolwarm', aspect='auto')
ax4.set_xticks([0, 1])
ax4.set_yticks([0, 1])
ax4.set_xticklabels(['x', 'y'])
ax4.set_yticklabels(['x', 'y'])
ax4.set_title('Correlation Matrix')plt.colorbar(im, ax=ax4)
plt.tight_layout()
plt.show()

--- Previous Chapter | Next Chapter

Quick Access

Complete Cheat Sheet

Best Practices

Security Guide

Testing Guide

Performance Tips

Deployment Guide

Complete Navigation

Getting Started

Core Programming

File & Web Development

Testing & Debugging

Performance & Best Practices

Advanced Topics

Deployment & Security

Quick Reference

Project Stats

Popular Libraries (NumPy, Pandas, Matplotlib)

NumPy

Installation

Basic Operations

Creating different types of arrays: various methods to initialize NumPy arrays

Vectorized operations: apply operations to entire arrays at once (much faster than loops)

Multi-dimensional Arrays

Multi-dimensional arrays: working with 2D and 3D data structures

Creating special matrices for different purposes: common matrix initialization patterns

Indexing and slicing multi-dimensional arrays: accessing specific elements and subarrays

Mathematical Operations

Basic statistical operations on arrays: common mathematical functions

Matrix operations: working with 2D arrays for linear algebra

Broadcasting

Broadcasting: NumPy's ability to perform operations on arrays of different shapes automatically

Pandas

Installation

Series

Creating different types of Series (1D labeled arrays): various initialization methods

Statistical operations on Series: built-in methods for data analysis

DataFrames

Creating DataFrame from dictionary (most common method): structured data container

Basic DataFrame operations and information: essential methods for data exploration

Reading and Writing Data

Reading data from different file formats: pandas supports many data sources

Writing data to different file formats: export DataFrames to various formats

Data Manipulation

Data selection and filtering operations: accessing and subsetting data

Filtering data based on conditions: boolean indexing for row selection

Sorting data: arrange rows based on column values

Grouping and aggregation: split data into groups and apply functions

Handling Missing Data

Handling missing data (NaN values) in DataFrames: essential for data cleaning

Filling missing values with appropriate strategies: different approaches for different scenarios

Merging and Joining

Creating another DataFrame for demonstration of merging operations: combining data from multiple sources

Different types of joins between DataFrames: SQL-like operations for combining data

Matplotlib

Installation

Basic Plotting

Creating a simple line plot: basic visualization example

Multiple Plots

Creating multiple subplots in a single figure: organize multiple plots in one window

First subplot: sine wave

Second subplot: cosine wave

Different Plot Types

Different types of plots for various data visualization needs

Scatter plot: shows relationship between two variables

Bar plot: shows categorical data

Histogram: shows distribution of data

Pie chart: shows proportions of a whole

Plotting with Pandas

Using pandas DataFrame

Line plot

Scatter plot

Customizing Plots

Setting style

Creating figure with custom size

Plotting with custom colors and styles

Customizing appearance

Setting limits

Saving Plots

Save as PNG

Save as PDF

Save as SVG

Integration Example

Generate sample data

Create visualization

Scatter plot

Histogram