1. General and Utility Functions

  • pd.read_csv() - Read a CSV file into a DataFrame.
  • pd.read_excel() - Read an Excel file.
  • pd.read_json() - Read JSON data.
  • pd.DataFrame() - Create a new DataFrame.
  • pd.Series() - Create a new Series.

2. Inspection and Information

  • .head(n) - Display the first n rows.
  • .tail(n) - Display the last n rows.
  • .info() - Get a concise summary of the DataFrame.
  • .describe() - Generate descriptive statistics.
  • .shape - Get dimensions of the DataFrame.
  • .columns - Get or set column labels.
  • .index - Get or set index labels.
  • .dtypes - Display data types of columns.
  • .memory_usage() - Memory usage of the DataFrame.

3. Selection and Indexing

  • .loc[] - Access rows and columns by labels.
  • .iloc[] - Access rows and columns by integer location.
  • .at[] - Access single values by label.
  • .iat[] - Access single values by integer location.
  • .set_index() - Set the DataFrame index.
  • .reset_index() - Reset the index to default.
  • .sort_index() - Sort the DataFrame by index.

4. Data Manipulation

  • .rename() - Rename columns or index.
  • .drop() - Drop rows or columns.
  • .append() - Append rows to the DataFrame.
  • .merge() - Merge DataFrames.
  • .join() - Join DataFrames on indices.
  • .concat() - Concatenate DataFrames.
  • .pivot() - Pivot data into a new DataFrame.
  • .pivot_table() - Create a pivot table with aggregation.
  • .melt() - Unpivot a DataFrame.
  • .replace() - Replace values.
  • .fillna() - Fill missing values.
  • .dropna() - Drop rows/columns with missing values.

5. Aggregation and Grouping

  • .groupby() - Group by values and perform aggregation.
  • .agg() - Aggregate data using functions.
  • .count() - Count non-NA/null values.
  • .sum() - Sum of values.
  • .mean() - Mean of values.
  • .median() - Median of values.
  • .min() - Minimum value.
  • .max() - Maximum value.
  • .std() - Standard deviation.
  • .var() - Variance.
  • .mode() - Most frequent value.

6. Sorting and Ranking

  • .sort_values() - Sort by values.
  • .rank() - Rank values in a DataFrame.

7. Data Cleaning

  • .isna() / .isnull() - Detect missing values.
  • .notna() / .notnull() - Detect non-missing values.
  • .astype() - Change data type of a column.
  • .duplicated() - Identify duplicate rows.
  • .drop_duplicates() - Remove duplicate rows.

8. Time Series

  • .to_datetime() - Convert to datetime.
  • .dt - Access datetime attributes.
  • .resample() - Resample time series data.
  • .rolling() - Rolling window calculations.
  • .expanding() - Expanding window calculations.

9. Statistical and Mathematical Operations

  • .corr() - Compute correlation matrix.
  • .cov() - Compute covariance matrix.
  • .cumsum() - Cumulative sum.
  • .cumprod() - Cumulative product.
  • .diff() - Difference between consecutive elements.
  • .clip() - Clip values to within a specified range.

10. File I/O

  • .to_csv() - Write a DataFrame to a CSV file.
  • .to_excel() - Write to an Excel file.
  • .to_json() - Write to a JSON file.
  • .to_sql() - Write to a SQL database.
  • .to_pickle() - Serialize to a pickle object.

 Here’s a more detailed explanation of Pandas commands with examples:


1. General and Utility Functions

  • Reading a CSV file
import pandas as pd

# Create and read CSV
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)

# Read the CSV file
df = pd.read_csv('data.csv')
print(df)
  • Creating a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

2. Inspection and Information

  • View first few rows
print(df.head())  # Default shows 5 rows
  • Get DataFrame information
print(df.info())
  • View basic statistics
print(df.describe())

3. Selection and Indexing

  • Select a single column
print(df['Name'])
  • Select multiple columns
print(df[['Name', 'Age']])
  • Access rows by labels (loc)
# Access first row
print(df.loc[0])
  • Access rows by index (iloc)
# Access second row
print(df.iloc[1])

4. Data Manipulation

  • Add a new column
df['Salary'] = [50000, 60000]
print(df)
  • Rename columns
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
  • Drop a column
df.drop('Salary', axis=1, inplace=True)
print(df)

5. Aggregation and Grouping

  • Group by and compute mean
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)

6. Sorting and Ranking

  • Sort rows by column values
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
df.sort_values(by='Age', inplace=True)
print(df)

7. Data Cleaning

  • Handle missing values
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 35]}
df = pd.DataFrame(data)

# Detect missing values
print(df.isna())

# Fill missing values
df['Age'].fillna(0, inplace=True)

# Drop rows with missing values
df.dropna(inplace=True)
print(df)

8. Time Series

  • Convert to datetime
data = {'Date': ['2023-01-01', '2023-01-02'], 'Value': [100, 200]}
df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])
print(df)

9. Statistical and Mathematical Operations

  • Compute correlation matrix
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.corr())
  • Cumulative sum
df['Cumulative_Sum'] = df['A'].cumsum()
print(df)

10. File I/O

  • Save and read CSV
df.to_csv('output.csv', index=False)
df_read = pd.read_csv('output.csv')
print(df_read)

Here's an expanded explanation of Pandas commands with examples and a detailed explanation:

1. Reading and Writing Data

Command: pd.read_csv()

Explanation: Reads a CSV file and creates a DataFrame.

import pandas as pd

# Read CSV file
df = pd.read_csv('data.csv')  # Ensure 'data.csv' exists
print(df.head())  # Display the first 5 rows

Command: pd.to_csv()

Explanation: Writes a DataFrame to a CSV file.

df.to_csv('output.csv', index=False)

2. Creating a DataFrame

Command: pd.DataFrame()

Explanation: Creates a DataFrame from a dictionary.

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

3. Viewing and Inspecting Data

Command: .head()

Explanation: Returns the first few rows of the DataFrame.

print(df.head(3))  # First 3 rows

Command: .info()

Explanation: Displays summary information about the DataFrame.

print(df.info())

Command: .describe()

Explanation: Provides summary statistics for numeric columns.

print(df.describe())

4. Selecting and Indexing

Command: df[column_name]

Explanation: Selects a single column.

print(df['Name'])  # Select the 'Name' column

Command: .loc[]

Explanation: Select rows and columns by labels.

print(df.loc[0])  # Select the first row

Command: .iloc[]

Explanation: Select rows and columns by position.

print(df.iloc[1, 0])  # Second row, first column

5. Manipulating Data

Command: .rename()

Explanation: Renames columns or indices.

df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)

Command: .drop()

Explanation: Drops rows or columns.

df.drop('Years', axis=1, inplace=True)  # Drop column 'Years'
print(df)

Command: df['new_column']

Explanation: Adds or modifies a column.

df['Salary'] = [50000, 60000]
print(df)

6. Sorting

Command: .sort_values()

Explanation: Sorts rows by a column.

df.sort_values(by='Salary', ascending=False, inplace=True)
print(df)

7. Handling Missing Data

Command: .isna()

Explanation: Detects missing values.

print(df.isna())

Command: .fillna()

Explanation: Replaces missing values.

df['Age'].fillna(0, inplace=True)

Command: .dropna()

Explanation: Drops rows with missing values.

df.dropna(inplace=True)

8. Aggregation and Grouping

Command: .groupby()

Explanation: Groups rows and performs aggregate functions.

data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)

9. Time Series

Command: pd.to_datetime()

Explanation: Converts strings to datetime objects.

df['Date'] = pd.to_datetime(['2023-01-01', '2023-01-02'])
print(df)

Command: .resample()

Explanation: Resamples time series data.

time_df = df.set_index('Date').resample('D').sum()
print(time_df)

10. Statistical Operations

Command: .sum()

Explanation: Calculates the sum of a column.

print(df['Age'].sum())

Command: .mean()

Explanation: Calculates the mean of a column.

print(df['Age'].mean())

Command: .corr()

Explanation: Computes the correlation matrix.

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.corr())

11. Saving Data

Command: .to_csv()

Explanation: Writes a DataFrame to a CSV file.

df.to_csv('output.csv', index=False)

Command: .to_excel()

Explanation: Writes a DataFrame to an Excel file.

df.to_excel('output.xlsx', index=False)