1. General and Utility Functions
- pd.read_csv()- Read a CSV file into a DataFrame.
- pd.read_excel()- Read an Excel file.
- pd.read_json()- Read JSON data.
- pd.DataFrame()- Create a new DataFrame.
- pd.Series()- Create a new Series.
2. Inspection and Information
- .head(n)- Display the first- nrows.
- .tail(n)- Display the last- nrows.
- .info()- Get a concise summary of the DataFrame.
- .describe()- Generate descriptive statistics.
- .shape- Get dimensions of the DataFrame.
- .columns- Get or set column labels.
- .index- Get or set index labels.
- .dtypes- Display data types of columns.
- .memory_usage()- Memory usage of the DataFrame.
3. Selection and Indexing
- .loc[]- Access rows and columns by labels.
- .iloc[]- Access rows and columns by integer location.
- .at[]- Access single values by label.
- .iat[]- Access single values by integer location.
- .set_index()- Set the DataFrame index.
- .reset_index()- Reset the index to default.
- .sort_index()- Sort the DataFrame by index.
4. Data Manipulation
- .rename()- Rename columns or index.
- .drop()- Drop rows or columns.
- .append()- Append rows to the DataFrame.
- .merge()- Merge DataFrames.
- .join()- Join DataFrames on indices.
- .concat()- Concatenate DataFrames.
- .pivot()- Pivot data into a new DataFrame.
- .pivot_table()- Create a pivot table with aggregation.
- .melt()- Unpivot a DataFrame.
- .replace()- Replace values.
- .fillna()- Fill missing values.
- .dropna()- Drop rows/columns with missing values.
5. Aggregation and Grouping
- .groupby()- Group by values and perform aggregation.
- .agg()- Aggregate data using functions.
- .count()- Count non-NA/null values.
- .sum()- Sum of values.
- .mean()- Mean of values.
- .median()- Median of values.
- .min()- Minimum value.
- .max()- Maximum value.
- .std()- Standard deviation.
- .var()- Variance.
- .mode()- Most frequent value.
6. Sorting and Ranking
- .sort_values()- Sort by values.
- .rank()- Rank values in a DataFrame.
7. Data Cleaning
- .isna()/- .isnull()- Detect missing values.
- .notna()/- .notnull()- Detect non-missing values.
- .astype()- Change data type of a column.
- .duplicated()- Identify duplicate rows.
- .drop_duplicates()- Remove duplicate rows.
8. Time Series
- .to_datetime()- Convert to datetime.
- .dt- Access datetime attributes.
- .resample()- Resample time series data.
- .rolling()- Rolling window calculations.
- .expanding()- Expanding window calculations.
9. Statistical and Mathematical Operations
- .corr()- Compute correlation matrix.
- .cov()- Compute covariance matrix.
- .cumsum()- Cumulative sum.
- .cumprod()- Cumulative product.
- .diff()- Difference between consecutive elements.
- .clip()- Clip values to within a specified range.
10. File I/O
- .to_csv()- Write a DataFrame to a CSV file.
- .to_excel()- Write to an Excel file.
- .to_json()- Write to a JSON file.
- .to_sql()- Write to a SQL database.
- .to_pickle()- Serialize to a pickle object.
Here’s a more detailed explanation of Pandas commands with examples:
1. General and Utility Functions
- Reading a CSV file
import pandas as pd
# Create and read CSV
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)
# Read the CSV file
df = pd.read_csv('data.csv')
print(df)
- Creating a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
2. Inspection and Information
- View first few rows
print(df.head())  # Default shows 5 rows
- Get DataFrame information
print(df.info())
- View basic statistics
print(df.describe())
3. Selection and Indexing
- Select a single column
print(df['Name'])
- Select multiple columns
print(df[['Name', 'Age']])
- Access rows by labels (loc)
# Access first row
print(df.loc[0])
- Access rows by index (iloc)
# Access second row
print(df.iloc[1])
4. Data Manipulation
- Add a new column
df['Salary'] = [50000, 60000]
print(df)
- Rename columns
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
- Drop a column
df.drop('Salary', axis=1, inplace=True)
print(df)
5. Aggregation and Grouping
- Group by and compute mean
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
6. Sorting and Ranking
- Sort rows by column values
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
df.sort_values(by='Age', inplace=True)
print(df)
7. Data Cleaning
- Handle missing values
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 35]}
df = pd.DataFrame(data)
# Detect missing values
print(df.isna())
# Fill missing values
df['Age'].fillna(0, inplace=True)
# Drop rows with missing values
df.dropna(inplace=True)
print(df)
8. Time Series
- Convert to datetime
data = {'Date': ['2023-01-01', '2023-01-02'], 'Value': [100, 200]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)
9. Statistical and Mathematical Operations
- Compute correlation matrix
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.corr())
- Cumulative sum
df['Cumulative_Sum'] = df['A'].cumsum()
print(df)
10. File I/O
- Save and read CSV
df.to_csv('output.csv', index=False)
df_read = pd.read_csv('output.csv')
print(df_read)Here's an expanded explanation of Pandas commands with examples and a detailed explanation:
1. Reading and Writing Data
Command: pd.read_csv()
Explanation: Reads a CSV file and creates a DataFrame.
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv')  # Ensure 'data.csv' exists
print(df.head())  # Display the first 5 rows
Command: pd.to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
2. Creating a DataFrame
Command: pd.DataFrame()
Explanation: Creates a DataFrame from a dictionary.
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
3. Viewing and Inspecting Data
Command: .head()
Explanation: Returns the first few rows of the DataFrame.
print(df.head(3))  # First 3 rows
Command: .info()
Explanation: Displays summary information about the DataFrame.
print(df.info())
Command: .describe()
Explanation: Provides summary statistics for numeric columns.
print(df.describe())
4. Selecting and Indexing
Command: df[column_name]
Explanation: Selects a single column.
print(df['Name'])  # Select the 'Name' column
Command: .loc[]
Explanation: Select rows and columns by labels.
print(df.loc[0])  # Select the first row
Command: .iloc[]
Explanation: Select rows and columns by position.
print(df.iloc[1, 0])  # Second row, first column
5. Manipulating Data
Command: .rename()
Explanation: Renames columns or indices.
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
Command: .drop()
Explanation: Drops rows or columns.
df.drop('Years', axis=1, inplace=True)  # Drop column 'Years'
print(df)
Command: df['new_column']
Explanation: Adds or modifies a column.
df['Salary'] = [50000, 60000]
print(df)
6. Sorting
Command: .sort_values()
Explanation: Sorts rows by a column.
df.sort_values(by='Salary', ascending=False, inplace=True)
print(df)
7. Handling Missing Data
Command: .isna()
Explanation: Detects missing values.
print(df.isna())
Command: .fillna()
Explanation: Replaces missing values.
df['Age'].fillna(0, inplace=True)
Command: .dropna()
Explanation: Drops rows with missing values.
df.dropna(inplace=True)
8. Aggregation and Grouping
Command: .groupby()
Explanation: Groups rows and performs aggregate functions.
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
9. Time Series
Command: pd.to_datetime()
Explanation: Converts strings to datetime objects.
df['Date'] = pd.to_datetime(['2023-01-01', '2023-01-02'])
print(df)
Command: .resample()
Explanation: Resamples time series data.
time_df = df.set_index('Date').resample('D').sum()
print(time_df)
10. Statistical Operations
Command: .sum()
Explanation: Calculates the sum of a column.
print(df['Age'].sum())
Command: .mean()
Explanation: Calculates the mean of a column.
print(df['Age'].mean())
Command: .corr()
Explanation: Computes the correlation matrix.
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.corr())
11. Saving Data
Command: .to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
Command: .to_excel()
Explanation: Writes a DataFrame to an Excel file.
df.to_excel('output.xlsx', index=False)
0 Comments