1. General and Utility Functions
pd.read_csv()- Read a CSV file into a DataFrame.pd.read_excel()- Read an Excel file.pd.read_json()- Read JSON data.pd.DataFrame()- Create a new DataFrame.pd.Series()- Create a new Series.
2. Inspection and Information
.head(n)- Display the firstnrows..tail(n)- Display the lastnrows..info()- Get a concise summary of the DataFrame..describe()- Generate descriptive statistics..shape- Get dimensions of the DataFrame..columns- Get or set column labels..index- Get or set index labels..dtypes- Display data types of columns..memory_usage()- Memory usage of the DataFrame.
3. Selection and Indexing
.loc[]- Access rows and columns by labels..iloc[]- Access rows and columns by integer location..at[]- Access single values by label..iat[]- Access single values by integer location..set_index()- Set the DataFrame index..reset_index()- Reset the index to default..sort_index()- Sort the DataFrame by index.
4. Data Manipulation
.rename()- Rename columns or index..drop()- Drop rows or columns..append()- Append rows to the DataFrame..merge()- Merge DataFrames..join()- Join DataFrames on indices..concat()- Concatenate DataFrames..pivot()- Pivot data into a new DataFrame..pivot_table()- Create a pivot table with aggregation..melt()- Unpivot a DataFrame..replace()- Replace values..fillna()- Fill missing values..dropna()- Drop rows/columns with missing values.
5. Aggregation and Grouping
.groupby()- Group by values and perform aggregation..agg()- Aggregate data using functions..count()- Count non-NA/null values..sum()- Sum of values..mean()- Mean of values..median()- Median of values..min()- Minimum value..max()- Maximum value..std()- Standard deviation..var()- Variance..mode()- Most frequent value.
6. Sorting and Ranking
.sort_values()- Sort by values..rank()- Rank values in a DataFrame.
7. Data Cleaning
.isna()/.isnull()- Detect missing values..notna()/.notnull()- Detect non-missing values..astype()- Change data type of a column..duplicated()- Identify duplicate rows..drop_duplicates()- Remove duplicate rows.
8. Time Series
.to_datetime()- Convert to datetime..dt- Access datetime attributes..resample()- Resample time series data..rolling()- Rolling window calculations..expanding()- Expanding window calculations.
9. Statistical and Mathematical Operations
.corr()- Compute correlation matrix..cov()- Compute covariance matrix..cumsum()- Cumulative sum..cumprod()- Cumulative product..diff()- Difference between consecutive elements..clip()- Clip values to within a specified range.
10. File I/O
.to_csv()- Write a DataFrame to a CSV file..to_excel()- Write to an Excel file..to_json()- Write to a JSON file..to_sql()- Write to a SQL database..to_pickle()- Serialize to a pickle object.
Here’s a more detailed explanation of Pandas commands with examples:
1. General and Utility Functions
- Reading a CSV file
import pandas as pd
# Create and read CSV
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)
# Read the CSV file
df = pd.read_csv('data.csv')
print(df)
- Creating a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
2. Inspection and Information
- View first few rows
print(df.head()) # Default shows 5 rows
- Get DataFrame information
print(df.info())
- View basic statistics
print(df.describe())
3. Selection and Indexing
- Select a single column
print(df['Name'])
- Select multiple columns
print(df[['Name', 'Age']])
- Access rows by labels (
loc)
# Access first row
print(df.loc[0])
- Access rows by index (
iloc)
# Access second row
print(df.iloc[1])
4. Data Manipulation
- Add a new column
df['Salary'] = [50000, 60000]
print(df)
- Rename columns
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
- Drop a column
df.drop('Salary', axis=1, inplace=True)
print(df)
5. Aggregation and Grouping
- Group by and compute mean
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
6. Sorting and Ranking
- Sort rows by column values
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
df.sort_values(by='Age', inplace=True)
print(df)
7. Data Cleaning
- Handle missing values
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 35]}
df = pd.DataFrame(data)
# Detect missing values
print(df.isna())
# Fill missing values
df['Age'].fillna(0, inplace=True)
# Drop rows with missing values
df.dropna(inplace=True)
print(df)
8. Time Series
- Convert to datetime
data = {'Date': ['2023-01-01', '2023-01-02'], 'Value': [100, 200]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)
9. Statistical and Mathematical Operations
- Compute correlation matrix
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.corr())
- Cumulative sum
df['Cumulative_Sum'] = df['A'].cumsum()
print(df)
10. File I/O
- Save and read CSV
df.to_csv('output.csv', index=False)
df_read = pd.read_csv('output.csv')
print(df_read)Here's an expanded explanation of Pandas commands with examples and a detailed explanation:
1. Reading and Writing Data
Command: pd.read_csv()
Explanation: Reads a CSV file and creates a DataFrame.
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv') # Ensure 'data.csv' exists
print(df.head()) # Display the first 5 rows
Command: pd.to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
2. Creating a DataFrame
Command: pd.DataFrame()
Explanation: Creates a DataFrame from a dictionary.
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
3. Viewing and Inspecting Data
Command: .head()
Explanation: Returns the first few rows of the DataFrame.
print(df.head(3)) # First 3 rows
Command: .info()
Explanation: Displays summary information about the DataFrame.
print(df.info())
Command: .describe()
Explanation: Provides summary statistics for numeric columns.
print(df.describe())
4. Selecting and Indexing
Command: df[column_name]
Explanation: Selects a single column.
print(df['Name']) # Select the 'Name' column
Command: .loc[]
Explanation: Select rows and columns by labels.
print(df.loc[0]) # Select the first row
Command: .iloc[]
Explanation: Select rows and columns by position.
print(df.iloc[1, 0]) # Second row, first column
5. Manipulating Data
Command: .rename()
Explanation: Renames columns or indices.
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
Command: .drop()
Explanation: Drops rows or columns.
df.drop('Years', axis=1, inplace=True) # Drop column 'Years'
print(df)
Command: df['new_column']
Explanation: Adds or modifies a column.
df['Salary'] = [50000, 60000]
print(df)
6. Sorting
Command: .sort_values()
Explanation: Sorts rows by a column.
df.sort_values(by='Salary', ascending=False, inplace=True)
print(df)
7. Handling Missing Data
Command: .isna()
Explanation: Detects missing values.
print(df.isna())
Command: .fillna()
Explanation: Replaces missing values.
df['Age'].fillna(0, inplace=True)
Command: .dropna()
Explanation: Drops rows with missing values.
df.dropna(inplace=True)
8. Aggregation and Grouping
Command: .groupby()
Explanation: Groups rows and performs aggregate functions.
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
9. Time Series
Command: pd.to_datetime()
Explanation: Converts strings to datetime objects.
df['Date'] = pd.to_datetime(['2023-01-01', '2023-01-02'])
print(df)
Command: .resample()
Explanation: Resamples time series data.
time_df = df.set_index('Date').resample('D').sum()
print(time_df)
10. Statistical Operations
Command: .sum()
Explanation: Calculates the sum of a column.
print(df['Age'].sum())
Command: .mean()
Explanation: Calculates the mean of a column.
print(df['Age'].mean())
Command: .corr()
Explanation: Computes the correlation matrix.
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.corr())
11. Saving Data
Command: .to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
Command: .to_excel()
Explanation: Writes a DataFrame to an Excel file.
df.to_excel('output.xlsx', index=False)
0 Comments