1. General and Utility Functions
pd.read_csv()
- Read a CSV file into a DataFrame.pd.read_excel()
- Read an Excel file.pd.read_json()
- Read JSON data.pd.DataFrame()
- Create a new DataFrame.pd.Series()
- Create a new Series.
2. Inspection and Information
.head(n)
- Display the firstn
rows..tail(n)
- Display the lastn
rows..info()
- Get a concise summary of the DataFrame..describe()
- Generate descriptive statistics..shape
- Get dimensions of the DataFrame..columns
- Get or set column labels..index
- Get or set index labels..dtypes
- Display data types of columns..memory_usage()
- Memory usage of the DataFrame.
3. Selection and Indexing
.loc[]
- Access rows and columns by labels..iloc[]
- Access rows and columns by integer location..at[]
- Access single values by label..iat[]
- Access single values by integer location..set_index()
- Set the DataFrame index..reset_index()
- Reset the index to default..sort_index()
- Sort the DataFrame by index.
4. Data Manipulation
.rename()
- Rename columns or index..drop()
- Drop rows or columns..append()
- Append rows to the DataFrame..merge()
- Merge DataFrames..join()
- Join DataFrames on indices..concat()
- Concatenate DataFrames..pivot()
- Pivot data into a new DataFrame..pivot_table()
- Create a pivot table with aggregation..melt()
- Unpivot a DataFrame..replace()
- Replace values..fillna()
- Fill missing values..dropna()
- Drop rows/columns with missing values.
5. Aggregation and Grouping
.groupby()
- Group by values and perform aggregation..agg()
- Aggregate data using functions..count()
- Count non-NA/null values..sum()
- Sum of values..mean()
- Mean of values..median()
- Median of values..min()
- Minimum value..max()
- Maximum value..std()
- Standard deviation..var()
- Variance..mode()
- Most frequent value.
6. Sorting and Ranking
.sort_values()
- Sort by values..rank()
- Rank values in a DataFrame.
7. Data Cleaning
.isna()
/.isnull()
- Detect missing values..notna()
/.notnull()
- Detect non-missing values..astype()
- Change data type of a column..duplicated()
- Identify duplicate rows..drop_duplicates()
- Remove duplicate rows.
8. Time Series
.to_datetime()
- Convert to datetime..dt
- Access datetime attributes..resample()
- Resample time series data..rolling()
- Rolling window calculations..expanding()
- Expanding window calculations.
9. Statistical and Mathematical Operations
.corr()
- Compute correlation matrix..cov()
- Compute covariance matrix..cumsum()
- Cumulative sum..cumprod()
- Cumulative product..diff()
- Difference between consecutive elements..clip()
- Clip values to within a specified range.
10. File I/O
.to_csv()
- Write a DataFrame to a CSV file..to_excel()
- Write to an Excel file..to_json()
- Write to a JSON file..to_sql()
- Write to a SQL database..to_pickle()
- Serialize to a pickle object.
Here’s a more detailed explanation of Pandas commands with examples:
1. General and Utility Functions
- Reading a CSV file
import pandas as pd
# Create and read CSV
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)
# Read the CSV file
df = pd.read_csv('data.csv')
print(df)
- Creating a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
2. Inspection and Information
- View first few rows
print(df.head()) # Default shows 5 rows
- Get DataFrame information
print(df.info())
- View basic statistics
print(df.describe())
3. Selection and Indexing
- Select a single column
print(df['Name'])
- Select multiple columns
print(df[['Name', 'Age']])
- Access rows by labels (
loc
)
# Access first row
print(df.loc[0])
- Access rows by index (
iloc
)
# Access second row
print(df.iloc[1])
4. Data Manipulation
- Add a new column
df['Salary'] = [50000, 60000]
print(df)
- Rename columns
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
- Drop a column
df.drop('Salary', axis=1, inplace=True)
print(df)
5. Aggregation and Grouping
- Group by and compute mean
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
6. Sorting and Ranking
- Sort rows by column values
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
df.sort_values(by='Age', inplace=True)
print(df)
7. Data Cleaning
- Handle missing values
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 35]}
df = pd.DataFrame(data)
# Detect missing values
print(df.isna())
# Fill missing values
df['Age'].fillna(0, inplace=True)
# Drop rows with missing values
df.dropna(inplace=True)
print(df)
8. Time Series
- Convert to datetime
data = {'Date': ['2023-01-01', '2023-01-02'], 'Value': [100, 200]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)
9. Statistical and Mathematical Operations
- Compute correlation matrix
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.corr())
- Cumulative sum
df['Cumulative_Sum'] = df['A'].cumsum()
print(df)
10. File I/O
- Save and read CSV
df.to_csv('output.csv', index=False)
df_read = pd.read_csv('output.csv')
print(df_read)
Here's an expanded explanation of Pandas commands with examples and a detailed explanation:
1. Reading and Writing Data
Command: pd.read_csv()
Explanation: Reads a CSV file and creates a DataFrame.
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv') # Ensure 'data.csv' exists
print(df.head()) # Display the first 5 rows
Command: pd.to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
2. Creating a DataFrame
Command: pd.DataFrame()
Explanation: Creates a DataFrame from a dictionary.
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
3. Viewing and Inspecting Data
Command: .head()
Explanation: Returns the first few rows of the DataFrame.
print(df.head(3)) # First 3 rows
Command: .info()
Explanation: Displays summary information about the DataFrame.
print(df.info())
Command: .describe()
Explanation: Provides summary statistics for numeric columns.
print(df.describe())
4. Selecting and Indexing
Command: df[column_name]
Explanation: Selects a single column.
print(df['Name']) # Select the 'Name' column
Command: .loc[]
Explanation: Select rows and columns by labels.
print(df.loc[0]) # Select the first row
Command: .iloc[]
Explanation: Select rows and columns by position.
print(df.iloc[1, 0]) # Second row, first column
5. Manipulating Data
Command: .rename()
Explanation: Renames columns or indices.
df.rename(columns={'Age': 'Years'}, inplace=True)
print(df)
Command: .drop()
Explanation: Drops rows or columns.
df.drop('Years', axis=1, inplace=True) # Drop column 'Years'
print(df)
Command: df['new_column']
Explanation: Adds or modifies a column.
df['Salary'] = [50000, 60000]
print(df)
6. Sorting
Command: .sort_values()
Explanation: Sorts rows by a column.
df.sort_values(by='Salary', ascending=False, inplace=True)
print(df)
7. Handling Missing Data
Command: .isna()
Explanation: Detects missing values.
print(df.isna())
Command: .fillna()
Explanation: Replaces missing values.
df['Age'].fillna(0, inplace=True)
Command: .dropna()
Explanation: Drops rows with missing values.
df.dropna(inplace=True)
8. Aggregation and Grouping
Command: .groupby()
Explanation: Groups rows and performs aggregate functions.
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Score': [85, 95, 75]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
print(grouped)
9. Time Series
Command: pd.to_datetime()
Explanation: Converts strings to datetime objects.
df['Date'] = pd.to_datetime(['2023-01-01', '2023-01-02'])
print(df)
Command: .resample()
Explanation: Resamples time series data.
time_df = df.set_index('Date').resample('D').sum()
print(time_df)
10. Statistical Operations
Command: .sum()
Explanation: Calculates the sum of a column.
print(df['Age'].sum())
Command: .mean()
Explanation: Calculates the mean of a column.
print(df['Age'].mean())
Command: .corr()
Explanation: Computes the correlation matrix.
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.corr())
11. Saving Data
Command: .to_csv()
Explanation: Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
Command: .to_excel()
Explanation: Writes a DataFrame to an Excel file.
df.to_excel('output.xlsx', index=False)
0 Comments