Dictionaries and Working with Pandas DataFrames
Python provides powerful data structures like dictionaries and Pandas DataFrames, which are essential for handling and analyzing data efficiently.
1. Introduction to Dictionaries in Python
A dictionary is an unordered collection of key-value pairs, where each key is unique and used to access the corresponding value.
Example of a Dictionary
# Creating a dictionary
student = {
"name": "Alice",
"age": 22,
"course": "Data Science"
}
# Accessing values
print(student["name"]) # Alice
print(student.get("age")) # 22
2. Dictionary Operations and Methods
Dictionaries have built-in methods for modification and retrieval.
Common Dictionary Methods
# Adding a new key-value pair
student["grade"] = "A"
# Updating an existing value
student["age"] = 23
# Removing a key-value pair
student.pop("course")
# Looping through a dictionary
for key, value in student.items():
print(key, ":", value)
Checking for Key Existence
if "name" in student:
print("Key exists!")
Practice Problems
- Create a dictionary storing the details of a laptop (brand, model, RAM, price). Print the model and price.
- Write a function that counts the occurrence of each word in a sentence using a dictionary.
3. Introduction to Pandas and DataFrames
Pandas is a Python library used for data manipulation and analysis. A DataFrame is a table-like data structure similar to an Excel spreadsheet.
Installing Pandas
pip install pandas
Importing Pandas
import pandas as pd
Creating a DataFrame
# Creating a simple DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
4. Loading and Exploring Data with Pandas
Pandas allows loading data from various sources like CSV, Excel, and SQL databases.
Loading Data
df = pd.read_csv("data.csv") # Load data from a CSV file
Exploring the Data
print(df.head()) # Display first 5 rows
print(df.info()) # Get information about the dataset
print(df.describe()) # Summary statistics
Practice Problems
- Load a CSV file into a DataFrame and display the first 10 rows.
- Print the column names and check for missing values in the dataset.
5. Data Manipulation with Pandas
Pandas provides various functions to manipulate and clean data.
Filtering Data
filtered_df = df[df["Age"] > 30] # Get rows where Age is greater than 30
Sorting Data
sorted_df = df.sort_values(by="Salary", ascending=False) # Sort by Salary in descending order
Adding a New Column
df["Bonus"] = df["Salary"] * 0.10 # Calculate a 10% bonus
Handling Missing Data
df.dropna() # Remove rows with missing values
df.fillna(0) # Replace missing values with 0
Practice Problems
- Add a new column "Experience Level" based on Age (e.g., "Junior" if Age < 28, "Senior" otherwise).
- Remove rows where the Salary is missing or null.
- Sort the DataFrame by Age in ascending order and print the result.
Conclusion
- Dictionaries store key-value pairs and provide fast data retrieval.
- Pandas simplifies data handling with its powerful DataFrame structure.
- DataFrames allow data manipulation, exploration, and analysis efficiently.
These concepts are crucial for anyone working with data in Python.