Dictionaries and Working with Pandas DataFrames

Python provides powerful data structures like dictionaries and Pandas DataFrames, which are essential for handling and analyzing data efficiently.


1. Introduction to Dictionaries in Python

A dictionary is an unordered collection of key-value pairs, where each key is unique and used to access the corresponding value.

Example of a Dictionary

# Creating a dictionary
student = {
    "name": "Alice",
    "age": 22,
    "course": "Data Science"
}

# Accessing values
print(student["name"])  # Alice
print(student.get("age"))  # 22

2. Dictionary Operations and Methods

Dictionaries have built-in methods for modification and retrieval.

Common Dictionary Methods

# Adding a new key-value pair
student["grade"] = "A"

# Updating an existing value
student["age"] = 23

# Removing a key-value pair
student.pop("course")

# Looping through a dictionary
for key, value in student.items():
    print(key, ":", value)

Checking for Key Existence

if "name" in student:
    print("Key exists!")

Practice Problems

  1. Create a dictionary storing the details of a laptop (brand, model, RAM, price). Print the model and price.
  2. Write a function that counts the occurrence of each word in a sentence using a dictionary.

3. Introduction to Pandas and DataFrames

Pandas is a Python library used for data manipulation and analysis. A DataFrame is a table-like data structure similar to an Excel spreadsheet.

Master This Topic with PrepAI

Transform your learning with AI-powered tools designed to help you excel.

Installing Pandas

pip install pandas

Importing Pandas

import pandas as pd

Creating a DataFrame

# Creating a simple DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Salary": [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)

4. Loading and Exploring Data with Pandas

Pandas allows loading data from various sources like CSV, Excel, and SQL databases.

Loading Data

df = pd.read_csv("data.csv")  # Load data from a CSV file

Exploring the Data

print(df.head())  # Display first 5 rows
print(df.info())  # Get information about the dataset
print(df.describe())  # Summary statistics

Practice Problems

  1. Load a CSV file into a DataFrame and display the first 10 rows.
  2. Print the column names and check for missing values in the dataset.

5. Data Manipulation with Pandas

Pandas provides various functions to manipulate and clean data.

Filtering Data

filtered_df = df[df["Age"] > 30]  # Get rows where Age is greater than 30

Sorting Data

sorted_df = df.sort_values(by="Salary", ascending=False)  # Sort by Salary in descending order

Adding a New Column

df["Bonus"] = df["Salary"] * 0.10  # Calculate a 10% bonus

Handling Missing Data

df.dropna()  # Remove rows with missing values
df.fillna(0)  # Replace missing values with 0

Practice Problems

  1. Add a new column "Experience Level" based on Age (e.g., "Junior" if Age < 28, "Senior" otherwise).
  2. Remove rows where the Salary is missing or null.
  3. Sort the DataFrame by Age in ascending order and print the result.

Conclusion

  • Dictionaries store key-value pairs and provide fast data retrieval.
  • Pandas simplifies data handling with its powerful DataFrame structure.
  • DataFrames allow data manipulation, exploration, and analysis efficiently.

These concepts are crucial for anyone working with data in Python.