Skip to content

subhashpolisetti/TimeSeriesDataAggregation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Hopping Window Temperature Aggregation

This project demonstrates the use of a hopping window function to aggregate mean values of temperature sensor readings. This technique simulates real-time data processing where a distributed system calculates the average temperature over overlapping time intervals to enhance throughput and reduce latency.

Project Overview

The script generates a sample dataset of hourly temperature readings. It then uses a hopping window to calculate mean values for each time window and aggregates the data for analysis. This project is relevant to applications in streaming data processing where overlapping windows (hopping windows) provide insights from continuously generated data.

Table of Contents

  1. Objective
  2. Hopping Window Explanation
  3. Methodology
  4. Code Explanation
  5. Usage Instructions
  6. Sample Output

Objective

The goal is to calculate the mean values of temperature readings from a simulated sensor system using a hopping window function. This enables a system to obtain insights with low latency by frequently computing values over an overlapping window.

Hopping Window Explanation

A hopping window is a time-based windowing function in streaming data. It:

  • Collects data points within a fixed duration, e.g., 3 hours.
  • Advances at a smaller, fixed interval, e.g., every 1 hour.
  • Ensures overlap, where each data point may contribute to multiple windows.

For example, a three-hour window hopping every hour would result in windows covering:

  • 00:00 - 03:00
  • 01:00 - 04:00
  • 02:00 - 05:00 …and so on.

Methodology

  1. Data Simulation: A DataFrame simulates 100 hourly readings, representing temperature values.
  2. Hopping Window Parameters:
    • window_size = '3h': Sets each window to capture three hours of data.
    • hop_size = '1h': Advances the window every hour, ensuring overlap.
  3. Rolling Mean Calculation:
    • Uses rolling(window=window_size).mean() to get a mean for each 3-hour window.
    • Resamples the result with resample(hop_size).mean() to ensure hopping intervals.
  4. Output: A DataFrame with timestamps and corresponding mean values.

Code Explanation

import pandas as pd
import numpy as np

# Step 1: Simulate sensor data as a time series DataFrame
# Generates 100 hourly periods starting from 2024-01-01, simulating sensor readings
data = {
    'timestamp': pd.date_range(start='2024-01-01', periods=100, freq='h'),
    'value': np.random.randint(1, 10, 100)
}
df = pd.DataFrame(data)
df.set_index('timestamp', inplace=True)

# Step 2: Define window and hop parameters
window_size = '3h'
hop_size = '1h'

# Step 3: Calculate mean temperature values in each hopping window
# Rolling mean over a 3-hour window, resampled every hour to match hop size intervals
hopping_means = df['value'].rolling(window=window_size).mean().resample(hop_size).mean()

# Convert to DataFrame with descriptive column name
hopping_means_df = hopping_means.to_frame(name="Hopping Window Mean Values")
print(hopping_means_df)

Usage Instructions

  1. Clone or download this repository.
  2. Ensure you have pandas and numpy installed:
    pip install pandas numpy
  3. Run the script in your Python environment:
    python HoppingWindowSolution.py

Sample Output

Below is an example output, where each row shows the mean of temperature values for a 3-hour window, updated every 1 hour:

timestamp              | Hopping Window Mean Values
-----------------------|---------------------------
2024-01-01 00:00:00    | 3.000000
2024-01-01 01:00:00    | 5.000000
2024-01-01 02:00:00    | 4.000000
2024-01-01 03:00:00    | 3.666667
2024-01-01 04:00:00    | 3.333333
...

Conclusion

This script illustrates the practical application of hopping windows in time-series data analysis, providing an effective way to calculate overlapping aggregations for real-time data processing.

About

A Python project demonstrating time-series data aggregation using a hopping window function to calculate mean temperature readings from simulated sensor data. Ideal for streaming applications requiring low-latency insights over overlapping time intervals in real-time

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors