Real-time Sensor Data Analysis and Prediction System

Project Overview

High school research project that collects temperature and humidity data from Arduino UNO + DHT11 sensor, streams it through Apache Kafka, analyzes it with Apache Spark, and predicts future values using LSTM models. Includes modern and legacy web dashboards for real-time visualization.

Architecture

Arduino UNO + DHT11 → Serial → Python Producer → Kafka → Spark Streaming → Analysis
                                      ↓
                              LSTM Model ← Historical Data
                                      ↓
                              Web Dashboard (Real-time visualization)

Quick Start Commands

System Setup

# Start entire system (Docker + Conda environment)
./start.sh

# Stop system
./stop.sh

Individual Components

# Start sensor data producer (Terminal 1)
./run_producer.sh

# Start Spark streaming analytics (Terminal 2) 
./run_spark.sh

# Start main dashboard (Terminal 3)
./run_dashboard.sh

# Start legacy dashboard (optional)
python dashboard_legacy.py

Web Interfaces

Main Dashboard: http://localhost:8050 (modern glassmorphism design)
Legacy Dashboard: http://localhost:8060 (simple design)
Kafka UI: http://localhost:8081
Spark UI: http://localhost:8080

Key Files and Components

Core System Files

docker-compose.yml: Kafka + Spark cluster with KRaft mode (no Zookeeper)
environment.yml: Conda Python environment setup
simple_producer.py: Kafka producer for sensor data (supports both sample and real Arduino data)
spark-apps/spark_streaming.py: Spark streaming job for real-time analytics

LSTM Model System

simple_lstm.py: Simplified dual LSTM system with separate temperature and humidity models
- Key class:
```
SimpleLSTM
```
  with methods
```
train()
```
  ,
```
predict()
```
  ,
```
prepare_data()
```
- Uses MinMaxScaler for normalization and sliding window approach
- Sequence length: 10 data points, Epochs: 10, Batch size: 4

Dashboard Applications

dashboard.py: Modern dashboard with glassmorphism design and professional styling
- Real-time charts with fixed Y-axis ranges (temp: 15-35°C, humid: 25-85%)
- Automatic model retraining every 1 minute
- CSS Grid layout with gradient backgrounds
dashboard_legacy.py: Simple minimal design dashboard on port 8060
- Basic HTML tables and simple charts without fancy styling

Arduino Integration

arduino_code.ino: Arduino code for DHT11 sensor (uploads to digital pin 2)
Configure
```
USE_SAMPLE_DATA = False
```
in simple_producer.py for real sensor data
Update serial port path in simple_producer.py (e.g., '/dev/cu.usbserial-140')

Technical Architecture Details

Data Flow

Collection: Arduino DHT11 sensor → Serial every 2 seconds
Streaming: Kafka topic 'sensor-data' with KRaft mode
Analytics: Spark calculates hourly averages/max and minute averages
Prediction: Dual LSTM models trained on minute averages
Visualization: Real-time dashboards with live charts

LSTM Model Design

Separate Models: Independent LSTM models for temperature and humidity
Training Data: Uses minute-averaged sensor readings
Prediction: Forecasts next minute's temperature and humidity
Auto-retraining: Models retrain every minute with new data
Performance: Tracks MSE for both temperature and humidity models

Kafka Configuration

Mode: KRaft (no Zookeeper dependency)
Topic: 'sensor-data'
Bootstrap Server: localhost:9092
UI: Available at localhost:8081

Spark Configuration

Version: 3.4.1
Mode: Streaming with micro-batches
Analytics: Hourly and minute-level aggregations
UI: Available at localhost:8080

Development Workflows

Testing LSTM Models

# Test simple LSTM independently
python simple_lstm.py

Debugging Data Flow

Check Kafka topics: http://localhost:8081
Monitor Spark jobs: http://localhost:8080
View producer logs in terminal
Check dashboard console for model training status

Arduino Serial Setup

# Find serial ports (macOS)
ls /dev/cu.*

# Find serial ports (Linux)
ls /dev/ttyUSB* /dev/ttyACM*

Common Issues and Solutions

LSTM Model Problems

MSE showing 0.0: Usually indicates insufficient training data or model architecture issues
Fake predictions: Ensure separate models for temperature and humidity are properly trained
Performance not updating: Check if models are actually retraining with new data

Graph Display Issues

One line not visible: Y-axis scaling problems - fixed with range=[15,35] for temp, range=[25,85] for humidity
Empty charts: Check if historical_data list is being populated correctly

Kafka Connection Issues

# Check container status
docker ps | grep kafka

# View Kafka logs
docker logs kafka

Environment Requirements

Python: 3.9 with Conda
Kafka: 7.4.0 (KRaft mode)
Spark: 3.4.1
TensorFlow: 2.13.0
Dash: 2.14.1
Docker Compose: For infrastructure

Key Implementation Notes

Dashboard Design Differences

Modern Dashboard: Glassmorphism with CSS Grid, gradients, professional styling
Legacy Dashboard: Basic HTML tables, minimal CSS, simple design
Chart Configuration: Both use fixed Y-axis ranges to prevent visibility issues

LSTM Training Strategy

Initial Training: Uses sample data for quick startup
Continuous Learning: Retrains every minute with real data
Data Requirements: Minimum 20 data points for retraining
Performance Tracking: Separate MSE tracking for temperature and humidity

Data Management

Historical Data: Keeps last 100 sensor readings in memory
Predictions: Maintains last 10 predictions for visualization
Sample vs Real: Configurable via USE_SAMPLE_DATA flag

This system demonstrates real-time data processing, machine learning prediction, and modern web visualization techniques suitable for educational and research purposes.

Real-time Sensor Data Analysis and Prediction System

Project Overview

Architecture

Arduino UNO + DHT11 → Serial → Python Producer → Kafka → Spark Streaming → Analysis
                                      ↓
                              LSTM Model ← Historical Data
                                      ↓
                              Web Dashboard (Real-time visualization)

Quick Start Commands

System Setup

# Start entire system (Docker + Conda environment)
./start.sh

# Stop system
./stop.sh

Individual Components

# Start sensor data producer (Terminal 1)
./run_producer.sh

# Start Spark streaming analytics (Terminal 2) 
./run_spark.sh

# Start main dashboard (Terminal 3)
./run_dashboard.sh

# Start legacy dashboard (optional)
python dashboard_legacy.py

Web Interfaces

Main Dashboard: http://localhost:8050 (modern glassmorphism design)
Legacy Dashboard: http://localhost:8060 (simple design)
Kafka UI: http://localhost:8081
Spark UI: http://localhost:8080

Key Files and Components

Core System Files

docker-compose.yml: Kafka + Spark cluster with KRaft mode (no Zookeeper)
environment.yml: Conda Python environment setup
simple_producer.py: Kafka producer for sensor data (supports both sample and real Arduino data)
spark-apps/spark_streaming.py: Spark streaming job for real-time analytics

LSTM Model System

simple_lstm.py: Simplified dual LSTM system with separate temperature and humidity models
- Key class:
```
SimpleLSTM
```
  with methods
```
train()
```
  ,
```
predict()
```
  ,
```
prepare_data()
```
- Uses MinMaxScaler for normalization and sliding window approach
- Sequence length: 10 data points, Epochs: 10, Batch size: 4

Dashboard Applications

dashboard.py: Modern dashboard with glassmorphism design and professional styling
- Real-time charts with fixed Y-axis ranges (temp: 15-35°C, humid: 25-85%)
- Automatic model retraining every 1 minute
- CSS Grid layout with gradient backgrounds
dashboard_legacy.py: Simple minimal design dashboard on port 8060
- Basic HTML tables and simple charts without fancy styling

Arduino Integration

arduino_code.ino: Arduino code for DHT11 sensor (uploads to digital pin 2)
Configure
```
USE_SAMPLE_DATA = False
```
in simple_producer.py for real sensor data
Update serial port path in simple_producer.py (e.g., '/dev/cu.usbserial-140')

Technical Architecture Details

Data Flow

Collection: Arduino DHT11 sensor → Serial every 2 seconds
Streaming: Kafka topic 'sensor-data' with KRaft mode
Analytics: Spark calculates hourly averages/max and minute averages
Prediction: Dual LSTM models trained on minute averages
Visualization: Real-time dashboards with live charts

LSTM Model Design

Separate Models: Independent LSTM models for temperature and humidity
Training Data: Uses minute-averaged sensor readings
Prediction: Forecasts next minute's temperature and humidity
Auto-retraining: Models retrain every minute with new data
Performance: Tracks MSE for both temperature and humidity models

Kafka Configuration

Mode: KRaft (no Zookeeper dependency)
Topic: 'sensor-data'
Bootstrap Server: localhost:9092
UI: Available at localhost:8081

Spark Configuration

Version: 3.4.1
Mode: Streaming with micro-batches
Analytics: Hourly and minute-level aggregations
UI: Available at localhost:8080

Development Workflows

Testing LSTM Models

# Test simple LSTM independently
python simple_lstm.py

Debugging Data Flow

Check Kafka topics: http://localhost:8081
Monitor Spark jobs: http://localhost:8080
View producer logs in terminal
Check dashboard console for model training status

Arduino Serial Setup

# Find serial ports (macOS)
ls /dev/cu.*

# Find serial ports (Linux)
ls /dev/ttyUSB* /dev/ttyACM*

Common Issues and Solutions

LSTM Model Problems

MSE showing 0.0: Usually indicates insufficient training data or model architecture issues
Fake predictions: Ensure separate models for temperature and humidity are properly trained
Performance not updating: Check if models are actually retraining with new data

Graph Display Issues

One line not visible: Y-axis scaling problems - fixed with range=[15,35] for temp, range=[25,85] for humidity
Empty charts: Check if historical_data list is being populated correctly

Kafka Connection Issues

# Check container status
docker ps | grep kafka

# View Kafka logs
docker logs kafka

Environment Requirements

Python: 3.9 with Conda
Kafka: 7.4.0 (KRaft mode)
Spark: 3.4.1
TensorFlow: 2.13.0
Dash: 2.14.1
Docker Compose: For infrastructure

Key Implementation Notes

Dashboard Design Differences

Modern Dashboard: Glassmorphism with CSS Grid, gradients, professional styling
Legacy Dashboard: Basic HTML tables, minimal CSS, simple design
Chart Configuration: Both use fixed Y-axis ranges to prevent visibility issues

LSTM Training Strategy

Initial Training: Uses sample data for quick startup
Continuous Learning: Retrains every minute with real data
Data Requirements: Minimum 20 data points for retraining
Performance Tracking: Separate MSE tracking for temperature and humidity

Data Management

Historical Data: Keeps last 100 sensor readings in memory
Predictions: Maintains last 10 predictions for visualization
Sample vs Real: Configurable via USE_SAMPLE_DATA flag

This system demonstrates real-time data processing, machine learning prediction, and modern web visualization techniques suitable for educational and research purposes.

Real-time Sensor Data Analysis and Prediction System

Related Skills

Nano Banana Pro

Markdown Converter

1password