High school research project that collects temperature and humidity data from Arduino UNO + DHT11 sensor, streams it through Apache Kafka, analyzes it with Apache Spark, and predicts future values using LSTM models. Includes modern and legacy web dashboards for real-time visualization.
Arduino UNO + DHT11 → Serial → Python Producer → Kafka → Spark Streaming → Analysis
↓
LSTM Model ← Historical Data
↓
Web Dashboard (Real-time visualization)
# Start entire system (Docker + Conda environment)
./start.sh
# Stop system
./stop.sh
# Start sensor data producer (Terminal 1)
./run_producer.sh
# Start Spark streaming analytics (Terminal 2)
./run_spark.sh
# Start main dashboard (Terminal 3)
./run_dashboard.sh
# Start legacy dashboard (optional)
python dashboard_legacy.py
- Main Dashboard: http://localhost:8050 (modern glassmorphism design)
- Legacy Dashboard: http://localhost:8060 (simple design)
- Kafka UI: http://localhost:8081
- Spark UI: http://localhost:8080
- docker-compose.yml: Kafka + Spark cluster with KRaft mode (no Zookeeper)
- environment.yml: Conda Python environment setup
- simple_producer.py: Kafka producer for sensor data (supports both sample and real Arduino data)
- spark-apps/spark_streaming.py: Spark streaming job for real-time analytics
- simple_lstm.py: Simplified dual LSTM system with separate temperature and humidity models
- Key class:
SimpleLSTM
with methods train()
, predict()
, prepare_data()
- Uses MinMaxScaler for normalization and sliding window approach
- Sequence length: 10 data points, Epochs: 10, Batch size: 4
- dashboard.py: Modern dashboard with glassmorphism design and professional styling
- Real-time charts with fixed Y-axis ranges (temp: 15-35°C, humid: 25-85%)
- Automatic model retraining every 1 minute
- CSS Grid layout with gradient backgrounds
- dashboard_legacy.py: Simple minimal design dashboard on port 8060
- Basic HTML tables and simple charts without fancy styling
- arduino_code.ino: Arduino code for DHT11 sensor (uploads to digital pin 2)
- Configure
USE_SAMPLE_DATA = False
in simple_producer.py for real sensor data
- Update serial port path in simple_producer.py (e.g., '/dev/cu.usbserial-140')
- Collection: Arduino DHT11 sensor → Serial every 2 seconds
- Streaming: Kafka topic 'sensor-data' with KRaft mode
- Analytics: Spark calculates hourly averages/max and minute averages
- Prediction: Dual LSTM models trained on minute averages
- Visualization: Real-time dashboards with live charts
- Separate Models: Independent LSTM models for temperature and humidity
- Training Data: Uses minute-averaged sensor readings
- Prediction: Forecasts next minute's temperature and humidity
- Auto-retraining: Models retrain every minute with new data
- Performance: Tracks MSE for both temperature and humidity models
- Mode: KRaft (no Zookeeper dependency)
- Topic: 'sensor-data'
- Bootstrap Server: localhost:9092
- UI: Available at localhost:8081
- Version: 3.4.1
- Mode: Streaming with micro-batches
- Analytics: Hourly and minute-level aggregations
- UI: Available at localhost:8080
# Test simple LSTM independently
python simple_lstm.py
- Check Kafka topics: http://localhost:8081
- Monitor Spark jobs: http://localhost:8080
- View producer logs in terminal
- Check dashboard console for model training status
# Find serial ports (macOS)
ls /dev/cu.*
# Find serial ports (Linux)
ls /dev/ttyUSB* /dev/ttyACM*
- MSE showing 0.0: Usually indicates insufficient training data or model architecture issues
- Fake predictions: Ensure separate models for temperature and humidity are properly trained
- Performance not updating: Check if models are actually retraining with new data
- One line not visible: Y-axis scaling problems - fixed with range=[15,35] for temp, range=[25,85] for humidity
- Empty charts: Check if historical_data list is being populated correctly
# Check container status
docker ps | grep kafka
# View Kafka logs
docker logs kafka
- Python: 3.9 with Conda
- Kafka: 7.4.0 (KRaft mode)
- Spark: 3.4.1
- TensorFlow: 2.13.0
- Dash: 2.14.1
- Docker Compose: For infrastructure
- Modern Dashboard: Glassmorphism with CSS Grid, gradients, professional styling
- Legacy Dashboard: Basic HTML tables, minimal CSS, simple design
- Chart Configuration: Both use fixed Y-axis ranges to prevent visibility issues
- Initial Training: Uses sample data for quick startup
- Continuous Learning: Retrains every minute with real data
- Data Requirements: Minimum 20 data points for retraining
- Performance Tracking: Separate MSE tracking for temperature and humidity
- Historical Data: Keeps last 100 sensor readings in memory
- Predictions: Maintains last 10 predictions for visualization
- Sample vs Real: Configurable via USE_SAMPLE_DATA flag
This system demonstrates real-time data processing, machine learning prediction, and modern web visualization techniques suitable for educational and research purposes.