An interesting paper from KDD conference (2015).
- The air quality of current time and the past few hours
- The meteorological data of current time and past few hours
- Humidity, temperature,..
- Sunny, foggy, overcast, cloudy…
- Minor rainy, moderate rainy, heavily rainy, rain storm
- Wind direction, wind speed
- Weather forecast
Models used in this paper:
- Temporal predictor: Multivariate Linear Regression
- Spatial predictor: Neural Network
Multivariate Linear Regression*
Do not cofused with multiple linear regression. An example of how we distinguish difference below according to the number of variables:
- Simple linear regression: one y and one x. For example, suppose we wish to predict house price based on house size.
- Multiple linear regression: one y and serveral x’s. We could attempt to improve our prediction of house price by using more than one independent variable, for example, house size, the number of bedroom, or the number of bathroom.
- Multivariate multiple linear regression: serveral y’s and serveral x’s. We may wish to predict serveral y’s (such as the house price in last year and before the house price bubble burst\pop).
Implement by using Tensorflow
Softmax regression (or multinomial logistic regression) was introduced in Tensorflow’s MNIST For ML Beginners tutorial. However, here we would like to try to implement a multivariate multiple linear regression by using Tensorflow.
Simple linear regression: (Code)
W = tf.Variable(0.0, name="weight") b = tf.Variable(0.0, name="bias") activation = tf.add(tf.mul(X, W), b) cost = tf.reduce_sum(tf.pow((activation - Y), 2)) / (2*m) optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
Minimize cost function so that the sum of squared differences at each point is minimum.
If learning rate is too big, may not converge, if too small may converge slowly. At global minimum the partial derivative of Gradient descent formula is zero, hence the process may converge with fixed learning rate.
Multiple linear regression: (Code)
For multiple features it works the same, except variables are vectors. Feature scaling is important to make the method converge faster. To do it subtract mean and devide by range (or standard deviation).
def feature_normalize(train_X): return (train_X - np.mean(train_X, axis=0)) / np.std(train_X, axis=0)
Change learning rate to optimize for convergence. Change by half an order of magnitude and observe how cost function changes in dependence on number of iterations.
I used Portland housing prices data set for confirming the implemetation was right.
The curve of the cost funciton looks like this: with learning rate 0.1, 50 iterations and plot at every step.
And I wound up getting the same results as the solution suggested.
Training Cost= 2.04328e+09 W= [[ 109447.78125 ] [ -6578.36279297]] b= [ 340412.65625] Predict.... (Predict a house with 1650 square feet and 3 bedrooms.) House price(Y) = [[ 293081.46875]]
Normal Equations: (Code)
We can also use the closed-form solution for linear regression.
# theta = (X'*X)^-1 * X' * y theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(tf.transpose(X), X)), tf.transpose(X)), Y)
- Need to choose learning rate
- Need many iterations
- Works well for large n (switch to it for n > 10,000)
- No need for learning rate and many iterations
- Feature scaling is not nescessary
- Need to compute (X’ * X)^-1 which has O(n^3). Switch to gradient descent for large n.
Multivariate multiple linear regression: [(Code)][C4]
Gain an insight into data
Period of data collection: 2014/05/01~2015/04/30
The dataset is comprised of six parts (with data number in total):
- city data: (43)
- district data: (380)
- air quality station data: (437)
- air quality data: (2,891,393)
- meteorological data: (1,898,453)
- weather forecast data: (910,576)
and were all provided in .cvs file format. Schema and example in each data are showed below:
City ID | Chinese Name | English Name | Latitude | Longitude | Cluster ID
District ID | Chinese Name | English Name | City ID
Air Quality Station Data
Station ID | Chinese Name | English Name | Latitude | Longitude | District ID
Air Quality Data
Station ID | Time | PM25 | PM10 | NO2 | CO | O3 | SO2
Missing value is represented by NULL in the data files. AQI (air quality index) can be calculated based on HJ633-2012.
ID | Time | Weather | Temperature | Pressure | Humidity | Wind Speed | Wind Direction
Weather Forecast Data
ID | Forecast Time | Future Time | Temporal Granularity | Weather | Up temperature | Bottom Temperature | Wind Level | Wind Direc
001,2015-04-30 07:00:00,2015-04-30 08:00:00,3,1,28,21,3.5,3
001,2015-04-30 18:00:00,2015-04-30 20:00:00,3,14,25,22,3.5,3
The temporal predictor models the trend of air quality of a station based on four types of data:
- the AQIs of the past h hours at the station
- the local meteorology at the current time 𝑡𝑐
- time of day and day of the week
- the weather forecasts of the time interval we are going to predict.
Forecasting Fine-Grained Air Quality Based on Big Data
“Chapter 10, Multivariate regression – Section 10.1, Introduction”, Methods of Multivariate Analysis
What is multinominal logistic regression model?
Use attribute and target matrices for TensorFlow Linear Regression Python
Multiple Linear Regression Model by using Tensorflow
Treating quantity as constant in TensorFlow
How to implement multivariate linear stochastic gradient descent algorithm in tensorflow?