Reinforcement Learning for Data Analysis and Data Interpretation

In this blog post, we will design a Reinforcement Learning application and compare the efficiency of Dense neuron-type layers.
The purpose of the system is to design an easy-to-implement, effective, and time-efficient Reinforcement Learning DQN (Deep Q-Networks) model with Dense neuron networks. The system will include comparisons of users’ performance measures of the different optimizer and loss functions, and we will try to find out which way is right for us and our dataset based on this data and the fine-tuning we have made.
There is some terminology we need to know before reading the codes below:
Keras: It is one of the most widely used libraries in machine learning today. Keras is an open-source library for numerical computing that makes machine learning faster and easier.
Dense: The hidden layer that we will use is a standard neuron layer in a neural network. Each neuron receives input from all neurons in the previous layer, thus connecting to neurons in the next layer.
DQN: The neural net we created works with a punishment/reward mechanism and trains itself. According to these punishments and rewards, the net tries to reach the most effective answer by changing the weights of the connections on the net. It creates a precise matrix for the working agent (the name given to the neural network we’ve built) to which it can “apply” to maximize its reward in the long run.
Let’s expand on DQN a little more. An agent training application might have a raw-input action matrix; the first representative layer can abstract action and encode states, the second layer can generate and encode state arrangements, the third layer can encode an action on a state and the fourth layer can predict an appropriate outcome of the action of the current state.
The above loop briefly tells us that the neural network we have created tries to increase the next decision’s success rate by reprocessing each decision’s result for the next decision so that it can offer us the best result.
I ran my codes using Google Colab. I got the dataset from Kaggle. The name of the dataset is “unemployment_data_us”. This dataset provides us with unemployment percentages of certain groups by year.
First, let’s import the libraries we will use:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.losses import MSE from tensorflow.keras.activations import softmax, sigmoid, relu, elu from tensorflow.keras.optimizers import Adam import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
You can easily see what parameters each library takes and how it responds from the Tensorflow link at the bottom of the blog.
I use a library called “Pandas” to read the dataset.
dataset = pd.read_csv(‘unemployment_data_us.csv’) dataset.head()
Before we start working on and manipulating the data, there are certain features we need to uncover about the data. First of all, I would like to see the size of the data:
dataset.size
Result: 1848
There are NaN (empty) values in the data. These are not useful to us. We need to replace them using a feature of Pandas.
dataset.fillna(value=0, inplace=True)
The first is the value I want to replace, and the second is for updating the dataset without synchronization. I could also use a method rather than a value. This time, I proceed with a value because it works more for me.
dataset.info()
Here we can clearly see that all NaN data is gone.
Another frequently used Pandas feature is the .corr() method, which lists the correlation between the data.
dataset.describe()
With this method, we can access other useful information that describes the data statistically to us. It is more meaningful to use this if the whole data consists of numerical data.
Since the data evaluate male, female and ethnic values separately, I create another ‘total’ column and use male and female data to fill it out.
dataset[‘total’] = dataset[[‘Men’, ‘Women’]].mean(axis=1)
First, I want to separate this data into X (predicted) and y (targeted). When we (humans) think, we can easily choose the data we want to receive and create an interpretation in our head that we can understand. However, we need to be able to put it into code; that’s why we do this. In other words, we need to be able to tell the code according to which part of the data we have to build our statistics.
X = dataset.iloc[:, 4:6].values y = dataset.iloc[:, -1].values X, y
The definitions are set as;
X: high school graduates and their GPAs
y: unemployment rates
First of all, I want to do clustering and classification. So what are clustering and classification, also known as KNN or k-NN (k-nearest neighbors algorithm)? It is a non-parametric supervised learning classifier that uses proximity to make classifications or predictions about the grouping of an individual data point. It is an algorithm for seeing which group they are in when we put the data on the plot. It is a simple and beautiful way to make the number confusion in the data more understandable for people.
There are several approaches to drawing this. As we can tell from the name, we are looking for the nearest neighbor. In other words, this is the name given to the area on the plot that covers the point and/or other points closest to the points. So how do we calculate this? The simplest is euclidian distance:
We must apply this to all other data based on the chosen point. But typically, the classification does not work for continuous data type. Therefore we need to tweak y (target data) a little bit. Our target data should be yes/no, 0–1, pass/fail, but currently, it is not like that. So we use sklearn’s labelEncoder() method. Otherwise, the data will not fit the model.
from sklearn import preprocessing from sklearn import utils lab = preprocessing.LabelEncoder() y_transformed = lab.fit_transform(y)
Now there’s one last thing left before we get the data into the final model; splitting X and y as training and test data. We are lucky that there is already a method in it.
X_train, X_test, y_train, y_test = train_test_split(X, y_transformed, test_size=0.2, random_state=0)
Initially, let’s assume we don’t know anything and go by making minor changes to the method inputs.
I define the n value of KNN as 15; then, we will come to how to find the optimal n value.
classifier = KNeighborsClassifier(n_neighbors=15) classifier.fit(X_train, y_train) classifier.predict([[30, 8000]])
Now that we’ve completed the model, let’s compare the test data with the predicted data:
y_pred = classifier.predict(X_test) y_pred, y_test
Based on the results here, we can easily say that the loss in the model is very high. Also, let’s we examine the accuracy value of the model:
accuracy = accuracy_score(y_test, y_pred)*100 str(round(accuracy, 2))
The result is approximately 11.11%. Now let’s calculate the optimal n value and review the results again.
For this, we will need a few methods we will write ourselves and some adjustments to the data.
# creating list of K for KNN k_list = list(range(1,50,2)) # creating list of cv scores cv_scores = [] # perform 10-fold cross validation for k in k_list: knn = KNeighborsClassifier(n_neighbors=k) scores = cross_val_score(knn, X_train, y_train, cv=7, scoring='accuracy') cv_scores.append(scores.mean())
Here we will take a list of n values and scores we have received and visualize it with the code below.
MSE = [1 - x for x in cv_scores] plt.figure() plt.figure(figsize=(15,10)) plt.title('The optimal number of neighbors', fontsize=20, fontweight='bold') plt.xlabel('Number of Neighbors K', fontsize=15) plt.ylabel('Misclassification Error', fontsize=15) sns.set_style("whitegrid") plt.plot(k_list, MSE) plt.show()
k_list[MSE.index(min(MSE))]
Result: 3
Now, if we repeat the same steps by replacing n with the value of 3, the estimated test values are as follows:
array([21, 68, 38, 12, 9, 46, 0, 64, 68, 54, 65, 68, 0, 6, 68, 66, 12, 2, 16, 57, 20, 1, 1, 63, 32, 56, 58])
Let’s review the accuracy of these results;
accuracy = accuracy_score(y_test, my_y_pred)*100 str(round(accuracy, 3))
Result is 72.815; so we can say that this is a more successful model.
So far, we have seen how we can interpret the data using statistical methods. Let’s see how we can do this by creating an artificial neural network.
First, I create a new page and import the libraries. I will not go over that again because the steps are more or less the same until the train_test_split() above.
We just will need to add different manipulations to the data here.
dataset.drop('Date', axis=1, inplace=True) #delete date column #create a dictionary containing each month month = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8, 'Sep':9, 'Oct': 10, 'Nov': 11, 'Dec': 12} dataset['Month'] = dataset['Month'].map(month) dataset['Day'] = 1 #add day column to make it easier to format dataset['Date'] = pd.to_datetime(dataset[['Year', 'Month', 'Day']]) dataset.set_index('Date', inplace=True) dataset.sort_index(inplace=True) #sort data by date
First, let’s create a simple nn sequential. We will use the Tensorflow library for this.
model = Sequential() model.add(Dense(units=16, activation='relu')) model.add(Dense(units=16, activation='relu')) model.add(Dense(units=1, activation='sigmoid')) model.compile(optimizer = Adam(learning_rate=0.1), loss=MSE, metrics=['accuracy']) model.fit(X_train, y_train, batch_size=64, epochs=100)
Activation function is a function used in neural networks that returns a small value for small inputs and a larger value if the inputs exceed a threshold.
Optimizers are algorithms or methods that modify neural network features, such as weights and learning rates, to reduce losses. Optimizers are used to solve optimization problems by minimizing the function.
I first determined the learning value of the optimizer as 0.1, the activation method of the hidden layer as relu(rectified linear unit), and the activation method of the output layer as sigmoid.
relu:
In short, this method returns negative values as 0 and positive ones as themselves.
sigmoid:
This method optimizes the values as a value between 0 and 1.
When I used this, I could not reduce the loss of the model in 50 iterations or 100 iterations. It changed between very large values, such as 830 1500. For this, I tried tweaking the batch_size. When the result did not change, I wanted to play with nn numbers. However, the increase in the number of parameters caused the results to get smaller and the loss to increase.
Actually, there is no other way but to modify the model a bit to find the right model for our dataset in the first place.
The number of parameters of the current model is sufficient for the size of our dataset.
model.summary()
This gives us the parameter information of the model being trained.
After long trials, I made the model like this:
model = Sequential() model.add(Dense(units=6, activation='relu')) model.add(Dense(units=6, activation='relu')) model.add(Dense(units=1, activation='elu')) model.compile(optimizer = Adam(learning_rate=0.001), loss=MSE, metrics=['accuracy']) model.fit(X_train, y_train, batch_size=16, epochs=50)
When I reduced the number of parameters in the layer and changed the activation method of the output layer to elu, I got better results. Likewise, I had to lower the learning_rate a little bit.
Let’s see more on elu:
It didn’t suit us well as sigmoid only gave a value between 0 and 1, and our target data were larger numbers. Instead of changing the activation method, normalizing y(target data) is also an option here. When working with larger nns, we can normalize the calculations to be faster and more helpful in some jobs.
After bringing the model to this shape, the loss started to give lower values; I saw values between ~2 and ~1.
Finally, it’s time to test the model with our test data and compare the results, as in the previous KNN.
y_pred = model.predict(X_test) y_pred, y_test
As you can see, the predicted results are closer this time, so our model tends to make more successful predictions. Let’s calculate accuracy again:
accuracy = accuracy_score(y_test, my_y_pred)*100 str(round(accuracy, 3))
Result: 87.720
In summary, we covered DQN model training and comparison for the Dense neuron type. The most important thing was to set the right parameters and decide μ(learning_rate), the leading coefficient of the focusing structure. One way to evaluate to ensure layers are working is to compare loss. We can apply the same evaluation by taking the loss values in each q-value calculation and trying to estimate according to these values.
Resources:
https://keras.io/api/layers/core_layers/dense/ https://www.kaggle.com/datasets/aniruddhasshirahatti/us-unemployment-dataset-2010-2020 https://scikit-learn.org/stable/index.html https://www.tensorflow.org/api_docs/python/tf https://pandas.pydata.org/pandas-docs/stable/index.htmlAuthor: Ersin Çebi
Date Published: Oct 24, 2022
