Blogs

R Code For Finding Faults In A Router – Network Analytics

2017-05-25    |    Admin

In this tutorial, we will present a few simple yet effective methods that you can use to build a simple analytics tool using R to identify congestions in a router.

At our company Redeem Systems, we have large amounts of expertise in the field of engineering and networking as well as for analytics. Using the expertise in the field of networking we are able to identify problems in the field of networking by which we were able to identify how analytics could help in the domain. The code discussed in this blog is just to give you a very simple and primitive application of what we can achieve in networking and engineering analytics.

Having expertise was of great help in identifying the problems that they would face while designing a router and problems they need to fix. In this example, we are going to go over how we could identify congestions in a router, given a system log file of a router.

Prerequisites


Programming Language: R

R is a very powerful statistics tool for computing. What makes R good for Analytics is most of the analysis can be done using the vast amount of libraries that it has.

Programming IDE(optional but preferable, you can also use some editor for coding): RStudio

Download the installer for your OS from the link below

Once, you have R installed you will need a few libraries before you can get started with the analysis. So, fire up your favorite editor or open Rstudio. Open a new R script file in RStudio:

 library(reshape) 
library(stringr)

The library reshape provides a variety of methods for reshaping data prior to analysis.

There are four main families of functions in strings:


They say that most of the analysis is done, once you have pre-processed your data and made it ready for analysis. Ok, I don't know who said that, let’s assume I said that. In this case, it is totally true because most of the work is done in pre-processing since here we are going to do root cause analytics. Hence, we do not have to do training for it to be able to understand. This is the most primitive use case. I'll also try to tell you about a few complex problems that we are working on.

First of all, we need the data that we will be using for the analysis. It is a Syslog file that was obtained from one of our routers. It is completely unstructured, as it comes in the format of a .log file. I’ll be giving the download link to it below. But the thing that we are going to focus on is how to process any unstructured data. So you can use any kind of data for doing the same. But the reason I’m using this particular data file is that I want to explain the use case related to a router.

Data Processing:


So, now we are going to get started with processing the data file and convert it into a data frame. The data frame is like a table containing the data, with all the parameters. First, we load the required libraries and then load the data we need to do the analysis.

 library(reshape) 
library(stringr)
#Loading the data and converting it into a data frame
#When you run the final code, file.choose() will automatically ask you to select the file
data <- read.table(file.choose(),header=FALSE,sep="\t")
data <-as.data.frame(data)

Now, we have to eliminate the first few lines of the data, since it has all the redundant data that is not needed for the analysis. We, start from where the log file has data according to the value assigned to it. In this case, the data is only available for the month of February. So, we use the grep function to obtain the lines of the log that only contain the value Feb.

You can also do the same if you want to do a month-wise analysis. We also, do remove the extra whitespaces between the code in several of the lines.

 #Removing the first few lines using grep function
 data_cleaned<-subset(data(),grepl("Feb",ignore.case=TRUE))

 #Removing the extra white spaces between lines
 data_cleaned<-gsub("\\s+"," ",data_cleaned)

 data_cleaned<-as.data.frame(data_cleaned)


 Date <- as.Date(data_cleaned$data_cleaned,"%b %d")

 #Splitting the log into 6 different columns
 data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6)
 data_final<-as.data.frame(data_cleaned)

Data Insights:


Now, the data has been completely processed and we can identify when the defense was

 #Syslog defence was approved at these points
 data_protection_approved<-subset(log, grepl("cleared",data()$V6,ignore.case=TRUE))

Now, we give the complete code at once for it be understood. If you have a look at it once, you will see that it’s a lot of explanation. I was trying to explain this code, so that anyone and everyone can understand what’s going on.

 #Complete Code

 library(reshape)
 library(stringr)

 #Loading the data and converting it into a data frame
 #When you run the final code, file.choose() will automatically ask you to select the file

 data <- read.table(file.choose(),header=FALSE,sep="\t")
 data<-as.data.frame(data)

 #Removing the first few lines using grep function
 data_cleaned<-subset(data,grepl("Feb",ignore.case=TRUE))

 #Removing the extra white spaces between lines
 data_cleaned<-gsub("\\s+"," ",data_cleaned)

 data_cleaned<-as.data.frame(data_cleaned)


 Date <- as.Date(data_cleaned$data_cleaned,"%b %d")

 #Splitting the log into 6 different columns
 data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6)
 data_final<-as.data.frame(data_cleaned)

 #Syslog Defence was approved at these points
 data_protection_approved<-subset(log, grepl("approved",data()$V6,ignore.case=TRUE))

Please note that the code is particularly for this log file. So suppose you have a system that generates the same kind of log every day. You can take into a store all this data and find out at what points anomalies are occurring.

Similarly, you can do complex analytics applications by collecting data for years or months and use this data. Some of the things that you can with that data is:

1) You can use this historic data for forecasting how the congestions would be based on day, date, a month of the year, weekly trends and a lot more. From this, you can get a lot of insights. There are many applications just to forecasting analytics.

2) You can use this data to find faults and also predict based on the data at that particular time to understand patterns, hence prevent faults before they occur.

 data <- read.table(file.choose(),header=FALSE,sep="\t") data<-as.data.frame(data) #Removing the first few lines using grep function data_cleaned<-subset(data,grepl("Feb",ignore.case=TRUE)) #Removing the extra white spaces between lines data_cleaned<-gsub("\\s+"," ",data_cleaned) data_cleaned<-as.data.frame(data_cleaned) Date <- as.Date(data_cleaned$data_cleaned,"%b %d") #Splitting the log into 6 different columns data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6) data_final<-as.data.frame(data_cleaned)  #Syslog Defence was approved at these points data_protection_approved<-subset(log, grepl("approved",data()$V6,ignore.case=TRUE))

Share Blog:

Leave a Comment