OPENSTACK LOG FILE AUTOMATION DUPLICATION … · 1.1 Introduction In this chapter we will talk...
Transcript of OPENSTACK LOG FILE AUTOMATION DUPLICATION … · 1.1 Introduction In this chapter we will talk...
OPENSTACK LOG FILE AUTOMATION DUPLICATION
MUHAMMAD NADZMI B MOHD ZAMERI
BACHELOR OF COMPUTER SCIENCE (COMPUTER NETWORK SECURITY) WITH
HONOURS
BTBL 16043903
TABLE OF CONTENTS
ABSTRACT……………………... ..………………………………………………………...3
Chapter 1………………………....
1.1 Introduction……………...
1.2 Background……………....
1.3 Problem Statement……...
1.4 Objective………………....
1.5 Project Method………….
1.6 Limitation Of Work…….
..………………………………………………………...4
..………………………………………………………...4
..………………………………………………………...5
..………………………………………………………...6
..………………………………………………………...6
..………………………………………………………...7
..………………………………………………………...8
Chapter 2………………………....
2.1 Introduction……………….
2.2 Cloud Computing………...
2.3 Virtual Machine…………..
2.4 Openstack………………….
2.5 Log File…………………….
2.7 Summary Of Openstack ....
And Log File System
Research Paper
..………………………………………………………...9
..………………………………………………………...9
..……………………………………………………….10
..…………………………………………………....10-11
......…………………………………………………11-12
....……………………………………………………...13
...…………………………………………………...13-16
Chapter 3………………………....
3.1
Introduction……………….
3.2 Flowchart………………….
3.3 Openstack Installation
And Configuration
..……………………………………………………….17
..……………………………………………………….17
..……………………………………………………….18
..……………………………………………………….19
1
3.3.1 Installing Centos 7 In
A Virtual Machine
3.3.2 Installing Openstack..
3.3.3 Configuring………....
Openstack System
3.3.4 Openstack System…..
Flow Design
..……………………………………………………….20
..……………………………………………………….21
..……………………………………………………21-22
REFERENCE………………….... ……………………………………………………..23-24
2
ABSTRACT
OpenStack Log File Duplication is aim to reduce and simplify the logfile system in the
OpenStack dashboard while giving it an option to make a backup of a log file in each of the
openstack nodes to save it as a backup on swift. In this project we will focusing in creating
backup to make it safer for us to use openstack and centralising the openstack logfile system to
make one large log file that is compelling for an average joe to read it.
3
CHAPTER 1
1.1 Introduction
In this chapter we will talk about the background of OpenStack, its problem statement,
objectives, scopes and limitation of work. In the background section we will talk about the
history and the background of the software and project call OpenStack and how it was achieved
and what kind of services that it provided. In the Problem Statement section, we will talk about
our limitation of OpenStack and how it gives inconvenience in to the user which will lead up to
the objective section where we will talk about the criteria that we want to achieve during the
development of this project. After then that we will talk about the project scope in who, where,
and what is the use case of the project. Lastly, we will talk the limitation of developing the
system.
4
1.2 Background
The internet has become a very essential aspect in everyday life on this day and age
where everyone and everything is connected to it. As such the growth of cloud computing
services must not be taken lightly as its importance grows day to day from personal and public
usage in completing and storing everyday life events. One such services is an open source
Infrastructure-as-a-service (IaaS) name OpenStack. Open Stack started in 2010 and was manage
by the OpenStack Foundation. It was a joint project between The National Aeronautics and
Space Administration (NASA) and Rackspace Hosting which intended to help organization by
offering a cloud computing services that can run on a regular hardware. As openstack is a cloud
computing service it is important to keep track off your data. As such shows the importance of
log files in the system as it records all the data flow and authentication that has happens in the
system. With that comes a great reliability on the log file in security management and data
management as we will search the log file if there are suspicious activity done to our cloud
computer. But the great reliability does not have an alternate way in handling the system if the
specific log file is corrupted or lost. Other than that, it is hard for normal user to understand the
logfile for the first time. With so comes the logfile automation duplicate system in theory will
make it harder for you to lose your log file because of the duplication provided by the project and
make it easier for us to read our logfile where we can sort and visualize the logfile data.
5
1.3 Problem statement
Nowadays Log files are an insental system in a computer network environment. This is
because a log file is a system that lets us view the changes that has been done to our system in
depth which helps us in securing and debugging our system. As such many problems will occur
when a log file is lost or corrupted. This is because there is no easy way to retrieve the lost log
file and as such the data that is stored in the logfile system is also lost.
Secondly is that the log files data are to complicated as many history are compile and
save in a log file. This makes reading the specific log file harder for an average joe to read the
history of the file system. Lastly is that how log files are seperated and not centralized for every
nod in a system. As a log file is separated in smaller part we may have to check multiple logfile
to know the problem like to know the problem in logfile A we must search the problem in log
file B which have an error problem in log file C. As such researching a log file becomes time
consuming and tedious.
1.4 Objective
1. To develop a Log File duplicate so that when a log file is corrupted or lost there is a way
to obtain the previous and current backup of the logfile.
2. To design a simplify openstack dashboard that can summaries the logfile data so that an
average user can read it easily.
3. To test the usability of the system in increasing productivity by centralizing the logfile
obtain from the services.
6
1.5 Project Method
The method of the project is divided into three parts that is the installation, the
modification of the open stack system and the testing and implementation of the system. The first
part of installing the system is by setting up and configuring the centos 7 operating system to
allow open stack to access and use the computer as a server that will set up the online cloud
storage. Secondly is to modify the existing module of the open stack system to receive data and
log file that will make a duplicate data of the exact data and store it to the dashboard.
The Last part is the testing and implementation of the system as we will set it up for a
real use case environment where we will monitor the usability of the system and its performance
in helping user to recover log files that has been lost by corruption or other means.
7
1.6 Limitation of Work
The project stated has several limitations that it could not resist:
I. Computer hardware performance
The hardware of a specific computer will determine the speed of
efficiency an open stack system will work. This is because an older computer will
run much more slowly then a new computer as such provide a slower speed for
the system to process, store and record all of its data running in the system. It is
also noted that a computer CPU cannot run as fast as a server CPU thus making it
slower to encrypted and decrypted data then the speed of a server CPU.
II. Internet speed connection bandwidth
An internet is a crucial part of the openstack system as it is a cloud computing
software meaning it uses the internet to process all of the things that it can do. As
of that, the speed of the internet plays a crucial part in transferring and updating
data of the open stack system. With the geological different in internet speed we
may found it difficult to run the system if the internet speed is to slow when we
access it in another area.
8
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
In this particular chapter we will be explain some of the particular components and terms
that have been used in our previous chapter and some that will be use in the future. These terms
consist of Cloud Computing, Virtual machine, log file and openstack that will be explain more in
depth to deepen our understanding on how,what and why we are using them in our current
project.
9
2.2 Cloud Computing
Cloud Computing is a term that is widely and broadly used nowadays that is commonly
associated with the growth of the internet over this past few decade. This is because cloud
computing is usually done in a manner that is accessible anywhere or anytime by the use of the
internet. It is a high level services that is often located on the internet or cloud that is easily
improvise by the management with minimal effort as cloud computing is a big pools of
configurable computer system resources that is shared in an economic scale. This helps decrease
the time taken for a many organisation to set up their businesses and also improve the
maintenance cost of the particular system as we only need to maintain only one big core system
rather than a multiple of smaller system.
2.3 Virtual Machine
Virtual Machine or VM is an emulator software that simulate and visualise a seperate
computer in a host environment which is able to perform many computing as it is a separate
computer. VM also known as a guest is created as the main computer that is known as a host as it
is hosting the VM in its computer environment. VM are capable of running most programs and
application that a normal computer can but sometimes the task are perform in a different way
but as the outcome are mostly same it is overlook by most people. VM are divided into two
category because of their many uses. These uses are depended on their level of correspondent of
the original computer as it cannot perform task that surpass the original computer can do.
10
The first category is the System Virtual Machine which is a substitute of a normal
computer that have the capability of simulating a whole new environment within a VM.This will
allow you to install multiple different logical environment that can be use seperetedly in a single
physical computer. We will be using this VM in our current project. Secondly is a Processing
virtual machine which is constructed to perform a computer task in a platform-independent
environment.
2.4 Openstack
Openstack is a cloud computing software that is mostly deploy as an infrastructure as a
service (IaaS) that is use to build and manage cloud computing of public or private platforms. It
is a free open source software platform that is back by a substantial amount of company and
countless community members working to improve the platform. As openstack is a cloud
computing software it is only logical that it has its on VM which allows user to manage and
modify thear cloud environment easily. Openstack is comprise of seven main components that
consist of Nova, Swift, Cinder, Neutron, Horizon, Keystone and Glance These component have
their own task ands uses that is use to operate the openstack software.
11
Figure 2.1: Openstack Components
The First component is Horizon that is the dashboard in openstack. It is the Graphical
User Interface (GUI) for the user and the first component that the user may see when launching
openstack. Horizon gives developer access to all of the component of openstack through an
application programing interface (API) and gives an administrator service to the user to manage
the cloud. Secondly is Nova the brain or main computing engin of openstack.Nova is task with a
crucial part of openstack that is to deploy and manage the large number of VM instance in an
openstack.
After that comes the keystone component in openstack. The keystone is the identity
service or authentication of openstack. It is task to provides and list the identity of all the user in
openstack and store their permission on openstack in which component they are allow to use and
modify.
The fourth component of openstack is the Swift. Swift is an object storage module that is
task to store the object and files in the openstack system. After that we have Glance that is the
Image service for openstack. Glance is task in providing image as in a saved virtual hard disk
services to the openstack that allow it to use as example in launching a new VM.
The sixth component is the neutron that is the networking component of openstack.
Neutron helps the VM in openstack to communicate with each other easily and quickly. Finally
is cinder that is the block storage of openstack. Cinder also provides a storage services like swift
but differently as swift store object randomly in openstack, cinder uses a more traditional way of
storing object that is storing the object in a place that is easily access by the user where the speed
of accessing the data is the priority for the user.
12
2.5 Log File
In an IT environment a log file is crucial in developing,maintaining and securing a
system or application. A log file is a file that records the modification or activation that have
accoure in a system. As such a log file will list all of the changes a user may make to they system
which is important as it can provide a clue or evidence that a change have happened in a system.
Other than that a log file can also helps us in debugging a system when face with an error to the
system.The act of keeping a log file is logging in which a log file is save to be use if an error
occur. A log file can be divided into many category to improve the readability of the log file as
such some example of these chetagory are the event log, Transaction logs, message logs and
many more.
2.7 Summary of Openstack and log file system Research Paper
The table below show the summary of literature review related to openstack and logfile
system.
Title Of Paper Author & Year Background
LOG FILE MANAGEMENT TOOLS
Alan Gatto, Dean Cottle, Oleg Fylypenko, Shivakumar Gurusiddappa, Kevin Haselhuhn, Greg Hollis, Luis Lamprea, Sergey Aleksin, Gaurav Kumar, Narendra Datar, Michael Pougnet, Poras Bharucha, Brett Dale (19-12-2017)
When providing technical support for computer systems support specialists often use log files generated by various components of the systems to diagnose technical issues. Such logs are generally stored in various locations scattered across various computers ( e . g . , servers ) of the software system and across the file systems of those
13
computers thereby complicating collecting those logs from a customer computer system installation
METHODS AND SYSTEMS TO DETECT ANOMALIES IN COMPUTER SYSTEM BEHAVIOR BASED ON LOG-FILE SAMPLING
Darren Brown, Junyuan LIN, Nicholas Kushmerick (30-10-2018)
Methods and systems that detect computer system anomalies based on log file sampling are described. Computers systems generate log files that record various types of operating system and software run events in event messages. For each computer system, a sample of event messages are collected in a first time interval and a sample of event messages are collected in a recent second time interval. Methods calculate a difference between the event messages collected in the first and second time intervals. When the difference is greater than a threshold, an alert is generated. The process of repeatedly collecting a sample of event messages in a recent time interval, calculating a difference between the event messages collected in the recent and previous time intervals, comparing the difference to the threshold, and generating an alert when the threshold is violated may be executed for each computer system of a cluster of computer systems.
SYSTEM METHOD, AND COMPUTER READABLE MEDIA FOR IDENTIFYING A USER-INITIATED LOG
Danny Yen-Fu Chen, David A. Cox, Sheryl S. Kinstler, Fabian F. Morgan (03-01-2017)
A system, a method, and a computer readable media for identifying a user-initiated log file record in a log file are
14
FILE RECORD IN A LOG FILE
provided. The log file has a user-initiated log file record and a repeating pattern of log file records automatically generated by a software program. The system allows a user to identify first and second timestamp values corresponding to first and second times which identify a time interval of interest in the log file. The system further analyzes the log file to identify the user-initiated log file record having a timestamp value between the first and second timestamp values. The system further identifies the repeating pattern of log file records in the log file.
OPENSTACK AND SOFTWAREDEFINED NETWORKING THE ENORMOUS POTENTIAL OF OPEN SOURCE SOFTWARE COLLABORATION
Hoai Le (September 2017) Throughout the theoretical part, cloud computing, OpenStack architecture, OpenStack core services, Software-Defined Networking architecture, and SDN-related technologies were researched. The outcome indicated that OpenStack and Software-Defined Networking could play well together. It also showed why people favored OpenStack, and why it has become one of the fastest growing open source communities.
DISTRIBUTED LOG ANALYSIS ON THE
Galip Aydin, Ibrahim Riza Hallac (10-02-2018)
In this paper we describe our work on designing a web
15
CLOUD USING MAPREDUCE
based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system consists of several cluster nodes, it splits the large log files on a distributed file system and quickly processes them using MapReduce programming model. The cluster is created using an open source cloud infrastructure, which allows us to easily expand the computational power by adding new nodes. This gives us the ability to automatically resize the cluster according to the data analysis requirements. We implemented MapReduce programs for basic log analysis needs like frequency analysis, error detection, busy hour detection etc. as well as more complex analyses which require running several jobs. The system can automatically identify and analyze several web server log types such as Apache, IIS, Squid etc. We use open source projects for creating the cloud infrastructure and running MapReduce jobs.
16
CHAPTER 3
METHODOLOGY
3.1 INTRODUCTION
In this chapter i will be reporting the methodology that was proposed by other researcher
and how their research help in building and improving the present framework. In this chapter i
will present the framework and system model, flowchart and the approach i take in taking on the
project. The selection of methodology that is most suited for the development of this project is
crucial in determining the outcome of the project as choosing the incorrect methodology may
hinders the project time flow and incidentally be the reasons of the project delay or
discontinuation. This is because of the developer relies in the time flow of a project to guide
them through their work and because the wrong methodology hinders a developer timestamp
that will unintentionally effect the project
17
3.2 FLOWCHART
Figure 3.0 Flowchart of Openstack log file automation duplication
Figure 3.0 show an overall flow chart of configuring the log file system in an openstack
environment. The first step is to run a virtual machine and install a centos 7 operating system
preferably the minimal on that particular virtual machine. Secondly we will have to instal the
openstack framework on that particular machine and then configure the existing log file system
and openstack dashboard so that we can implement a duplication task for the logfile system and
maybe improve on the dashboard of openstack.
18
3.3 OPENSTACK INSTALLATION AND CONFIGURATION
3.3.1 INSTALLING CENTOS 7 IN A VIRTUAL MACHINE
Installing Centos 7 in a virtual machine is a straightforward process. In this
project we will be using VMware to simulate a secondary instance in our computer
to install centos 7 minimal. Installing Centos required any centos 7 Iso that is up to
date and some tweaking to the configuration to the instance. This configuration
range from setting the connection type of the instance to be a bridge connection, to
allowing the VM to use preferably more than 20 GB of computer storage so that
openstack can be run smoothly.
19
3.3.2 INSTALLING OPENSTACK
The first part in Installing Openstack on a centosOS is, we must check our ip
address as whether it is the same to the host computer or not. Other then that we
must stop, disable and remove unneeded services such as networkmanager and
firewall as openstack already have its own network manager and it does not play
well with our network manager and firewall. After that we have to set up our
hostname for our system and synchronize the server time to ours with ntpdate.
The second part is to find and install missing respiratory like the RPM
distribution for openstack and from there we will install centos release of openstack
and then update all of our computer system respiratory . After that we will install
openstack with packstack that is a openstack utility tool that uses puppet module in
helping us to deploy the openstack module. But before that we will configure our
admin password,SSL, Server password and many more in our openstack answer
file.
The third part is to start the openstack installation using packstack answer
file configuration that we have done before that will help automatically install our
openstack. Finally after the installation we are able to access our openstack
dashboard from a remote host.
20
3.3.3 CONFIGURING OPENSTACK SYSTEM
Openstack System are configure in each of the system configuration file. As such
to configure an openstack system we must first enter the node file containing the
configuration file of the node and open the configuration file to edit of the
openstack configuration manually. As an example and openstack Nova
configuration file is located in its Nova file and is name nova.cfg. In the nova file
we will be configuring its logging file which we will add a syntax that will provide
the logfile do duplicate itself as a backup. Other then that is the dashboard where
we will try to simplify it to make it more applicable for normal user.
3.3.4 OPENSTACK SYSTEM FLOW DESIGN
The design of the openstack system flow is fairly easy in this project. As
we know openstack log file are store in their individual node and its configuration
is base on that node configuration. As such we will design a system, that will
duplicate the log file when a log file is recorded and send store it in a backup
folder. Other than that we will merge the current data of the log file to create one
centralized log file where all of the data can be view in the dashboard of the
openstack.
21
22
REFERENCE
Hoai Le. (2017). OPENSTACK AND SOFTWARE DEFINED NETWORKING. The Enormous
Potential of Open Source Software Collaboration.
Ranger, S. (2018, December 13). What is cloud computing? Everything you need to know about
the cloud, explained. Retrieved from ZD Net :
https://www.zdnet.com/article/what-is-cloud-computing-everything-you-need-to-know-fr
om-public-and-private-cloud-to-software-as-a/
CHANDAN KUMAR. (2018). Cloud-based Log Analyzer. 8 Cloud-based Log Analyzer for IT
Operational Insights.
Ben Silverman. (2017). How to explain OpenStack to a complete newcomer.
Miao He, Jin Feng Li,Chang Rui Ren,Bing Shao,Ming,Xie,Tian Zhi Zhao. (2017).
GENERATING IMPORTANT VALUES FROM A VARIETY OF SERVER LOG
FILES.
OpenStack. (2018, December 5). Introduction to OpenStack. Retrieved from OpenStack:
23
https://docs.openstack.org/security-guide/introduction/introduction-to-openstack.html
OpenStackComunity. (2018, December 5). Networking Services security best practices.
Retrieved from Openstack:
https://docs.openstack.org/security-guide/networking/securing-services.html
24