7/29/2019 Linux DM Lab Manual
1/68
Contents
S.No TopicPage
no
1. List of Linux Programs 4
2.List of Data Mining Programs 6
3.
Week1
1. Write a shell script that accepts a file name, starting and ending line
numbers as arguments and displays all the lines between the given line
numbers.
8
2. Write a shell script that deletes all lines containing a specified word in
one or more files supplied as arguments to it.
3. Write a shell script that displays a list of all the files in the current
directory to which the user has read, write and execute permissions.
4. Write a shell script that receives any number of file names as arguments
checks if every argument supplied is a file or a directory and reports
accordingly. Whenever the argument is a file, the number of lines on it is also
reported.
4.
Week 2
5. Write a shell script that accepts a list of file names as its arguments, counts
and reports the occurrence of each word that is present in the first argument
file on other argument files12
6. Write a shell script to list all of the directory files in a directory.
7. Write a shell script to find factorial of a given integer.
5.
Week 38. Write an awk script to count the number of lines in a file that do not
contain vowels.
149. Write an awk script to find the number of characters, words and lines in a
file.
10. Write a c program that makes a copy of a file using standard I/O and
system calls
6.
Week 4
11. Implement in C the following UNIX commands using System calls
A. cat B. ls C. mv
15
1
7/29/2019 Linux DM Lab Manual
2/68
12. Write a program that takes one or more file/directory names as command
line input and reports the following information on the file.
A. File type. B. Number of links.
C. Time of last access.
D. Read, Write and Execute permissions.
7.
Week 5
13. Write a C program to emulate the UNIX ls l command.
1814. Write a C program to list for every file in a directory, its inode number
and file name.
15. Write a C program that demonstrates redirection of standard output to a
file.
Ex: ls > f1.
8.
Week 6
16. Write a C program to create a child process and allow the parent to
display parent and the child to display child on the screen.
20
17. Write a C program to create a Zombie process.
18. Write a C program that illustrates how an orphan is created.
9.
Week 7
19. Write a C program that illustrates how to execute two commands
concurrently with a command pipe.
Ex: - ls l | sort
22
20. Write C programs that illustrate communication between two unrelated
processes using named pipe
21. Write a C program to create a message queue with read and write
permissions to write 3 messages to it with different priority numbers.
22. Write a C program that receives the messages (from the above message
queue as specified in (21)) and displays them.
10.
Week 8
23. Write a C program to allow cooperating processes to lock a resource for
exclusive use, using a) Semaphores b) flock or lockf system calls.30
24. Write a C program that illustrates suspending and resuming processes
using signals
11.
Week 9
25. Write a C program that implements a producer-consumer system with
two processes.
31
2
7/29/2019 Linux DM Lab Manual
3/68
(Using Semaphores).
26. Write client and server programs (using c) for interaction between server
and client processes using Unix Domain sockets.
12.
Week 10
27. Write client and server programs (using c) for interaction between server
and client processes using Internet Domain sockets.33
28. Write a C program that illustrates two processes communicating using
shared memory
13. Listing of categorical attributes and the real-valued attributesseparately.
39
14. Rules for identifying attributes. 40
15. Training a decision tree. 43
16. Test on classification of decision tree. 47
17. Testing on the training set . 51
18. Using cross validation for training. 52
19. Significance of attributes in decision tree. 55
20. Trying generation of decision tree with various number of decision
tree.
58
21. Find out differences in results using decision tree and cross-validation on a data set.
60
22. Decision trees. 62
23. Reduced error pruning for training Decision Trees using cross-validation
62
24. Convert a Decision Trees into "if-then-else rules". 65
List of Linux Programs
1.
Write a shell script that accepts a file name, starting and ending line numbers as
arguments and displays all the lines between the given line numbers.
3
7/29/2019 Linux DM Lab Manual
4/68
2.Write a shell script that deletes all lines containing a specified word in one or
more files supplied as arguments to it.
3.Write a shell script that displays a list of all the files in the current directory to
which the user has read, write and execute permissions.
4.
Write a shell script that receives any number of file names as arguments checks
if every argument supplied is a file or a directory and reports accordingly.
Whenever the argument is a file, the number of lines on it is also reported
5.
Write a shell script that accepts a list of file names as its arguments, counts and
reports the occurrence of each word that is present in the first argument file on
other argument files
6. Write a shell script to list all of the directory files in a directory.
7. Write a shell script to find factorial of a given integer.
8.Write an awk script to count the number of lines in a file that do not contain
vowels.
9. Write an awk script to find the number of characters, words and lines in a file.10.
Write a c program that makes a copy of a file using standard I/O and system
calls
11. Implement in C the following UNIX commands using System calls
A. cat B. ls C. mv
12.
Write a program that takes one or more file/directory names as command line
input and reports the following information on the file.
A. File type. B. Number of links.
C. Time of last access. D. Read, Write and Execute permissions.
13. Write a C program to emulate the UNIX ls l command.
14.Write a C program to list for every file in a directory, its inode number and file
name.
15.
Write a C program that demonstrates redirection of standard output to a file.Ex:
ls > f1.
16.Write a C program to create a child process and allow the parent to display
parent and the child to display child on the screen.
17. Write a C program to create a Zombie process.
18. Write a C program that illustrates how an orphan is created.
19.
Write a C program that illustrates how to execute two commands concurrently
with a command pipe.
Ex: - ls l | sort
20.Write C programs that illustrate communication between two unrelated
processes using named pipe.
21.Write a C program to create a message queue with read and write permissions
to write 3 messages to it with different priority numbers.
22.Write a C program that receives the messages (from the above message queue
as specified in (21)) and displays them.23. Write a C program to allow cooperating processes to lock a resource for
4
7/29/2019 Linux DM Lab Manual
5/68
exclusive use, using a) Semaphores b) flock or lockf system calls.
24.Write a C program that illustrates suspending and resuming processes using
signals.
25.Write a C program that implements a producer-consumer system with two
processes. (using Semaphores).
26.Write client and server programs (using c) for interaction between server and
client processes using Unix Domain sockets.
27.Write client and server programs (using c) for interaction between server and
client processes using Internet Domain sockets.
28.Write a C program that illustrates two processes communicating using shared
memory
List of Data Mining Programs
5
7/29/2019 Linux DM Lab Manual
6/68
6
S.No. Task Description1. List all the categorical (or nominal) attributes and the real-valued
attributes separately.2. What attributes do you think might be crucial in making the credit
assessment ? Come up with some simple rules in plain English usingyour selected attributes.
3. One type of model that you can create is a Decision Tree - train aDecision Tree using the complete dataset as the training data. Report themodel obtained after training.
4. Suppose you use your above model trained on the complete dataset, and
classify credit good/bad for each of the examples in the dataset. What %of examples can you classify correctly ? (This is also called testing on thetraining set) Why do you think you cannot get 100 % training accuracy ?
5. Is testing on the training set as you did above a good idea ?Why or Why not ?
6. One approach for solving the problem encountered in the previousquestion is using cross-validation ? Describe what is cross-validationbriefly. Train a Decistion Tree again using cross-validation and reportyour results. Does your accuracy increase/decrease ? Why ? (10 marks)
7. Check to see if the data shows a bias against "foreign workers" (attribute20),or "personal-status" (attribute 9). One way to do this (perhaps rather
simple minded) is to remove these attributes from the dataset and see ifthe decision tree created in those cases is significantly different from thefulldataset case which you have already done. To remove an attribute youcan use the preprocess tab in Weka's GUI Explorer. Did removing theseattributes have any significant effect? Discuss.
8. Another question might be, do you really need to input so manyattributes to get good results? Maybe only a few would do. For example,you could try just having attributes 2, 3, 5, 7, 10, 17 (and 21, the classattribute (naturally)). Try out some combinations. (You had removed twoattributes inproblem 7. Remember to reload the arff data file to get all the attributes
initially before you start selecting the ones you want.)9. Sometimes, the cost of rejecting an applicant who actually has a good
credit (case 1) might be higher than accepting an applicant who has badcredit (case 2). Instead of counting the misclassifications equally in bothcases, give a higher cost to the first case (say cost 5) and lower cost tothe second case. You can do this by using a cost mAECx in Weka. Trainyour Decision Treeagain and report the Decision Tree and cross-validation results. Are theysignificantly different from results obtained in problem 6 (using equalcost)?
10. Do you think it is a good idea to prefer simple decision trees instead ofhaving long complex decision trees? How does the complexity of aDecision Tree relate to the bias of the model?
11. You can make your Decision Trees simpler by pruning the nodes. Oneapproach is to use Reduced Error Pruning - Explain this idea briefly. Tryreduced error pruning for training your Decision Trees using cross-validation (you can do this in Weka) and report the Decision Tree youobtain ? Also,report your accuracy using the pruned model. Does your accuracyincrease ?
12. (Extra Credit): How can you convert a Decision Trees into "if-then-elserules". Make up your own small Decision Tree consisting of 2-3 levels and
convert it into a set of rules. There also exist different classifiers thatoutput the model in the form of rules - one such classifier in Weka isrules.PART, train this model and report the set of rules obtained.Sometimes just one attribute canbe good enough in making the decision, yes, just one ! Can you predictwhat attribute that might be in this dataset ? OneR classifier uses asingle attribute to make decisions (it chooses the attribute based onminimum error). Report the rule obtained by training a one R classifier.
7/29/2019 Linux DM Lab Manual
7/68
Week1
1. Write a shell script that accepts a file name, starting and ending line numbers as
arguments and displays all the lines between the given line numbers.
Aim: ToWrite a shell script that accepts a file name, starting and ending line numbers as
arguments and displays all the lines between the given line numbers.
Script:
if [ $# -ne 3 ]
then echo "Error : Invalid number of arguments."
exitfi
7
7/29/2019 Linux DM Lab Manual
8/68
if [ $2 -gt $3 ]
then
echo "Error : Invalid range value."
exit
fil=`expr $3 - $2 + 1`
cat $1 | tail +$2 | head -$l
Output:
$sh 11b.sh test 5 7
abc 1234
def 5678
ghi 91011
Description :
head command : This command is used to display at the beginning of one ormore
files. By default it displays first 10 lines of a file
head [ count option ] filename
tail command : This command is used to display last few lines at the end of a file. By
default it displays last 10 lines of a file
tail [ +/-start] filename
startis starting line number
tail -5 filename : It displays last 5 lines of the file
tail +5 filename : It displays all the lines ,beginning from line number 5 to end of the file.
2. Write a shell script that deletes all lines containing a specified word in one or more
files supplied as arguments to it.
Aim: To write a shell script that deletes all lines containing a specified word in one or
more files supplied as arguments to it.
Script:
clear
if [ $# -eq 0 ]
then
echo no arguments passed
exit
fi
echo the contents before deleting
for i in $*
do
echo $i
cat $i
doneecho enter the word to be deleted
8
7/29/2019 Linux DM Lab Manual
9/68
read word
for i in $*
do
grep -vi "$word" $i > temp
mv temp $iecho after deleting
cat $i
done
Output:
$ sh 8b.sh test1
the contents before deleting
test1
hello
hello
bangalore
mysore city
enter the word to be deleted
city
after deleting
hello
hello
Bangalore
$ sh 8b.shno argument passed
3. Write a shell script that displays a list of all the files in the current directory to
which the user has read, write and execute permissions.
Aim: To write a shell script that displays a list of all the files in the current directory to
which the user has read, write and execute permissions.
Script:
echo "enter the directory name"
read dir
if [ -d $dir ]
then
cd $dir
ls > f
exec < fwhile read line
9
7/29/2019 Linux DM Lab Manual
10/68
do
if [ -f $line ]
then
if [ -r $line -a -w $line -a -x $line ]
thenecho "$line has all permissions"
else
echo "files not having all permissions"
fi
fi
done
fi
4. Write a shell script that receives any number of file names as arguments checks if
every argument supplied is a file or a directory and reports accordingly. Wheneverthe argument is a file, the number of lines on it is also reported
Aim: To write a shell script that receives any number of file names as arguments checks
if every argument supplied is a file or a directory
Script:
for x in $*
do
if [ -f $x ]
then
echo " $x is a file "
echo " no of lines in the file are "
wc -l $x
elif [ -d $x ]
then
echo " $x is a directory "
else
echo " enter valid filename or directory name "
fidone
10
7/29/2019 Linux DM Lab Manual
11/68
Week 2
5. Write a shell script that accepts a list of file names as its arguments, counts and
reports the occurrence of each word that is present in the first argument file on
other argument files.
Aim : To write a shell script that accepts a list of file names as its arguments, counts
and reports the occurrence of each word that is present in the first argument file on
other argument files.
Script:
if [ $# -ne 2 ]
then
echo "Error : Invalid number of arguments."
exit
fi
str=`cat $1 | tr '\n' ' '`
for a in $str
do
echo "Word = $a, Count = `grep -c "$a" $2`"
done
Output :
$ cat test
hello AEC$ cat test1
hello AEC
hello AEC
hello
$ sh 1.sh test test1
Word = hello, Count = 3
Word = AEC, Count = 2
6. Write a shell script to list all of the directory files in a directory.
11
7/29/2019 Linux DM Lab Manual
12/68
Script:
# !/bin/bash
echo"enter directory name"
read dirif[ -d $dir]
then
echo"list of files in the directory"
ls $dir
else
echo"enter proper directory name"
fi
Output:
Enter directory name
AEC
List of all files in the directoty
CSE.txt
ECE.txt
7. Write a shell script to find factorial of a given integer.
Script:
# !/bin/bash
echo "enter a number"read num
fact=1
while [ $num -ge 1 ]
do
fact=`echo $fact\* $num|bc`
let num--
done
echo "factorial of $n is $fact"
Output:
Enter a number
5
Factorial of 5 is 120
12
7/29/2019 Linux DM Lab Manual
13/68
Week 3
8. Write an awk script to count the number of lines in a file that do not contain
vowels.
9. Write an awk script to find the number of characters, words and lines in a file.10. Write a c program that makes a copy of a file using standard I/O and system calls
Aim : To write an awk script to find the number of characters, words and lines in a file.
Script:
BEGIN{print "record.\t characters \t words"}
#BODY section
{
len=length($0)
total_len+=len
print(NR,":\t",len,":\t",NF,$0)
words+=NF
}
END{
print("\n total")
print("characters :\t" total len)
print("lines :\t" NR)
}
13
7/29/2019 Linux DM Lab Manual
14/68
Week 4
11. Implement in C the following UNIX commands using System calls
A. cat B. ls C. mv
12. Write a program that takes one or more file/directory names as command line inputand reports the following information on the file.
A. File type. B. Number of links.
C. Time of last access. D. Read, Write and Execute permissions.
AIM: Implement in C the cat Unix command using system calls
#include
#include
#define BUFSIZE 1
int main(int argc, char **argv)
{
int fd1;
int n;
char buf;
fd1=open(argv[1],O_RDONLY);
printf("Welcome to AEC\n");
while((n=read(fd1,&buf,1))>0)
{
printf("%c",buf);/* or
write(1,&buf,1); */
}
return (0);
}
AIM: Implement in C the following ls Unix command using system calls
Algorithm:
1. Start.
2. open directory using opendir( ) system call.
3. read the directory using readdir( ) system call.
4. print dp.name and dp.inode .
5. repeat above step until end of directory.
6. End
#include
#include
#include
#include
14
7/29/2019 Linux DM Lab Manual
15/68
#define FALSE 0
#define TRUE 1
extern int alphasort();
char pathname[MAXPATHLEN];
main() {
int count,i;
struct dirent **files;
int file_select();
if (getwd(pathname) == NULL )
{ printf("Error getting pathn");
exit(0);
}
printf("Current Working Directory = %sn",pathname);
count = scandir(pathname, &files, file_select, alphasort);
if (count d_name, ".") == 0) ||(strcmp(entry->d_name, "..") == 0))
return (FALSE);
else
return (TRUE);
}
AIM: Implement in C the Unix command mv using system calls
Algorithm:
1. Start
2. open an existed file and one new open file using open()
system call
3. read the contents from existed file using read( ) systemcall
15
7/29/2019 Linux DM Lab Manual
16/68
4. write these contents into new file using write system
call using write( ) system call
5. repeat above 2 steps until eof
6. close 2 file using fclose( ) system call
7. delete existed file using using unlink( ) system8. End.
Program:
#include
#include
#include
#include
int main(int argc, char **argv)
{
int fd1,fd2;
int n,count=0;
fd1=open(argv[1],O_RDONLY);
fd2=creat(argv[2],S_IWUSR);
rename(fd1,fd2);
unlink(argv[1]);
printf( file is copied );
return (0);
}
16
7/29/2019 Linux DM Lab Manual
17/68
Week 5
13. Write a C program to emulate the UNIX ls l command.
ALGORITHM :
Step 1: Include necessary header files for manipulating directory.
Step 2: Declare and initialize required objects.
Step 3: Read the directory name form the user.
Step 4: Open the directory using opendir() system call and report error if the directory is not
available.
Step 5: Read the entry available in the directory.
Step 6: Display the directory entry ie., name of the file or sub directory.
Step 7: Repeat the step 6 and 7 until all the entries were read.
/* 1. Simulation of ls command */
#include
#include
#include
#includemain()
{
char dirname[10];
DIR *p;
struct dirent *d;printf("Enter directory name ");
scanf("%s",dirname);
p=opendir(dirname);
if(p==NULL)
{
perror("Cannot find dir.");
exit(-1);
}
while(d=readdir(p))
printf("%s\n",d->d_name);
}
SAMPLE OUTPUT:
enter directory name iii
...
f2
17
7/29/2019 Linux DM Lab Manual
18/68
14. Write a C program to list for every file in a directory, its inode number and file
name.
15. Write a C program that demonstrates redirection of standard output to a file.
Ex: ls > f1.Description:
An Inode number points to an Inode. An Inode is a data structure that stores
the following information about a file :
Size of file
Device ID
User ID of the file
Group ID of the file
The file mode information and access privileges for owner, group and others
File protection flags
The timestamps for file creation, modification etc
link counter to determine the number of hard links
Pointers to the blocks storing files contents
18
7/29/2019 Linux DM Lab Manual
19/68
19
7/29/2019 Linux DM Lab Manual
20/68
Week 6
16. Write a C program to create a child process and allow the parent to display
parent and the child to display child on the screen.
#include
#includemain()
{
int childpid;
if (( childpid=fork())0)
{
}
else
printf(Child process);
}
17. Write a C program to create a Zombie process.
If child terminates before the parent process then parent process with out child is
called zombie process
#include
#include
main()
{
int childpid;
if (( childpid=fork())0)
{
Printf(child process);
exit(0);
}
else
{wait(100);
20
7/29/2019 Linux DM Lab Manual
21/68
printf(parent process);
}
}
18. Write a C program that illustrates how an orphan is created.
#include
main()
{
int id;
printf("Before fork()\n");
id=fork();
if(id==0)
{
printf("Child has started: %d\n ",getpid());
printf("Parent of this child : %d\n",getppid());
printf("child prints 1 item :\n ");
sleep(25);
printf("child prints 2 item :\n");
}
else
{
printf("Parent has started: %d\n",getpid());printf("Parent of the parent proc : %d\n",getppid());
}
printf("After fork()");
}
21
7/29/2019 Linux DM Lab Manual
22/68
Week 7
19. Write a C program that illustrates how to execute two commands concurrently with
a command pipe.
Ex: - ls l | sort
AIM: Implementing Pipes
D ESCRIPTION :
A pipe is created by calling a pipe() function.
int pipe(int filedesc[2]);
It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is
open for writing. This function returns a 0 if ok & -1 on error.
ALGORITHM:
The following is the simple algorithm for creating, writing to and reading from a
pipe.
1) Create a pipe through a pipe() function call.
2) Use write() function to write the data into the pipe. The syntax is as follows
write(int [],ip_string,size);
int [] filedescriptor variable, in this case if int filedesc[2] is the variable, then
use the filedesc[1] as the first parameter.
ip_string The string to be written in the pipe.
Size buffer size for storing the input
3) Use read() function to read the data that has been written to the pipe.
The syntax is as follows
read(int [], char,size);
PROGRAM:
#include
#include
main()
{
int pipe1[2],pipe2[2],childpid;
if(pipe(pipe1)
7/29/2019 Linux DM Lab Manual
23/68
printf("cannot fork");
}
else
if(childpid >0)
{close(pipe1[0]);
close(pipe2[1]);
client(pipe2[0],pipe1[1]);
while (wait((int *) 0 ) !=childpid);
close(pipe1[1]);
close(pipe2[0]);
exit(0);
}
else
{
close(pipe1[1]);
close(pipe2[0]);
server(pipe1[0],pipe2[1]);
close(pipe1[0]);
close(pipe2[1]);
exit(0);
}
}
client(int readfd,int writefd){
int n;
char buff[1024];
if(fgets(buff,1024,stdin)==NULL)
printf("file name read error");
n=strlen(buff);
if(buff[n-1]=='\n')
n--;
if(write(writefd,buff,n)!=n)
printf("file name write error");
while((n=read(readfd,buff,1024))>0)
if(write(1,buff,n)!=n)
printf("data write error");
if(n
7/29/2019 Linux DM Lab Manual
24/68
n=read(readfd,buff,1024);
buff[n]='\0';
if((fd=open(buff,0))0)
write(writefd,buff,n);
}
}
20. Write C programs that illustrate communication between two unrelated processes
using named pipe.
AIM: Implementing IPC using a FIFO (or) named pipe.
D ESCRIPTION :
Another kind of IPC is FIFO(First in First Out) is sometimes also called as
named pipe.It is like a pipe, except that it has a name.Here the name is that of a file
that multiple processes can open(), read and write to. A FIFO is created using the
mknod() system call. The syntax is as follows
int mknod(char *pathname, int mode, int dev);
The pathname is a normal Unix pathname, and this is the name of the FIFO.
The mode argument specifies the file mode access mode.The dev value is ignored for
a FIFO.
Once a FIFO is created, it must be opened for reading (or) writing using either the
open system call, or one of the standard I/O open functions-fopen, or freopen.
ALGORITHM:
The following is the simple algorithm for creating, writing to and reading from
a
FIFO.
1) Create a fifo through mknod() function call.
2) Use write() function to write the data into the fifo. The syntax is as follows
write(int [],ip_string,size);
24
7/29/2019 Linux DM Lab Manual
25/68
int [] filedescriptor variable, in this case if int filedesc[2] is the variable, then
use the filedesc[1] as the first parameter.
ip_string The string to be written in the fifo.
Size buffer size for storing the input
3) Use read() function to read the data that has been written to the fifo.
The syntax is as follows
read(int [], char,size);
PROGRAM:
#define FIFO1 "Fifo1"
#define FIFO2 "Fifo2"
#include
#include
#include
#include
#include
main()
{int childpid,wfd,rfd;
mknod(FIFO1,0666|S_IFIFO,0);
mknod(FIFO2,0666|S_IFIFO,0);
if (( childpid=fork())==-1)
{
printf("cannot fork");
}
else
if(childpid >0)
{
wfd=open(FIFO1,1);
rfd=open(FIFO2,0);
client(rfd,wfd);
while (wait((int *) 0 ) !=childpid);
close(rfd);
close(wfd);
unlink(FIFO1);
unlink(FIFO2);
}
else25
7/29/2019 Linux DM Lab Manual
26/68
{
rfd=open(FIFO1,0);
wfd=open(FIFO2,1);
server(rfd,wfd);
close(rfd);close(wfd);
}
}
client(int readfd,int writefd)
{
int n;
char buff[1024];
printf ("enter s file name");
if(fgets(buff,1024,stdin)==NULL)
printf("file name read error");
n=strlen(buff);
if(buff[n-1]=='\n')
n--;
if(write(writefd,buff,n)!=n)
printf("file name write error");
while((n=read(readfd,buff,1024))>0)
if(write(1,buff,n)!=n)
printf("data write error");
if(n
7/29/2019 Linux DM Lab Manual
27/68
21. Write a C program to create a message queue with read and write permissions to
write 3 messages to it with different priority numbers.
22. Write a C program that receives the messages (from the above message queue as
specified in (21)) and displays them.
Aim: To create a message queue
DESCRIPTION:
Message passing between processes are part of operating system, which are done through a
message queue. Where messages are stored in kernel and are associated with message queue
identifier (msqid). Processes read and write messages to an arbitrary queue in a way such
that a process writes a message to a queue, exits and other process reads it at later time.
ALGORITHM:
Before defining a structure ipc_perm structure should be defined which is done by
including following file.
#include
#include
A structure of information is maintained by kernel, it should contain following.
struct msqid_ds{
struct ipc_perm msg_perm; /*operation permission*/
struct msg *msg_first; /*ptr to first msg on queue*/
struct msg *msg_last; /*ptr to last msg on queue*/
ushort msg_cbytes; /*current bytes on queue*/
ushort msg_qnum; /*current no of msgs on queue*/
ushort msg_qbytes; /*max no of bytes on queue*/
ushort msg_lspid; /*pid o flast msg send*/
ushort msg_lrpid; /*pid of last msgrecvd*/
time_t msg_stime; /*time of last msg snd*/
time_t msg_rtime; /*time of last msg rcv*/
time_t msg_ctime; /*time of last msg ctl*/
};To create new message queue or access existing message queue msgget() function is
used
Syntax:
int msgget(key_t key ,int msgflag);
Msg flag values
Num val Symb value desc
0400 MSG_R Read by owner
0200 MSG_w Write by owner
0040 MSG_R >>3 Read by group0020 MSG_W>>3 Write by group
27
7/29/2019 Linux DM Lab Manual
28/68
Msgget returns msqid, or -1 if error
1. To put message on queue msgsnd() function is used.
Syntax: int msgsnd(int msqid , struct msgbuf *ptr,int length, int flag);
msqid is message queue id, a unique id
msgbufis actual content to send, a pointer to structure which contain following
struct msgbuf
{
Long mtype; /*message type >0 */
Char mtext[1]; /*data*/
};
length is the size of message in bytes
flag is
- IPC_NOWAIT which allows sys call to return immediately when no room on
queue, when this is specified msgsnd will return -1 if no room on queue.
Else flag can be specified as 0
2. To receive Message msgrcv() function is used
Syntax:
Int msgrcv(int msqid , struct msgbuf *ptr, int length, long msgtype, int flag);
*ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer area
Flag hasMSG_NOERROR , it returns an error if length is not large enough to
receive msg, if data portion is greater than msg length it truncates and returns.
3. Variety of control operations on msg can be done through msgctl() function
Int msgctl(int msqid, int cmd, struct msqid_ds *buff);
IPC_RMID in cmd is givento remove a message queue from the system.
Let us create a header file msgq.h with following in it
#include
#include
#include
#include
extern int errno;
#define MKEY1 1234L
#define MKEY2 2345L#define PERMS 0666
28
7/29/2019 Linux DM Lab Manual
29/68
Server operation algorithm:
#include msgq.h
main()
{ Int readid, writeid;
If((readid = msgget(MSGKEY1, PERMS |IPC_CREAT))
7/29/2019 Linux DM Lab Manual
30/68
24. Write a C program that illustrates suspending and resuming processes using
signals.
23. a) AIM: C program that illustrate file locking using semaphores
PROGRAM:
#include
#include
#include
#include
#include
#include
int main(void)
{
key_t key;
int semid;
union semun arg;
if((key==ftok("sem demo.c","j"))== -1)
{
perror("ftok");
exit(1);
}
if(semid=semget(key,1,0666|IPC_CREAT))== -1){
perror("semget"):
exit(1);
}
arg.val=1;
if(semctl(semid,0,SETVAL,arg)== -1)
{
perror("smctl");
exit(1);
}
return 0;
}
OUTPUT:
semget
smctl
Week 9
30
7/29/2019 Linux DM Lab Manual
31/68
25. Write a C program that implements a producer-consumer system with two
processes. (using Semaphores).
26. Write client and server programs (using c) for interaction between server and client
processes using Unix Domain sockets.
Algorithm:
1. Start
2. create semaphore using semget( ) system call
3. if successful it returns positive value
4. create two new processes
5. first process will produce
6. until first process produces second process cannot consume
7. End.
Source code:
#include
#include
#include
#include
#include
#include
#define num_loops 2int main(int argc,char* argv[])
{
int sem_set_id;
int child_pid,i,sem_val;
struct sembuf sem_op;
int rc;
struct timespec delay;
clrscr();
sem_set_id=semget(ipc_private,2,0600);
if(sem_set_id==-1)
{
perror(main:semget);
exit(1);
}
printf(semaphore set created,semaphore setid%d\n ,
sem_set_id);
child_pid=fork();
switch(child_pid)
{case -1:
31
7/29/2019 Linux DM Lab Manual
32/68
perror(fork);
exit(1);
case 0:
for(i=0;i
7/29/2019 Linux DM Lab Manual
33/68
27. Write client and server programs (using c) for interaction between server and client
processes using Internet Domain sockets.
28. Write a C program that illustrates two processes communicating using shared
memory.
DESCRIPTION:
Shared Memory is an efficeint means of passing data between programs. One
program will create a memory portion which other processes (if permitted) can access.
The problem with the pipes, FIFOs and message queues is that for two processes to
exchange information, the information has to go through the kernel. Shared memory provides
a way around this by letting two or more processes share a memory segment.
In shared memory concept if one process is reading into some shared memory, for
example, other processes must wait for the read to finish before processing the data.
A process creates a shared memory segment using shmget()|. The original owner of a
shared memory segment can assign ownership to another user with shmctl(). It can also
revoke this assignment. Other processes with proper permission can perform various control
functions on the shared memory segment using shmctl(). Once created, a shared segment can
be attached to a process address space using shmat(). It can be detached using shmdt() (see
shmop()). The attaching process must have the appropriate permissions for shmat(). Once
attached, the process can read or write to the segment, as allowed by the permission requestedin the attach operation. A shared segment can be attached multiple times by the same process.
A shared memory segment is described by a control structure with a unique ID that points to
an area of physical memory. The identifier of the segment is called the shmid. The structure
definition for the shared memory segment control structures and prototypews can be found in
.
shmget() is used to obtain access to a shared memory segment. It is prottyped by:
int shmget(key_t key, size_t size, int shmflg);
The key argument is a access value associated with the semaphore ID. The size argument is
the size in bytes of the requested shared memory. The shmflg argument specifies the initial
access permissions and creation control flags.
When the call succeeds, it returns the shared memory segment ID. This call is also used to get
the ID of an existing shared segment (from a process requesting sharing of some existing
memory portion).
The following code illustrates shmget():#include
33
7/29/2019 Linux DM Lab Manual
34/68
#include
#include
...
key_t key; /* key to be passed to shmget() */
int shmflg; /* shmflg to be passed to shmget() */int shmid; /* return value from shmget() */
int size; /* size to be passed to shmget() */
...
key = ...
size = ...
shmflg) = ...
if ((shmid = shmget (key, size, shmflg)) == -1) {
perror("shmget: shmget failed"); exit(1); } else {
(void) fprintf(stderr, "shmget: shmget returned %d\n", shmid);
exit(0);
}
...
Controlling a Shared Memory Segment
shmctl() is used to alter the permissions and other characteristics of a shared memory
segment. It is prototyped as follows:
int shmctl(int shmid, int cmd, struct shmid_ds *buf);The process must have an effective shmid of owner, creator or superuser to perform this
command. The cmd argument is one of following control commands:
SHM_LOCK
-- Lock the specified shared memory segment in memory. The
process must have the effective ID of superuser to perform this
command.
SHM_UNLOCK
-- Unlock the shared memory segment. The process must have the
effective ID of superuser to perform this command.
IPC_STAT
-- Return the status information contained in the control structure
and place it in the buffer pointed to by buf. The process must have
read permission on the segment to perform this command.
IPC_SET
-- Set the effective user and group identification and access
permissions. The process must have an effective ID of owner,
creator or superuser to perform this command.
IPC_RMID
-- Remove the shared memory segment.The buf is a sructure of type struct shmid_ds which is defined in
34
7/29/2019 Linux DM Lab Manual
35/68
The following code illustrates shmctl():
#include
#include
#include
...int cmd; /* command code for shmctl() */
int shmid; /* segment ID */
struct shmid_ds shmid_ds; /* shared memory data structure to
hold results */
...
shmid = ...
cmd = ...
if ((rtrn = shmctl(shmid, cmd, shmid_ds)) == -1) {
perror("shmctl: shmctl failed");
exit(1);
}
..
Attaching and Detaching a Shared Memory Segment
shmat() and shmdt() are used to attach and detach shared memory segments. They are
prototypes as follows:
void *shmat(int shmid, const void *shmaddr, int shmflg);
int shmdt(const void *shmaddr);
shmat() returns a pointer, shmaddr, to the head of the shared segment associated with a valid
shmid. shmdt() detaches the shared memory segment located at the address indicated byshmaddr
. The following code illustrates calls to shmat() and shmdt():
#include
#include
#include
static struct state { /* Internal record of attached segments. */
int shmid; /* shmid of attached segment */
char *shmaddr; /* attach point */
int shmflg; /* flags used on attach */
} ap[MAXnap]; /* State of current attached segments. */
int nap; /* Number of currently attached segments. */
...
char *addr; /* address work variable */
register int i; /* work area */
register struct state *p; /* ptr to current state entry */
...
p = &ap[nap++];
p->shmid = ...
p->shmaddr = ...p->shmflg = ...
35
7/29/2019 Linux DM Lab Manual
36/68
p->shmaddr = shmat(p->shmid, p->shmaddr, p->shmflg);
if(p->shmaddr == (char *)-1) {
perror("shmop: shmat failed");
nap--;
} else(void) fprintf(stderr, "shmop: shmat returned %#8.8x\n",
p->shmaddr);
...
i = shmdt(addr);
if(i == -1) {
perror("shmop: shmdt failed");
} else {
(void) fprintf(stderr, "shmop: shmdt returned %d\n", i);
for (p = ap, i = nap; i--; p++)
if (p->shmaddr == addr) *p = ap[--nap];
}
...
Algorithm:
1. Start
2. create shared memory using shmget( ) system call
3. if success full it returns positive value
4. attach the created shared memory using shmat( ) system
call5. write to shared memory using shmsnd( ) system call
6. read the contents from shared memory using shmrcv( )
system call
7. End .
Source Code:
#include
#include
#include
#include
#include
#include
#define shm_size 1024
int main(int argc,char * argv[])
{
key_t key;
int shmid;
char *data;
int mode;
if(argc>2){
36
7/29/2019 Linux DM Lab Manual
37/68
fprintf(stderr,usage:stdemo[data_to_writte]\n);
exit(1);
}
if((shmid=shmget(key,shm_size,0644/ipc_creat))==-1)
{perror(shmget);
exit(1);
}
data=shmat(shmid,(void *)0,0);
if(data==(char *)(-1))
{
perror(shmat);
exit(1);
}
if(argc==2)
printf(writing to segment:\%s\\n,data);
if(shmdt(data)==-1)
{
perror(shmdt);
exit(1);
}
return 0;
}
Input:#./a.out swarupa
Output:
writing to segment swarupa
Data Mining Lab
Credit Risk Assessment
Description: The business of banks is making loans. Assessing the credit worthiness
of an applicant is of crucial importance. You have to develop a system to help a loan
officer decide whether the credit of a customer is good, or bad. A banks business
rules regarding loans must consider two opposing factors. On the one hand, a bank
wants to make as many loans as possible. Interest on these loans is the bans profit
source. On the other hand, a bank cannot afford to make too many bad loans. Too
many bad loans could lead to the collapse of the bank. The banks loan policy mustinvolve a compromise not too strict, and not too lenient.
37
7/29/2019 Linux DM Lab Manual
38/68
To do the assignment, you first and foremost need some knowledge about the world
of credit . You can acquire such knowledge in a number of ways.
1. Knowledge Engineering. Find a loan officer who is willing to talk. Interview her and
try to represent her knowledge in the form of production rules.
2. Books. Find some training manuals for loan officers or perhaps a suitable textbook on
finance. Translate this knowledge from text form to production rule form.
3. Common sense. Imagine yourself as a loan officer and make up reasonable rules
which can be used to judge the credit worthiness of a loan applicant.
4. Case histories. Find records of actual cases where competent loan officers correctly
judged when not to, approve a loan application.
The German Credit Data :
Actual historical credit data is not always easy to come by because of confidentiality
rules. Here is one such dataset ( original) Excel spreadsheet version of the German credit
data (download from web).
In spite of the fact that the data is German, you should probably make use of it for this
assignment, (Unless you really can consult a real loan officer !)
A few notes on the German dataset :
DM stands for Deutsche Mark, the unit of currency, worth about 90 centsCanadian (but looks and acts like a quarter).
Owns_telephone. German phone rates are much higher than in Canada so fewer
people own telephones.
Foreign_worker. There are millions of these in Germany (many from Turkey). It
is very hard to get German citizenship if you were not born of German parents.
There are 20 attributes used in judging a loan applicant. The goal is the classify
the applicant into one of two categories, good or bad.
Subtasks : (Turn in your answers to the following tasks)
Laboratory Manual For Data Mining
EXPERIMENT-1
Aim: To list all the categorical(or nominal) attributes and the real valued attributes using Weka
mining tool.
Tools/ Apparatus: Weka mining tool..
38
7/29/2019 Linux DM Lab Manual
39/68
Procedure:
1) Open the Weka GUI Chooser.
2) Select EXPLORER present in Applications.
3) Select Preprocess Tab.
4) Go to OPEN file and browse the file that is already stored in the system bank.csv.
5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute.
SampleOutput:
EXPERIMENT-2
Aim:To identify the rules with some of the important attributes by a) manually and b) Using Weka .
Tools/ Apparatus: Weka mining tool..
Theory:
Association rule mining is defined as: Let be a set ofnbinary attributes called items. Let be a set of
transactions called the database. Each transaction inD has a unique transaction ID and contains a
subset of the items inI. A rule is defined as an implication of the form X=>Y where X,Y C I and X
Y= . The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and
consequent(righthandside or RHS) of the rule respectively.
To illustrate the concepts, we use a small example from the supermarket domain.
39
7/29/2019 Linux DM Lab Manual
40/68
The set of items isI= {milk,bread,butter,beer} and a small database containing the items (1 codes
presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule
for the supermarket could be meaning that if milk and bread is bought, customers also buy butter.
Note: this example is extremely small. In practical applications, a rule needs a support of several
hundred transactions before it can be considered statistically significant, and datasets often containthousands or millions of transactions.
To select interesting rules from the set of all possible rules, constraints on various measures of
significance and interest can be used. The bestknown constraints are minimum thresholds on support
and confidence. The support supp(X) of an itemsetXis defined as the proportion of transactions in the
data set which contain the itemset. In the example database, the itemset {milk,bread} has a support of
2 / 5 = 0.4 since it occurs in 40% of all transactions (2 out of 5 transactions).
The confidence of a rule is defined . For example, the rule has a confidence of 0.2 / 0.4 = 0.5 in the
database, which means that for 50% of the transactions containing milk and bread the rule is correct.
Confidence can be interpreted as an estimate of the probabilityP(Y|X), the probability of finding theRHS of the rule in transactions under the condition that these transactions also contain the LHS .
ALGORITHM:
Association rule mining is to find out association rules that satisfy the predefined minimum support
and confidence from a given database. The problem is usually decomposed into two subproblems.
One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those
itemsets are called frequent or large itemsets. The second problem is to generate association rules
from those large itemsets with the constraints of minimal confidence.
Suppose one of the large itemsets is Lk, Lk = {I1, I2, , Ik}, association rules with this itemsets aregenerated in the following way: the first rule is {I1, I2, , Ik1} and {Ik}, by checking the confidence
this rule can be determined as interesting or not. Then other rule are generated by deleting the last
items in the antecedent and inserting it to the consequent, further the confidences of the new rules are
checked to determine the interestingness of them. Those processes iterated until the antecedent
becomes empty. Since the second subproblem is quite straight forward, most of the researches focus
on the first subproblem. The Apriori algorithm finds the frequent setsL In DatabaseD.
Find frequent setLk 1.
Join Step.
o Ckis generated by joiningLk 1with itself
Prune Step.
o Any (k 1) itemset that is not frequent cannot be a subset of a
frequent kitemset, hence should be removed.
Where (Ck: Candidate itemset of size k)
(Lk: frequent itemset of size k)
40
7/29/2019 Linux DM Lab Manual
41/68
Apriori Pseudocode
Apriori (T,)
L
7/29/2019 Linux DM Lab Manual
42/68
7/29/2019 Linux DM Lab Manual
43/68
7/29/2019 Linux DM Lab Manual
44/68
2) Select EXPLORER present in Applications.
3) Select Preprocess Tab.
4) Go to OPEN file and browse the file that is already stored in the system bank.csv.
5) Go to Classify tab.
6) Here the c4.5 algorithm has been chosen which is entitled as j48 in Java and can be selected by
clicking the button choose
7) and select tree j48
9) Select Test options Use training set
10) if need select attribute.
11) Click Start .
12)now we can see the output details in the Classifier output.
13) right click on the result list and select visualize tree option .
Sample output:
44
7/29/2019 Linux DM Lab Manual
45/68
The decision tree constructed by using the implementedC4.5 algorithm
45
http://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithm7/29/2019 Linux DM Lab Manual
46/68
EXPERIMENT-4
Aim: To find the percentage of examples that are classified correctly by using the above created
decision tree model? ie.. Testing on the training set.
Tools/ Apparatus: Weka mining tool..
Theory:
Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is
unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to
be an apple if it is red, round, and about 4" in diameter. Even though these features depend on the
existence of the other features, a naive Bayes classifier considers all of these properties to
independently contribute to the probability that this fruit is an apple.
An advantage of the naive Bayes classifier is that it requires a small amount of training data toestimate the parameters (means and variances of the variables) necessary for classification. Because
independent variables are assumed, only the variances of the variables for each class need to be
determined and not the entirecovariance mAECx The naive Bayes probabilistic model :
The probability model for a classifier is a conditional model
P(C|F1 .................Fn) over a dependent class variable Cwith a small number of outcomes orclasses,
conditional on several feature variablesF1 throughFn. The problem is that if the number of features
n is large or when a feature can take on a large number of values, then basing such a model on
probability tables is infeasible. We therefore reformulate the model to make it more tractable.
Using Bayes' theorem, we write P(C|F1...............Fn)=[{p(C)p(F1..................Fn|C)}/p(F1,........Fn)]
46
7/29/2019 Linux DM Lab Manual
47/68
In plain English the above equation can be written as
Posterior= [(prior *likehood)/evidence]
In practice we are only interested in the numerator of that fraction, since the denominator does not
depend on Cand the values of the featuresFi are given, so that the denominator is effectively
constant. The numerator is equivalent to the joint probability model p(C,F1........Fn) which can be
rewritten as follows, using repeated applications of the definition of conditional probability:
p(C,F1........Fn) =p(C) p(F1............Fn|C) =p(C)p(F1|C) p(F2.........Fn|C,F1,F2)
=p(C)p(F1|C) p(F2|C,F1)p(F3.........Fn|C,F1,F2)
= p(C)p(F1|C) p(F2|C,F1)p(F3.........Fn|C,F1,F2)......p(Fn|C,F1,F2,F3.........Fn1)
Now the "naive" conditional independence assumptions come into play: assume that each featureFi is
conditionally independent of every other featureFj for ji .
This means that p(Fi|C,Fj)=p(Fi|C)
and so the joint model can be expressed as p(C,F1,.......Fn)=p(C)p(F1|C)p(F2|C)...........
=p(C) p(Fi|C)
This means that under the above independence assumptions, the conditional distribution over the class
variable Ccan be expressed like this:
p(C|F1..........Fn)= p(C) p(Fi|C)
Z
whereZis a scaling factor dependent only on F1.........Fn, i.e., a constant if the values of the feature
variables are known.
Models of this form are much more manageable, since they factor into a so called class prior p(C) and
independent probability distributions p(Fi|C). If there are kclasses and if a model for eachp(Fi|C=c)
can be expressed in terms ofrparameters, then the corresponding naive Bayes model has (k 1) + n r
kparameters. In practice, often k= 2 (binary classification) and r= 1 (Bernoulli variables as features)
are common, and so the total number of parameters of the naive Bayes model is 2n + 1, where n is the
number of binary features used for prediction
P(h/D)= P(D/h) P(h) P(D)
P(h) : Prior probability of hypothesis h
P(D) : Prior probability of training data D
P(h/D) : Probability of h given D
P(D/h) : Probability of D given h
47
7/29/2019 Linux DM Lab Manual
48/68
7/29/2019 Linux DM Lab Manual
49/68
7/29/2019 Linux DM Lab Manual
50/68
17 309 | b = NO
EXPERIMENT-5
Aim: To Is testing a good idea.
Tools/ Apparatus: Weka Mining tool
Procedure:
1) In Test options, select the Supplied test set radio button
2) click Set
3) Choose the file which contains records that were not in the training set we used to create the
model.
4) click Start(WEKA will run this test data set through the model we already created. )
5) Compare the output results with that of the 4th experiment
Sample output:
This can be experienced by the different problem solutions while doing practice.
The important numbers to focus on here are the numbers next to the "Correctly Classified Instances"(92.3 percent) and the "Incorrectly Classified Instances" (7.6 percent). Other important numbers are inthe "ROC Area" column, in the first row (the 0.936); Finally, in the "Confusion MAECx," it showsthe number of false positives and false negatives. The false positives are 29, and the false negativesare 17 in this mAECx.
Based on our accuracy rate of 92.3 percent, we say that upon initial analysis, this is a good model.
One final step to validating our classification tree, which is to run our test set through the model and
ensure that accuracy of the model
50
7/29/2019 Linux DM Lab Manual
51/68
Comparing the "Correctly Classified Instances" from this test set with the "Correctly ClassifiedInstances" from the training set, we see the accuracy of the model , which indicates that the modelwill not break down with unknown data, or when future data is applied to it.
EXPERIMENT-6
Aim: To create a Decision tree by cross validation training data set using Weka mining tool.
Tools/ Apparatus: Weka mining tool..
Theory:
Decision tree learning, used in data mining and machine learning, uses a decision tree as a predictive
model which maps observations about an item to conclusions about the item's target value In these
tree structures, leaves represent classifications and branches represent conjunctions of features that
lead to those classifications. In decision analysis, a decision tree can be used to visually and explicitly
represent decisions and decision making. In data mining, a decision tree describes data but not
decisions; rather the resulting classification tree can be an input for decision making. This page deals
with decision trees in data mining.
Decision tree learning is a common method used in data mining. The goal is to create a model that
predicts the value of a target variable based on several input variables. Each interior node corresponds
to one of the input variables; there are edges to children for each of the possible values of that input
variable. Each leaf represents a value of the target variable given the values of the input variables
represented by the path from the root to the leaf.
A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This
process is repeated on each derived subset in a recursive manner called recursive partitioning. The
recursion is completed when the subset at a node all has the same value of the target variable, or when
splitting no longer adds value to the predictions.
In data mining, trees can be described also as the combination of mathematical and computational
techniques to aid the description, categorisation and generalization of a given set of data.
Data comes in records of the form:
(x, y) = (x1, x2, x3..., xk, y)
51
7/29/2019 Linux DM Lab Manual
52/68
The dependent variable, Y, is the target variable that we are trying to understand, classify or
generalise. The vectorx is comprised of the input variables, x1, x2, x3 etc., that are used for that task.
Procedure:
1) Given the Bank database for mining.
2) Use the Weka GUI Chooser.
3) Select EXPLORER present in Applications.
4) Select Preprocess Tab.
5) Go to OPEN file and browse the file that is already stored in the system bank.csv.
6) Go to Classify tab.
7) Choose Classifier Tree
8) Select J48
9) Select Test options Cross-validation.
10) Set Folds Ex:10
11) if need select attribute.
12) now Start weka.
13)now we can see the output details in the Classifier output.
14)Compare the output results with that of the 4 th experiment
15) check whether the accuracy increased or decreased?
Sample output:
52
7/29/2019 Linux DM Lab Manual
53/68
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 539 89.8333 %
Incorrectly Classified Instances 61 10.1667 %
Kappa statistic 0.7942
Mean absolute error 0.167
Root mean squared error 0.305
Relative absolute error 33.6511 %
Root relative squared error 61.2344 %
Total Number of Instances 600
=== Detailed Accuracy By Class ===
53
7/29/2019 Linux DM Lab Manual
54/68
7/29/2019 Linux DM Lab Manual
55/68
7/29/2019 Linux DM Lab Manual
56/68
56
7/29/2019 Linux DM Lab Manual
57/68
57
7/29/2019 Linux DM Lab Manual
58/68
EXPERIMENT-8
Aim: Select some attributes from GUI Explorer and perform classification and see the effect using
Weka mining tool.
Tools/ Apparatus: Weka mining tool..
Procedure:
1) Given the Bank database for mining.
2) Use the Weka GUI Chooser.
3) Select EXPLORER present in Applications.
4) Select Preprocess Tab.
5) Go to OPEN file and browse the file that is already stored in the system bank.csv.
6) select some of the attributes from attributes list which are to be removed. With this step only the
attributes necessary for classification are left in the attributes panel.
7) The go to Classify tab.
8) Choose Classifier Tree
9) Select j48
10) Select Test options Use training set
11) if need select attribute.
12) now Start weka.
13)now we can see the output details in the Classifier output.
14) right click on the result list and select visualize tree option .
15)Compare the output results with that of the 4 th experiment
16) check whether the accuracy increased or decreased?
17)check whether removing these attributes have any significant effect.
Sample output:
58
7/29/2019 Linux DM Lab Manual
59/68
EXPERIMENT-9
59
7/29/2019 Linux DM Lab Manual
60/68
Aim: To create a Decision tree by cross validation training data set by changing the cost mAECx in
Weka mining tool.
Tools/ Apparatus: Weka mining tool..
Procedure:
1) Given the Bank database for mining.
2) Use the Weka GUI Chooser.
3) Select EXPLORER present in Applications.
4) Select Preprocess Tab.
5) Go to OPEN file and browse the file that is already stored in the system bank.csv.
6) Go to Classify tab.
7) Choose Classifier Tree
8) Select j48
9) Select Test options Training set.
10)Click on more options.
11)Select cost sensitive evaluation and click on set button
12)Set the mAECx values and click on resize. Then close the window.
13)Click Ok
14)Click start.
15) we can see the output details in the Classifier output
16) Select Test options Cross-validation.
17) Set Folds Ex:10
18) if need select attribute.
19) now Start weka.
20)now we can see the output details in the Classifier output.
21)Compare results of 15th and 20th steps.
22)Compare the results with that of experiment 6.
Sample output:
60
7/29/2019 Linux DM Lab Manual
61/68
7/29/2019 Linux DM Lab Manual
62/68
Tools/ Apparatus: Weka mining tool..
Procedure:
This will be based on the attribute set, and the requirement of relationship among attribute we want to
study. This can be viewed based on the database and user requirement.
EXPERIMENT-11
Aim: To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy
for cross validation trained data set using Weka mining tool.
Tools/ Apparatus: Weka mining tool..
Theory :
Reduced-error pruning
Each node of the (over-fit) tree is examined for pruning
A node is pruned (removed) only if the resulting pruned tree
performs no worse than the original over the validation set
Pruning a node consists of
Removing the sub-tree rooted at the pruned node
Making the pruned node a leaf node
Assigning the pruned node the most common classification of the training instances attached to that
node
Pruning nodes iteratively
Always select a node whose removal most increases the DT accuracy over the validation set
Stop when further pruning decreases the DT accuracy over the validation set
IF (Children=yes) (income=>30000)
THEN (car=Yes)
Procedure:
1) Given the Bank database for mining.
2) Use the Weka GUI Chooser.
3) Select EXPLORER present in Applications.
62
7/29/2019 Linux DM Lab Manual
63/68
4) Select Preprocess Tab.
5) Go to OPEN file and browse the file that is already stored in the system bank.csv.
6) select some of the attributes from attributes list
7) Go to Classify tab.
8) Choose Classifier Tree
9) Select NBTree i.e., Navie Baysiean tree.
10) Select Test options Use training set
11) right click on the text box besides choose button ,select show properties
12) now change unprone mode false to true.
13) change the reduced error pruning % as needed.
14) if need select attribute.
15) now Start weka.
16)now we can see the output details in the Classifier output.
17) right click on the result list and select visualize tree option .
Sample output:
63
7/29/2019 Linux DM Lab Manual
64/68
64
7/29/2019 Linux DM Lab Manual
65/68
7/29/2019 Linux DM Lab Manual
66/68
7/29/2019 Linux DM Lab Manual
67/68
7/29/2019 Linux DM Lab Manual
68/68
One R
PART
Top Related