Gaussian Fuzzy Number for STR-DNA Similarity Calculation...

14
Research Article Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving Familial and Tribal Relationships Maria Susan Anggreainy , 1 M. Rahmat Widyanto, 1 Belawati H. Widjaja, 1 and Nurtami Soedarsono 2 1 Faculty of Computer Science, Universitas Indonesia, Depok Campus, West Java 16424, Indonesia 2 Faculty of Dentistry, Universitas Indonesia, Salemba Campus, Jakarta 10430, Indonesia Correspondence should be addressed to Maria Susan Anggreainy; [email protected] Received 18 March 2018; Accepted 9 July 2018; Published 29 July 2018 Academic Editor: Florentino Fdez-Riverola Copyright © 2018 Maria Susan Anggreainy et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We performed locus similarity calculation by measuring fuzzy intersection between individual locus and reference locus and then performed CODIS STR-DNA similarity calculation. e fuzzy intersection calculation enables a more robust CODIS STR-DNA similarity calculation due to imprecision caused by noise produced by PCR machine. We also proposed shiſted convoluted Gaussian fuzzy number (SCGFN) and Gaussian fuzzy number (GFN) to represent each locus value as improvement of triangular fuzzy number (TFN) as used in previous research. Compared to triangular fuzzy number (TFN), GFN is more realistic to represent uncertainty of locus information because the distribution is assumed to be Gaussian. en, the original Gaussian fuzzy number (GFN) is convoluted with distribution of certain ethnic locus information to produce the new SCGFN which more represents ethnic information compared to original GFN. Experiments were done for the following cases: people with family relationships, people of the same tribe, and certain tribal populations. e statistical test with analysis of variance (ANOVA) shows the difference in similarity between SCGFN, GFN, and TFN with a significant level of 95%. e Tukey method in ANOVA shows that SCGFN yields a higher similarity which means being better than the GFN and TFN methods. e proposed method enables CODIS STR-DNA similarity calculation which is more robust to noise and performed better CODIS similarity calculation involving familial and tribal relationships. 1. Introduction Genetics is the study of genes, genetic variation, and heredity in living organisms. Population genetics is a part of evolution- ary biology and is a subfield genetic that deals with genetic differences within and between populations [1]. Variations in traits among human populations represent genetic differ- ences that can be inherited from generation to generation. Population genetics is learning about genetic variation in the population, involving the examination and modeling of changes in the frequency of genes and alleles in populations over time and space [2]. Population genetics gives us the opportunity to step back and observe patterns of genetic change over time. Comparing populations to one another can lead to capturing how external factors trigger the evolution of a trait, as well as mapping variants associated with various traits within the population. Population genetics is another way of looking at DNA that can generate insight into the potential to benefit everyone. Many of the genes found in a population will be polymorphic, that is, will occur in a number of different alleles. Mathematical models are used to investigate and predict the occurrence of specific alleles or combinations of alleles in the population; the focus is by com- paring data groups or populations or species, not individuals. A population is a group of individuals with the same characteristics (species) that live in the same place and have the ability to reproduce among each other; evolution also works through populations [3]. Geneticists, on the other hand, view the population as a means or container for the exchange of alleles owned by the individuals of its members. e dynamic frequency of alleles in a population is of major concern in the study of population genetics Hindawi Advances in Bioinformatics Volume 2018, Article ID 8602513, 13 pages https://doi.org/10.1155/2018/8602513

Transcript of Gaussian Fuzzy Number for STR-DNA Similarity Calculation...

Page 1: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Research ArticleGaussian Fuzzy Number for STR-DNA Similarity CalculationInvolving Familial and Tribal Relationships

Maria Susan Anggreainy 1 M RahmatWidyanto1

Belawati H Widjaja1 and Nurtami Soedarsono2

1Faculty of Computer Science Universitas Indonesia Depok Campus West Java 16424 Indonesia2Faculty of Dentistry Universitas Indonesia Salemba Campus Jakarta 10430 Indonesia

Correspondence should be addressed to Maria Susan Anggreainy mariasusan61uiacid

Received 18 March 2018 Accepted 9 July 2018 Published 29 July 2018

Academic Editor Florentino Fdez-Riverola

Copyright copy 2018 Maria Susan Anggreainy et alThis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

We performed locus similarity calculation by measuring fuzzy intersection between individual locus and reference locus and thenperformed CODIS STR-DNA similarity calculation The fuzzy intersection calculation enables a more robust CODIS STR-DNAsimilarity calculation due to imprecision caused by noise produced by PCRmachineWe also proposed shifted convolutedGaussianfuzzy number (SCGFN) and Gaussian fuzzy number (GFN) to represent each locus value as improvement of triangular fuzzynumber (TFN) as used in previous research Compared to triangular fuzzy number (TFN) GFN is more realistic to representuncertainty of locus information because the distribution is assumed to be Gaussian Then the original Gaussian fuzzy number(GFN) is convolutedwith distribution of certain ethnic locus information to produce the new SCGFNwhichmore represents ethnicinformation compared to original GFN Experiments were done for the following cases people with family relationships peopleof the same tribe and certain tribal populations The statistical test with analysis of variance (ANOVA) shows the difference insimilarity between SCGFN GFN and TFNwith a significant level of 95The Tukeymethod in ANOVA shows that SCGFN yieldsa higher similarity which means being better than the GFN and TFN methods The proposed method enables CODIS STR-DNAsimilarity calculation which ismore robust to noise and performed better CODIS similarity calculation involving familial and tribalrelationships

1 Introduction

Genetics is the study of genes genetic variation and heredityin living organisms Population genetics is a part of evolution-ary biology and is a subfield genetic that deals with geneticdifferences within and between populations [1] Variationsin traits among human populations represent genetic differ-ences that can be inherited from generation to generationPopulation genetics is learning about genetic variation inthe population involving the examination and modeling ofchanges in the frequency of genes and alleles in populationsover time and space [2] Population genetics gives us theopportunity to step back and observe patterns of geneticchange over time Comparing populations to one another canlead to capturing how external factors trigger the evolutionof a trait as well as mapping variants associated with various

traits within the population Population genetics is anotherway of looking at DNA that can generate insight into thepotential to benefit everyone Many of the genes found ina population will be polymorphic that is will occur in anumber of different alleles Mathematical models are used toinvestigate and predict the occurrence of specific alleles orcombinations of alleles in the population the focus is by com-paring data groups or populations or species not individuals

A population is a group of individuals with the samecharacteristics (species) that live in the same place and havethe ability to reproduce among each other evolution alsoworks through populations [3] Geneticists on the otherhand view the population as a means or container for theexchange of alleles owned by the individuals of its membersThe dynamic frequency of alleles in a population is of majorconcern in the study of population genetics

HindawiAdvances in BioinformaticsVolume 2018 Article ID 8602513 13 pageshttpsdoiorg10115520188602513

2 Advances in Bioinformatics

DNA regions with short repeat units (usually 2-6 basepairs in length) are called Short Tandem Repeats (STR)STRs are found surrounding the chromosomal centromereSTRs have proven to have several benefits that make themespecially suitable for human identification [4] STRs havebecome popular DNAmarkers because they are easily ampli-fied by polymerase chain reaction (PCR)without the problemof differential amplification that is the PCR products forSTRs are generally similar in amount making analysis easierAn individual inherits one copy of an STR from each parentwhich may or may not have similar repeat sizes The numberof repeats in STR markers can be highly variable amongindividuals which make these STRs effective for humanidentification purposes [5]

Beginning in 1996 the FBI Laboratory launched anationwide forensic science effort to establish core STRloci for inclusion within the national database known asCODIS (Combined DNA Index System) The 13 CODIS lociare CSF1PO FGA TH01 TPOX VWA D3S1358 D5S818D7S820 D8S1179 D13S317 D16S539 D18S51 and D21S11These loci are nationally and internationally recognized as thestandard for human identification DNA STR markers usedin this research were 15 CODIS loci with two additional lociie D19S433 andD2S1338 has additional loci for an extensiveand powerful STR testing battery if required [6 7]

A personrsquos DNA profile can match DNA profile datasimilarity to another person DNA profile plays an importantrole in solving problems related to the familyrsquos father andother family members [8 9] This method is a way that islegally used for solving to prove the validity of kinship orfamily ties of the person identifying unknown body of waror natural disaster victims and studying human population[10 11]

In previous research it has been noted that although MR Widyanto et al [12ndash14] are quite sufficient in setting withtriangular fuzzy number similarity of the size between thetwo alleles the statistical information on the ethnicity of thetwo profilesrsquo information is lost To overcome the problemthis research employs novel methods to measure similaritybetween tribes that gives a better result than previousmethodWe proposed shifted convoluted Gaussian fuzzy number(SCGFN) and Gaussian fuzzy number (GFN) to representeach locus value as improvement of triangular fuzzy number(TFN) as used in previous research

Research method was proposed in Section 2 Experi-mental results on three methods are shown in Section 3Analyses of statistical and comparison tests are summarizedin Section 4

2 Proposed Research Method

21 Fuzzy Sets Fuzzy sets are held as a basis for the theoryof possibility A fuzzy set A in x is formally defined as follows[15]

A = (x120583A (x)) | x 120576X (1)

where x is the universe of discourse and is the membershipdegree of the x in A When fuzzy set theory was presented

researches considered decision-making as one of the mostattractive application fields of that theory [16]

22 Measurement of Similarity Values of Two-Individual STR-DNA The calculation to find the STR-DNA similarity of twoindividuals (as shown by Figure 1) is the STR-DNA value ofallele 1 of the individual in comparison with the allele value 1STR-DNA of the reference and the STR-DNA value of allele 2of the individual with the STR-DNA value of allele 2 of eachreference locus Then we find the intersection point value ofthe two alleles for each locus and then calculate the averagevalue of the similarity of each locus

23 Two-Individual Matching Evidence versus Reference withTFN Similarity A triangular fuzzy number (TFN) 120572 canbe defined by a triplet (1198861 1198862 and 1198863) The triangularfuzzy number is used to represent uncertainty resulting fromimprecision of polymerase chain reaction (PCR) machineThe membership function 120583119886(119909) is [17]

120583119886 (119909) =

0 119909 lt 1198861 1198863 lt 119909119909 minus 11988611198862 minus 1198861 1198861 le 119909 le 1198862119909 minus 11988631198862 minus 1198863 1198862 le 119909 le 1198863

(2)

where 0le 1198861 le 1198862 le 1198863 le 1 1198861 and 1198863 stand for the lower andupper values of the support of 120572 respectively and 1198862 standsfor the modal values Value of every DNA loci is representedby fuzzy triangular number where the fuzziness value is setto be 04 through experiments and the center of the fuzzinessis the value of the corresponding loci The similarity valuebetween an allele of DNA profile evidence and DNA profilereference is given by

119905 = 1198863 minus 11988612 (1198863 minus 1198862) (3)

where the value of the first allele lt value of the second allelet is intersection of the two alleles

a2 is STR-DNA value of the first allele a3 is a2 + 02 anda1 is STR-DNA value of the second allele -02

If t is a result of a negative value calculation then t isconsidered zero because it means there is no intersection onboth STR values therefore t values stay at interval [0 1]

The following example shows the geometric calculation ofthe individual intersection points and the reference (as shownby Figure 2) with the D8s1179 locus where the allele valuesare individual STR-DNA = 13 and the allele value of one inreference STR-DNA = 133 and the allele value of two on theindividual STR-DNA = 14 and the value of the two alleles inthe reference STR-DNA = 141

The similarity between two DNA alleles is thus calculatedas the average of the similarity of the entire locus which inturn is arithmetic mean which is expressed as

119905119894 = sum119873119895=1 120583 (119909119894 119910119895)119873 (4)

Advances in Bioinformatics 3

start

Allele mark value of a locus

the first allele value of an individual

the first allele value of areference

the second allele value of an individual

the second allele value of

a reference

Calculate the intersection point of the similarity value forall loci

All loci are ready counted

e value of the similarity of the average of each locus

end

Figure 1 Calculation flow of similar two individuals

1

09

08

07

06

05

04

03

02

01

0128 13 132 134 136 138 14 142

a-PC>H=1LLH=1b-PC>H=1LLH=1a-PC>H=2LLH=2b-PC>H=2LLH=2

Figure 2 The fuzzy triangular number

where 119905119894 is the value of similarity between DNA profileindividual and DNA profile reference of the 119894th individual119909119894 is a vector of DNA profile individual and 119910119895 is a vectorof DNA profile reference The vectors 119909119894 and 119910119895 are the 119873(isin 119873)-dimensional vector consisting of the value of 15 lociwithout amelogenin as has been used by Federal Bureau ofInvestigation (FBI)

24 Two-Individual Matching Evidence versus Reference withGFN Similarity To improve the capability of locus match-ing we propose GFN (Gaussian fuzzy number) similarity

Compared to traditional triangular fuzzy number (TFN)GFN is more realistic to represent uncertainty of locus infor-mation because the distribution is assumed to be GaussianThe Gaussian fuzzy function transforms the original valuesinto a normal distribution The midpoint of the normaldistribution defines the ideal definition for the set assigned a1 with the remaining input values decreasing in membershipas theymove away from themidpoint in both the positive andnegative directionsThe input values decrease inmembershipfrom the midpoint until they reach a point where the valuesmove too far from the ideal definition and definitely not in

4 Advances in Bioinformatics

1

09

08

07

06

05

04

03

02

0

01

PC>H=1

LLH=1

PC>H=2

LLH=2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Figure 3 Fuzzy Gaussian similarity

the set and are therefore assigned zeros The fuzzy Gaussianfunction is given below [18]

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (5)

A Gaussian Membership function is specified by two param-eters a Gaussian membership function is determined com-plete by 120583 and 120590 120583 represents the membership ship center(the peak of the curve) and 120590 determines the membershipfunction width

Intersection from two Gaussian functions is as follows[19]

V =

1 119894119891 1205832 le 1205831exp[minus [(1205831 minus 12058321205901 + 1205902)]

2] 119894119891 1205832 lt 1205831 (6)

where 1205831 is value of STR-DNA from individual 1205832 is valueof STR-DNA from reference 1205901 is the sigma value of theindividual and 1205902 is the sigma value of the reference

The following example shows Gaussian of the individualintersection points and the reference (as shown by Figure 3)with the D8s1179 locus where the allele values are individualSTR-DNA = 13 and the allele value of one in reference STR-DNA = 133 and the allele value of two on the individual STR-DNA = 14 and the value of the two alleles in the referenceSTR-DNA = 141

25 Two-Individual Matching Evidence versus Reference withSCGFN Similarity To improve the capability of locus match-ing in which ethnic information is involved we proposeSCGFN (shifted convoluted Gaussian fuzzy number) similar-ityThe original Gaussian fuzzy number (GFN) is convolutedwith distribution of certain ethnic locus information There-fore the new SCGFN more represents ethnic informationcompared to original GFN

The convolution function is a multiplication of the indi-vidual fuzzy Gaussian locus function and the fuzzy Gaussianapproximation of the population locus The fuzzy Gaussianfunction of the population locus approximation is obtainedby extracting the mean value at which the mean value of thefuzzy Gaussian population locus is the STR-DNA value ofthe most population density and deviation of the particularpopulation Fuzzy Gaussian individual locus obtained wherethe mean is the STR-DNA value of an individual locus withthe standard deviation value is the mean value minus 2

The convolution is a mathematical operation on twofunctions (f and g) to produce a third function that istypically viewed as a modified version of one of the originalfunctions giving the integral of the pointwise multiplicationof the two functions as a function of the amount that oneof the original functions is translated [20] The function f isobtained from the individual and the function g is derivedfrom the reference

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (7)

and

119892 (119909) = 119886119890minus(119909minus120583119892)221205902119892 (8)

convolution operator is [21]

119875119891otimes119892 (119909) = 119865minus1 [119865 (119891 (119909)) 119865 (119892 (119909))]= 119886119890minus(119909minus(120583119891+120583119892))22(1205902119891+1205902119892) (9)

where a is the height of fuzzy = 1 For counting means

120583119891otimes119892 (119909) = 120583119891 + 120583119892 (10)

and standard deviation is as follows

120590119891otimes119892 (119909) = radic1205902119891 + 1205902119892 (11)

The convoluted fuzzy number will replace the fuzzy numberas individual locus representation value Therefore to get astronger tribal relationship individual similarity calculationvalue is involved then the convoluted Gaussian fuzzy numberis shifted approaching to the tribal population reference fuzzynumber The new shifted convoluted fuzzy number is calledshifted convoluted Gaussian fuzzy number (SCGFN) ThisSCGFN is a new representation value of individual locusObtaining the mean value of SCGFN is to compare the meanvalue of individual fuzzy Gaussian (120583119894) with the mean valueof fuzzy Gaussian approximation of the locus population(120583119894119901) And then the convoluted fuzzy number is shiftedapproaching to the fuzzy Gaussian approximation of locus

Advances in Bioinformatics 5

123n

STR-DNA

LocusIndividual

A

B

C

Tribes

finding the highest relative tribal values

Figure 4 Tribal inference system architecture design

population of certain tribeThe algorithm for shifting SCGFNis given below

if (120583119894119901 lt 120583119894)120583119904119888119892119891119899 = 120583119894 minus 002

10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816 elseif (120583119894119901 gt 120583119894)

120583119904119888119892119891119899 = 120583119894 + 002 10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816

else

120583119904119888119892119891119899 = 120583119894end

(12)

The standard deviation value of SCGFN is the sum of thestandard deviation value of the individual fuzzy Gaussiannumber (120590119894) with the standard deviation value of the fuzzyGaussian approximation of the population locus of certaintribe (120590119894119901) The following formula is for the SCGFN standarddeviation

120590119904119888119892119891119899 = 120590119894119901 + 120590119894 (13)

26 Measuring Tribal Relative Value from a DNA Profile Ingeneral the work flow of the tribal inference system is tofind the value of the tribal similarity done by calculating theaverage value of the point of intersection of the value of theindividual similarity to the value of the tribal population inthe database of 15 loci The tribe having the highest similarityvalue to the individual profile will be selected as the ethnicestimation of the profile Workflow process can be seen inFigure 4

Tribalmatching with triangular fuzzy number is obtainedfrom the intersection of fuzzy triangular individual andtriangular fuzzy population approximation From each tribalpopulation the mean intersection value of the individualfuzzy triangular and fuzzy triangular population approxima-tion for each locus are calculated and to determine the ethnicpopulation of the individual the greatest value of the meanvalue of each locus of a tribal population triangular fuzzyindividuals is obtained by using formula (3) Fuzzy triangular

Table 1 The results of the output on population A

Locus D3S1358Allele 1

STR-DNA Number of Individuals[13] [3][14] [10][15] [27][16] [31][17] [8][18] [1]

population approximation is also obtained by using formula(3) where a2 is the STR-DNA value of the largest populationof density a3 is a2 + 02 and a1 is a2 - 02

Example is shown in Table 1The output for population A is at locus D3S1358 and on

allele 1 The value of 13th 14th 15th 16th 17th and 18th STR-DNA are 3 10 27 31 8 and 1 individuals respectively as the16th STR-DNA shows the most number of individuals at 31it can be concluded that the value of a1 = 158 a2 = 16 and a3= 162

27 Tribal Matching with GFN Tribal matching with Gaus-sian Fuzzy Similarity was obtained from individual fuzzyGaussians with fuzzy Gaussian population approximationGaussian fuzzy individuals are obtained by using equation offormula (7) where 120583119891 is individual STR-DNA value and 120590119891 =1 Gaussian fuzzy population is obtained by using equation offormula (7) where 120583119891 means the distribution of the numberof individuals and 120590119891 is the standard deviation from thedistribution of the number of individuals Calculation ofstandard deviation is as follows

119904 = radic119899sum1199092119894 minus sum (119909119894)2119899 (119899 minus 1) (14)

where 119899 is number of STR-DNA values and 119909119894 is the numberof individuals of a locus population

28 Tribal Matching with SCGFN Shifted convoluted Gaus-sian fuzzy number (SCGFN) will replace the individual fuzzynumber SCGFN is a new individual locus with increasinglystrong ethnicity Tribal matching with SCGFN is calculatingthe value of similarity intersection between the correspond-ing SCGFN and populationrsquos fuzzy number as in tribalmatching in GFN Then look for the maximum value forall tribes The maximum tribal value means the individualrsquostribal value

3 Experimental Results

TheDNAprofile data to be entered into the database system isa PCR based DNA identification profile consisting of 15 lociexcluding amelogenin each consisting of two alleles for eachlocus The DNA profile used as an input to a tribal inferencesystem is an Indonesian DNA with a total of 240 DNA datacomprising Java (A) Malay (B) Mentawai (C) and Toraja

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 2: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

2 Advances in Bioinformatics

DNA regions with short repeat units (usually 2-6 basepairs in length) are called Short Tandem Repeats (STR)STRs are found surrounding the chromosomal centromereSTRs have proven to have several benefits that make themespecially suitable for human identification [4] STRs havebecome popular DNAmarkers because they are easily ampli-fied by polymerase chain reaction (PCR)without the problemof differential amplification that is the PCR products forSTRs are generally similar in amount making analysis easierAn individual inherits one copy of an STR from each parentwhich may or may not have similar repeat sizes The numberof repeats in STR markers can be highly variable amongindividuals which make these STRs effective for humanidentification purposes [5]

Beginning in 1996 the FBI Laboratory launched anationwide forensic science effort to establish core STRloci for inclusion within the national database known asCODIS (Combined DNA Index System) The 13 CODIS lociare CSF1PO FGA TH01 TPOX VWA D3S1358 D5S818D7S820 D8S1179 D13S317 D16S539 D18S51 and D21S11These loci are nationally and internationally recognized as thestandard for human identification DNA STR markers usedin this research were 15 CODIS loci with two additional lociie D19S433 andD2S1338 has additional loci for an extensiveand powerful STR testing battery if required [6 7]

A personrsquos DNA profile can match DNA profile datasimilarity to another person DNA profile plays an importantrole in solving problems related to the familyrsquos father andother family members [8 9] This method is a way that islegally used for solving to prove the validity of kinship orfamily ties of the person identifying unknown body of waror natural disaster victims and studying human population[10 11]

In previous research it has been noted that although MR Widyanto et al [12ndash14] are quite sufficient in setting withtriangular fuzzy number similarity of the size between thetwo alleles the statistical information on the ethnicity of thetwo profilesrsquo information is lost To overcome the problemthis research employs novel methods to measure similaritybetween tribes that gives a better result than previousmethodWe proposed shifted convoluted Gaussian fuzzy number(SCGFN) and Gaussian fuzzy number (GFN) to representeach locus value as improvement of triangular fuzzy number(TFN) as used in previous research

Research method was proposed in Section 2 Experi-mental results on three methods are shown in Section 3Analyses of statistical and comparison tests are summarizedin Section 4

2 Proposed Research Method

21 Fuzzy Sets Fuzzy sets are held as a basis for the theoryof possibility A fuzzy set A in x is formally defined as follows[15]

A = (x120583A (x)) | x 120576X (1)

where x is the universe of discourse and is the membershipdegree of the x in A When fuzzy set theory was presented

researches considered decision-making as one of the mostattractive application fields of that theory [16]

22 Measurement of Similarity Values of Two-Individual STR-DNA The calculation to find the STR-DNA similarity of twoindividuals (as shown by Figure 1) is the STR-DNA value ofallele 1 of the individual in comparison with the allele value 1STR-DNA of the reference and the STR-DNA value of allele 2of the individual with the STR-DNA value of allele 2 of eachreference locus Then we find the intersection point value ofthe two alleles for each locus and then calculate the averagevalue of the similarity of each locus

23 Two-Individual Matching Evidence versus Reference withTFN Similarity A triangular fuzzy number (TFN) 120572 canbe defined by a triplet (1198861 1198862 and 1198863) The triangularfuzzy number is used to represent uncertainty resulting fromimprecision of polymerase chain reaction (PCR) machineThe membership function 120583119886(119909) is [17]

120583119886 (119909) =

0 119909 lt 1198861 1198863 lt 119909119909 minus 11988611198862 minus 1198861 1198861 le 119909 le 1198862119909 minus 11988631198862 minus 1198863 1198862 le 119909 le 1198863

(2)

where 0le 1198861 le 1198862 le 1198863 le 1 1198861 and 1198863 stand for the lower andupper values of the support of 120572 respectively and 1198862 standsfor the modal values Value of every DNA loci is representedby fuzzy triangular number where the fuzziness value is setto be 04 through experiments and the center of the fuzzinessis the value of the corresponding loci The similarity valuebetween an allele of DNA profile evidence and DNA profilereference is given by

119905 = 1198863 minus 11988612 (1198863 minus 1198862) (3)

where the value of the first allele lt value of the second allelet is intersection of the two alleles

a2 is STR-DNA value of the first allele a3 is a2 + 02 anda1 is STR-DNA value of the second allele -02

If t is a result of a negative value calculation then t isconsidered zero because it means there is no intersection onboth STR values therefore t values stay at interval [0 1]

The following example shows the geometric calculation ofthe individual intersection points and the reference (as shownby Figure 2) with the D8s1179 locus where the allele valuesare individual STR-DNA = 13 and the allele value of one inreference STR-DNA = 133 and the allele value of two on theindividual STR-DNA = 14 and the value of the two alleles inthe reference STR-DNA = 141

The similarity between two DNA alleles is thus calculatedas the average of the similarity of the entire locus which inturn is arithmetic mean which is expressed as

119905119894 = sum119873119895=1 120583 (119909119894 119910119895)119873 (4)

Advances in Bioinformatics 3

start

Allele mark value of a locus

the first allele value of an individual

the first allele value of areference

the second allele value of an individual

the second allele value of

a reference

Calculate the intersection point of the similarity value forall loci

All loci are ready counted

e value of the similarity of the average of each locus

end

Figure 1 Calculation flow of similar two individuals

1

09

08

07

06

05

04

03

02

01

0128 13 132 134 136 138 14 142

a-PC>H=1LLH=1b-PC>H=1LLH=1a-PC>H=2LLH=2b-PC>H=2LLH=2

Figure 2 The fuzzy triangular number

where 119905119894 is the value of similarity between DNA profileindividual and DNA profile reference of the 119894th individual119909119894 is a vector of DNA profile individual and 119910119895 is a vectorof DNA profile reference The vectors 119909119894 and 119910119895 are the 119873(isin 119873)-dimensional vector consisting of the value of 15 lociwithout amelogenin as has been used by Federal Bureau ofInvestigation (FBI)

24 Two-Individual Matching Evidence versus Reference withGFN Similarity To improve the capability of locus match-ing we propose GFN (Gaussian fuzzy number) similarity

Compared to traditional triangular fuzzy number (TFN)GFN is more realistic to represent uncertainty of locus infor-mation because the distribution is assumed to be GaussianThe Gaussian fuzzy function transforms the original valuesinto a normal distribution The midpoint of the normaldistribution defines the ideal definition for the set assigned a1 with the remaining input values decreasing in membershipas theymove away from themidpoint in both the positive andnegative directionsThe input values decrease inmembershipfrom the midpoint until they reach a point where the valuesmove too far from the ideal definition and definitely not in

4 Advances in Bioinformatics

1

09

08

07

06

05

04

03

02

0

01

PC>H=1

LLH=1

PC>H=2

LLH=2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Figure 3 Fuzzy Gaussian similarity

the set and are therefore assigned zeros The fuzzy Gaussianfunction is given below [18]

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (5)

A Gaussian Membership function is specified by two param-eters a Gaussian membership function is determined com-plete by 120583 and 120590 120583 represents the membership ship center(the peak of the curve) and 120590 determines the membershipfunction width

Intersection from two Gaussian functions is as follows[19]

V =

1 119894119891 1205832 le 1205831exp[minus [(1205831 minus 12058321205901 + 1205902)]

2] 119894119891 1205832 lt 1205831 (6)

where 1205831 is value of STR-DNA from individual 1205832 is valueof STR-DNA from reference 1205901 is the sigma value of theindividual and 1205902 is the sigma value of the reference

The following example shows Gaussian of the individualintersection points and the reference (as shown by Figure 3)with the D8s1179 locus where the allele values are individualSTR-DNA = 13 and the allele value of one in reference STR-DNA = 133 and the allele value of two on the individual STR-DNA = 14 and the value of the two alleles in the referenceSTR-DNA = 141

25 Two-Individual Matching Evidence versus Reference withSCGFN Similarity To improve the capability of locus match-ing in which ethnic information is involved we proposeSCGFN (shifted convoluted Gaussian fuzzy number) similar-ityThe original Gaussian fuzzy number (GFN) is convolutedwith distribution of certain ethnic locus information There-fore the new SCGFN more represents ethnic informationcompared to original GFN

The convolution function is a multiplication of the indi-vidual fuzzy Gaussian locus function and the fuzzy Gaussianapproximation of the population locus The fuzzy Gaussianfunction of the population locus approximation is obtainedby extracting the mean value at which the mean value of thefuzzy Gaussian population locus is the STR-DNA value ofthe most population density and deviation of the particularpopulation Fuzzy Gaussian individual locus obtained wherethe mean is the STR-DNA value of an individual locus withthe standard deviation value is the mean value minus 2

The convolution is a mathematical operation on twofunctions (f and g) to produce a third function that istypically viewed as a modified version of one of the originalfunctions giving the integral of the pointwise multiplicationof the two functions as a function of the amount that oneof the original functions is translated [20] The function f isobtained from the individual and the function g is derivedfrom the reference

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (7)

and

119892 (119909) = 119886119890minus(119909minus120583119892)221205902119892 (8)

convolution operator is [21]

119875119891otimes119892 (119909) = 119865minus1 [119865 (119891 (119909)) 119865 (119892 (119909))]= 119886119890minus(119909minus(120583119891+120583119892))22(1205902119891+1205902119892) (9)

where a is the height of fuzzy = 1 For counting means

120583119891otimes119892 (119909) = 120583119891 + 120583119892 (10)

and standard deviation is as follows

120590119891otimes119892 (119909) = radic1205902119891 + 1205902119892 (11)

The convoluted fuzzy number will replace the fuzzy numberas individual locus representation value Therefore to get astronger tribal relationship individual similarity calculationvalue is involved then the convoluted Gaussian fuzzy numberis shifted approaching to the tribal population reference fuzzynumber The new shifted convoluted fuzzy number is calledshifted convoluted Gaussian fuzzy number (SCGFN) ThisSCGFN is a new representation value of individual locusObtaining the mean value of SCGFN is to compare the meanvalue of individual fuzzy Gaussian (120583119894) with the mean valueof fuzzy Gaussian approximation of the locus population(120583119894119901) And then the convoluted fuzzy number is shiftedapproaching to the fuzzy Gaussian approximation of locus

Advances in Bioinformatics 5

123n

STR-DNA

LocusIndividual

A

B

C

Tribes

finding the highest relative tribal values

Figure 4 Tribal inference system architecture design

population of certain tribeThe algorithm for shifting SCGFNis given below

if (120583119894119901 lt 120583119894)120583119904119888119892119891119899 = 120583119894 minus 002

10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816 elseif (120583119894119901 gt 120583119894)

120583119904119888119892119891119899 = 120583119894 + 002 10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816

else

120583119904119888119892119891119899 = 120583119894end

(12)

The standard deviation value of SCGFN is the sum of thestandard deviation value of the individual fuzzy Gaussiannumber (120590119894) with the standard deviation value of the fuzzyGaussian approximation of the population locus of certaintribe (120590119894119901) The following formula is for the SCGFN standarddeviation

120590119904119888119892119891119899 = 120590119894119901 + 120590119894 (13)

26 Measuring Tribal Relative Value from a DNA Profile Ingeneral the work flow of the tribal inference system is tofind the value of the tribal similarity done by calculating theaverage value of the point of intersection of the value of theindividual similarity to the value of the tribal population inthe database of 15 loci The tribe having the highest similarityvalue to the individual profile will be selected as the ethnicestimation of the profile Workflow process can be seen inFigure 4

Tribalmatching with triangular fuzzy number is obtainedfrom the intersection of fuzzy triangular individual andtriangular fuzzy population approximation From each tribalpopulation the mean intersection value of the individualfuzzy triangular and fuzzy triangular population approxima-tion for each locus are calculated and to determine the ethnicpopulation of the individual the greatest value of the meanvalue of each locus of a tribal population triangular fuzzyindividuals is obtained by using formula (3) Fuzzy triangular

Table 1 The results of the output on population A

Locus D3S1358Allele 1

STR-DNA Number of Individuals[13] [3][14] [10][15] [27][16] [31][17] [8][18] [1]

population approximation is also obtained by using formula(3) where a2 is the STR-DNA value of the largest populationof density a3 is a2 + 02 and a1 is a2 - 02

Example is shown in Table 1The output for population A is at locus D3S1358 and on

allele 1 The value of 13th 14th 15th 16th 17th and 18th STR-DNA are 3 10 27 31 8 and 1 individuals respectively as the16th STR-DNA shows the most number of individuals at 31it can be concluded that the value of a1 = 158 a2 = 16 and a3= 162

27 Tribal Matching with GFN Tribal matching with Gaus-sian Fuzzy Similarity was obtained from individual fuzzyGaussians with fuzzy Gaussian population approximationGaussian fuzzy individuals are obtained by using equation offormula (7) where 120583119891 is individual STR-DNA value and 120590119891 =1 Gaussian fuzzy population is obtained by using equation offormula (7) where 120583119891 means the distribution of the numberof individuals and 120590119891 is the standard deviation from thedistribution of the number of individuals Calculation ofstandard deviation is as follows

119904 = radic119899sum1199092119894 minus sum (119909119894)2119899 (119899 minus 1) (14)

where 119899 is number of STR-DNA values and 119909119894 is the numberof individuals of a locus population

28 Tribal Matching with SCGFN Shifted convoluted Gaus-sian fuzzy number (SCGFN) will replace the individual fuzzynumber SCGFN is a new individual locus with increasinglystrong ethnicity Tribal matching with SCGFN is calculatingthe value of similarity intersection between the correspond-ing SCGFN and populationrsquos fuzzy number as in tribalmatching in GFN Then look for the maximum value forall tribes The maximum tribal value means the individualrsquostribal value

3 Experimental Results

TheDNAprofile data to be entered into the database system isa PCR based DNA identification profile consisting of 15 lociexcluding amelogenin each consisting of two alleles for eachlocus The DNA profile used as an input to a tribal inferencesystem is an Indonesian DNA with a total of 240 DNA datacomprising Java (A) Malay (B) Mentawai (C) and Toraja

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 3: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 3

start

Allele mark value of a locus

the first allele value of an individual

the first allele value of areference

the second allele value of an individual

the second allele value of

a reference

Calculate the intersection point of the similarity value forall loci

All loci are ready counted

e value of the similarity of the average of each locus

end

Figure 1 Calculation flow of similar two individuals

1

09

08

07

06

05

04

03

02

01

0128 13 132 134 136 138 14 142

a-PC>H=1LLH=1b-PC>H=1LLH=1a-PC>H=2LLH=2b-PC>H=2LLH=2

Figure 2 The fuzzy triangular number

where 119905119894 is the value of similarity between DNA profileindividual and DNA profile reference of the 119894th individual119909119894 is a vector of DNA profile individual and 119910119895 is a vectorof DNA profile reference The vectors 119909119894 and 119910119895 are the 119873(isin 119873)-dimensional vector consisting of the value of 15 lociwithout amelogenin as has been used by Federal Bureau ofInvestigation (FBI)

24 Two-Individual Matching Evidence versus Reference withGFN Similarity To improve the capability of locus match-ing we propose GFN (Gaussian fuzzy number) similarity

Compared to traditional triangular fuzzy number (TFN)GFN is more realistic to represent uncertainty of locus infor-mation because the distribution is assumed to be GaussianThe Gaussian fuzzy function transforms the original valuesinto a normal distribution The midpoint of the normaldistribution defines the ideal definition for the set assigned a1 with the remaining input values decreasing in membershipas theymove away from themidpoint in both the positive andnegative directionsThe input values decrease inmembershipfrom the midpoint until they reach a point where the valuesmove too far from the ideal definition and definitely not in

4 Advances in Bioinformatics

1

09

08

07

06

05

04

03

02

0

01

PC>H=1

LLH=1

PC>H=2

LLH=2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Figure 3 Fuzzy Gaussian similarity

the set and are therefore assigned zeros The fuzzy Gaussianfunction is given below [18]

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (5)

A Gaussian Membership function is specified by two param-eters a Gaussian membership function is determined com-plete by 120583 and 120590 120583 represents the membership ship center(the peak of the curve) and 120590 determines the membershipfunction width

Intersection from two Gaussian functions is as follows[19]

V =

1 119894119891 1205832 le 1205831exp[minus [(1205831 minus 12058321205901 + 1205902)]

2] 119894119891 1205832 lt 1205831 (6)

where 1205831 is value of STR-DNA from individual 1205832 is valueof STR-DNA from reference 1205901 is the sigma value of theindividual and 1205902 is the sigma value of the reference

The following example shows Gaussian of the individualintersection points and the reference (as shown by Figure 3)with the D8s1179 locus where the allele values are individualSTR-DNA = 13 and the allele value of one in reference STR-DNA = 133 and the allele value of two on the individual STR-DNA = 14 and the value of the two alleles in the referenceSTR-DNA = 141

25 Two-Individual Matching Evidence versus Reference withSCGFN Similarity To improve the capability of locus match-ing in which ethnic information is involved we proposeSCGFN (shifted convoluted Gaussian fuzzy number) similar-ityThe original Gaussian fuzzy number (GFN) is convolutedwith distribution of certain ethnic locus information There-fore the new SCGFN more represents ethnic informationcompared to original GFN

The convolution function is a multiplication of the indi-vidual fuzzy Gaussian locus function and the fuzzy Gaussianapproximation of the population locus The fuzzy Gaussianfunction of the population locus approximation is obtainedby extracting the mean value at which the mean value of thefuzzy Gaussian population locus is the STR-DNA value ofthe most population density and deviation of the particularpopulation Fuzzy Gaussian individual locus obtained wherethe mean is the STR-DNA value of an individual locus withthe standard deviation value is the mean value minus 2

The convolution is a mathematical operation on twofunctions (f and g) to produce a third function that istypically viewed as a modified version of one of the originalfunctions giving the integral of the pointwise multiplicationof the two functions as a function of the amount that oneof the original functions is translated [20] The function f isobtained from the individual and the function g is derivedfrom the reference

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (7)

and

119892 (119909) = 119886119890minus(119909minus120583119892)221205902119892 (8)

convolution operator is [21]

119875119891otimes119892 (119909) = 119865minus1 [119865 (119891 (119909)) 119865 (119892 (119909))]= 119886119890minus(119909minus(120583119891+120583119892))22(1205902119891+1205902119892) (9)

where a is the height of fuzzy = 1 For counting means

120583119891otimes119892 (119909) = 120583119891 + 120583119892 (10)

and standard deviation is as follows

120590119891otimes119892 (119909) = radic1205902119891 + 1205902119892 (11)

The convoluted fuzzy number will replace the fuzzy numberas individual locus representation value Therefore to get astronger tribal relationship individual similarity calculationvalue is involved then the convoluted Gaussian fuzzy numberis shifted approaching to the tribal population reference fuzzynumber The new shifted convoluted fuzzy number is calledshifted convoluted Gaussian fuzzy number (SCGFN) ThisSCGFN is a new representation value of individual locusObtaining the mean value of SCGFN is to compare the meanvalue of individual fuzzy Gaussian (120583119894) with the mean valueof fuzzy Gaussian approximation of the locus population(120583119894119901) And then the convoluted fuzzy number is shiftedapproaching to the fuzzy Gaussian approximation of locus

Advances in Bioinformatics 5

123n

STR-DNA

LocusIndividual

A

B

C

Tribes

finding the highest relative tribal values

Figure 4 Tribal inference system architecture design

population of certain tribeThe algorithm for shifting SCGFNis given below

if (120583119894119901 lt 120583119894)120583119904119888119892119891119899 = 120583119894 minus 002

10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816 elseif (120583119894119901 gt 120583119894)

120583119904119888119892119891119899 = 120583119894 + 002 10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816

else

120583119904119888119892119891119899 = 120583119894end

(12)

The standard deviation value of SCGFN is the sum of thestandard deviation value of the individual fuzzy Gaussiannumber (120590119894) with the standard deviation value of the fuzzyGaussian approximation of the population locus of certaintribe (120590119894119901) The following formula is for the SCGFN standarddeviation

120590119904119888119892119891119899 = 120590119894119901 + 120590119894 (13)

26 Measuring Tribal Relative Value from a DNA Profile Ingeneral the work flow of the tribal inference system is tofind the value of the tribal similarity done by calculating theaverage value of the point of intersection of the value of theindividual similarity to the value of the tribal population inthe database of 15 loci The tribe having the highest similarityvalue to the individual profile will be selected as the ethnicestimation of the profile Workflow process can be seen inFigure 4

Tribalmatching with triangular fuzzy number is obtainedfrom the intersection of fuzzy triangular individual andtriangular fuzzy population approximation From each tribalpopulation the mean intersection value of the individualfuzzy triangular and fuzzy triangular population approxima-tion for each locus are calculated and to determine the ethnicpopulation of the individual the greatest value of the meanvalue of each locus of a tribal population triangular fuzzyindividuals is obtained by using formula (3) Fuzzy triangular

Table 1 The results of the output on population A

Locus D3S1358Allele 1

STR-DNA Number of Individuals[13] [3][14] [10][15] [27][16] [31][17] [8][18] [1]

population approximation is also obtained by using formula(3) where a2 is the STR-DNA value of the largest populationof density a3 is a2 + 02 and a1 is a2 - 02

Example is shown in Table 1The output for population A is at locus D3S1358 and on

allele 1 The value of 13th 14th 15th 16th 17th and 18th STR-DNA are 3 10 27 31 8 and 1 individuals respectively as the16th STR-DNA shows the most number of individuals at 31it can be concluded that the value of a1 = 158 a2 = 16 and a3= 162

27 Tribal Matching with GFN Tribal matching with Gaus-sian Fuzzy Similarity was obtained from individual fuzzyGaussians with fuzzy Gaussian population approximationGaussian fuzzy individuals are obtained by using equation offormula (7) where 120583119891 is individual STR-DNA value and 120590119891 =1 Gaussian fuzzy population is obtained by using equation offormula (7) where 120583119891 means the distribution of the numberof individuals and 120590119891 is the standard deviation from thedistribution of the number of individuals Calculation ofstandard deviation is as follows

119904 = radic119899sum1199092119894 minus sum (119909119894)2119899 (119899 minus 1) (14)

where 119899 is number of STR-DNA values and 119909119894 is the numberof individuals of a locus population

28 Tribal Matching with SCGFN Shifted convoluted Gaus-sian fuzzy number (SCGFN) will replace the individual fuzzynumber SCGFN is a new individual locus with increasinglystrong ethnicity Tribal matching with SCGFN is calculatingthe value of similarity intersection between the correspond-ing SCGFN and populationrsquos fuzzy number as in tribalmatching in GFN Then look for the maximum value forall tribes The maximum tribal value means the individualrsquostribal value

3 Experimental Results

TheDNAprofile data to be entered into the database system isa PCR based DNA identification profile consisting of 15 lociexcluding amelogenin each consisting of two alleles for eachlocus The DNA profile used as an input to a tribal inferencesystem is an Indonesian DNA with a total of 240 DNA datacomprising Java (A) Malay (B) Mentawai (C) and Toraja

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 4: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

4 Advances in Bioinformatics

1

09

08

07

06

05

04

03

02

0

01

PC>H=1

LLH=1

PC>H=2

LLH=2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Figure 3 Fuzzy Gaussian similarity

the set and are therefore assigned zeros The fuzzy Gaussianfunction is given below [18]

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (5)

A Gaussian Membership function is specified by two param-eters a Gaussian membership function is determined com-plete by 120583 and 120590 120583 represents the membership ship center(the peak of the curve) and 120590 determines the membershipfunction width

Intersection from two Gaussian functions is as follows[19]

V =

1 119894119891 1205832 le 1205831exp[minus [(1205831 minus 12058321205901 + 1205902)]

2] 119894119891 1205832 lt 1205831 (6)

where 1205831 is value of STR-DNA from individual 1205832 is valueof STR-DNA from reference 1205901 is the sigma value of theindividual and 1205902 is the sigma value of the reference

The following example shows Gaussian of the individualintersection points and the reference (as shown by Figure 3)with the D8s1179 locus where the allele values are individualSTR-DNA = 13 and the allele value of one in reference STR-DNA = 133 and the allele value of two on the individual STR-DNA = 14 and the value of the two alleles in the referenceSTR-DNA = 141

25 Two-Individual Matching Evidence versus Reference withSCGFN Similarity To improve the capability of locus match-ing in which ethnic information is involved we proposeSCGFN (shifted convoluted Gaussian fuzzy number) similar-ityThe original Gaussian fuzzy number (GFN) is convolutedwith distribution of certain ethnic locus information There-fore the new SCGFN more represents ethnic informationcompared to original GFN

The convolution function is a multiplication of the indi-vidual fuzzy Gaussian locus function and the fuzzy Gaussianapproximation of the population locus The fuzzy Gaussianfunction of the population locus approximation is obtainedby extracting the mean value at which the mean value of thefuzzy Gaussian population locus is the STR-DNA value ofthe most population density and deviation of the particularpopulation Fuzzy Gaussian individual locus obtained wherethe mean is the STR-DNA value of an individual locus withthe standard deviation value is the mean value minus 2

The convolution is a mathematical operation on twofunctions (f and g) to produce a third function that istypically viewed as a modified version of one of the originalfunctions giving the integral of the pointwise multiplicationof the two functions as a function of the amount that oneof the original functions is translated [20] The function f isobtained from the individual and the function g is derivedfrom the reference

119891 (119909) = 119886119890minus(119909minus120583119891)221205902119891 (7)

and

119892 (119909) = 119886119890minus(119909minus120583119892)221205902119892 (8)

convolution operator is [21]

119875119891otimes119892 (119909) = 119865minus1 [119865 (119891 (119909)) 119865 (119892 (119909))]= 119886119890minus(119909minus(120583119891+120583119892))22(1205902119891+1205902119892) (9)

where a is the height of fuzzy = 1 For counting means

120583119891otimes119892 (119909) = 120583119891 + 120583119892 (10)

and standard deviation is as follows

120590119891otimes119892 (119909) = radic1205902119891 + 1205902119892 (11)

The convoluted fuzzy number will replace the fuzzy numberas individual locus representation value Therefore to get astronger tribal relationship individual similarity calculationvalue is involved then the convoluted Gaussian fuzzy numberis shifted approaching to the tribal population reference fuzzynumber The new shifted convoluted fuzzy number is calledshifted convoluted Gaussian fuzzy number (SCGFN) ThisSCGFN is a new representation value of individual locusObtaining the mean value of SCGFN is to compare the meanvalue of individual fuzzy Gaussian (120583119894) with the mean valueof fuzzy Gaussian approximation of the locus population(120583119894119901) And then the convoluted fuzzy number is shiftedapproaching to the fuzzy Gaussian approximation of locus

Advances in Bioinformatics 5

123n

STR-DNA

LocusIndividual

A

B

C

Tribes

finding the highest relative tribal values

Figure 4 Tribal inference system architecture design

population of certain tribeThe algorithm for shifting SCGFNis given below

if (120583119894119901 lt 120583119894)120583119904119888119892119891119899 = 120583119894 minus 002

10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816 elseif (120583119894119901 gt 120583119894)

120583119904119888119892119891119899 = 120583119894 + 002 10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816

else

120583119904119888119892119891119899 = 120583119894end

(12)

The standard deviation value of SCGFN is the sum of thestandard deviation value of the individual fuzzy Gaussiannumber (120590119894) with the standard deviation value of the fuzzyGaussian approximation of the population locus of certaintribe (120590119894119901) The following formula is for the SCGFN standarddeviation

120590119904119888119892119891119899 = 120590119894119901 + 120590119894 (13)

26 Measuring Tribal Relative Value from a DNA Profile Ingeneral the work flow of the tribal inference system is tofind the value of the tribal similarity done by calculating theaverage value of the point of intersection of the value of theindividual similarity to the value of the tribal population inthe database of 15 loci The tribe having the highest similarityvalue to the individual profile will be selected as the ethnicestimation of the profile Workflow process can be seen inFigure 4

Tribalmatching with triangular fuzzy number is obtainedfrom the intersection of fuzzy triangular individual andtriangular fuzzy population approximation From each tribalpopulation the mean intersection value of the individualfuzzy triangular and fuzzy triangular population approxima-tion for each locus are calculated and to determine the ethnicpopulation of the individual the greatest value of the meanvalue of each locus of a tribal population triangular fuzzyindividuals is obtained by using formula (3) Fuzzy triangular

Table 1 The results of the output on population A

Locus D3S1358Allele 1

STR-DNA Number of Individuals[13] [3][14] [10][15] [27][16] [31][17] [8][18] [1]

population approximation is also obtained by using formula(3) where a2 is the STR-DNA value of the largest populationof density a3 is a2 + 02 and a1 is a2 - 02

Example is shown in Table 1The output for population A is at locus D3S1358 and on

allele 1 The value of 13th 14th 15th 16th 17th and 18th STR-DNA are 3 10 27 31 8 and 1 individuals respectively as the16th STR-DNA shows the most number of individuals at 31it can be concluded that the value of a1 = 158 a2 = 16 and a3= 162

27 Tribal Matching with GFN Tribal matching with Gaus-sian Fuzzy Similarity was obtained from individual fuzzyGaussians with fuzzy Gaussian population approximationGaussian fuzzy individuals are obtained by using equation offormula (7) where 120583119891 is individual STR-DNA value and 120590119891 =1 Gaussian fuzzy population is obtained by using equation offormula (7) where 120583119891 means the distribution of the numberof individuals and 120590119891 is the standard deviation from thedistribution of the number of individuals Calculation ofstandard deviation is as follows

119904 = radic119899sum1199092119894 minus sum (119909119894)2119899 (119899 minus 1) (14)

where 119899 is number of STR-DNA values and 119909119894 is the numberof individuals of a locus population

28 Tribal Matching with SCGFN Shifted convoluted Gaus-sian fuzzy number (SCGFN) will replace the individual fuzzynumber SCGFN is a new individual locus with increasinglystrong ethnicity Tribal matching with SCGFN is calculatingthe value of similarity intersection between the correspond-ing SCGFN and populationrsquos fuzzy number as in tribalmatching in GFN Then look for the maximum value forall tribes The maximum tribal value means the individualrsquostribal value

3 Experimental Results

TheDNAprofile data to be entered into the database system isa PCR based DNA identification profile consisting of 15 lociexcluding amelogenin each consisting of two alleles for eachlocus The DNA profile used as an input to a tribal inferencesystem is an Indonesian DNA with a total of 240 DNA datacomprising Java (A) Malay (B) Mentawai (C) and Toraja

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 5: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 5

123n

STR-DNA

LocusIndividual

A

B

C

Tribes

finding the highest relative tribal values

Figure 4 Tribal inference system architecture design

population of certain tribeThe algorithm for shifting SCGFNis given below

if (120583119894119901 lt 120583119894)120583119904119888119892119891119899 = 120583119894 minus 002

10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816 elseif (120583119894119901 gt 120583119894)

120583119904119888119892119891119899 = 120583119894 + 002 10038161003816100381610038161003816120583119894119901 minus 12058311989410038161003816100381610038161003816

else

120583119904119888119892119891119899 = 120583119894end

(12)

The standard deviation value of SCGFN is the sum of thestandard deviation value of the individual fuzzy Gaussiannumber (120590119894) with the standard deviation value of the fuzzyGaussian approximation of the population locus of certaintribe (120590119894119901) The following formula is for the SCGFN standarddeviation

120590119904119888119892119891119899 = 120590119894119901 + 120590119894 (13)

26 Measuring Tribal Relative Value from a DNA Profile Ingeneral the work flow of the tribal inference system is tofind the value of the tribal similarity done by calculating theaverage value of the point of intersection of the value of theindividual similarity to the value of the tribal population inthe database of 15 loci The tribe having the highest similarityvalue to the individual profile will be selected as the ethnicestimation of the profile Workflow process can be seen inFigure 4

Tribalmatching with triangular fuzzy number is obtainedfrom the intersection of fuzzy triangular individual andtriangular fuzzy population approximation From each tribalpopulation the mean intersection value of the individualfuzzy triangular and fuzzy triangular population approxima-tion for each locus are calculated and to determine the ethnicpopulation of the individual the greatest value of the meanvalue of each locus of a tribal population triangular fuzzyindividuals is obtained by using formula (3) Fuzzy triangular

Table 1 The results of the output on population A

Locus D3S1358Allele 1

STR-DNA Number of Individuals[13] [3][14] [10][15] [27][16] [31][17] [8][18] [1]

population approximation is also obtained by using formula(3) where a2 is the STR-DNA value of the largest populationof density a3 is a2 + 02 and a1 is a2 - 02

Example is shown in Table 1The output for population A is at locus D3S1358 and on

allele 1 The value of 13th 14th 15th 16th 17th and 18th STR-DNA are 3 10 27 31 8 and 1 individuals respectively as the16th STR-DNA shows the most number of individuals at 31it can be concluded that the value of a1 = 158 a2 = 16 and a3= 162

27 Tribal Matching with GFN Tribal matching with Gaus-sian Fuzzy Similarity was obtained from individual fuzzyGaussians with fuzzy Gaussian population approximationGaussian fuzzy individuals are obtained by using equation offormula (7) where 120583119891 is individual STR-DNA value and 120590119891 =1 Gaussian fuzzy population is obtained by using equation offormula (7) where 120583119891 means the distribution of the numberof individuals and 120590119891 is the standard deviation from thedistribution of the number of individuals Calculation ofstandard deviation is as follows

119904 = radic119899sum1199092119894 minus sum (119909119894)2119899 (119899 minus 1) (14)

where 119899 is number of STR-DNA values and 119909119894 is the numberof individuals of a locus population

28 Tribal Matching with SCGFN Shifted convoluted Gaus-sian fuzzy number (SCGFN) will replace the individual fuzzynumber SCGFN is a new individual locus with increasinglystrong ethnicity Tribal matching with SCGFN is calculatingthe value of similarity intersection between the correspond-ing SCGFN and populationrsquos fuzzy number as in tribalmatching in GFN Then look for the maximum value forall tribes The maximum tribal value means the individualrsquostribal value

3 Experimental Results

TheDNAprofile data to be entered into the database system isa PCR based DNA identification profile consisting of 15 lociexcluding amelogenin each consisting of two alleles for eachlocus The DNA profile used as an input to a tribal inferencesystem is an Indonesian DNA with a total of 240 DNA datacomprising Java (A) Malay (B) Mentawai (C) and Toraja

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 6: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

6 Advances in Bioinformatics

Table 2 Example results of individual similarity with family reference

No Individual1 Id Relationship Individual2 Id Relationship SCGFN TFN GFN1 0800103 mother 0800101 child 0938856 0525 07985652 08006002 mother 0800603 child 0822547 0383333 06843613 0800401 mother 0800403 child 0923285 0433333 07472064 0800903 mother 0800902 child 0931517 05 08620065 08002F mother 08002C child 0856002 0333333 06146846 08002M father 08002C child 0838193 0533333 07381937 0800901 father 0800902 child 0872816 0566667 07366538 0800102 father 0800101 child 0850089 0383333 07035899 0800402 father 0800403 child 0848397 0533333 078107110 08006001 father 0800603 child 0818216 0516667 0739065

(a) (b)

(c) (d)

Figure 5 Examples from the same person similarity calculation

(D) tribesThe experiments have been done using theMatlabR2016b The experiment was conducted with four cases

31 Same Person From 240 pieces of data experiments wereconducted with the same people with SCGFN GFN andTFN the similarity values are equal 1 Figure 5 is an exampleof whether the individual identity and reference entered arethe same person and the result of the similarity obtained is 1

(a) Input individual id MT021 and reference id numberMT021

(b) Input individual id TRJ12 and reference id numberTRJ12

(c) Input individual id JT11 and reference id number JT11(d) Input individual id MW135 and reference id number

32 People Who Have Family Relationships DNA profilestested in both biological parents are father and mother Fromthe experiment the average individual similarity with familyreference is obtained SCGFN 87 TFN 39 and GFN 74Table 2 shows ten instances of the result of the similarity ofan individual with a reference being a mother or father

33 People Who Belong to the Same Tribe From the exper-iment the average similarity of two individuals who havethe same tribe is obtained SCGFN 896 TFN 21 andGFN 6514 Table 3 shows ten instances of the result of thesimilarity of two individuals from the same tribe

34 Certain Tribal Populations Population data consists offour tribes where the number of people in tribe A is 80 thenumber of people in tribe B is 100 the number of peoplein tribe C is 20 and the number of people in tribe D is 40Figure 6 shows an example of the tribal population similaritycalculation of the individual identity JT19 From Figure 6 itcan be seen that SCGFN GFN and TFN displaying the tribeof JT11 are tribe A

Table 4 shows the result of similarity values with a certaintribal population with SCGFN TFN and GFN From 80experiments on the A tribe the average tribe population wasfound to be 79 with fuzzy convolution 45 with fuzzytriangular and 71 with fuzzy Gaussian The average B pop-ulation of 100 experiments were 81 with fuzzy convolution46 with fuzzy triangular and 76 with fuzzy GaussianTheaverage C population of 40 experiments were 80 with fuzzyconvolution 45 with fuzzy triangular and 73 with fuzzy

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 7: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 7

Table 3 Example results of two individuals in the same tribe

No Individu1 Id Tribe Individu 2 Id Tribe SCGFN TFN GFN1 MT097 C MT104 C 0944695 085 09115842 TRJ19 D TRJ22 D 0934938 06 08475233 JT11 A JT13 A 0903104 06 07461724 MW132 B MW135 B 0869267 0216667 06943635 JT11 A JT12 A 0856067 0333333 06690716 JT11 A JT17 A 0890797 0233333 06096167 MT021 A MT028 A 0924174 0333333 06898988 MT019 B MT036 B 0873376 0283333 06070319 MW128 B MW123 B 0886405 0366667 067186210 MT017 C MT018 C 0878135 0333333 066973

Table 4 The result of similarity values with a certain tribal population

Tribe SCGFN TFN GFNA 079 045 0714639375B 081 046 07615544C 08071442 044999925 073734225D 0846345667 064 0794088667

Figure 6 Examples of tribal population certain

Gaussian The average D population of 20 experiments were84 with fuzzy convolution 64 with fuzzy triangular and79 with fuzzy Gaussian

4 Analysis of Statistical and Comparison Tests

To perform analysis of the test results that have been donestatistical tests were used To know the difference of averagevalue of STR-DNA similarity value of the three methodsused in this study ANOVA and comparative test were usedInterpretation of ANOVA test is that if the test results showthat H0 failed to be rejected (no difference) then post hoctest is not done Conversely if the test results indicate H0 isrejected (there is a difference) then a post hoc advanced testshould be performed To give a clear explanationwhy SCGFNis better than GFN and TFN analysis of statistical andcomparison test with Tukey method in ANOVA is providedStatistical tests were performed using Minitab 16 Statisticaltests were performed for 3 cases

41 Individuals Who Have Family Relationships To find outthe different methods used in this method we used ANOVAand comparative tests In the one-way ANOVA test there isonly one independent variable for this case as independent

variables are individuals who have family relationships Thesummary of variance analysis in Algorithm 1 was obtainedP value le 0001 Associated with the level of significance(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods To determinewhichmethod is better than the othermethod it is further tested by the Tukey method

In Algorithm 2 it can be seen that(i) TFN lt SCGFN because it does not contain zero and

center negative(ii) GFN lt SCGFN because it does not contain zero and

center negative(iii) GFN gt TFN because it does not load zero and center

positiveThen it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 7 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

42 Individuals Who Belong to the Same Tribe The summaryof variance analysis in Algorithm 3 was obtained P value le

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 8: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

8 Advances in Bioinformatics

Source DF SS MS F P

Factor 2 090267 045133 9919 00005

Error 27 012286 000455

Total 29 102553

S = 006746 R-Sq = 8802 R-Sq(adj) = 8713

Algorithm 1 ANOVA individual test results of those who have family relationship

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper +---------+---------+---------+---------

TFN -049007 -041520 -034033 (-- --)

GFN -020433 -012945 -005458 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

TFN subtracted from

Lower Center Upper +---------+---------+---------+---------

GFN 021087 028575 036062 (-- --)

+---------+---------+---------+---------

-050 -025 000 025

Algorithm 2 Comparison test with Tukey method

GFNTFNSCGFN

1009080706050403

Dat

a

Boxplot of SCGFN TFN GFN

Figure 7 Boxplot people who have family relationships

0001 Associated with the level of significance (120572) = 005 orconfidence level 95 obtained p lt 120572 means H0 is rejectedso it can be concluded that there is a difference between thethreemethods To determine whichmethod is better than theother method it is further tested by the Tukey method

In Algorithm 4 it can be seen that

(i) TFN lt SCGFN because it does not contain zero andcenter negative

(ii) GFN lt SCGFN because it does not contain zero andcenter negative

(iii) GFN gt TFN because it does not load zero and centerpositive

GFNTFNSCGFN

100908070605040302

Dat

a

Boxplot of SCGFN TFN GFN

Figure 8 Boxplot Individuals who belong to the same tribeComparison test with Tukey method

Then it can be concluded that TFN lt GFN lt SCGFN Fromthe result of similarity test with three methods and boxplotobtained in Figure 8 it can be seen that SCGFNmethod usedin this research has higher similarity value in comparisonwith GFN and TFN method

43 Certain Tribal Populations Testing is donewith 720 datathat is 240 data with 3 methods The summary of varianceanalysis for A B C and D in Algorithm 5 was obtainedP value le 0001 Associated with the level of significance

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 9: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 9

Source DF SS MS F P

Factor 2 11783 05891 3424 00005

Error 27 04646 00172

Total 29 16429

S = 01312 R-Sq = 7172 R-Sq(adj) = 6963

Algorithm 3 Individual test results with the same tribe using ANOVA

Tukey 95 Simultaneous Confidence Intervals

All Pairwise Comparisons

Individual confidence level = 9804

SCGFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

TFN -06267 -04811 -03355 (---- ----)

GFN -03300 -01844 -00388 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

TFN subtracted from

Lower Center Upper -+---------+---------+---------+--------

GFN 01511 02967 04423 (---- ----)

-+---------+---------+---------+--------

-060 -030 000 030

Algorithm 4 Comparison test with Tukey method

(120572) = 005 obtained p lt 120572 means H0 is rejected so it canbe concluded that there is a difference between the threemethods

The experiment was conducted with 3 methods wherethe method of SCGFN is method 1 TFN method is method2 and GFN is method 3 To determine which method isbetter than other method then each tribe is tested furtherIn Algorithm 6 the comparison of three methods in ANOVAtest results in tribal population A can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 7 the comparison of three methods in thetribal population B can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

In Algorithm 8 the comparison of 3 methods in tribalpopulation C can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN lt SCGFN because it does not load zero andcenter negative

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt GFN lt SCGFN whichmeans SCGFN is better than GFN and GFN is better thanTFN

In Algorithm 9 the comparison of three methods in thetribal population D can be explained

(1) TFN lt SCGFN because it does not load zero andcenter negative

(2) GFN = SCGFN because it loads zero

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 10: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

10 Advances in Bioinformatics

Analysis of Variance for A using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76061 76061 38030 43856 00005

Error 251 21766 21766 00087

Total 253 97827

S = 00931218 R-Sq = 7775 R-Sq(adj) = 7757

Analysis of Variance for B using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 79691 79691 39845 35737 00005

Error 251 27986 27986 00111

Total 253 107677

S = 0105592 R-Sq = 7401 R-Sq(adj) = 7380

Analysis of Variance for C using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 87934 87934 43967 44401 00005

Error 251 24855 24855 00099

Total 253 112788

S = 00995100 R-Sq = 7796 R-Sq(adj) = 7779

Analysis of Variance for D using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Metode 2 76350 76350 38175 32852 00005

Error 251 29167 29167 00116

Total 253 105517

S = 0107798 R-Sq = 7236 R-Sq(adj) = 7214

Algorithm 5 ANOVA test results for a particular tribe

Tukey 950 Simultaneous Confidence Intervals

Response Variable A

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04276 -03940 -03605 ( -)

3 -00957 -00622 -00286 (- )

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 02984 03319 03653 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 6 ANOVA test results on tribal population A

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 11: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 11

Tukey 950 Simultaneous Confidence Intervals

Response Variable B

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper -------+---------+---------+---------

2 -04301 -03921 -03541 ( -)

3 -00738 -00358 00023 (- )

-------+---------+---------+---------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper -------+---------+---------+---------

3 03184 03563 03942 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 7 ANOVA test results on tribal population B

Tukey 950 Simultaneous Confidence Intervals

Response Variable C

All Pairwise Comparisons among Levels of Metode

Method = 1 subtracted from

Method Lower Center Upper --------+---------+---------+--------

2 -04482 -04123 -03765 (- )

3 -00745 -00386 -00028 ( -)

--------+---------+---------+--------

-025 000 025

Method = 2 subtracted from

Method Lower Center Upper --------+---------+---------+--------

3 03380 03737 04094 ( )

--------+---------+---------+--------

-025 000 025

Algorithm 8 ANOVA test results on tribal population C

(3) GFN gt TFN because it does not contain zero andpositive center

So it can be concluded that TFN lt (GFN = SCGFN) whichmeans that SCGFN and GFN methods are equal and betterthan TFN

5 Conclusions

In this research the experiments were conducted to find abetter method to obtain higher individual similarity valuesand to find stronger tribal properties To improve the capabil-ity of locus matching SCGFN and GFN have been proposedIt performed fuzzy number similarity of the size between thetwo alleles Experiments were done for the following cases

people with family relationships people of the same tribeand certain tribal populations In these three cases ANOVAshows the difference in similarity between SCGFN GFN andTFNwith a significant level of 95 In the case of people withfamily relationship and the case of people of the same tribewith Tukey method in ANOVA shows that SCGFN yieldsa higher similarity which means better than the GFN andTFN methods While in the case of certain tribal populationwith Tukey method in ANOVA shows in population A andpopulation C SCGFN better than GFN and TFN whereas inpopulation B and population D SCGFN is equal to GFN andbetter than TFNThe proposedmethod enables CODIS STR-DNA similarity calculation which is more robust to noiseand performed better CODIS similarity calculation involvingfamilial and tribal relationships

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 12: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

12 Advances in Bioinformatics

Tukey 950 Simultaneous Confidence Intervals

Response Variable D

All Pairwise Comparisons among Levels of Metode

Metode = 1 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

2 -04176 -03787 -03399 (- )

3 -00624 -00236 00152 ( -)

-------+---------+---------+---------

-025 000 025

Metode = 2 subtracted from

Metode Lower Center Upper -------+---------+---------+---------

3 03164 03551 03938 ( -)

-------+---------+---------+---------

-025 000 025

Algorithm 9 ANOVA test results on tribal population D

Data Availability

The STR-DNA data used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This paper is sponsored by the Indexed Scientific Publicationfor Student Doctorrsquos Final Project (Hibah Doktor) 2018 ofUniversitas Indonesia [0588SKRUI2018]References

[1] S K Sheppard D S Guttman and J R Fitzgerald ldquoPopulationgenomics of bacterial host adaptationrdquoNature ReviewsGenetics

[2] Emery and Rimoinrsquos Principles and Practice of Medical GeneticsElsevier 2013

[3] A Neil and Campbell Campbell Reece Pearson BenjaminCummings 8th edition 2008

[4] M S H Abu Halima L P Bernal and F A Sharif ldquoGeneticvariation of 15 autosomal short tandem repeat (STR) loci in thePalestinian population of Gaza Striprdquo LegalMedicine vol 11 no4 pp 203-204 2009

[5] M G Patel Jignal Shaikh and D Marjadi ldquoForensic Concep-tion DNA Typing of FTA Spotted Samplesrdquo Journal of AppliedBiology Biotechnology Vol vol 2 no 04 pp 021ndash029 2014

[6] S J Venables R Daniel S D Sarre et al ldquoAllele frequency datafor 15 autosomal STR loci in eight Indonesian subpopulationsrdquoForensic Science International Genetics vol 20 pp 45ndash52 2016

[7] Khaleda Parven Forensic use of DNA information human rightsprivacy and other challenges University of Wollongong ThesisCollections 7 Khaleda Parven Forensic use ofDNA informationhuman rights 2012

[8] B Derbyshire Sharing DNA Profiles and Finger Prints Across theEU requires further safeguards GeneWatch UK 2015

[9] A Holobinko ldquoTheoretical and Methodological Approaches toUnderstanding Human Migration Patterns and their Utility inForensic Human Identification Casesrdquo Societies vol 2 no 2 pp42ndash62 2012

[10] D Ricke A Shcherbina N Chiu et al ldquoSherlockrsquos ToolkitA forensic DNA analysis systemrdquo in Proceedings of the IEEEInternational SymposiumonTechnologies forHomeland SecurityHST 2015 USA April 2015

[11] A L Lowe A Urquhart L A Foreman and I W EvettldquoInferring ethnic origin by means of an STR profilerdquo ForensicScience International vol 119 no 1 pp 17ndash22 2001

[12] M Rahmat Widyanto ldquoVarious defuzzification methods onDNA similaritymatching suing fuzzy inference systemrdquo Journalof Advanced Computational Intelligence Intelligent Informaticsvol 14 no 3 2010

[13] R N Hartono M R Widyanto and N Soedarsono ldquoFuzzylogic system for DNA profile matching with embedded ethnicinferencerdquo in Proceedings of the 2010 2nd International Confer-ence onAdvances in Computing Control and TelecommunicationTechnologies ACT 2010 pp 69ndash73 Indonesia December 2010

[14] M R Widyanto R N Hartono and N Soedarsono ldquoA NovelHuman STR Similarity Method using Cascade Statistical FuzzyRules with Tribal Information Inferencerdquo International Journalof Electrical and Computer Engineering (IJECE) vol 6 no 6 p3103 2016

[15] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8pp 338ndash353 1965

[16] S Iwamoto K Tsurusaki and T Fujita ldquoConditional decision-making in fuzzy environmentrdquo Journal of the OperationsResearch Society of Japan vol 42 no 2 pp 198ndash218 1999

[17] Liyuan Zhang Xuanhua Xu and Li Tao ldquoSome Similarity Mea-sures for Triangular Fuzzy Number and Their Applications inMultiple Criteria Group Decision-Makingrdquo Journal of AppliedMathematics vol 2013 pp 1ndash7 2013

[18] K Kundu ldquoImage Denoising using Patch based Processing withFuzzy Gaussian Membership Functionrdquo International Journalof Computer Applications vol 118 no 12 pp 0975ndash8887 2015

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 13: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Advances in Bioinformatics 13

Department of Computer Science amp Engineering Govt Collegeof Engineering amp Textile Technology Serampore Hooghl

[19] H A Hefny H M Elsayed and H F Aly ldquoFuzzy multi-criteriadecision making model for different scenarios of electricalpower generation in Egyptrdquo Egyptian Informatics Journal vol14 no 2 pp 125ndash133 2013

[20] B Akshay B Akash and P Avril ldquoConvolution and applicationof convolutionrdquo International Journal of Innovative Research InTechnology vol 6 pp 2349ndash6002 2014

[21] P A Bromiley Products and Convolutions of Gaussian Probabil-ity Density Function vol 14 Imaging Sciences Research GroupInstitute of Population Health School of Medicine Universityof Manchester Manchester 2014

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 14: Gaussian Fuzzy Number for STR-DNA Similarity Calculation ...downloads.hindawi.com/journals/abi/2018/8602513.pdf · Gaussian Fuzzy Number for STR-DNA Similarity Calculation Involving

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom