Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8....

9
Original article Predictive capacity of a non-radioisotopic local lymph node assay using ow cytometry, LLNA:BrdUFCM: Comparison of a cutoff approach and inferential statistics Da-eun Kim a,1 , Hyeri Yang a,1 , Won-Hee Jang b , Kyoung-Mi Jung b , Miyoung Park b , Jin Kyu Choi b , Mi-Sook Jung c , Eun-Young Jeon c , Yong Heo d , Kyung-Wook Yeo d , Ji-Hoon Jo d , Jung Eun Park d , Soo Jung Sohn e , Tae Sung Kim e , Il Young Ahn e , Tae-Cheon Jeong f , Kyung-Min Lim a, , SeungJin Bae a, a College of Pharmacy, Ewha Womans University, Republic of Korea b Medical Beauty Research Division, Amorepacic Corp. R&D Center, Republic of Korea c Pharmacology Efcacy Team, Biotoxtech Co., Ltd., Republic of Korea d Department of Occupational Health, College of Natural Sciences, Catholic University of Daegu, Republic of Korea e National Institute of Food and Drug Safety Evaluation, Ministry of Food and Drug Safety, Republic of Korea f College of Pharmacy, Yeungnam University, Gyeongsan 38541, Republic of Korea abstract article info Article history: Received 13 August 2015 Received in revised form 20 October 2015 Accepted 2 December 2015 Available online 4 December 2015 In order for a novel test method to be applied for regulatory purposes, its reliability and relevance, i.e., reproducibility and predictive capacity, must be demonstrated. Here, we examine the predictive capacity of a novel non-radioisotopic local lymph node assay, LLNA:BrdUFCM (5-bromo-2-deoxyuridine-ow cytome- try), with a cutoff approach and inferential statistics as a prediction model. 22 reference substances in OECD TG429 were tested with a concurrent positive control, hexylcinnamaldehyde 25%(PC), and the stimulation index (SI) representing the fold increase in lymph node cells over the vehicle control was obtained. The optimal cutoff SI (2.7 cutoff b 3.5), with respect to predictive capacity, was obtained by a receiver operating character- istic curve, which produced 90.9% accuracy for the 22 substances. To address the inter-test variability in respon- siveness, SI values standardized with PC were employed to obtain the optimal percentage cutoff (42.6 cutoff b 57.3% of PC), which produced 86.4% accuracy. A test substance may be diagnosed as a sensitizer if a statistically signicant increase in SI is elicited. The parametric one-sided t-test and non-parametric Wilcoxon rank-sum test produced 77.3% accuracy. Similarly, a test substance could be dened as a sensitizer if the SI means of the vehicle control, and of the low, middle, and high concentrations were statistically signicantly different, which was test- ed using ANOVA or KruskalWallis, with post hoc analysis, Dunnett, or DSCF (DwassSteelCritchlowFligner), respectively, depending on the equal variance test, producing 81.8% accuracy. The absolute SI-based cutoff approach produced the best predictive capacity, however the discordant decisions between prediction models need to be examined further. © 2015 Elsevier Inc. All rights reserved. Keywords: Local lymph node assay LLNA:BrdUFCM Prediction model Skin sensitization Predictive capacity Descriptive and inferential statistics 1. Introduction A non-radioisotopic local lymph node assay, LLNA:BrdUFCM, employing 5-bromo-2-deoxyuridine (BrdU) and ow cytometry (Jung et al., 2010, 2012) instead of 3 H-thymidine and scintillation counting in the traditional radioisotopic LLNA (OECD, 2010a), was de- veloped as a replacement (renement of the 3Rs) to promote the phase-out of the conventional skin sensitization test using guinea pigs (OECD, 1992). LLNA:BrdUFCM may provide additional advantages over other non-radioisotopic LLNAs, including high sensitivity and capacity to accommodate multiple endpoints such as ex vivo cytokine release, cell subtyping, and surface marker expression. However, this method still needs to be validated as a generic version of LLNA for its reliability and relevance, i.e., reproducibility and predictive capacity. OECD TG429 performance standard (PS) delineates the standard of reliability and relevance to demonstrate the equality to traditional LLNA in order to save efforts and resources to develop novel analo- gous methods. Previously, we reported that inter- and intra-reproducibility of LLNA:BrdUFCM satises the standard provided by OECD TG429 PS (Yang et al., 2015), however, the predictive capacity is yet to be evaluat- ed. For the assessment of the predictive capacity, OECD TG429 PS Journal of Pharmacological and Toxicological Methods 78 (2016) 7684 Corresponding authors at: College of Pharmacy, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 120-808, Republic of Korea. E-mail addresses: [email protected] (K.-M. Lim), [email protected] (S. Bae). 1 These authors contributed equally to this work. http://dx.doi.org/10.1016/j.vascn.2015.12.001 1056-8719/© 2015 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect Journal of Pharmacological and Toxicological Methods journal homepage: www.elsevier.com/locate/jpharmtox

Transcript of Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8....

Page 1: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

Contents lists available at ScienceDirect

Journal of Pharmacological and Toxicological Methods

j ourna l homepage: www.e lsev ie r .com/ locate / jpharmtox

Original article

Predictive capacity of a non-radioisotopic local lymph node assay usingflow cytometry, LLNA:BrdU–FCM: Comparison of a cutoff approach andinferential statistics

Da-eun Kim a,1, Hyeri Yang a,1, Won-Hee Jang b, Kyoung-Mi Jung b, Miyoung Park b, Jin Kyu Choi b,Mi-Sook Jung c, Eun-Young Jeon c, Yong Heo d, Kyung-Wook Yeo d, Ji-Hoon Jo d, Jung Eun Park d, Soo Jung Sohn e,Tae Sung Kim e, Il Young Ahn e, Tae-Cheon Jeong f, Kyung-Min Lim a,⁎, SeungJin Bae a,⁎a College of Pharmacy, Ewha Womans University, Republic of Koreab Medical Beauty Research Division, Amorepacific Corp. R&D Center, Republic of Koreac Pharmacology Efficacy Team, Biotoxtech Co., Ltd., Republic of Koread Department of Occupational Health, College of Natural Sciences, Catholic University of Daegu, Republic of Koreae National Institute of Food and Drug Safety Evaluation, Ministry of Food and Drug Safety, Republic of Koreaf College of Pharmacy, Yeungnam University, Gyeongsan 38541, Republic of Korea

⁎ Corresponding authors at: College of Pharmacy, EEwhayeodae-gil, Seodaemun-gu, Seoul 120-808, Republic

E-mail addresses: [email protected] (K.-M. Lim), sjba1 These authors contributed equally to this work.

http://dx.doi.org/10.1016/j.vascn.2015.12.0011056-8719/© 2015 Elsevier Inc. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 13 August 2015Received in revised form 20 October 2015Accepted 2 December 2015Available online 4 December 2015

In order for a novel test method to be applied for regulatory purposes, its reliability and relevance,i.e., reproducibility and predictive capacity, must be demonstrated. Here, we examine the predictive capacityof a novel non-radioisotopic local lymph node assay, LLNA:BrdU–FCM (5-bromo-2′-deoxyuridine-flow cytome-try), with a cutoff approach and inferential statistics as a prediction model. 22 reference substances in OECDTG429 were tested with a concurrent positive control, hexylcinnamaldehyde 25%(PC), and the stimulationindex (SI) representing the fold increase in lymph node cells over the vehicle control was obtained. The optimalcutoff SI (2.7 ≤ cutoff b3.5), with respect to predictive capacity, was obtained by a receiver operating character-istic curve, which produced 90.9% accuracy for the 22 substances. To address the inter-test variability in respon-siveness, SI values standardized with PC were employed to obtain the optimal percentage cutoff (42.6 ≤ cutoffb57.3% of PC), which produced 86.4% accuracy. A test substance may be diagnosed as a sensitizer if a statisticallysignificant increase in SI is elicited. The parametric one-sided t-test and non-parametricWilcoxon rank-sum testproduced 77.3% accuracy. Similarly, a test substance could be defined as a sensitizer if the SImeans of the vehiclecontrol, and of the low, middle, and high concentrations were statistically significantly different, whichwas test-ed using ANOVA or Kruskal–Wallis, with post hoc analysis, Dunnett, or DSCF (Dwass–Steel–Critchlow–Fligner),respectively, depending on the equal variance test, producing 81.8% accuracy. The absolute SI-based cutoffapproach produced the best predictive capacity, however the discordant decisions between prediction modelsneed to be examined further.

© 2015 Elsevier Inc. All rights reserved.

Keywords:Local lymph node assayLLNA:BrdU–FCMPrediction modelSkin sensitizationPredictive capacityDescriptive and inferential statistics

1. Introduction

A non-radioisotopic local lymph node assay, LLNA:BrdU–FCM,employing 5-bromo-2′-deoxyuridine (BrdU) and flow cytometry(Jung et al., 2010, 2012) instead of 3H-thymidine and scintillationcounting in the traditional radioisotopic LLNA (OECD, 2010a), was de-veloped as a replacement (refinement of the 3Rs) to promote thephase-out of the conventional skin sensitization test using guinea pigs

wha Womans University, 52of [email protected] (S. Bae).

(OECD, 1992). LLNA:BrdU–FCM may provide additional advantagesover other non-radioisotopic LLNAs, including high sensitivity andcapacity to accommodate multiple endpoints such as ex vivo cytokinerelease, cell subtyping, and surface marker expression. However, thismethod still needs to be validated as a generic version of LLNA for itsreliability and relevance, i.e., reproducibility and predictive capacity.OECD TG429 performance standard (PS) delineates the standard ofreliability and relevance to demonstrate the equality to traditionalLLNA in order to save efforts and resources to develop novel analo-gous methods.

Previously, we reported that inter- and intra-reproducibility ofLLNA:BrdU–FCM satisfies the standard provided by OECD TG429 PS(Yang et al., 2015), however, the predictive capacity is yet to be evaluat-ed. For the assessment of the predictive capacity, OECD TG429 PS

Page 2: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

Table 1SI values for 22 reference substances obtained with LLNA:BrdU–FCM.

No. Substances CAS No. Vendor Vehicle Conc. (%) SI values (mean ± SD) MAXSI

NC L M H PC

1 5-Chloro-2-methyl-4-isothiazolin-3-one(CMI)/2-methyl-4-isothiazolin-3-one(MI)

26172-55-4/2682-20-4 Rohm andHass

DMF 2.5, 5, 10 1.00 ± 0.51 7.29 ± 2.39 19.72 ± 4.95 14.34 ± 3.55 10.77 ± 1.34 19.72

2 DNCB 97-00-7 Aldrich AOO 0.1, 0.25, 0.5 1.00 ± 0.35 4.40 ± 1.16 18.42 ± 4.75 25.28 ± 12.87 12.23 ± 3.94 25.283 4-Phenylenediamine 106-50-3 Sigma DMSO 0.5, 1, 2.5 1.00 ± 0.20 4.39 ± 1.32 6.39 ± 1.89 12.28 ± 5.89 8.54 ± 2.01 12.284 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, 1.0 1.00 ± 0.51 10.56 ± 2.80 19.51 ± 5.00 24.99 ± 2.70 10.77 ± 1.34 24.995 Isoeugenol 97-54-1 Aldrich AOO 5, 10, 25 1.00 ± 0.38 1.74 ± 0.86 3.51 ± 2.02 11.16 ± 3.41 8.06 ± 5.80 11.166 2-Mercaptobenzothiazole 149-30-4 Aldrich DMF 25, 50, 100 1.00 ± 0.58 1.43 ± 0.17 1.10 ± 0.52 0.83 ± 0.41 4.35 ± 1.95 1.437 Citral 5392-40-5 Aldrich AOO 10, 25, 50 1.00 ± 0.32 2.53 ± 1.21 6.46 ± 0.78 11.06 ± 1.66 5.73 ± 4.63 11.068 HCA 101-86-0 Aldrich AOO 5, 10, 25 1.00 ± 0.35 1.22 ± 0.41 3.08 ± 1.64 9.55 ± 4.76 12.23 ± 3.94 9.559 Eugenol 97-53-0 Fluka AOO 5, 10, 25 1.00 ± 0.70 2.44 ± 0.77 5.56 ± 2.32 21.13 ± 5.31 8.30 ± 3.80 21.1310 Phenyl benzoate 93-99-2 Aldrich DMF 5, 10, 25 1.00 ± 0.47 4.14 ± 1.90 12.50 ± 4.16 7.83 ± 2.77 21.07 ± 9.12 12.5011 Cinnamic alcohol 104-54-1 Aldrich AOO 25, 50, 100 1.00 ± 0.58 2.28 ± 0.85 4.66 ± 1.05 4.28 ± 1.98 15.83 ± 5.45 4.6612 Imidazolidinyl urea 39236-46-9 Aldrich DMF 10, 25, 50 1.00 ± 0.46 1.78 ± 0.60 2.99 ± 0.85 3.52 ± 1.09 6.12 ± 1.51 3.5213 Methyl methacrylate 80-62-6 Aldrich AOO 25, 50, 100 1.00 ± 0.58 1.63 ± 0.46 0.93 ± 0.23 0.86 ± 0.28 15.83 ± 5.45 1.6314 Chlorobenzene 108-90-7 Sigma-Aldrich AOO 10, 25, 50 1.00 ± 0.17 0.66 ± 0.28 1.27 ± 0.54 2.55 ± 0.84 26.59 ± 9.06 2.5515 Isopropanol 67-63-0 Sigma-Aldrich AOO 2.5, 5, 10 1.00 ± 0.17 1.20 ± 0.29 0.91 ± 0.72 1.30 ± 0.58 26.59 ± 9.06 1.3016 Lactic acid 50-21-5 Fluka DMF 5, 10, 25 1.00 ± 0.25 1.48 ± 0.71 1.46 ± 0.28 1.51 ± 0.41 7.12 ± 4.78 1.5117 Methyl salicylate 119-36-8 Sigma-Aldrich AOO 25, 50, 100 1.00 ± 0.21 1.91 ± 0.28 2.66 ± 0.67 1.91 ± 0.72 4.90 ± 1.01 2.6618 Salicylic acid 69-72-7 Sigma DMF 1, 2.5, 5 1.00 ± 0.70 1.86 ± 0.54 2.38 ± 1.39 1.81 ± 0.47 8.54 ± 2.01 2.3819 Sodium lauryl sulfate 151-21-3 Sigma AOO 5, 10, 25 1.00 ± 0.31 4.75 ± 2.29 6.47 ± 1.94 4.71 ± 0.83 3.68 ± 0.34 6.4720 Ethylene glycol dimethacrylate 97-90-5 Aldrich DMSO 25, 50, 100 1.00 ± 0.34 2.02 ± 0.64 2.75 ± 0.99 4.06 ± 1.34 6.15 ± 2.06 4.0621 Xylene 1330-20-7 Sigma-Aldrich AOO 25, 50, 100 1.00 ± 0.28 1.45 ± 0.26 2.44 ± 0.31 4.19 ± 0.82 4.13 ± 0.60 4.1922 Nickel chloride 7718-54-9 Aldrich DMSO 0.001, 0.0025,

0.0051.00 ± 0.15 1.11 ± 0.26 1.19 ± 0.27 1.19 ± 0.34 5.37 ± 1.19 1.19

SI, stimulation index; MAX, maximummean SI value among treated groups; NC, negative control; PC, positive control (hexylcinnamaldehyde, 25%); test substance treated with L, low; M, middle; and H, high concentrations. SI values were obtainedusing the LLNA:BrdU–FCM method.

77D.Kim

etal./JournalofPharmacologicaland

ToxicologicalMethods

78(2016)

76–84

Page 3: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

78 D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

advises the testing of 18 reference substances composed of 13 sensi-tizers and 5 non-sensitizers. The target predictive capacity is 100% sen-sitivity, 100% specificity, and 100% accuracy, which demands the correctclassification of all reference substances. Traditional LLNA (Gerberick,Ryan, Dearman, & Kimber, 2007; Kimber & Weisenberger, 1989) andits analogous methods (Idehara, Yamagishi, Yamashita, & Ito, 2008;Kojima et al., 2011) measure the proliferation of lymph node cells, thekey event No.4 of the adverse outcome pathway for skin sensitizationinitiated by the covalent binding of proteins (OECD, 2014). The stimula-tion index (SI), which represents the extent of proliferation of lymphnode cells (LNCs) (Kimber, Dearman, Scholes, & Basketter, 1994) isthe raw data produced by LLNA.

The criterion commonly used in LLNAs for the classification of asensitizer or a non-sensitizer, namely the prediction model, is theSI-threshold method (Basketter et al., 1999) in which a test articleis determined as a sensitizer when the SI exceeds a certain cutoff,otherwise it is deemed a non-sensitizer (Idehara et al., 2008;Kenneth, 1998; Scholes et al., 1992). The SI-threshold, or cutoff todecide the presence of a sensitizer, is obtained in such a way thatthe predictive capacity is maximized with the receiver operatingcharacteristic (ROC) curve (Hanley & McNeil, 1982). The ECt value,which is the estimated concentration of a test substance elicitingthreshold SI, varies with the threshold i.e., cutoff. Thus, the adjust-ment of cutoff level affects inter- and intra-lab reproducibility.Accordingly, the final cutoff is decided with consideration of bothpredictive capacity and reproducibility. The cutoff for traditional LLNAis 3.0, while the non-radioisotopic analogous methods, LLNA:DA and

Fig. 1. SI value of 22 substances (N= 4–5). (a) Sensitizers (1–8), (b) sensitizers (9–13) and falwith max SI values for 18 obligatory substances.

LLNA:BrdU–ELISA, employ 1.8 (OECD, 2010b) and 1.6 (OECD, 2010c),respectively. This reflects the fact that the optimum cutoff may bevaried by the sensitivity of the detection method for LNC proliferation,and therefore, a large reference data set is necessary for the determina-tion of the ideal cutoff value. Furthermore, the cutoff approach does notconsider the intrinsic variance of SI, and consequently, substances withborderline sensitization potency may be falsely classified.

As with other animal tests, intrinsic variation of animals often leadsto substantial inter-test variability in the overall responsiveness tosensitizers. Moreover, in contrast to traditional LLNA, LLNA:BrdU–FCMdoes not use entire pooled LNCs, but only a small aliquot, which mayadd to the variability of SI values. To reduce the inter-test variabilityor to take the variance of the data into consideration, SI valuesmay be standardized with those of a concurrent positive control,25% hexylcinnamaldehyde (HCA), or tested by inferential statisticalmethods that consider the variation (standard deviation) within asingle test, and examine whether the test substance elicited statisti-cally significant increases in LNC proliferation over the vehicle con-trol (Ehling et al., 2005b; Kojima et al., 2011; Omori et al., 2008;Omori & Sozu, 2007).

We conducted LLNA:BrdU–FCM in a coded fashion for 18 obligatoryreference substances and 4 optional substances enlisted in OECD TG429PS. With the obtained SI values, we examined its predictive capacity bycomparison with GHS classification, employing a conventional cutoffapproach and inferential statistics as a prediction model, namely thedecision criteria to diagnose a sensitizer, in an attempt to find the bestprediction model to maximize the use of LLNA:BrdU–FCM.

se positives (19–21), (c) non-sensitizers (14–18) and a false negative (22), (d) ROC curve

Page 4: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

79D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

2. Materials & methods

2.1. Chemicals and reagents

A total of 22 reference substances were selected from OECD TG429(OECD, 2010a) to include 13 sensitizers, 5 non-sensitizers, and 4 option-al substances consisting of 3 false positives and 1 false negative. Thedetails and tested concentrations of the 22 reference substances arepresented in Table 1. The experiments were carried out in a coded fash-ion, i.e., the identity of the test substancewasblinded. In thepreliminarytest, the vehicle was examined and selected among AOO, DMF, MEK, orDMSO (in the order of testing), such that the maximum solubility wasassured without precipitation. Test concentrations were free fromsystemic toxicity and/or excessive local skin irritation, which was con-firmed in the preliminary test with a concentration range startingfrom the maximum soluble concentration as advised in OECD TG429.According to OECD TG429, ‘excessive local skin irritation’ is defined as‘erythema score ≥3 and/or an increase in ear thickness of ≥25% on anyday of measurement’. 5-Bromo-2′-deoxyuridine (BrdU, St. Louis, MO,USA) was dissolved in phosphate-buffered saline (PBS) at a concentra-tion of 10 mg/mL.

2.2. Animal and experimental protocol

Both the animal care and study protocols employed were in accor-dance with the Institutional Animal Care and Use Committee (IACUC)of participating laboratories (AmorePacific Co., Biotoxtech Co., CatholicUniversity of Daegu and NIFDS). This study was performed accordingto the method of Jung et al. (2010, 2012) with minor modifications.Groups of Balb/C mice (N = 4 or 5) were treated with 25 μL test sub-stances dissolved in vehicle, or vehicle alone, on the back of both earsdaily for 3 consecutive days (days 1–3). On day 5, the mice were intra-peritoneally injected with BrdU and sacrificed after one day. Following

Table 2Decision based on provisional cutoff value 3.0 and optimal cutoff, 2.79.

No. SubstancesLLNA ref. (OECD TG)

+/– EC3

15–Chloro–2–methyl–4–isothiazolin–3–one (CMI)/2–methyl–4–isothiazolin–3–one (MI)

+ 0.009

2 DNCB + 0.0493 4–Phenylenediamine + 0.114 Cobalt chloride + 0.65 Isoeugenol + 1.56 2–Mercaptobenzothiazole + 1.77 Citral + 9.28 HCA + 9.79 Eugenol + 10.110 Phenyl benzoate + 13.611 Cinnamic alcohol + 2112 Imidazolidinyl urea + 2413 Methyl methacrylate + 9014 Chlorobenzene – NA15 Isopropanol – NA16 Lactic acid – NA17 Methyl salicylate – NA18 Salicylic acid – NA19 Sodium lauryl sulfate + 8.120 Ethylene glycol dimethacrylate + 2821 Xylene + 95.822 Nickel chloride – NA

Test substance treated with L, low; M, middle; and H, high concentrations. Decision, + stands fEC2.7 was obtained assuming y-intercept is 1.0.

the sacrifice employing CO2 asphyxiation, auricular lymph nodes wereisolated, weighed, and underwent lymphocyte preparation. After bilat-eral auricular lymph nodes were pooled on an individual basis, lymphnode cells (LNCs) were prepared by disaggregation through a 70 μmmesh (BD Biosciences, Franklin Lakes, NJ) in 1 mL PBS. The LNCs werecounted using a hemocytometer after staining with trypan blue. TheLNCs (2 × 106 cells/mL) were washed once by centrifugation (300 ×g)for 5 min with PBS, and resuspended for the fixation and perme-abilization step according to the instruction manual of the BrdU Flowkit (BD Pharmingen™, Franklin Lakes, NJ). LNCs were subsequentlypermeabilized using Cytoperm plus buffer containing 10% DMSO. AfterDNA denaturation by incubation with DNase for 1 h, the LNCs werewashed, and incubated with an FITC-conjugated anti-BrdU antibodyfor 20 min at RT in the dark at a dilution of 1:50. Cells were washedonce more and then resuspended in 20 mL 7-AAD solution to label theDNA. 10,000 7-AAD-expressing cells were gated, and the number ofcells expressing BrdU was analyzed using a BD FACS Calibur™ system.The stimulation index (SI) value was obtained by calculating the ratioof the mean number of LNCs with incorporated BrdU from mice ineach of the test substance dose groups to those in the vehicle controlgroup. 22 substances were divided, and evaluated by participating lab-oratories. SI data obtained in compliance with the finalized protocolwere analyzed.

2.3. Statistical analysis

2.3.1. Optimal cutoff values using absolute SI valuesThe receiver operating characteristic (ROC) curve, which illustrates

sensitivity against false positive rate, has been frequently used to obtainoptimal cutoff values in diagnostic tests (Hanley & McNeil, 1982). 18obligatory chemicals were used to plot a ROC curve. Of those SI valueswith diverse concentrations (low/middle/high), the ones with max-imum (highest) values were used to obtain the ROC curve, since the

LLNA:BrdU–FCM

Group Max SI Cutoff 3.0 Cutoff 2.7 EC2.7

M 19.72 + + *0.919

H 25.28 + + 0.015

H 12.28 + + 0.068

H 24.99 + + *0.063

H 11.16 + + 8.222

L 1.43 – – NA

H 11.06 + + 9.396

H 9.55 + + 8.772

H 21.13 + + 6.019

M 12.50 + + *4.238

M 4.66 + + 10.895

H 3.52 + + 26.820

L 1.63 – – NA

H 2.55 – – NA

H 1.30 – – NA

H 1.51 – – NA

M 2.66 – – NA

M 2.38 – – NA

M 6.47 + + *7.668

H 4.06 + + 49.298

H 4.19 + + 58.499

H 1.19 – – NA

or the positive and− stands for the negative decision (shaded). * indicates the case when

Page 5: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

Fig. 2. Standardized SI values of 22 substances (N = 4–5). (a) Sensitizers (1–8), (b) sensitizers (9–13) and false positives (19–21), (c) non-sensitizers (14–18) and a falsenegative (22), (d) ROC curve with max % SI for 18 obligatory substances.

Table 3Decision based on cut-off for standardized SI value based on PC, HCA 25%.

No. SubstancesLLNA ref.

(OECD TG)LLNA:BrdU–FCM

(2.7)

(Max SI – NC SI)/(PC SI – NC SI) × 100

Group Max % SI Cutoff 42.6

15–Chloro–2–methyl–4–isothiazolin–3–one

(CMI)/2–methyl–4–isothiazolin–3–one (MI)+ + M 191.56 +

2 DNCB + + H 216.14 +3 4–Phenylenediamine + + H 149.64 +4 Cobalt chloride + + H 245.48 +5 Isoeugenol + + H 143.82 +6 2–Mercaptobenzothiazole + – L 12.75 –7 Citral + + H 212.65 +8 HCA + + H 76.09 +9 Eugenol + + H 275.59 +10 Phenyl benzoate + + M 57.31 +11 Cinnamic alcohol + + M 24.66 –12 Imidazolidinyl urea + + H 49.24 +13 Methyl methacrylate + – L 4.26 –14 Chlorobenzene – – H 6.04 –15 Isopropanol – – H 1.18 –16 Lactic acid – – H 8.30 –17 Methyl salicylate – – M 42.58 –18 Salicylic acid – – M 18.30 –19 Sodium lauryl sulfate + + M 204.32 +20 Ethylene glycol dimethacrylate + + H 59.48 +21 Xylene + + H 101.95 +22 Nickel chloride – – H 4.38 –

SI, stimulation index; NC, negative control; PC, positive control; MAX % SI, maximummean SI % among groups treatedwith test substance , + stands for the positive and− stands for thenegative decision (shaded).

80 D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

Page 6: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

81D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

test substance is defined as positive (a sensitizer) if the maximum SIvalue is greater than the cutoff value, and as negative (a non-sensitizer) otherwise (OECD, 2010a).

2.3.2. Optimal cutoff values using standardized SI valuesTo reduce the inter-test variation, SI values were standardized with

that of the corresponding concurrent positive control, as can befrequently seen elsewhere (Bennett & Briggs, 2011). The maximum SIvalue of each chemical was standardized as follows;

(SI value of the chemical − SI value of the negative control) ∗ 100 /(SI value of the positive control − SI value of the negative control).

The standardized SI values were then used to obtain the optimalcutoff values based on the ROC curve.

2.3.3. Diagnosis based on inferential statistics: one-sided t-test/Wilcoxonrank sum test

In the case of the cutoff approach, SI values obtained from 4 to 5animals were averaged and employed as a criterion to classify whetherthe test substance is a sensitizer. Therefore, the variance in SI values(that was substantial) was not taken into account for the decision.To overcome this limitation, inferential statistics were additionallyemployed. Since the t-test considers the standard deviation (variation)aswell as themean of the values, it suggestswhether the results are sta-tistically significantly different (Pagano & Gauvreau, 2000). We testedwhether themeanmaximumSI valuewas statistically significantly larg-er than that of the vehicle control, and if the p-value was less than 0.05(alpha), thus the corresponding chemical was diagnosed as beingpositive (a sensitizer); and if not, then it was deemed negative (anon-sensitizer). Since the sample size was small (N = 4 or 5 perchemical), a non-parametric approach (Wilcoxon rank sum test)was also considered (Wilcoxon, 1945). When the results from theparametric (t-test) and non-parametric approach diverged, theparametric approach was considered when normality assumptions

Table 4Summary of one-sided t-test or Wilcoxon rank sum test.

No. SubstancesLLNA ref.

(OECD TG)

LLN:BrdU–

(2.7

15–Chloro–2–methyl–4–isothiazolin–3–one

(CMI)/2–methyl–4–isothiazolin–3–one (MI)+ +

2 DNCB + +3 4–Phenylenediamine + +4 Cobalt chloride + +5 Isoeugenol + +6 2–Mercaptobenzothiazole + –7 Citral + +8 HCA + +9 Eugenol + +10 Phenyl benzoate + +11 Cinnamic alcohol + +12 Imidazolidinyl urea + +13 Methyl methacrylate + –14 Chlorobenzene – –15 Isopropanol – –16 Lactic acid – –17 Methyl salicylate – –18 Salicylic acid – –19 Sodium lauryl sulfate + +20 Ethylene glycol dimethacrylate + +21 Xylene + +22 Nickel chloride – –

L, low;M,middle;H, high, Decision,+ stands for thepositive and− stands for the negative decisfailed by Kolmogorov–Smirnov test.

(affirmed with the Kolmogorov–Smirnov test (Shapiro, Wilk, &Chen, 1968)) held; if they failed, then a non-parametric approachwas considered (Hothorn, 2014; Na, Yang, Bae, & Lim, 2014;Shapiro et al., 1968).

2.3.4. Diagnosis based on inferential statistics: one-way Analysis ofVariance (ANOVA) or Kruskal–Wallis test

The OECD guidelines consider the maximum mean SI value amongthe tested dose groups, yet this approach ignores the variance of SIswithin a group. To adjust for the group variance, we employed ANOVAand testedwhether themeans of the negative control, and the low,mid-dle, and high concentrations were statistically significantly different atthe 5% level; if the p-value was larger than 0.05, it was diagnosed as anon-sensitizer, and if not, then we conducted Tukey's post-hoc analysisto identify whether the significance originated from the differencecompared with the negative control, and if that was the case, then thechemical was defined as a sensitizer (Douglas & Michael, 1991). Onthe other hand, if the ANOVA suggested statistical significance, yet thepost-hoc analysis suggested that the significance was not between thenegative control and the test material, then it was defined as negative.

The homoscedasticity assumption was tested by Levene's test(Brown & Forsythe, 1974); if it failed, then the non-parametric KruskalWallis test result was primarily considered (Brown & Forsythe, 1974). Ifit failed (p b 0.05), then the non-parametric Dwass–Steel–Critchlow–Fligner method (DSCF) was applied (Critchlow & Fligner, 1991). DSCFwas used to make all possible pairwise comparisons between groups.All statistical analyses were conducted using SAS version 9.3 (SAS Inc.,Cary, NC, USA).

3. Results

The stimulation index (SI) values for 22 reference substances (18obligatory and 4 optional) at 3 different doses, the range of which wasdetermined from preliminary tests to exclude severely irritatingdoses (data not shown), were obtained by employing LLNA:BrdU–

AFCM )

Comparison of the group with MAX mean SI vs NC

Group T–Test Wilcoxon Decision

M 0.0000 0.004* +

H 0.0015 0.004 +H 0.0014 0.004 +H 0.0000 0.004 +H 0.0001 0.004 +L 0.0757 0.075 –H 0.0000 0.004 +H 0.0020 0.004 +H 0.0000 0.004 +M 0.0001 0.004 +M 0.0001 0.004 +H 0.0007 0.004 +L 0.0455 0.048 +H 0.0019 0.004 +H 0.1486 0.274 –H 0.0226 0.048 +M 0.0004 0.004 +M 0.0455 0.056 +M 0.0001 0.004 +H 0.0006 0.004 +H 0.0000 0.004 +H 0.1405 0.226 –

ion (shaded). One-sided t-test;Wilcoxon,Wilcoxon rank sum test. * normality assumption

Page 7: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

82 D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

FCM. As shown in Table 1, the majority of sensitizers other thanmethylmethacrylate and 2-mercaptobenzothiazole displayed dose-dependent increases inmean SIs, which exceeded the provisional cutoffvalue, 3.0. In the case of non-sensitizers, the minimal level of increaseslower than 3.0were observed in SIswhere dose-dependencywas rarelynoticeable. Incidentally, SI values of the concurrent positive control, 25%HCA (PC), showed a highly variable range (3.68–26.59), raising concernfor intrinsically varied responsiveness of LNC proliferation to sensitizersfrom experiment to experiment, asmeasured by BrdU–FCM (Fig. 1a–c).

To determine the optimal cutoff SI value for the classification of sen-sitizers, ROCwas establishedwithmaximummean SIs among 3 doses of18 obligatory test substances (as shown in Fig. 1d). As a result, a cutoffvalue of 2.7 to 3.5 (2.7 ≤ cutoff b3.5) exhibited the best performancein all aspects of sensitivity, specificity, and accuracy. Incidentally, thisrange included the provisional cutoff, 3.0, and a sensitivity of 87.5%, aspecificity of 100%, and an accuracy of 90.9% were achieved for all 22substances (Table 2).

Table 3 shows the SI values standardized with those of the PC(Fig. 2a–c), which has been widely used as a calibrant for LLNA(Dearman et al., 2001). With the percentage SI data, a ROC was con-structed to obtain the optimum cutoff percentage SI value (Fig. 2d). Asa result, a different cutoff value was suggested with respect to thesensitivity, specificity, and accuracy, as depicted in Table 3. A lowerpercentage SI was preferable for sensitivity, while a higher SI rangewas superior for specificity. However, the overall performance, whichwas 86.4% accuracy for the 22 substances, was not better than that ofthe cutoff approach based on the absolute SI.

The results of inferential statistics are shown in Tables 4 and 5. Asshown in Table 4, a one-sided t-test and aWilcoxon test yielded compa-rable results, with the exception of salicylic acid, where the t-test classi-fied it as a sensitizer, whereas the Wilcoxon test classified otherwise.However, the normality assumption did not fail, indicating that thedecision using a t-test could bemade. Based upon these tests, the sensi-tivity was improved (87.5% → 93.8%), but the specificity and accuracywere worse than those of the original cutoff method based on absoluteSI. Table 5 shows the results of a one-way ANOVA (parametric) orKruskal Wallis (non-parametric) test, which examined the statisticallysignificant difference between the negative control and the treatmentgroups of the test material. The decisions by ANOVA and Kruskal Walliswere in agreement, and based on this approach, the sensitivity wasstatus quo (87.5%), nevertheless, the specificity (66.7%) and accuracy(81.8%) were worse than with the SI cutoff method.

4. Discussion

Here, we conducted LLNA:BrdU–FCM for the 22 reference sub-stances and investigated the conventional cutoff approach and inferen-tial statistics for the prediction model of LLNA:BrdU–FCM, comparingthe predictive capacity (Table 6). For the cutoff approach, through theestablishment of a ROC curve, the optimal cutoff SI was determined tobe 2.7 ≤ cutoff b3.5, which produced 90.9% accuracy for the 22substances. SI values standardized based on that of 25% HCA (PC), toaccount for the inter-test variability in responsiveness, were analyzedto obtain an optimal cutoff SI percentage of the PC (42.6 ≤ cutoffb57.3% of PC), producing 86.4% accuracy. Parametric and non-parametric tests determined the statistically significant increase in SIvalues of the treated groups with the maximum mean SI, producing77.3% accuracy. Similarly, ANOVA or the Kruskal–Wallis with post hocanalysis, Dunnett, or DSCF, respectively, to test the statistically signifi-cant difference from the vehicle control, yielded 81.8% accuracy for the22 substances. Overall, the predictive capacity of LLNA:BrdU–FCM wasthe best using the cutoff approach based on the absolute SI.

LLNA and its analogous methods have been developed to identifyskin sensitizers based on the proliferation of lymph node cells (LNCs),the key event No.4 of the adverse outcome pathway for skin sensitiza-tion (OECD, 2014). Accordingly, it would be ideal to classify a test

substance as a sensitizer when it elicits biologically relevant and statis-tically significant increases in LNC proliferation (Ehling et al., 2005a;Hothorn & Hasler, 2008; Hothorn & Vohr, 2010). For the first criterion,i.e., biologically relevant increases in LNC proliferation, the cutoff ap-proach may be appropriate. A sensitizer would elicit LNC proliferation,and if it exceeded a certain level, namely cutoff, the current approachand diagnosis based on the absolute SI value seem to fit nicely withthe idea of being a sensitizer. However, the extent of the biologicallyrelevant increase in LNC proliferation is controversial. Ehling et al.suggested that a 1.5-fold increase in the LNC count is relevant for the de-termination of a sensitizer (Ehling et al., 2005a), which produced 95%accuracy vs. the concurrent LLNA assay (Basketter et al., 2012). Thymi-dine uptake, ATP content, and BrdU incorporation employed in LLNAor analogous methods represent the proliferation of LNCs, however,the sensitivities of these readouts differ substantially. The cutoff valuefor LLNA is 3.0, while LLNA:DA and LLNA:BrdU–ELISA employ 1.8(OECD, 2010b) and 1.6 (OECD, 2010c) respectively, whereas 2.7 wasdetermined as an optimal cutoff for LLNA:BrdU–FCM. Incidentally,these cutoff values were obtained empirically, based on a ROC curvewith SI data from reference substances (Basketter et al., 1999). ROCanalysis of traditional LLNA vs. guinea pig or human data suggested3.6 or 3.4 respectively, rather than the current 3.0. Therefore, closeexamination and comparison of LLNA and the analogous methods maybe enlightening with respect to gaining insight into the biologicallyrelevant increases in LNC proliferation and the sensitivity of therespective readouts.

A cutoff value of 2.7 was determined as optimal based on the ROCcurve analysis, which maximized accuracy. The ROC curve helps to elu-cidate an optimal cutoff for the best sensitivity, specificity, or accuracy.Previously, we have examined the reproducibility of LLNA:BrdU–FCMin view of the ECt of positive controls, HCA, and DNCB, as stated inOECD TG429 PS (OECD, 2010a). A threshold value of 2.5, 2.6, and 2.7successfully accommodated the ECt values of HCAandDNCB into the ac-ceptable range for LLNA:BrdU–FCM, as suggested by the PS of OECDTG429 (Yang et al., 2015). Collectively, 2.7 satisfies both the best re-producibility and best accuracy. However, since the cutoff approachconsiders only the averaged SI values and ignores the intrinsic varianceof multiple values in a group, the second criterion, i.e., statisticallysignificant increases in LNC proliferation, was not satisfied in some ofthe substances deemed to be sensitizers based on the cutoff approach.

Inferential statistics examines whether the difference between twoor more groups occurs by chance, by comparing the test statistics withthe pre-determined level of significance. Specifically, the inferential sta-tistics suggest whether the difference between the vehicle control(i.e., negative control) and the test material is statistically significant;considering that the SI value of a sensitizer should be significantly largerthan that of the vehicle control, this approach is consistentwith the def-inition of a sensitizer, especially the one-sided t-test/Wilcoxon ranksum test. However, our analysis illustrates that this approach is actuallyinferior to the conventional cutoff approach, in respect to accuracy(86.4% vs. 90.9%). More specifically, the conventional cutoff approachperforms better in terms of specificity (100% vs. 33.3%), yet, the sensitiv-ity of a one-sided t-test/Wilcoxon rank sum test (93.8%) is superior tothat of the conventional cutoff approach (87.5%). It is not surprisingthat the cutoff approach has better specificity than the inferential statis-tics, since the inferential statistics consider the difference between thegroups in the context of variance; in other words, if the SI values ofthe test materials or the vehicle control have a small variance, whichwas the case for weak sensitizers or non-sensitizers, then the inter-group differences are more likely to be statistically significant, regard-less of the mean values. Thus, the inferential statistics approach hadmore false positive fractions compared with the conventional approach(chemical # 14, 17. Table 5). Since specificity and false positive fractionsadd up to 1, higher false positive fractions result in lower specificity,which is shown in our analyses. Nevertheless, the one-sided t-test/Wilcoxon approach has higher sensitivity than the cutoff approach;

Page 8: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

Table 5Summary of ANOVA or Kruskal–Wallis with post hoc analysis.

No. SubstancesLLNA ref.

(OECD TG)LLNA:BrdU–

FCM (2.7)Levene's

TestANOVA Dunnett

Kruskal–Wallis

DSCF Decision

15–Chloro–2–methyl–4–isothiazolin–3–

one(CMI)/2–methyl–4–isothiazolin–3– one (MI)

+ + 0.2880 <.0001 L,M,H 0.0007 L, M ,H +

2 DNCB + + 0.1146 0.0001 M,H 0.0018 L, M, H +3 4–Phenylenediamine + + 0.0541 0.0003 M,H 0.0014 L, M, H +4 Cobalt chloride + + 0.1320 <.0001 L,M,H 0.0007 L, M, H +5 Isoeugenol + + 0.1334 <.0001 H 0.0014 M, H +6 2–Mercaptobenzothiazole + – 0.4394 0.2274 – 0.1405 – –7 Citral + + 0.0329 <.0001 M,H 0.0005 L, M, H +8 HCA + + 0.0183 0.0002 H 0.0024 H +9 Eugenol + + 0.0704 <.0001 H 0.0006 M, H +

10 Phenyl benzoate + + 0.1236 <.0001 M,H 0.0011 L, M, H +11 Cinnamic alcohol + + 0.0472 0.0007 M,H 0.0045 M +12 Imidazolidinyl urea + + 0.4496 0.0004 M,H 0.0031 M, H +13 Methyl methacrylate + – 0.4117 0.0327 – 0.0533 – –14 Chlorobenzene – – 0.0025 0.0002 H 0.0056 H +15 Isopropanol – – 0.2595 0.5818 – 0.4515 – –16 Lactic acid – – 0.3085 0.2635 – 0.1438 – –17 Methyl salicylate – – 0.1666 0.0014 M 0.0083 L, M +18 Salicylic acid – – 0.1272 0.1185 – 0.1316 – –19 Sodium lauryl sulfate + + 0.0877 0.0004 L,M,H 0.0054 L, M, H +20 Ethylene glycol dimethacrylate + + 0.1679 0.0006 H 0.0043 H +21 Xylene + + 0.0013 <.0001 M,H 0.0007 M, H +22 Nickel chloride – – 0.2660 0.6319 – 0.5968 – –

L, low; M, middle; H, high, DSCF stands for Dwass, Steel, Critchlow–Fligner Method; Decision, + stands for the positive and − stands for the negative decision (shaded).

83D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

chemical # 13, whichwas deemed a non-sensitizer based on its SI value(1.5), was significantly different from the SI value of the vehicle control,thus was classified as a sensitizer based on the one-sided t-test/Wilcoxon approach. Although the small sample size deters further

Table 6Overall summary of prediction models for LLNA:BrdU–FCM.

No. Substances LLNA ref.LLNA:BrdU–FCM

3.0 2.7

15–Chloro–2–methyl–4–isothiazolin–3–one (CMI)/2–methyl–4–isothiazolin–

3–one (MI)+ + +

2 DNCB + + +3 4–Phenylenediamine + + +4 Cobalt chloride + + +5 Isoeugenol + + +6 2–Mercaptobenzothiazole + – –7 Citral + + +8 HCA + + +9 Eugenol + + +

10 Phenyl benzoate + + +11 Cinnamic alcohol + + +12 Imidazolidinyl urea + + +13 Methyl methacrylate + – –14 Chlorobenzene – – –15 Isopropanol – – –16 Lactic acid – – –17 Methyl salicylate – – –18 Salicylic acid – – –19 Sodium lauryl sulfate + + +20 Ethylene glycol dimethacrylate + + +21 Xylene + + +22 Nickel chloride – – –

Sensitivity 87.5%(14/16) 87.5%(14/16Specificity 100%(6/6) 100%(6/6)Accuracy 90.9%(20/22) 90.9%(20/22

Shaded, decision different from LLNA ref.; t-test, one-sided t-test; Wilcoxon, Wilcoxon rank su

generalization, it is an interesting point to note, suggesting that whenspecificity or accuracy counts more than sensitivity, the current cutoffapproach could be relied upon more, yet, when the sensitivity mattersmore, then inferential statistics should be used.

(Max SI – NC SI)/(PC SI – NC SI) × 100

T–test/Wilcoxon (NC vs MAX SI)

ANOVA/Kruskal–Wallis

Max SI % 42.6 Group Decision Decision

191.56 + Middle + +

216.14 + High + +149.64 + High + +245.48 + High + +143.82 + High + +12.75 – Low – –

212.65 + High + +76.09 + High + +

275.59 + High + +57.31 + Middle + +24.66 – Middle + +49.24 + High + +4.26 – Low + –6.04 – High + +1.18 – High – –8.30 – High + –

42.58 – Middle + +18.30 – Middle + –

204.32 + Middle + +59.48 + High + +

101.95 + High + +4.38 – high – –

) 81.3%(13/16) 93.8%(15/16) 87.5%(14/16)100%(6/6) 33.3%(2 /6) 66.7%(4/6)

) 86.4%(19/22) 77.3%(17 /22) 81.8%(18/22)

m test; SI, stimulation index; NC, negative control; PC, positive control.

Page 9: Journal of Pharmacological and Toxicological Methods. Kim et al., J. Pharmacol... · 2016. 8. 23. · 4 Cobalt chloride 7646-79-9 Sigma-Aldrich DMF 0.25, 0.5, ... advises the testing

84 D. Kim et al. / Journal of Pharmacological and Toxicological Methods 78 (2016) 76–84

Depending on the concurrence and discordance of the classificationsbased on the cutoff approach and inferential statistics, substances fallinto three categories; substances determined as sensitizers by bothcriteria (14/16), those determined as non-sensitizers by both criteria(4/6), and those with a discordant decision. Substances with a concur-rent decision satisfy the definition of a sensitizer that we suggestedabove, namely, a substance that elicits a biologically relevant and statis-tically significant increase in SI. In the same context, a discrepant deci-sion indicates that the substance could not satisfy both criteria, fromwhich we can infer that the decision must be made more cautiously.Discrepant decisions between the cutoff approach and inferential statis-tics were found for 5 substances (methylmethacrylate, chlorobenzene,lactic acid, methylsalicylate, and salicylic acid; chemical #13, 14, 16,17, and 18). The extent of the increase in SI in a low dose group of thesensitizer, methylmethacrylate (1.63± 0.46), was lower than the cutoff2.7, however, it was statistically different from the vehicle controlwith amarginal p value, ~0.04. Chlorobenzene, a non-sensitizer, elicited amarked increase in SI up to 2.55 ± 0.84, although it did not exceedthe cutoff, 2.7. A concentration-dependent increase in SI was clearlynoted for chlorobenzene, suggesting that its sensitizing potential mustbe examined in more detail. Lactic acid and salicylic acid, non-sensitizers, exhibited SI values lower than the cutoff 2.7 (1.51 and2.38), however, they barely reached statistical significance (p b 0.05).Methylsalicylate increased the SI up to 2.66 ± 0.67 at the middle dose,which was barely lower than the cutoff but achieved statistical signifi-cance (p b 0.05). Nevertheless, dose-dependence could not be found,suggesting that their sensitizing potential needs to be revisited to assesspossible factors causing false positive results.

The 5 discordant substances described above and 2-mercaptobenzo-thiazole (that all the prediction models unanimously deemed as a non-sensitizer) have also been pointed out as problematic in a previousstudy (Basketter et al., 2012), since LLNA and LNCC produced different re-sults from OECD PS, indicating that they may be borderline sensitizers ornon-sensitizers in LLNA and its analogous methods. Indeed, Basketteret al. raised concerns over the inappropriate selection of reference sub-stances enlisted in OECD PS, due to the ambiguous points concerningthe sensitization potential of the substances described above. It was pro-posed that rather than using a cutoff approach, the ECt value, which con-siders dose-linearity, may bemore relevant, a point withwhich our studywas in agreement.

In conclusion, we compared the cutoff approach and inferential sta-tistics as prediction models for a non-radioisotopic local lymph nodeassay using flow cytometry, LLNA:BrdU–FCM, which demonstratesthat each criteria has respective merits in the classification of sensi-tizers. Ideally, substances thatmeet both criteriamay be true sensitizers.Although LLNA:BrdU–FCM failed to identify all 22 reference substancesas stated in OECD TG429 PS, it successfully classified 18 of the 22, basedon the cutoff approach, that have no confounding issues, suggesting thatits predictive capacity is comparable to those of traditional LLNA orother analogous methods. Further tests with additional reference sub-stances (hopefully those with reproducible conclusions for LLNA andits analogs) will fully demonstrate its use and limitations.

Acknowledgments

This research was supported by a grant (13172MFDS987) from theMinistry of Food and Drug Safety of Korea.

References

Basketter, D. A., Lea, L. J., Cooper, K., Stocks, J., Dickens, A., Pate, I., ... Kimber, I. (1999).Threshold for classification as a skin sensitizer in the local lymph node assay: A sta-tistical evaluation. Food and Chemical Toxicology, 37, 1167–1174.

Basketter, D., Kolle, S. N., Schrage, A., Honarvar, N., Gamer, A. O., van Ravenzwaay, B., &Landsiedel, R. (2012). Experience with local lymph node assay performance

standards using standard radioactivity and nonradioactive cell count measurements.Journal of Applied Toxicology, 32, 590–596.

Bennett, J. O., & Briggs, W. L. (2011). Using and understanding mathematics: A quantitativereasoning approach. Pearson.

Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal ofthe American Statistical Association, 69, 364–367.

Critchlow, E. D., & Fligner, A. M. (1991). On distribution-free multiple comparisons in theone-way analysis of variance. Commununications in Statistics-Theory and Methods, 20,127–139.

Dearman, R. J., Wright, Z. M., Basketter, D. A., Ryan, C. A., Gerberick, G. F., & Kimber, I.(2001). The suitability of hexyl cinnamic aldehyde as a calibrant for the murinelocal lymph node assay. Contact Dermatitis, 44, 357–361.

Douglas, C. E., & Michael, F. A. (1991). On distribution-free multiple comparisons in theone-way analysis of variance. Commununications in Statistics-Theory and Methods,20, 127–139.

Ehling, G., Hecht, M., Heusener, A., Huesler, J., Gamer, A. O., van Loveren, H., ... Vohr, H. W.(2005a). An European inter-laboratory validation of alternative endpoints of themurine local lymph node assay: 2nd round. Toxicology, 212, 69–79.

Ehling, G., Hecht, M., Heusener, A., Huesler, J., Gamer, A. O., van Loveren, H., ... Vohr, H. W.(2005b). An European inter-laboratory validation of alternative endpoints of themurine local lymph node assay: First round. Toxicology, 212, 60–68.

Gerberick, G. F., Ryan, C. A., Dearman, R. J., & Kimber, I. (2007). Local lymph node assay(LLNA) for detection of sensitization capacity of chemicals. Methods, 41, 54–60.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiveroperating characteristic (ROC) curve. Radiology, 143, 29–36.

Hothorn, L. A. (2014). Statistical evaluation of toxicological bioassays — A review.Toxicology Research, 3, 418–432.

Hothorn, L. A., & Hasler, M. (2008). Proof of hazard and proof of safety in toxicologicalstudies using simultaneous confidence intervals for differences and ratios to control.Journal of Biopharmaceutical Statistics, 18, 915–933.

Hothorn, L. A., & Vohr, H. -W. (2010). Statistical evaluation of the Local Lymph NodeAssay. Regulatory Toxicology and Pharmacology, 56, 352–356.

Idehara, K., Yamagishi, G., Yamashita, K., & Ito, M. (2008). Characterization and evaluationof a modified local lymph node assay using ATP content as a non-radio isotopic end-point. Journal of Pharmacological and Toxicological Methods, 58, 1–10.

Jung, K. M., Bae, I. H., Kim, B. H., Kim, W. K., Chung, J. H., Park, Y. H., & Lim, K. M. (2010).Comparison of flow cytometry and immunohistochemistry in non-radioisotopic mu-rine lymph node assay using bromodeoxyuridine. Toxicology Letters, 192, 229–237.

Jung, K. -M., Jang, W. -H., Lee, Y. -K., Yum, Y. N., Sohn, S., Kim, B. -H., ... Lim, K. -M. (2012). Bcell increases and ex vivo IL-2 production as secondary endpoints for the detection ofsensitizers in non-radioisotopic local lymph node assay using flow cytometry.Toxicology Letters, 209, 255–263.

Kenneth, I. K. J.H. R. J. D. G. F. G. C.A. R. D. A. B. L. L. R. V.H. G. S. L. S. E. L. (1998). Assessment ofthe skin sensitization potential of topical medicaments using the local lymph nodeassay: An interlaboratory evaluation. Journal of Toxicology and Environmental Health.Part A, 53, 563–579.

Kimber, I., & Weisenberger, C. (1989). A murine local lymph node assay for the identifica-tion of contact allergens. Archives of Toxicology, 63, 274–282.

Kimber, I., Dearman, R. J., Scholes, E. W., & Basketter, D. A. (1994). The local lymph nodeassay: Developments and applications. Toxicology, 93, 13–31.

Kojima, H., Takeyoshi, M., Sozu, T., Awogi, T., Arima, K., Idehara, K., ... Omori, T. (2011).Inter-laboratory validation of the modified murine local lymph node assay basedon 5-bromo-2′-deoxyuridine incorporation. Journal of Applied Toxicology, 31, 63–74.

Na, J., Yang, H., Bae, S., & Lim, K. M. (2014). Analysis of statistical methods currently usedin toxicology journals. Toxicological Research, 30, 185–192.

OECD (1992). In OECD (Ed.), Test no. 406: Skin sensitisation. Paris: OECD Publishing.OECD (2010a). In OECD (Ed.), Test no. 429: Skin sensitisation: Local lymph node assay.

Paris: OECD Publishing.OECD (2010b). In OECD (Ed.), Test no. 442A: Skin sensitization: Local lymph node assay: DA.

Paris: OECD Publishing.OECD (2010c). In OECD (Ed.), Test no. 442B: Skin sensitization: Local lymph node assay:

BrdU-ELISA. Paris: OECD Publishing.OECD (2014). The adverse outcome pathway for skin sensitisation initiated by covalent bind-

ing to proteins. OECD Publishing.Omori, T., & Sozu, T. (2007). Variance of the stimulation index for the local lymph node

assay. AATEX, 12, 212–217.Omori, T., Idehara, K., Kojima, H., Sozu, T., Arima, K., Goto, H., ... Kanazawa, Y. (2008).

Interlaboratory validation of the modified murine local lymph node assay based onadenosine triphosphate measurement. Journal of Pharmacological and ToxicologicalMethods, 58, 11–26.

Pagano, M., & Gauvreau, K. (2000). Principles of biostatistics. Duxbury Thompson Learning.Scholes, E., Basketter, D., Sarll, A., Kimber, I., Evans, C., Miller, K., ... Waite, S. (1992). The

local lymph node assay: Results of a final inter-laboratory validation under field con-ditions. Journal of Applied Toxicology, 12, 217–222.

Shapiro, S. S., Wilk, M. B., & Chen, H. J. (1968). A comparative study of various tests fornormality. Journal of the American Statistical Association, 63, 1343–1372.

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1,80–83.

Yang, H., Na, J., Jang, W. H., Jung, M. S., Jeon, J. Y., Heo, Y., ... Bae, S. (2015). Appraisal ofwithin- and between-laboratory reproducibility of non-radioisotopic local lymphnode assay using flow cytometry, LLNA:BrdU–FCM: Comparison of OECD TG429 per-formance standard and statistical evaluation. Toxicology Letters, 234, 172–179.