Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

9
Solubility Modeling with a Nonrandom Two-Liquid Segment Activity Coefficient Model Chau-Chyun Chen* and Yuhua Song Aspen Technology, Inc., Ten Canal Park, Cambridge, Massachusetts 02141 A segment contribution activity coefficient model, derived from the polymer nonrandom two- liquid model, is proposed for fast, qualitative estimation of the solubilities of organic nonelec- trolytes in common solvents. Conceptually, the approach suggests that one account for the liquid nonideality of mixtures of complex pharmaceutical molecules and small solvent molecules in terms of interactions between three pairwise interacting conceptual segments: hydrophobic segment, hydrophilic segment, and polar segment. In practice, these conceptual segments become the molecular descriptors used to represent the molecular surface characteristics of each solute and solvent molecule. The treatment results in component-specific molecular parameters: hydrophobicity X, polarity Y, and hydrophilicity Z. Once the molecular parameters are identified from experimental data for common solvents and solute molecules, the model offers a simple and practical thermodynamic framework to estimate solubilities and to perform other phase equilibrium calculations in support of pharmaceutical process design. Introduction Solvent selection is a critical task in the chemical synthesis and recipe development phase of the phar- maceutical and agricultural chemical industries. 1-3 The choice of solvents directly impacts the reaction rates, extraction efficiency, crystallization yield, etc. Proper solvent selection results in faster product separation and purification, reduced solvent emission and lesser waste, higher yield, lower overall cost, and better production processes. Solubility is a key property of concern in solvent selection because pharmaceutical product isolation is often done through crystallization at reduced temper- ature and/or with the addition of antisolvent. Solubility data involving new drug molecules and their precursors in the solvents rarely exist, if any. Although limited solubility experiments are taken for a few solvents as part of the process development practice, the experi- mental task can multiply rapidly when one considers the choices of solvents and solvent-antisolvent mix- tures, the effect of temperature, the impacts of impuri- ties, the possibilities of multiple polymorphs, etc. As a result, solvent selection is largely dictated by research- ers’ preferences or prior experiences. Existing solubility estimation techniques are best represented by the Hansen model, 4 the UNIFAC group contribution model, 5 and perhaps the Abraham solva- tion model. 6 From the three, Hansen and UNIFAC are activity coefficient models that can be used for the estimation of solubilities in pure solvents and in solvent mixtures. Other popular activity coefficient models, such as van Laar, Wilson, nonrandom two liquid (NRTL), or UNIQUAC, are not practical because use of these models requires the determination of binary interaction parameters from phase equilibrium data for each of the solute-solvent and solvent-solvent binary mixtures. Solute-solvent phase equilibrium data are rarely avail- able to support the use of these activity coefficient models in pharmaceutical process design. The Hansen model is a correlative model. It requires experimental solubility data from which component- specific solubility parameters can be determined for the solutes. The UNIFAC model is a predictive model that requires only chemical structure information for the solutes and solvents. Unfortunately, although these models have shown limited utilities for solubility esti- mation of chemicals with molecular weights in the low 100s g/mol, prior investigators 2 have found that, because of inherent assumptions with Hansen and UNIFAC, they are inadequate in estimating solubilities for large, complex organic molecules with molecular weights in the range of 200-600 g/mol. UNIFAC fails for systems with large complex molecules for which either the UNIFAC functional groups are undefined or the func- tional group additivity rule becomes invalid. Addition- ally, neither Hansen nor UNIFAC is applicable to electrolyte solutes, a major concern for the pharmaceu- tical industry because organic electrolytes account for the majority of drug compounds. Recent developments in computational chemistry yielded COSMO-RS 7 and COSMO-SAC, 8 predictive models that represent promising alternatives to UNI- FAC. Like UNIFAC, the current COMOS-RS-type mod- els are not applicable to electrolyte solutes. In this paper, we present the NRTL segment activity coefficient (NRTL-SAC) model as the thermodynamic framework for solubility modeling. The NRTL-SAC model is based on the polymer NRTL model, 9 a deriva- tive of the original NRTL model of Renon and Praus- nitz. 10 NRTL is one of the most successful molecular thermodynamic models in the chemical industry. The model and its derivatives have been widely used to correlate and extrapolate phase behaviors of highly nonideal systems with chemicals, electrolytes, oligo- mers, polymers, surfactants, etc. 9,11 We show that the NRTL-SAC model provides a simple and practical thermodynamic framework for chemists and engineers to perform solubility modeling in support of their * To whom correspondence should be addressed. Tel.: (617) 949-1202. Fax: (617) 949-1030. E-mail: chauchyun.chen@ aspentech.com. 8354 Ind. Eng. Chem. Res. 2004, 43, 8354-8362 10.1021/ie049463u CCC: $27.50 © 2004 American Chemical Society Published on Web 12/15/2004

description

nrtl

Transcript of Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

Page 1: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

Solubility Modeling with a Nonrandom Two-Liquid SegmentActivity Coefficient Model

Chau-Chyun Chen* and Yuhua Song

Aspen Technology, Inc., Ten Canal Park, Cambridge, Massachusetts 02141

A segment contribution activity coefficient model, derived from the polymer nonrandom two-liquid model, is proposed for fast, qualitative estimation of the solubilities of organic nonelec-trolytes in common solvents. Conceptually, the approach suggests that one account for the liquidnonideality of mixtures of complex pharmaceutical molecules and small solvent molecules interms of interactions between three pairwise interacting conceptual segments: hydrophobicsegment, hydrophilic segment, and polar segment. In practice, these conceptual segments becomethe molecular descriptors used to represent the molecular surface characteristics of each soluteand solvent molecule. The treatment results in component-specific molecular parameters:hydrophobicity X, polarity Y, and hydrophilicity Z. Once the molecular parameters are identifiedfrom experimental data for common solvents and solute molecules, the model offers a simpleand practical thermodynamic framework to estimate solubilities and to perform other phaseequilibrium calculations in support of pharmaceutical process design.

Introduction

Solvent selection is a critical task in the chemicalsynthesis and recipe development phase of the phar-maceutical and agricultural chemical industries.1-3 Thechoice of solvents directly impacts the reaction rates,extraction efficiency, crystallization yield, etc. Propersolvent selection results in faster product separation andpurification, reduced solvent emission and lesser waste,higher yield, lower overall cost, and better productionprocesses.

Solubility is a key property of concern in solventselection because pharmaceutical product isolation isoften done through crystallization at reduced temper-ature and/or with the addition of antisolvent. Solubilitydata involving new drug molecules and their precursorsin the solvents rarely exist, if any. Although limitedsolubility experiments are taken for a few solvents aspart of the process development practice, the experi-mental task can multiply rapidly when one considersthe choices of solvents and solvent-antisolvent mix-tures, the effect of temperature, the impacts of impuri-ties, the possibilities of multiple polymorphs, etc. As aresult, solvent selection is largely dictated by research-ers’ preferences or prior experiences.

Existing solubility estimation techniques are bestrepresented by the Hansen model,4 the UNIFAC groupcontribution model,5 and perhaps the Abraham solva-tion model.6 From the three, Hansen and UNIFAC areactivity coefficient models that can be used for theestimation of solubilities in pure solvents and in solventmixtures. Other popular activity coefficient models, suchas van Laar, Wilson, nonrandom two liquid (NRTL), orUNIQUAC, are not practical because use of thesemodels requires the determination of binary interactionparameters from phase equilibrium data for each of thesolute-solvent and solvent-solvent binary mixtures.Solute-solvent phase equilibrium data are rarely avail-

able to support the use of these activity coefficientmodels in pharmaceutical process design.

The Hansen model is a correlative model. It requiresexperimental solubility data from which component-specific solubility parameters can be determined for thesolutes. The UNIFAC model is a predictive model thatrequires only chemical structure information for thesolutes and solvents. Unfortunately, although thesemodels have shown limited utilities for solubility esti-mation of chemicals with molecular weights in the low100s g/mol, prior investigators2 have found that, becauseof inherent assumptions with Hansen and UNIFAC,they are inadequate in estimating solubilities for large,complex organic molecules with molecular weights inthe range of 200-600 g/mol. UNIFAC fails for systemswith large complex molecules for which either theUNIFAC functional groups are undefined or the func-tional group additivity rule becomes invalid. Addition-ally, neither Hansen nor UNIFAC is applicable toelectrolyte solutes, a major concern for the pharmaceu-tical industry because organic electrolytes account forthe majority of drug compounds.

Recent developments in computational chemistryyielded COSMO-RS7 and COSMO-SAC,8 predictivemodels that represent promising alternatives to UNI-FAC. Like UNIFAC, the current COMOS-RS-type mod-els are not applicable to electrolyte solutes.

In this paper, we present the NRTL segment activitycoefficient (NRTL-SAC) model as the thermodynamicframework for solubility modeling. The NRTL-SACmodel is based on the polymer NRTL model,9 a deriva-tive of the original NRTL model of Renon and Praus-nitz.10 NRTL is one of the most successful molecularthermodynamic models in the chemical industry. Themodel and its derivatives have been widely used tocorrelate and extrapolate phase behaviors of highlynonideal systems with chemicals, electrolytes, oligo-mers, polymers, surfactants, etc.9,11 We show that theNRTL-SAC model provides a simple and practicalthermodynamic framework for chemists and engineersto perform solubility modeling in support of their

* To whom correspondence should be addressed. Tel.: (617)949-1202. Fax: (617) 949-1030. E-mail: [email protected].

8354 Ind. Eng. Chem. Res. 2004, 43, 8354-8362

10.1021/ie049463u CCC: $27.50 © 2004 American Chemical SocietyPublished on Web 12/15/2004

Page 2: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

pharmaceutical process design. While this paper focuseson modeling solubilities of organic nonelectrolytes,future work will extend the model to organic electro-lytes.

Solubility Modeling

The solubility of a solid organic nonelectrolyte can bedescribed by the expression1,2

for T e Tm

where xISAT is the mole fraction of the solute I dissolved

in the solvent phase at saturation, ∆fusS is the entropyof fusion of the solute, γI

SAT is the activity coefficient ofthe solute in the solution at saturation, R is the gasconstant, T is the temperature, and Tm is the meltingpoint of the solute. Given a polymorph, ∆fusS and Tmare fixed. At a fixed temperature, the solubility is onlya function of the activity coefficient of the solute in thesolution. Clearly, the activity coefficient of the solutein the solution plays a key role in determining thesolubility.

Equation 1 is a simplified expression for solubility.It ignores the contributions due to the difference be-tween solid and liquid heat capacities at the meltingpoint and due to the pressure correction. When thevalues of ∆fusS and Tm are not available, the solubilityproduct constant, Ksp, can be introduced into eq 1 as anadjustable parameter for data regression:

Ksp corresponds to the ideal solubility of the solute.

NRTL Segment Activity Coefficient Model

The proposed NRTL segment activity coefficientmodel builds on the segment contribution concept thatwas first incorporated into the polymer NRTL model9

for systems with oligomers and polymers. In NRTL-SAC, the activity coefficient expression is written in twoparts such that

where γIC and γI

R are the combinatorial and residualcontributions to the activity coefficient of component I.The residual part, γI

R, is set equal to the local composi-tion (lc) interaction contribution, γI

lc, of the polymerNRTL as follows:

We then compute the segment activity coefficient, Γm,from the NRTL equation.

where i, j, k, m, and m′ are the segment-based speciesindices, I and J are the component indices, xj is thesegment-based mole fraction of segment species j, xJ isthe mole fraction of component J, rm,I is the number ofsegment species m contained in component I, Γm

lc is theactivity coefficient of segment species m, and Γm

lc,I is theactivity coefficient of segment species m contained onlyin component I. G and τ in eqs 6 and 7 are local binaryquantities related to each other by the NRTL nonran-domness factor parameter R:

Equation 5 is a general form for the local compositioninteraction contribution to activity coefficients of com-ponents in the NRTL-SAC model. For monosegmentsolvent components (S), eq 5 can be simplified andreduced to the classical NRTL model as follows:

with

Therefore

ln xISAT )

∆fusSR (1 -

Tm

T ) - ln γISAT (1)

∆fusS ) ∆fusH/Tm (2)

ln Ksp ) ln xISAT + ln γI

SAT (3)

ln γI ) ln γIC + ln γI

R (4)

ln γIR ) ln γI

lc ) ∑m

rm,I[ln Γmlc - ln Γm

lc,I] (5)

ln Γmlc )

∑j

xjGjmτjm

∑k

xkGkm

+

∑m′

xm′Gmm′

∑k

xkGkm′(τmm′ -

∑j

xjGjm′τjm′

∑k

xkGkm′ ) (6)

ln Γmlc,I )

∑j

xj,IGjmτjm

∑k

xk,IGkm

+

∑m′

xm′,IGmm′

∑k

xk,IGkm′(τmm′ -

∑j

xj,IGjm′τjm′

∑k

xk,IGkm′ ) (7)

xj )

∑J

xJrj,J

∑I∑

i

xIri,I

(8)

xj,I )rj,I

∑i

ri,I

(9)

G ) exp(-Rτ) (10)

ln γI)Slc ) ∑

m

rm,S[ln Γmlc - ln Γm

lc,S] (11)

rm,S ) 1 (12)

ln Γmlc,S ) 0 (13)

Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8355

Page 3: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

Equation 14 is the same as the classical NRTL model.10

The combinatorial part, γIC, is calculated from the

Flory-Huggins term:

where rI and φI are the total segment number andsegment mole fraction of component I, respectively.

Conceptual Segment Contribution Concept. Theessence of NRTL-SAC resides in its use of the conceptualsegment contribution concept. While UNIFAC decom-poses molecules into a large set of predefined functionalgroups based on the chemical structure, NRTL-SACmaps molecules into a few predefined conceptual seg-ments, or molecular descriptors, based on expressedcharacteristics of molecular interactions in solutions.Specifically, for each solute and solvent molecule, NRTL-SAC describes their effective surface interactions interms of three types of conceptual segments: hydro-phobic segment, polar segment, and hydrophilic seg-ment. Equivalent numbers of the conceptual segmentsfor each molecule are measures of the effective surfaceareas of the molecule that exhibit surface interactioncharacteristics of hydrophobicity (X), polarity (Y), andhydrophilicity (Z). These molecular measures, i.e., X, Y,and Z, are to be determined not from the molecularstructure but from the interaction characteristics of themolecules in solution as expressed in their experimentalphase equilibrium data.

The pairwise segment-segment interaction charac-teristics of these conceptual segments are representedby their corresponding binary NRTL parameters. Thedetermination of these binary NRTL parameters isdiscussed in the next section. Given the NRTL param-eters for the pairwise segment-segment interactionsand the molecular measures (X, Y, and Z) for themolecules, we apply eqs 4-9 to compute activity coef-ficients for the segments and the molecules in solution.In other words, the phase behavior of the mixtures willbe accounted for based on the segment compositions ofthe molecules and their pairwise segment-segmentinteractions.

The conceptual segment contribution approach rep-resents a practical alternative to the UNIFAC functionalgroup contribution approach. This approach is suitablefor use in the industrial practice of carrying out mea-

surements for a few selected solvents and then using amodel to quickly predict other solvents or solventmixtures and to generate a list of suitable solventsystems. The NRTL-SAC model aims to provide such athermodynamic framework. With NRTL-SAC, availableexperimental data are used to identify molecular pa-rameters for the solutes, and the model is used toextrapolate to other solvent systems that are alsodescribed in terms of the same set of molecular descrip-tors.

Conceptual Segments and NRTL Binary Param-eters. Three conceptual segments are initially identifiedfor nonelectrolyte molecules: hydrophobic segment,polar segment, and hydrophilic segment. Additionalconceptual segments may be introduced when we ex-pand the scope to cover organic electrolytes, chargedmolecules, zwitterions, etc. To enhance the usability ofNRTL-SAC, the choice of conceptual segments is meantto be a minimal set rather than a comprehensive set.These conceptual segments are chosen to simulate theinteraction characteristics of representative molecularsurfaces that significantly contribute to the liquid-phasenonideality of real molecules. Here the hydrophilicsegment simulates polar molecular surfaces that are“hydrogen bond donor or acceptor.“ As such, it repre-sents molecular surfaces with the tendency to form ahydrogen bond. The hydrophobic segment simulatesmolecular surfaces with the adversity to form a hydro-gen bond. The polar segment simulates polar molecularsurfaces that are “electron pair donor or acceptor.” Whilethe hydrophobic and hydrophilic segments have theirstrong and clear physical meanings and unique contri-butions to the liquid-phase nonideality, in our drive tominimize the number of conceptual segments and forpractical purposes, we lumped all other surface interac-tions with the “polar” segment.

With the conceptual segments identified, real mol-ecules are then selected as reference molecules for theconceptual segments and available phase equilibriumdata of these reference molecules are used to identifyNRTL binary parameters for the conceptual segments.In choosing the reference molecules, we prefer thosemolecules with distinct molecular characteristics (i.e.,hydrophobic, hydrophilic, or polar) and with abundant,publicly available phase equilibrium data.

We focus our study on the 59 solvents reviewed foruse in pharmaceutical process design by the Interna-tional Conference on Harmonization of Technical Re-quirements for Registration of Pharmaceuticals forHuman Use (ICH).12 We also consider water, triethyl-amine, and n-octanol in this study because they are usedextensively in pharmaceutical processes. Additionalsolvents can be considered in the future. Table 1 showsthese 62 solvents and their molecular characteristics.Hydrocarbon solvents (aliphatic or aromatic), haloge-nated hydrocarbons, and ethers are mainly hydrophobic.Ketones, esters, and amides are both hydrophobic andpolar. Alcohols, glycols, and amines may have bothsubstantial hydrophilicity and hydrophobicity. Acids are“complex” molecules, exhibiting hydrophilicity, polarity,and hydrophobicity. Also shown in Table 1 are theavailable NRTL binary parameters for various solvent-water and solvent-hexane binary systems. We obtainedthese binary parameters by fitting the available datacompiled by DECHEMA for phase equilibrium at oraround room temperature. We deliberately ignore thetemperature dependency of these parameters because

ln γI)Slc )

∑j

xjGjSτjS

∑k

xkGkS

+ ∑m

xmGSm

∑k

xkGkm(τSm -

∑j

xjGjmτjm

∑k

xkGkm )(14)

GjS ) exp(-RjSτjS) (15)

GSj ) exp(-RjSτSj) (16)

ln γIC ) ln

φI

xI

+ 1 - rI∑J

φJ

rJ

(17)

rI ) ∑i

ri,I (18)

φI )rIxI

∑J

rJxJ

(19)

8356 Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004

Page 4: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

these parameters are reported here only to illustrateranges of values for these binary parameters.

Table 1 shows that all hydrophobic solvents (1) exhibitsimilar repulsive interactions with water (2) and bothτ12 and τ21 are large positive values for the solvent-water binaries. When the hydrophobic solvents also

carry significant hydrophilic or polar characteristics, wesee that τ12 becomes negative while τ21 retains a largepositive value.

Interestingly, we see similar repulsive, but weaker,interactions between the polar solvent (1) and hexane(2), a representative hydrophobic solvent. Both τ12 and

Table 1. NRTL Binary Parameters for Common Solvents in Pharmaceutical Process Design

solvent (component 1) τ12a τ21

a τ12b τ21

b τ12c τ21

c solvent characteristics

acetic acid 1.365 0.797 2.445 -1.108 complexacetone 0.880 0.935 0.806 1.244 polaracetonitrile 1.834 1.643 0.707 1.787 polaranisole hydrophobicbenzene 1.490 -0.614 3.692 5.977 hydrophobic1-butanol -0.113 2.639 0.269 2.870 -2.157 5.843 hydrophobic/hydrophilic2-butanol -0.165 2.149 -0.168 3.021 -1.539 5.083 hydrophobic/hydrophilicn-butyl acetate 1.430 2.131 hydrophobic/polarmethyl tert-butyl ether -0.148 0.368 1.534 4.263 hydrophobiccarbon tetrachloride 1.309 -0.850 5.314 7.369 hydrophobicchlorobenzene 0.884 -0.194 4.013 7.026 hydrophobicchloroform 1.121 -0.424 3.587 4.954 hydrophobiccumene hydrophobiccyclohexane -0.824 1.054 6.012 9.519 hydrophobic1,2-dichloroethane 1.576 -0.138 3.207 4.284 2.833 4.783 hydrophobic1,1-dichloroethylene hydrophobic1,2-dichloroethylene hydrophobicdichloromethane 0.589 0.325 1.983 3.828 polar1,2-dimethoxyethane 0.450 1.952 polarN,N-dimethylacetamide -0.564 1.109 polarN,N-dimethylformamide 1.245 1.636 -1.167 2.044 polardimethyl sulfoxide -2.139 0.955 polar1,4-dioxane 1.246 0.097 1.003 1.010 polarethanol 0.533 2.192 -0.024 1.597 hydrophobic/hydrophilic2-ethoxyethanol -0.319 2.560 -1.593 1.853 hydrophobic/hydrophilicethyl acetate 0.771 0.190 0.508 3.828 hydrophobic/polarethylene glycol 1.380 -1.660 hydrophilicdiethyl ether -0.940 1.400 1.612 3.103 hydrophobicethyl formate polarformamide complexformic acid -0.340 -1.202 complexn-heptane -0.414 0.398 hydrophobicn-hexane 6.547 10.949 6.547 10.949 hydrophobicisobutyl acetate polarisopropyl acetate polarmethanol 1.478 1.155 0.103 0.396 hydrophobic/hydrophilic2-methoxyethanol 1.389 -0.566 hydrophobic/hydrophilicmethyl acetate 0.715 2.751 polar3-methyl-1-butanol 0.062 2.374 -0.042 3.029 -0.598 5.680 hydrophobic/hydrophilicmethyl butyl ketone hydrophobic/polarmethylcyclohexane 1.412 -1.054 polarmethyl ethyl ketone -0.036 1.273 0.823 2.128 -0.769 3.883 hydrophobic/polarmethyl isobutyl ketone 0.977 4.868 hydrophobic/polarisobutyl alcohol 0.021 2.027 0.592 2.702 -1.479 5.269 hydrophobic/hydrophilicN-methyl-2-pyrrolidone -0.583 3.270 -0.235 0.437 hydrophobicnitromethane 1.968 2.556 polarn-pentane 0.496 -0.523 hydrophobic1-pentanol -0.320 2.567 -0.029 3.583 hydrophobic/hydrophilic1-propanol 0.049 2.558 0.197 2.541 hydrophobic/hydrophilicisopropyl alcohol 0.657 1.099 0.079 2.032 hydrophobic/hydrophilicn-propyl acetate 1.409 2.571 hydrophobic/polarpyridine -0.665 1.664 -0.990 3.146 polarsulfolane 1.045 0.396 polartetrahydrofuran 0.631 1.981 1.773 0.563 polar1,2,3,4-tetrahydronaphthalene 1.134 -0.631 hydrophobictoluene -0.869 1.292 4.241 7.224 hydrophobic1,1,1-trichloroethane 0.535 -0.197 hydrophobictrichloroethylene 1.026 -0.560 hydrophobicm-xylene hydrophobicwater 10.949 6.547 hydrophilictriethylamine -0.908 1.285 1.200 1.763 -0.169 4.997 hydrophobic/polar1-octanol -0.888 3.153 0.301 8.939 hydrophobic/hydrophilica NRTL binary τ parameters for various solvent-hexane systems. The NRTL nonrandom factor parameter, R, is fixed as a constant

of 0.2. In these binary systems, the solvent is component 1 and hexane is component 2. τ’s were determined from available VLE and LLEdata. b NRTL binary τ parameters for various solvent-water systems. The NRTL nonrandom factor parameter, R, is fixed as a constantof 0.3. In these binary systems, the solvent is component 1 and water is component 2. τ’s were determined from available VLE data.c NRTL binary τ parameters for various solvent-water systems. The NRTL nonrandom factor parameter, R, is fixed as a constant of 0.2.In these binary systems, the solvent is component 1 and water is component 2. τ’s were determined from available LLE data.

Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8357

Page 5: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

τ21 are small but positive values for the solvent-hexanebinaries. On the other hand, the interactions betweenhydrophobic solvents and hexane are weak, and thecorresponding NRTL binary parameters are around orless than unity, characteristic of nearly ideal solutions.

The interactions between polar solvents (1) and water(2) are more subtle. While all τ21 are positive, τ12 can bepositive or negative. Apparently, different polar mol-ecules exhibit different interactions, some repulsive andothers attractive, with hydrophilic molecules. For ex-ample, acetonitrile and acetone are hydrogen bondacceptors, and they form hydrogen bonds with water.Both τ12 and τ21 are positive for the acetone-water andacetonitrile-water binaries. For example, dimethylsulfoxide is a compound with excellent solvation capac-ity and high dielectric constant (48.75 at 25 °C). τ12 isnegative and τ21 is positive for the dimethyl sulfoxide-water binary.

Hexane and water are the obvious choices as thereference molecules for the hydrophobic and hydrophilicsegments, respectively. The selection of the referencemolecule for the polar segment requires attention to thewide variations of interactions between polar moleculesand water. Ultimately, we choose acetonitrile as arepresentative of polar molecules, and we introduce amechanism to tune the way we characterize the polarsegment.

The chosen values for the NRTL binary interactionparameters, R and τ, for the three conceptual segmentsare summarized in Table 2. As mentioned earlier, weignore the temperature dependency of the binary pa-rameters. The binary parameters for the hydrophobicsegment X (1)-hydrophilic segment Z (2) are deter-mined from available liquid-liquid equilibrium (LLE)data of the hexane-water binary mixture (see Table 1).We fix R at 0.2 because it is the customary value for Rfor systems that exhibit liquid-liquid separation. Hereboth τ12 and τ21 are large positive values (6.547 and10.950). They highlight the strong repulsive nature ofthe interactions between the hydrophobic and hydro-philic segments. The binary parameters for the hydro-phobic segment X (1)-polar segment Y (2) are deter-mined from available LLE data of the hexane-aceto-nitrile binary mixture (see Table 1). Again, we fix R at0.2. Both τ12 and τ21 are small positive values (1.643 and1.834). They highlight the weak repulsive nature of theinteractions between the hydrophobic and polar seg-ments.

The binary parameters for the polar segment Y (1)-hydrophilic segment Z (2) are determined from availablevapor-liquid equilibrium (VLE) data of the acetoni-trile-water binary mixture (see Table 1). We fix R at0.3 for the hydrophilic segment-polar segment pairbecause this binary does not exhibit liquid-liquidseparation. We fix τ21 at a positive value (1.787), andwe allow τ12 to vary between -2 and +2 to reflect thefact that the interaction between the polar molecule andwater can be negative or positive as shown in Table 1.In practice, this is achieved by allowing for two typesof polar segments, Y- and Y+. For the Y- polar

segment, the values of τ12 and τ21 are -2 and +1.787,respectively. For the Y+ polar segment, they are 2 and1.787, respectively. Note that both the Y- and Y+ polarsegments exhibit the same repulsive interactions withhydrophobic segments as those discussed in the previousparagraph. Also, an ideal solution is assumed for theY- polar segment and Y+ polar segment binary, i.e.,τ12 ) τ21 ) 0.

We understand that the treatment above is somehowarbitrary and it only reflects our own limited molecularinsights at this time. However, the treatment is de-signed to capture the general trends of the NRTL binaryparameters that we have observed for systems withhydrophobic, polar, and hydrophilic molecules. Furtherinvestigation may bring improved treatments.

Molecular Parameters for Solvents. The applica-tion of NRTL-SAC requires an extensive databank ofmolecular parameters for common solvents used in theindustry. As mentioned earlier, we focus on the commonsolvents used in the pharmaceutical industry.12 For eachsolvent, there can be up to four molecular parameters,i.e., X, Y-, Y+, and Z. Because of the fact that thesemolecular parameters represent certain pairwise sur-face interaction characteristics, often only one or twomolecular parameters are needed for most solvents. Forexample, alkanes are hydrophobic and are well repre-sented with hydrophobicity, X, alone. Alcohols arehybrids of hydrophobic and hydrophilic segments andare primarily represented with X and Z. Ketones, esters,and ethers are polar molecules with varying degrees ofhydrophobic contents. They are well represented by Xand Y’s.

Determination of solvent molecular parameters in-volves regression of available experimental VLE or LLEdata for binary systems of solvent and the above-mentioned reference molecules (i.e., hexane, acetonitrile,and water) or their substitutes. Solvent molecularparameters X, Y-, Y+, and Z are the adjustableparameters in the regression. If binary data are lackingfor the solvent with the reference molecules, data forother binaries may be used as long as the molecularparameters for the substitute reference molecules arealready identified.

Table 3 lists the molecular parameters identified forthe 62 solvents. We used the VLE or LLE data takenat or around room temperature and available in theDECHEMA database. Among the ICH solvents, becauseof the lack of sufficient experimental binary phaseequilibrium data, we are less comfortable with themolecular parameters identified for anisole, cumene,1,2-dichloroethylene, 1,2-dimethoxyethane, N,N-di-methylacetamide, dimethyl sulfoxide, ethyl formate,isobutyl acetate, isopropyl acetate, methyl butyl ketone,tetralin, and trichloroethylene. In fact, we are not ableto locate any public data for methyl butyl ketone (2-hexanone) and, therefore, its molecular parameters wereset to be the same as those for methyl isobutyl ketone.

The NRTL-SAC model with the molecular parametersdoes qualitatively capture the interaction characteristicsof the solvent mixtures and the resulting phase equi-librium behavior. As an example, Figures 1-3 show thebinary phase diagrams for the water-1,4-dioxane-octanol system. We compare the predictions from theNRTL model with the binary parameters in Table 1 tothe predictions from the NRTL-SAC model with themolecular parameters in Table 3. The predictions withthe NRTL-SAC model are broadly consistent with the

Table 2. NRTL Binary Parameters for ConceptualSegments in NRTL-SAC

segment 1 X X Y- Y+ Xsegment 2 Y- Z Z Z Y+τ12 1.643 6.547 -2.000 2.000 1.643τ21 1.834 10.949 1.787 1.787 1.834R12 ) R21 0.2 0.2 0.3 0.3 0.2

8358 Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004

Page 6: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

calculations from the NRTL model that are generallyunderstood to represent experimental data within en-gineering accuracy.

Model Applications

To test the usability of NRTL-SAC with solubilitymodeling of pharmaceuticals, we apply the model toaspirin with the room-temperature solubility data com-piled by Frank et al.2 Of the 23 solvents in Frank et

al.’s compilation, we focus on the 14 solvents for whichwe have molecular parameters available in Table 3. Wefirst fit the aspirin solubility data for all 14 solvents.The regression results are shown in Table 4 and Figure4. The regressed molecular parameters for aspirin aregiven in Table 5. With NRTL-SAC, the root-mean-square (rms) error in ln x, [∑i

N(ln xiexp - ln xi

cal)2/N]1/2,for the fit is 0.506 (here x is the solubility of the solute,i.e., mole fraction, and N is the number of data used inthe correlations). Acetic acid, a strong proton donor, isthe outlier in this case. With acetic acid removed, therms error in ln x for the fit drops significantly to 0.362.While there is room for further optimization of NRTL-

Table 3. NRTL-SAC Molecular Parameters for CommonSolvents

solvent name X Y- Y+ Z

acetic acid 0.045 0.164 0.157 0.217acetone 0.131 0.109 0.513acetonitrile 0.018 0.131 0.883anisole 0.722benzene 0.607 0.1901-butanol 0.414 0.007 0.4852-butanol 0.335 0.082 0.355n-butyl acetate 0.317 0.030 0.330methyl tert-butyl ether 1.040 0.219 0.172carbon tetrachloride 0.718 0.141chlorobenzene 0.710 0.424chloroform 0.278 0.039cumene 1.208 0.541cyclohexane 0.8921,2-dichloroethane 0.394 0.6911,1-dichloroethylene 0.529 0.2081,2-dichloroethylene 0.188 0.832dichloromethane 0.321 1.2621,2-dimethoxyethane 0.081 0.194 0.858N,N-dimethylacetamide 0.067 0.030 0.157N,N-dimethylformamide 0.073 0.564 0.372dimethyl sulfoxide 0.532 2.8901,4-dioxane 0.154 0.086 0.401ethanol 0.256 0.081 0.5072-ethoxyethanol 0.071 0.318 0.237ethyl acetate 0.322 0.049 0.421ethylene glycol 0.141 0.338diethyl ether 0.448 0.041 0.165ethyl formate 0.257 0.280formamide 0.089 0.341 0.252formic acid 0.707 2.470n-heptane 1.340n-hexane 1.000isobutyl acetate 1.660 0.108isopropyl acetate 0.552 0.154 0.498methanol 0.088 0.149 0.027 0.5622-methoxyethanol 0.052 0.043 0.251 0.560methyl acetate 0.236 0.3373-methyl-1-butanol 0.419 0.538 0.314methyl butyl ketone 0.673 0.224 0.469methylcyclohexane 1.162 0.251methyl ethyl ketone 0.247 0.036 0.480methyl isobutyl ketone 0.673 0.224 0.469isobutyl alcohol 0.566 0.067 0.485N-methyl-2-pyrrolidone 0.197 0.322 0.305nitromethane 0.025 1.216n-pentane 0.8981-pentanol 0.474 0.223 0.426 0.2481-propanol 0.375 0.030 0.511isopropyl alcohol 0.351 0.070 0.003 0.353n-propyl acetate 0.514 0.134 0.587pyridine 0.205 0.135 0.174sulfolane 0.210 0.457tetrahydrofuran 0.235 0.040 0.3201,2,3,4-tetrahydronaphthalene 0.443 0.555toluene 0.604 0.3041,1,1-trichloroethane 0.548 0.287trichloroethylene 0.426 0.285m-xylene 0.758 0.021 0.316water 1.000triethylamine 0.557 0.1051-octanol 0.766 0.032 0.624 0.335

Figure 1. Txy phase diagram for a water-1,4-dioxane mixtureat atmospheric pressure.

Figure 2. Txxy phase diagram for a water-octanol mixture atatmospheric pressure.

Figure 3. Txy phase diagram for an octanol-1,4-dioxane mixtureat atmospheric pressure.

Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8359

Page 7: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

SAC including both molecular descriptors and param-eters, the results are considered to be very satisfactory.

To test the predictive capability of NRTL-SAC, wealso fit the aspirin solubility data using only fourrepresentative solvents (i.e., acetone for the polarsolvent, cyclohexane and chloroform for the hydrophobicsolvents, and methanol for the hydrophilic solvent) andthen use the identified molecular parameters to esti-mate the aspirin solubilities in the other 10 solvents.As shown in Table 5, the molecular parameters for

aspirin only change slightly. Likewise, the rms error inln x for all 14 solvents only increases slightly from 0.506to 0.533. The comparison of experimental data vscomputed solubilities is given in Figure 5, which showsa quality of fit similar to that shown in Figure 4. Inother words, these molecular parameters are found tobe relatively independent of the number of solvents usedas long as proper representative solvents (hydrophobic,hydrophilic, and polar) are included. This study withaspirin and other similar studies suggest that the

Table 4. Solubility of Aspirin at Room Temperaturea

literature data

solvent wt %mole

fractionNRTL-SAC

(mole fraction)

NRTL-SAC(four solvents)b

(mole fraction)UNIFAC

(mole fraction)Hansen

(mole fraction)

methanol 33 8.053 × 10-2 7.950 × 10-2 8.053 × 10-2 7.722 × 10-2 4.256 × 10-2

acetone 29 1.163 × 10-1 1.084 × 10-1 1.163 × 10-1 8.782 × 10-2 7.892 × 10-2

ethanol 20 6.007 × 10-2 3.907 × 10-2 3.208 × 10-2 1.606 × 10-2 4.643 × 10-2

1,4-dioxane 19 1.029 × 10-1 1.130 × 10-1 1.204 × 10-1 5.699 × 10-2 1.997 × 10-2

acetic acid 12 4.347 × 10-2 1.709 × 10-1 1.670 × 10-1 9.522 × 10-2 9.053 × 10-2

methyl ethyl ketone 12 5.174 × 10-2 4.838 × 10-2 5.016 × 10-2 6.596 × 10-2 5.642 × 10-2

2-propanol 10 5.924 × 10-2 3.257 × 10-2 2.903 × 10-2 2.897 × 10-2 7.174 × 10-2

isoamyl alcohol 10 5.155 × 10-2 4.552 × 10-2 4.195 × 10-2 1.490 × 10-2 5.155 × 10-2

chloroform 6 4.057 × 10-2 4.547 × 10-2 4.057 × 10-2 9.735 × 10-2 3.369 × 10-2

diethyl ether 5 2.119 × 10-2 1.127 × 10-2 9.081 × 10-3 1.685 × 10-2 2.558 × 10-2

n-octanol 3 2.186 × 10-2 2.491 × 10-2 2.015 × 10-2 1.453 × 10-2 3.664 × 10-2

1,2-dichloroethane 3 1.670 × 10-2 1.352 × 10-2 1.232 × 10-2 3.969 × 10-2 2.809 × 10-2

1,1,1-trichloroethane 0.5 3.706 × 10-3 2.743 × 10-3 2.001 × 10-3 3.750 × 10-2 2.238 × 10-2

cyclohexane 0.005 2.335 × 10-5 4.962 × 10-5 2.335 × 10-5 9.351 × 10-4 4.695 × 10-3

aLiterature data, UNIFAC prediction results, and Hansen correlation results are taken from Frank et al.2 b The four representativesolvents are acetone, cyclohexane, methanol, and chloroform.

Table 5. NRTL-SAC Molecular Parameters for Solutes

solute MW no. of solvents T (K) X Y- Y+ Z ln Ksp rms error in ln x

aspirin 180.16 14 298.15 0.103 1.160 0.777 -2.630 0.506aspirin 180.16 4 298.15 0.039 1.372 0.799 -2.582 0.533e

p-aminobenzoic acid 137.14 7 298.15 0.218 0.681 1.935 0.760 -2.861 0.284benzoic acid 122.12 7 298.15 0.524 0.089 0.450 0.405 -1.540 0.160camphor 152.23 7 298.15 0.604 0.124 0.478 -0.593 0.092ephedrine 165.23 7 298.15 0.458 0.068 0.193 -0.296 0.067lidocaine 234.33 7 298.15 0.698 0.596 0.293 0.172 -0.978 0.027methylparaben 152.14 7 298.15 0.479 0.484 1.218 0.683 -2.103 0.120testosterone 288.41 7 298.15 1.051 0.771 0.233 0.669 -3.797 0.334theophylline 180.18 7 298.15 0.757 1.208 0.341 -6.110 0.661estriol 288.38 9a 298.15 0.853 0.291 1.928 -7.652 0.608estrone 270.37 12 298.15 0.499 0.679 1.521 0.196 -6.531 0.519morphine 285.34 6 308.15 0.773 1.811 -4.658 1.007piroxicam 331.35 14b 298.15 0.665 1.803 0.169 -7.656 0.665hydrocortisone 362.46 11c 298.15 0.401 0.970 1.248 0.611 -6.697 0.334haloperidol 375.86 13d 298.15 0.827 0.131 -4.398 0.311

a With THF excluded. b With 1,2-dichloroethane, chloroform, diethyl ether, and DMF excluded. c With hexane excluded. d Withchloroform and DMF excluded. e 14 solvents.

Figure 4. NRTL-SAC results for aspirin solubility at 298.15 K.Solubility data2 for all 14 solvents are fit simultaneously withNRTL-SAC.

Figure 5. NRTL-SAC results for aspirin solubility at 298.15 K.Solubility data2 for 4 solvents are fit with NRTL-SAC, while theother 10 are predicted.

8360 Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004

Page 8: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

NRTL-SAC molecular descriptors are good representa-tions of molecular surface interaction characteristics andthat the solvent molecules used to identify molecularparameters for the solute can be thought of as molecular“sensors” used to elucidate the surface interactioncharacteristics of the solute molecule in solution. Thesemolecular “sensors” probe and express the solute-solvent interactions in terms of binary phase equilibri-um data, i.e., solubility.

Note that, during the data regression, all experimen-tal solubility data, regardless of their order of magni-tude, were assigned with a standard deviation of 20%.

In addition to the experimental data and the NRTL-SAC results for aspirin at room temperature, Table 4also includes the UNIFAC prediction results and theHansen correlation results reported by Frank et al.2 Tobe sure of the UNIFAC predictions and the Hansencorrelations, we duplicated Frank’s results. With UNI-FAC and Hansen, the rms errors in ln x for the 14solvents are 1.352 and 1.600, respectively. Figures 6 and7 show the comparisons of experimental data andcomputed solubilities with UNIFAC and Hansen. Theoutliers could be attributed to either “poor” experimentaldata or “poor” model representations. Given that theNRTL-SAC results are clearly superior to those ofUNIFAC and Hansen, the results illustrate the relativeinability of UNIFAC and Hansen to capture solventeffects on the solubility of aspirin.

The data compilation of Marrero and Abildskov13

provides a good source of public solubility data for large,complex chemicals. To further test NRTL-SAC, we first

extract and test the solubility data for the eight mol-ecules reported by Lin and Nash.14 We also test themodel against six additional molecules with sizablesolubility data sets.

We apply the model with the solvents that areincluded in Table 3. The molecular parameters deter-mined for the solutes and the rms errors in ln x for thefits are summarized in Table 5. A good representationof the solubility data is obtained. The average rms errorin ln x for the 14 solutes (aspirin excluded) summarizedin Table 5 is 0.37. It corresponds to (45% accuracy insolubility predictions. Certainly, the quality of the fitreflects both the effectiveness of NRTL-SAC and thequality of the molecular parameters identified from thelimited available experimental data for the solvents.

Conclusions

The NRTL-SAC model is a practical thermodynamicframework for solubility modeling in pharmaceuticalprocess design. The model requires only component-specific molecular parameters that represent the surfaceinteraction characteristics of the molecules. For solutemolecules, these parameters are identified from solubil-ity measurements of the solute in a few representativesolvents, i.e., hydrophobic, hydrophilic, and polar sol-vents. The model is a useful tool for qualitative correla-tion and prediction of phase behavior, i.e., solubility, ofsystems with large, complex pharmaceutical solutes incommon solvents.

Acknowledgment

The authors are grateful to Hsien-Hsin Tung, DanielE. Bakken, Christopher Rentsch, and Jose E. Taboraof Merck for their critical evaluation of NRTL-SAC,UNIFAC, and Hansen models for solubility modelingof Merck compounds in solvents and solvent mixtures.We also thank Prof. John Prausnitz for his warmencouragement and insightful critiques on the manu-script.

Literature Cited

(1) Gupta, A.; Gupta, S.; Groves, F. R., Jr.; McLaughlin, E.Correlation of Solid-Liquid and Vapor-Liquid Equiibrium Datafor Polynuclear Aromatic Compounds. Fluid Phase Equilib. 1991,64, 201.

(2) Frank, T. C.; Downey, J. R.; Gupta, S. K. Quickly ScreenSolvents for Organic Solids. Chem. Eng. Prog. 1999, Dec, 41.

(3) Kolar, P.; Shen, J.-W.; Tsuboi, A.; Ishikawa, T. SolventSelection for Pharmaceuticals. Fluid Phase Equilib. 2002, 194-197, 771.

(4) Hansen, C. M. Hansen Solubility Parameters: A User’sHandbook; CRC Press: Boca Raton, FL, 2000.

(5) Fredenslund, A.; Jones, R. L.; Prausnitz, J. M. Group-Contribution Estimation of Activity Coefficients in Nonideal LiquidMixtures. AIChE J. 1975, 21, 1086.

(6) Acree, W. E., Jr.; Abraham, M. H. Solubility Predictions forCrystalline Nonelectrolyte Solutes Dissolved in Organic SolventsBased upon the Abraham General Solvation Model. Can. J. Chem.2001, 79, 1466.

(7) Klamt, A.; Eckert, F. COSMO-RS: a Novel and EfficientMethod for the a Priori Prediction of Thermophysical Data ofLiquids. Fluid Phase Equilib. 2000, 172, 43.

(8) Lin, S.-T.; Sandler, S. I. A Prior Phase Equilibrium Predic-tion from A Segment Contribution Solvation Model. Ind. Eng.Chem. Res. 2002, 41, 899.

Figure 6. UNIFAC results for aspirin solubility at 298.15 K.Solubility data2 for all 14 solvents are predicted with UNIFAC.

Figure 7. Hansen correlation results for aspirin solubility at298.15 K. Solubility data2 for all 14 solvents are fit simultaneouslywith Hansen.

Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8361

Page 9: Solubility Modeling With a Nonrandom Two-Liquid Segment Activity Coefficient Model

(9) Chen, C.-C. A Segment-Based Local Composition Model forthe Gibbs Energy of Polymer Solutions. Fluid Phase Equilib. 1993,83, 301.

(10) Renon, H.; Prausnitz, J. M. Local Compositions in Ther-modynamic Excess Functions for Liquid Mixtures. AIChE J. 1968,14, 135.

(11) Chen, C.-C.; Song, Y. Generalized Electrolyte NRTL Modelfor Mixed-Solvent Electrolyte Systems. AIChE J. 2004, 50, 1928.

(12) ICH Steering Committee, ICH Harmonised TripartiteGuideline, Impurities: Guideline for Residual Solvents, Q3C.International Conference on Harmonisation of Technical Require-ments for Registration of Pharmaceuticals for Human Use, 1997(http://www.ich.org).

(13) Marrero, J.; Abildskov, J. Solubility and Related Propertiesof Large Complex Chemicals, Part 1: Organic Solutes Rangingfrom C4 to C40. Chemistry Data Series XV; DECHEMA: Frankfurt/Main, Germany, 2003.

(14) Lin H.-M.; Nash, R. A. An Experimental Method forDetermining the Hildebrand Solubility Parameter of OrganicElectrolytes. J. Pharm. Sci. 1993, 82, 1018.

Received for review June 18, 2004Revised manuscript received September 29, 2004

Accepted October 18, 2004

IE049463U

8362 Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004