Pivotal Greenplum Text · GPText runs on Red Hat Enterprise Linux or CentOS 5.x, 6.x, or 7.x....
Transcript of Pivotal Greenplum Text · GPText runs on Red Hat Enterprise Linux or CentOS 5.x, 6.x, or 7.x....
PivotalGreenplum®Text
Version2.4.0
UserGuide
Rev:01
©2018PivotalSoftware,Inc.
2347121418252830323343576675118139140142
TableofContents
TableofContentsPivotal®Greenplum®Text2.4.0DocumentationPivotal®GPText2.4.0ReleaseNotesInstallingGPTextUpgradingGPTextIntroductiontoPivotalGPTextAdministeringGPTextGPTextHighAvailabilityGPTextBestPracticesTroubleshootingHadoopConnectionProblemsUsingPivotalGPTextWorkingWithGPTextIndexesQueryingGPTextIndexesCustomizingGPTextIndexesWorkingWithGPTextExternalIndexesGPTextFunctionReferenceGPTextManagementUtilitiesGPTextandSolrDataTypeMappingsGPTextSchemaTablesGPTextConfigurationParameters
©CopyrightPivotalSoftware,Inc,2013-2018 2 2.4.0
Pivotal®Greenplum®Text2.4.0Documentation
GPTextDocumentationPDF
PivotalGPText2.4.0ReleaseNotes
InstallingPivotalGPText
UpgradingPivotalGPText
UsingPivotalGPText
GPTextReferences
AdditionalResourcesPivotalGreenplumDatabase
ApacheSolrWebSite
ApacheMADlib
©CopyrightPivotalSoftware,Inc,2013-2018 3 2.4.0
Pivotal®GPText2.4.0ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.4.0
Released:June2018
AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.
GPTextincludesthefollowingfeatures:
TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching
BuildindexeswithdatabasedataorexternaldocumentsandsearchwiththeGPTextAPI
Customtokenizersforinternationaltextandsocialmediatext
AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors
Facetedsearchresults
Termhighlightinginresults
Greateremphasisonhighavailability
TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:
Start,stop,andmonitorZooKeeperandGPTextnodes
ConfigureGPTextnodesandindexes
Addanddeletereplicasforindexshards
BackupandrestoreGPTextindexes
RecoveraGPTextnode
ExpandtheGPTextclusterbyaddingGPTextnodes
PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.
FollowingareGPTextinstallationprerequisites.
GPTextrunsonRedHatEnterpriseLinux5.x,6.x,and7.x.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
InstallJavaJRE1.8.xandaddthe bin directorytothe PATH onallhostsinthecluster.GPTextistestedwithOracleJava1.8andOpenJDK1.8.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextcannotbeinstalledontoasharedNFSmount.
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit in
theGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetwork
©CopyrightPivotalSoftware,Inc,2013-2018 4 2.4.0
connectivitytotheGreenplumnetwork.
NewFeaturesandEnhancementsinGPText2.4.0GPText2.4.0allowsaddingdocumentsstoredinanauthenticatedFTPservertoaGPTextexternalindex.Thisenhancementincludeschangestoaddsupportforthe ftp externaldocumentsourcetypetothefollowingGPTextutilityandfunctions:
gptext.external upload command-lineutility
gptext.external_login() function
gptext.external_logout() function
gptext.index_external() function
gptext.index_external_dir() function
KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.
WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .
Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.
IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.
Workaround:
1. Whenanindexfailstoload,checktheSolrlogtofindthecause.
2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.
3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.
StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.
SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:
©CopyrightPivotalSoftware,Inc,2013-2018 5 2.4.0
mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"
InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.
mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0
BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.
Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes
configurationparameter:
$gpconfig-ccustom_variable_classes-v'gptext'
©CopyrightPivotalSoftware,Inc,2013-2018 6 2.4.0
InstallingGPText
PrerequisitesTheGPTextinstallationincludestheinstallationofApacheSolrCloudand,optionally,ApacheZooKeeper.
IfyouareinstallinganewGPTextreleaseintoanexistingGPTextsystem,followtheinstructionsinUpgradingGPTextinstead.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinuxorCentOS5.x,6.x,or7.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallaJRE1.8.xonallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit in
theGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes.Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusterwithatleastthreenodes(fivenodesrecommended)onseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
InstalltheGPTextBinaryDistribution1. OntheGreenplummasterhost,extracttheGPTextdistributionfile,acompressedtararchive.Forexample:
cd/home/gpadmintarxvfzgreenplum-text-release-rhel5_x86_64.tar.gz
Thereleasedirectorycontainsaninstallationconfigurationfile, gptext_install_config ,andtheGPTextinstallationbinary,whichhasanamesimilartogreenplum-text-<version>-<platform>.bin ,forexample, greenplum-text-2.2.0-rhel6_x86_64.bin .
2. Ifnecessary,grantexecutepermissiontotheGPTextbinary.Forexample:
chmod+x/home/gpadmin/greenplum-text-2.1.0-rhel5_x86_64.bin
3. IfyouareinstallingGPTextinadirectorythatisonlyaccessibletoroot,forexample /usr/local ,performthesesteps:
a. Createtheinstallationdirectoryasrootandchangetheownershiptothegpadminuser.b. Toinstalltoadirectorywheretheusermayormaynothavewritepermissions:
Usegpsshtocreateadirectorywiththesamefilepathonallhosts( mdw , smdw ,andthesegmenthosts sdw1 , sdw2 ,andsoon).Forexample:
/usr/local/<gptext-version>
Asroot,setthefilepermissionsandowner.Forexample:
#chmod775/usr/local/<gptext-version>#chowngpadmin:gpadmin/usr/local/<gptext-version>
©CopyrightPivotalSoftware,Inc,2013-2018 7 2.4.0
4. Editthe gptext_install_config filetosetparametersfortheinstallation.SeeSetInstallationParametersfordetails.
5. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:
./gptext-<version>.bin-c<gptext_install_config>
6. AcceptthePivotallicenseagreement.
OptionalTwo-PartGPTextInstallationYoucanruntheGPTextinstallationintwopartsbyfollowingthesesteps.
1. PrepareGPTextinstallationdirectoriesasdescribedinsteps1through3inInstalltheGPTextBinaries.
2. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:
./gptext-<version>.bin-b
Notethatthe -c<gptext_install_config> optionisomitted.
3. SourcetheGPTextenvironmentscriptintheGPTextinstallationdirectory:
source<gptext-install-dir>/greenplum-text_path.sh
4. Editthe gptext_install_config filetosetparametersfortheGPTextinstallation.SeeSetInstallationParametersfordetails.
5. DeploytheGPTextclusterwiththefollowingcommand:
gptext-deploy-c<gptext_install_config>
SetInstallationParametersAGPTextconfigurationfilenamed gptext_install_config containsparameterstoconfiguretheGPTextinstallation.Editthefileandsettheparametersasdescribedinthefollowingtable.
Table1.GPTextinstallationparameters
Parameter Description Example
GPTEXT_HOSTS
AnarrayofhostnamesonwhichtoinstallGPText,orusetheconstant "ALLSEGHOSTS" toinstallGPTextonallGreenplumDatabasesegmenthosts.GPTexthostsmustbepasswordlessssh-accessiblebythegpadminuserfromallotherhostsintheGreenplumCluster.
declare -a GPTEXT_HOSTS=(gptext_host1 gptext_host2 gptext_host3)
GPTEXT_HOSTS="ALLSEGHOSTS"
The GPTEXT_HOSTS and DATA_DIRECTORY installationparametersdeterminethenumberofGPTextnodesthataredeployed.
Thenumberofdirectoriesincludedinthe DATA_DIRECTORY arrayisthenumberofGPTextnodesthatarecreatedperhost.
The GPTEXT_HOSTS parameterdeterminesthenumberofhosts.Ifsettotheconstant "ALLSEGHOSTS" thenumberofGPTextnodehostsisthesameasthenumberofGreenplumsegmenthosts.If GPTEXT_HOSTS issettoanarrayofhostnames,thelengthofthearrayisthenumberofGPTextnodehosts.
ThemaximumnumberofGPTextnodesisthenumberofGreenplumDatabaseprimarysegments.ThebestpracticerecommendationistodeployfewerGPTextnodeswithmorememoryratherthantodividethememoryavailabletoGPTextamongthemaximumnumberofGPTextnodesallowed.Forexample,ifthereareeightprimarysegmentsperhostintheGreenplumDatabasecluster,themaximumnumberofGPTextnodesperhostiseight,butyoushouldtestwithtwoorfourGPTextnodesperhost,adjustingthe JAVA_OPTS installationparametertodividethememoryreservedforGPTextamongthem.
©CopyrightPivotalSoftware,Inc,2013-2018 8 2.4.0
DATA_DIRECTORY
AnarrayofdirectorypathswhereGPTextdatadirectoriesaretobecreated.ThenumberofdirectoriesinthearraydeterminesthenumberofGPTextnodesthatwillbecreatedoneachphysicalhost.IfGPTEXT_HOSTS listsmultipleinterfacesperhost,the
GPTextnodesarespreadevenlyacrosstheinterfaceaddresses.
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
JAVA_OPTS
SetstheminimumandmaximummemoryeachSolrCloudJVMcanuse.
JAVA_OPTS="-Xms1024M -Xmx2048M"
GPTEXT_PORT_BASE
GP_MAX_PORT_LIMIT
SetarangeofportnumbersavailabletoGPTextnodes.GPTextfindsunusedportsinthespecifiedrange.
GPTEXT_PORT_BASE=18983GP_MAX_PORT_LIMIT=28983
ZOO_CLUSTER
WhethertodeployaGPTextbindingZooKeeperclusteroruseanexistingZooKeepercluster.Ifsetto"BINDING" theinstallationdeploysaZooKeeper
cluster.TouseanexistingZooKeepercluster,setthisparametertoalistofZooKeepernodesintheformat"host1:port,host2:port,host3:port “.
ZOO_CLUSTER="BINDING"
ZOO_HOSTS
If ZOO_CLUSTER issetto "BINDING" ,thisparameterisanarrayofthehostswheretheZooKeepernodesaretobeinstalled.Thearraymustcontain3,5,or7hostnames,forexampleZOO_HOSTS=(sdw1 sdw2 swd3 sdw4 sdw5) .Ifyouare
usingasinglehostforZooKeeper,specifyitmultipletimes,forexample, ZOO_HOSTS=(sdw1 sdw1 swd1) .
declare -a ZOO_HOSTS=(localhost localhost localhost localhostlocalhost)
ZOO_DATA_DIR
TheZooKeeperdatadirectory,requiredwhenZOO_CLUSTER issetto "BINDING" . ZOO_DATA_DIR="/data/master/"
ZOO_GPTXTNODE
ThenodepathinZooKeeperforGPText.Thisparameterisrequiredwhether ZOO_CLUSTER issetto"BINDING" oralistofhosts.
ZOO_GPTXTNODE="gptext"
ZOO_PORT_BASE
ZOO_MAX_PORT_LIMIT
ArangeofportnumberstousefortheZooKeepercluster.Unusedportsareallocatedfromwithinthisrange.Therangemustcontainatleast4000portnumbers.
ZOO_PORT_BASE=2188ZOO_MAX_PORT_LIMIT=12188
GPTEXT_JAVA_HOME
ThehomedirectoryoftheJavainstallationtorunforZooKeeperandSolrprocesses.Ifnotset,theJREspecifiedinthe PATH and JAVA_HOME environmentvariableswillbeused.
GPTEXT_JAVA_HOME=/usr/java/jdk1.8.0_131
Parameter Description Example
StartingGPTextFirst,makesuretheGPTextcommand-lineutilitiesareinyourpathbysourcingtheGreenplumDatabaseandGPTextenvironmentscripts.ItisimportanttosourcetheGPTextenvironmentscripteachtimeyousourcetheGreenplumDatabasescript.Forexample:
source/usr/local/greenplum-db-<version>/greenplum_path.shsource/usr/local/greenplum-text-<version>/greenplum-text_path.sh
TouseGPTextinadatabase,youmustfirstusethe gptext-installsql managementutilitytoinstalltheGPTextuser-definedfunctionsandotherobjectsinthedatabase:
gptext-installsqldatabase[database2...]
©CopyrightPivotalSoftware,Inc,2013-2018 9 2.4.0
TheGPTextobjectsarecreatedinthe gptext schema.
TheZooKeeperclustermustberunningbeforeyoustartGPText.IfyouinstalledaboundZooKeepercluster,startitwiththe zkManager command-lineutility.
$zkManagerstart
StartGPTextwiththe gptext-start utility.
$gptext-start
ConfigureGreenplumDatabaseGPTextconfigurationparametersaresavedinZooKeeper.Youcan,however,viewandsetGPTextconfigurationparametersinaGreenplumDatabasesessionusingthe SHOW and SET commands.ThisrequiresaddingtheGPTextcustomvariableclasstotheGreenplumDatabase custom_variable_classes
configurationparameter.
The custom_variable_classes configurationparameterisacomma-separatedlistofclassnames.Itisunsetbydefault.Toseeifanycustomvariableclasseshavealreadybeenconfigured,runthis gpconfig commandatthecommandline.
gpconfig-scustom_variable_classes
Ifnocustomvariableclasseshavebeenset,settheparameterwiththefollowingcommand.
gpconfig-ccustom_variable_classes-v'gptext'[gpadmin@gpsne~]$gpconfig-ccustom_variable_classes-v'gptext'20171029:12:29:11:028199gpconfig:gpsne:gpadmin-[INFO]:-completedsuccessfully
Ifotherclasseshavebeenconfigured,add gptext totheexistinglist,separatedbyacomma.
Run gpstop-u
tohaveGreenplumDatabasereloadtheconfigurationfile.
WhenyouwanttovieworsetGPTextconfigurationparameters,firstexecutethe gptext.version() functiontoloadtheGPTextconfigurationparametersintothesession.
=#SELECTgptext.version();version--------------------------------GreenplumTextAnalytics2.1.2(1row)
=#SHOWgptext.idx_delim;gptext.idx_delim------------------,(1row)
SeeSettingGPTextConfigurationParametersformoreaboutGPTextconfigurationparameters.
UninstallingGPTextTouninstallGPText,runthe gptext-uninstall utility.YoumusthavesuperuserpermissionsonalldatabaseswithGPTextschemastorun gptext-uninstall .
gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.
Execute:
gptext-uninstall
©CopyrightPivotalSoftware,Inc,2013-2018 10 2.4.0
©CopyrightPivotalSoftware,Inc,2013-2018 11 2.4.0
UpgradingGPTextUpgradingaGPTextsystemtoanewGPTextreleaseinstallsthenewGPTextsoftwarereleaseonallhostsintheGreenplumclusterandthenupgradestheGPTextsystem.
UpgradingGPTextandGreenplumDatabaseattheSameTimeIfyouareupgradingtonewreleasesofGreenplumDatabaseandGPTextatthesametime,followthesesteps:
1. CompletetheGreenplumDatabaseupgradefirstandensurethedatabaseisoperational.
2. RuntheGPText gptext-migrator utilitytomigrateyourcurrentGPTextsystemtothenewlyupgradedGreenplumDatabasesystem.
3. EnsurethatthecurrentversionofGPTextworkswiththenewGreenplumDatabaseversion.
4. ProceedwiththeGPTextupgrade.
UpgradingaGPTextReleaseUpgradingaGPTextreleaseisatwo-partprocess:installthenewsoftwarereleaseontheGreenplumclusterhostsandthenupgradetheexistingGPTextsystem.TheGPTextinstallerperformsthefirstpart,installingthenewsoftware.The gptext-upgrade utilityperformsthesecondpart,upgradingthecurrentGPTextsystemtothenewversion.
TheGPTextinstallerdetectsanexistingGPTextsystemand,afterinstallingthenewsoftwarerelease,offerstorunthe gptext-upgrade utilityforyou.IfyouchoosetoupgradetheGPTextsystemlater,youcanrunthe gptext-upgrade utilityyourself.
AllupgradetasksareexecutedontheGreenplummasterhostasthe gpadmin user.The gpadmin usermusthavewritepermissioninthedirectorywherethenewGPTextreleaseistobeinstalled, /usr/local/greenplum-text-<release>-<version> bydefault.
TheGreenplumDatabase,ZooKeeper,andGPTextclustersmustberunning.TheprocedurestopsandrestartsGPTextduringtheupgrade.
Followthesesteps:
1. DownloadthenewGPTextreleaseforyourplatformfromPivotalNetwork .
2. Extractthereleasepackage.
$tarxfzgreenplum-text-<version>-<platform>.tar.gz
3. MakesurethatZooKeeperandGPTextarerunning.
$gptext-state
4. RuntheGPTextinstaller.
$./greenplum-text-<version>-<platform>.bin
5. TheinstallerpromptsyoutoacceptthePivotallicenseagreementandtochooseandcreatetheinstallationdirectory.
6. Theinstallerverifiestheenvironmenttoensurethatprerequisitesarepresent,suchasPythonandJava.Ifanyproblemsarediscovered,theinstalleroutputsanerrormessageandstops.Correcttheproblemidentifiedbythemessageandruntheinstalleragain.
7. AfterthenewsoftwarehasbeeninstalledontheGreenplumcluster,theinstallerlooksforanexistingGPTextinstallation.IfanexistingGPTextsystemisfound,theinstallerasksifyouwishtoupgradeGPTextdirectly.
Ifyouansweryes,theinstallerrunsthe gptext-upgrade script.The gptext-upgrade utilityvalidatestheenvironmenttoensureitcancompletetheupgrade,thenexecutestheupgradeandrestartstheGPTextsystem.Ifanyproblemsarediscovered, gptext-upgrade outputsamessageandquits.Fixtheindicatedproblemsandrunthegptext-upgradeutility(at <NEW_GPTEXTHOME>/bin/gptext-upgrade )tocompletetheGPTextsystemupgrade.
WhenupgradingGPText,youdonotspecifyaninstallationconfigurationfileasyoudofortheinitialGPTextinstallation.
©CopyrightPivotalSoftware,Inc,2013-2018 12 2.4.0
Ifyouanswerno,youmustrunthe gptext-upgrade scriptaftertheinstallercompletes.Seethegptext-upgradeutilityreferenceforinstructions.
Important:Ifyouanswernoorifthe gptext-upgrade quitswithoutupgradingyoursoftware,followthesestepstore-run gptext-upgrade atalatertime:
a. Sourcethe greenplum-text_path.sh scriptintheoldGPTextinstallationdirectory.Forexample:
$ source /usr/local/greenplum-text-<old-version>/greenplum-text_path.sh
b. Runthe gptext-upgrade commandfromthenewGPTextinstallationdirectory:
$ /usr/local/greenplum-text-<new-version>/bin/gptext-upgrade
8. Aftertheupgradehascompleted,sourcethe greenplum-text_path.sh inthenewGPTextreleasedirectoryandrun gptext-statehealthcheck toverifytheGPTextsystem:
$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh$gptext-statehealthcheck
©CopyrightPivotalSoftware,Inc,2013-2018 13 2.4.0
IntroductiontoPivotalGPTextPivotalGPTextenablesprocessingmassquantitiesofrawtextdata(suchassocialmediafeedsore-maildatabases)intomission-criticalinformationthatguidesbusinessandprojectdecisions.GPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.GPTextsupportsbusinessdecisionmakingbyoffering:
Multiplekindsofdata:GPTextsupportsbothsemi-structuredandunstructureddatasearches,whichexponentiallyincreasesthekindsofinformationyoucanfind.
Lessschemadependence:GPTextdoesnotrequirestaticschemastosuccessfullylocateinformation;schemascanchangeorbequitesimpleandstillreturntargetedresults.
Textanalytics:GPTextsupportsanalysisoftextdatawithmachinelearningalgorithms.TheMADlibanalyticslibraryisintegratedwithGreenplumDatabaseandisavailableforusewithGPText.
Thischaptercontainsthefollowingtopics:
GPTextSystemArchitecture
GPTextSampleUseCase
GPTextWorkflow
TextAnalysis
GPTextSystemArchitectureGPTextcombinesaGreenplumDatabaseclusterwithanApacheSolrCloudcluster.GreenplumDatabasesegmentsandGPTextnodescanbedeployedonthesamehostsorondifferenthostswithnetworkconnectivity.
ThefollowingfigureshowstheprocessarchitectureofthecombinedGreenplumDatabaseandApacheSolrclusters.ThefigureshowsfourclusternodeswithfourGreenplumsegmentsandfourSolrinstancesdeployedoneach.AnApacheZooKeeperservicemanagestheSolrCloudcluster.BecauseZooKeeperismostefficientwithanoddnumberofservers,ZooKeepernodesaredeployedonthreeofthefourhosts.GreenplumDatabaseusersaccessSolrCloudservicesviaGPTextuser-definedfunctionsinstalledinGreenplumdatabasesandcommand-lineutilities.
ThefigureomitstheGreenplummasterhost,secondarymaster,andmirrorsegmentsfortheGreenplumprimarysegments.
©CopyrightPivotalSoftware,Inc,2013-2018 14 2.4.0
TheGreenplumsegments,Solrinstances,andZooKeepernodesmayallbedeployedonseparatehostsonthesamenetwork,dependingonapplicationandperformancerequirements.
ThefollowingsectionsdescribehowGPTextintegratesSolrCloudwithGreenplumDatabaseandhowthetwoclustersworktogethertoprovideparalleltextsearchcapabilitiesinGreenplumDatabaseandmaintainhighavailability.
GreenplumDatabaseClusterAGreenplumDatabaseclusteriscomprisedofthefollowingcomponents:
Amasterdatabaseinstance,executingonadedicatedhost,conventionallynamed mdw .(Notillustrated)
Asecondarymasterinstance,onahostconventionallynamed smdw ,actingasawarmstandbyforthemasterinstance.(Notillustrated)
Anarrayofdatabaseprimarysegmentinstancesandmirrorsdeployedonsegmenthosts,byconvention sdw1 through sdwn .AsegmentinstanceisanindependentPostgresdatabaseprocessmanagingaportionofthedistributeddata.Eachsegmenthasamirror(notillustrated)onanotherhostintheclustertoprovideuninterruptedserviceincaseofasegmentorsegmenthostfailure.Thenumberofprimarysegmentsperhostisdeterminedbythehardwareconfiguration—thenumberandtypeofprocessorcores,theamountofphysicalRAM,localstoragecapacity,andnetworkcapacity—aswellasavailabilityandperformancerequirements.
TheGreenplummasterinstancecoordinatestheworkofthesegmentinstances.OptimalperformanceofaGreenplumDatabaseclusterrequiresthatallsegmenthostsbeconfiguredidenticallywiththesamenumberofprimaryandmirrorsegmentsoneach,andwiththedatabasedatadistributedevenlyamongthesegmentinstances.Thefullcapacityofthedatabaseclusterisutilizedwheneverysegmenthostperformsanequalamountofwork.
ApacheSolrCloudApacheSolrisaserverprovidingaccesstoApacheLucenefull-textindexes.ApacheSolrCloudisahighlyavailable,faulttolerantclusterofApacheSolrservers.ThetermGPTextclusterisanotherwaytorefertoaSolrCloudclusterdeployedbyGPTextforusewithaGreenplumDatabasesystem.
ASolrCloudclusteriscomprisedofthefollowingcomponents:
AnApacheZooKeeperclustertomanagetheSolrCloudcluster.SolrCloudusesZooKeepertomanageserverconfigurationandtocoordinatethecluster’sactivities.GPTextcaninstallZooKeeperclusterthatisboundtotheGPTextcluster,oritcanshareanexistingZooKeepercluster.IfGPTextinstallstheZooKeepercluster,itcanbemanagedusingGPTextfunctionsandutilities.TheZooKeeperclustercanbedeployedonGreenplumDatabaseclusterhostsor,forbestperformance,onseperatehostsaccessibletotheGreenplumDatabasecluster.
MultipleSolrCloudserverinstancesdeployedontheGreenplumsegmenthostsoronotherhostsonthesamenetwork.EachinstanceisaJVMprocessrunningSolrserver.SolrCloudinstancesuselocalstorage,whichmaybethesamelocalstoragevolumesthatstoreGreenplumDatabasedata.ThenumberofSolrCloudinstancesperhostcanbethesameasthenumberofGreenplumprimarysegmentsperhost,butthisisnotarequirement.ThenumberofinstancestoexecuteperhostisspecifiedduringGPTextinstallation.
GPTextprovidesdocumentindexingandsearchcapabilitiesforGreenplumDatabasebyaddinguser-definedfunctions(UDFs)thataccessSolrAPIsfromwithindatabasequeries.
GPTextUDFsperformthefollowingtasks:
createandmanageGPTextindexes
insertdocumentsintoindexesfromdatabasetablesor,forGPTextexternalindexes,fromdocumentsstoredoutsideofGreenplumDatabase
searchindexes
TherearealsoGPTextUDFsandcommand-lineutilitiestoconfigure,monitor,andmanagetheSolrCloudclusterandtomanagereplicas,SolrCloud’shigh-availabilitymechanism.(Moreonreplicasinthenextsection.)
ParallelisminGPTextIndexingandSearchingSolrClouddistributesdocumentindexesinslicescalledshards.WithGPText,thenumberofshardsforanindexisthesameasthenumberofGreenplumsegments,soeachGreenplumsegmentoperatesonanequalportionoftheindex.EachshardismanagedbyaSolrCloudinstanceandtheshardsaredistributedevenlyamongtheSolrCloudinstances.TheSolrCloudinstanceandGreenplumsegmentarenotrequiredtobeonthesamehost.
HighAvailabilityforGPTextIndexesSolrCloudprovideshighavailabilitybymaintainingreplicasofshardsandprovidingautomaticfailoverifashardfailsorbecomesunavailable.Onereplica
©CopyrightPivotalSoftware,Inc,2013-2018 15 2.4.0
ofeachshardistheleadreplicaandanychangestoitareappliedtotheotherreplicas.Thereplicationfactor,whichdeterminesthenumberofreplicastomaintainforeachshard,issetwhentheindexiscreated.ReplicasmayalsobeaddedordroppedlaterusingGPTextUDFsorcommand-lineutilities.
ZooKeeperdeterminesthelocationsofshardreplicasamongtheSolrnodesandhosts.WhenaddingareplicausingaGPTextUDForcommand-lineutility,anewshardcanbeexplicitlyplacedonaSolrCloudinstance.
GPTextSampleUseCaseForensicfinancialanalystsneedtolocatecommunicationsamongcorporateexecutivesthatpointtofinancialmalfeasanceintheirfirm.Theanalystsusethefollowingworkflow:
1. LoadtheemailrecordsintoaGreenplumdatabase.
2. CreateaSolrindexoftheemailrecords.
3. Runqueriesthatlookfortextstringsandtheirauthors.
4. Refinethequeriesuntiltheypairadummycompanynamewithtopthreeorfourexecutivescorrespondingaboutsuspectoffshorefinancialtransactions.Withthisdata,theanalystscanfocustheinvestigationonspecificindividualsratherthanthethousandsofauthorsintheinitialdatasample.
GPTextWorkflowGPTextworkswithGreenplumDatabaseandApacheSolrCloudtostoreandindexbigdataforinformationretrieval(query)purposes.High-levelworkflowsincludedataloadingandindexing,anddataquerying.
Thistopicdescribesthefollowinginformation:
DataLoadingandIndexingWorkflow
QueryingDataWorkflow
DataLoadingandIndexingWorkflowThefollowingdiagramshowstheGPTextworkflowforloadingandindexingdata.
AllclientinteractionwiththesystemisthroughtheGreenplummasterinstance.
1. LoaddataintoyourGreenplumDatabasesystem.Createadatabasetabletoholddataandthenaddthedatatothetable.Greenplumprovidesparalleldataloadingutilitiesandprotocolsthathelptotransformandloadexternaldatainvariousformatsandfromvarioussources.Fordetails,seetheGreenplumDatabaseAdministratorGuide,athttp://gpdb.docs.pivotal.io .
©CopyrightPivotalSoftware,Inc,2013-2018 16 2.4.0
2. CreateanemptyGPTextindex.Usethe gptext.create_index() user-definedfunction(UDF)tocreateanemptyGPTextindexforthetable.EachGreenplumsegmentwillmanageasliceoftheindex,calledashard.SolrCloudcreatesmultiplereplicasforeachshard,distributedamongtheSolrinstances,andchoosesaleadreplicafortheGreenplumsegmenttooperateupon.Solrmanagesreplicationbetweenthereplicas.
3. Populatetheindexwithdatafromthedatabasetable.Usethe gptext.index() UDFtoadddatatotheindex.ThisUDFworksbydispatchingaSQLquerytoexecuteoneachGreenplumsegment.ThesegmentsexecutethequeryandaddtheresultstotheirshardsusingSolrAPIs.
4. Commitchangestotheindex.CommitchangestotheGPTextindexbycallingthe gptext.commit_index() UDF.Untilthechangesarecommitted,queriesexecutedontheindexcannotaccessanydataaddedtotheindexwith gptext.index() .Ifneeded,uncommittedchangescanberolledback.SolrCloudreplicateschangescommittedtotheleadreplicatotheshards’non-leadreplicas.
QueryingDataWorkflowThefollowingdiagramshowsthehigh-levelGPTextqueryprocessworkflow:
1. AusersubmitsaSQLquerydesignedtosearchtheindexeddata.AGPTextsearchqueryisaSQL SELECT statementonaGPTextsearchUDFthatcontainsfull-textsearchexpressions.
2. TheGreenplummasterdispatchesthequerytotheGreenplumsegments.
3. Eachsegmentexecutesthequery,usingtheSolrAPItosearchitsindexshard.SolrCloudexecutesthesearchqueryontheleadreplicafortheshard.
4. TheGreenplumsegmentsreturntheresultsofthesearchquerytotheGreenplummaster.
5. TheGreenplummasteraggregatestheresultsfromallsegmentsandreturnsthemtotheclient.
TextAnalysisGPTextenablesanalysisofSolrindexeswithApacheMADlib,anopensourcelibraryforscalablein-databaseanalytics.MADlibprovidesdata-parallelimplementationsofmathematical,statistical,andmachinelearningmethodsforstructuredandunstructureddata.YoucanuseGPTexttoperformavarietyofMADlibanalyses.
LearnmoreaboutApacheMADlibathttp://madlib.apache.org .A gppkg packageforMADlibisavailableonthePivotalnetworkathttp://network.pivotal.io .
©CopyrightPivotalSoftware,Inc,2013-2018 17 2.4.0
AdministeringGPTextGPTextadministrationincludessecurityconsiderations,monitoringSolrindexstatistics,managingandmonitoringZooKeeper,andtroubleshooting.
ChangingGPTextServerConfigurationParametersConfigurationparametersusedwithGPTextarebuilt-intoGPTextwithdefaultvalues.YoucanchangethevaluesfortheseparametersbysettingthenewvaluesinaGreenplumDatabasesession.ThenewvaluesarestoredinZooKeeper.GPTextindexesusethevaluesofconfigurationparameterswhentheyarecreated.Changingconfigurationparametersaffectsnewindexes,butdoesnotaffectexistingindexes.
SeeGPTextConfigurationParametersforacompletelistofconfigurationparameters.
Aone-timeGreenplumDatabaseconfigurationchangeisneededforGreenplumDatabasetoallowsettinganddisplayingGPTextconfigurationvariables.Asthe gpadmin user,enterthefollowingcommandsinashell:
$gpconfig-ccustom_variable_classes-v'gptext'$gpstop-u
ThenconnecttoadatabasethatcontainstheGPTextschemaandexecutethe gptext.version() functiontoexposetheGPTextconfigurationvariables:
=#select*fromgptext.version();
ChangethevaluesofGPTextconfigurationvariablesusingthe SET commandinasessionwithadatabasethatcontainstheGPTextschema.Thefollowingexamplesetsvaluesforthreeconfigurationparametersina psql session:
=#setgptext.idx_buffer_size=10485760;SET=#setgptext.idx_delim='|';SET=#setgptext.extension_factor=5;SET
Youcanviewthecurrentvalueofaconfigurationparameterthatyouhavesetusingthe SHOW command:
=#showgptext.idx_delim;gptext.idx_delim------------------|(1row)
SecurityandGPTextIndexesGPTextsecurityisbasedonGreenplumDatabasesecurity.YourprivilegestoexecuteGPTextfunctionsdependonyourprivilegesforthedatabasetablethatisthesourcefortheindex.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumDatabasedatabase,thenyouhaveSELECTprivilegesforanindexgeneratedfromthattable.
ExecutingGPTextfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheGreenplumDatabaseAdministratorGuideforinformationaboutsettingprivileges.
ZooKeeperAdministrationApacheZooKeeperenablescoordinationbetweentheApacheSolrandPivotalGPTextdistributedprocessesthroughasharednamespacethatresemblesafilesystem.InZooKeeper,anode(calledaznode)cancontaindata,likeafile,andcanhavechildznodes,likeadirectory.ZooKeeperreplicatesdatabetweenmultipleinstancesdeployedasaclustertoprovideahighlyavailable,fault-tolerantservice.BothSolrandGPTextstoreconfigurationfilesandsharestatusbywritingdatatoZooKeeperznodes.GPTextstoresinformationinthe /gptext znode.TheconfigurationfilesforaGPTextindexareinthe/gptext/configs/<index-name> znode.
ThenumberofZooKeeperinstancesintheclusterdetermineshowmanyZooKeepernodefailurestheclustercantolerateandstillremainactive.Theserviceremainsavailableaslongasaclearmajorityofthenon-failednodesareabletocommunicatewitheachother.Totolerateafailureofnnodesthe
©CopyrightPivotalSoftware,Inc,2013-2018 18 2.4.0
clustermusthave2n+1nodes.Aclusteroffivenodes,forexample,cantoleratetwofailednodes.
ZooKeeperisveryfastforreadrequestsbecauseitstoresdatainmemory.IfZooKeeperbeginstoswapmemorytodisk,SolrandGPTextperformancewillsufferandcouldexperiencefailures,soitiscriticaltoallocatesufficientmemorytotheZooKeeperJavaprocesses.ToavoidZooKeeperinstancescompetingwithGreenplumDatabasesegmentsformemory,youshoulddeploytheZooKeeperinstancesandGreenplumDatabasesegmentsondifferenthosts.TheZooKeeperandGreenplumDatabasehostsmustbeonthesamenetworkandaccessiblewithpasswordlessSSHbythegpadminuser.YoucanusetheGreenplumDatabase gpssh-exkeys utilitytoshareSSHkeysbetweenZooKeeperandGreenplumDatabasehosts.
YoumuststarttheZooKeeperclusterbeforeyoustartGPText.WhenyoustartGPText,theSolrnodeseachloadthereplicasforindexestheymanage.Withlargenumbersofindexes,shards,andreplicas,startinguptheclustercangenerateaveryhigh,atypicalloadonZooKeeper.ItcantakealongtimetogetallindexesloadedandsomeZooKeeperrequestsmaytimeoutwaitingforresponses.Usingthe gptext-start--
slow_startoptionstartsSolrnodesoneata
time,providingamoreorderedstart-upandlimitingthenumberofconcurrentZooKeeperrequests.
TheGPTextcommand-lineutility zkManager canbeusedtomonitortheZooKeepercluster.IftheZooKeeperclusterisboundtoGPText,youcanalsostartandstoptheclusterusing zkManager .
CheckingZooKeeperStatusUsethe zkManager utilityfromthecommandlinetochecktheZooKeeperclusterstatus.Theutilityliststhehosts,ports,latency,andfollower/leadermodeforeachZooKeeperinstance.Ifanodeisdown,itsmodeislistedasDown.
TochecktheZooKeeperclusterstatus,runthe zkManagerstate command.
$zkManagerstate20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstateprocess.20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-HostportLatencymin/avg/maxMode20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21890/0/22follower20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21900/0/29leader20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21880/0/27follower20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Done.
Inadatabasesession,youcanusethe gptext.zookeeper_hosts() functiontolisttheZooKeeperhosts.
=#SELECT*FROMgptext.zookeeper_hosts();host|port--------+------gpdb51|2188gpdb51|2189gpdb51|2190(3rows)
StartingandStoppingtheZooKeeperClusterIftheZooKeeperclusterwasinstalledbytheGPTextinstaller,the zkManager utilitycanstartorstoptheZooKeepercluster.Tostartthecluster,runthezkManagerstart
command.
$zkManagerstart20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstartprocess20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-StartingZookeeper:20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-HostZookeeperDir20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo020171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo120171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo220171016:16:14:48:017845zkManager:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171016:16:14:53:017845zkManager:gpdb:gpadmin-[INFO]:-Done.
TostopZooKeeper,runthe zkManagerstop command.
©CopyrightPivotalSoftware,Inc,2013-2018 19 2.4.0
$zkManagerstop20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstopprocess.20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-StopZookeeper:20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-HostZookeeperDir20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo020171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo120171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo220171016:16:14:09:016499zkManager:gpdb:gpadmin-[INFO]:-Done.
SeethezkManagerreferenceformoreinformation.
CheckingSolrCloudStatusYoucancheckthestatusoftheSolrCloudclusterandindexesbyrunningthe gptext-state utilityfromthecommandline.
TocheckthestateoftheGPTextnodesandeachindex,runthe gptext-state utilitywiththe -D ( --details )option:
gptext-state-D
ThiscommandreportsthestatusoftheGPTextnodesandstatusofeachGPTextindex.
Run gptext-statelist toviewjusttheindexes.
The gptext-statehealthcheck commandcheckstheGPTextconfigurationfiles,theindexstatus,requireddiskspace,userprivileges,andindexanddatabaseconsistency.Bydefault,therequireddiskspacecheckpassesifthereisatleast20%diskfree.Youcansetadifferentdiskfreethresholdusingthe--disk_free option.Forexample:
[gpadmin@gpdb-sandbox~]$gptext-statehealthcheck--disk_free=2520160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
Seethe gptext-state utilityreferenceforadditionaloptions.
RecoveringGPTextNodesUsethe gptext-recover utilitytorecoverdownGPTextnodes,forexampleafterafailedGreenplumDatabasesegmenthostisrecovered.
Withnoarguments,the gptext-recover utilitydiscoversdownGPTextnodesandrestartsthem.
Withthe -f (or --force )option,ifaGPTextnodecannotberestartedandnoshardsaredown,thenodeisdeletedandcreatedagainonthesamehost.Missingreplicasareaddedandthefailednodeandfailedreplicasareremoved.
The -H ( --new_hosts )optionallowsrecreatingdownGPTextnodesonnewhoststhatreplacefailedhosts.ThedownGPTextnodesaredeletedandrecreatedonthenewhosts.Theargumenttothe -H optionisacomma-separatedlistofthenewhoststhataretoreplacethefailedhosts.Thenumberofnewhostsmustmatchthenumberoffailedhosts.Ifshardsaredown,itadvisesreindexing.Ifonlysomereplicasaredown,itrecreatesthereplicasonthenewhostsandupdates gptext.conf .
The -r optionrecoversreplicas,butdoesnotattempttorecoveranydownnodes.
Note:BeforerecoveringGPTextnodesonnewlyaddedhosts,ensurethatthefollowingGPTextprerequisiteshavebeeninstalledonthehost:
Java1.8
Python2.6
©CopyrightPivotalSoftware,Inc,2013-2018 20 2.4.0
TheLinux lsof utility
ViewingSolrIndexStatisticsYoucanviewSolrindexstatisticsbyrunningthe gptext-state utilityfromthecommandline.
TolistallGPTextindexes,enterthefollowingcommandatthecommandline:
gptext-statelist
Acommandlinethatretrievesallstatisticsforanindex:
gptext-state--indexwikipedia.public.articles
Acommandlinethatretrievesthenumberofdocumentsinanindex:
gptext-state--indexwikipedia.public.articles--stats_columns=num_docs
Acommandlinethatretrieves num_docs andtheindex size :
gptext-state--indexwikipedia.public.articles--stats_columnsnum_docs,size
BackingUpandRestoringGPTextIndexesWiththe gptext-backup managementutility,youcanbackupaGPTextindexsothat,ifneeded,youcanquicklyrecoverfromafailure.ThebackupcanberestoredtothesameGPTextsystemortoanothersystemwiththesamenumberofGreenplumDatabasesegments.
The gptext-backup managementutilitybacksupanindexanditsconfigurationfilestoeitherasharedfilesystem,whichmustbemountedonandwritablebyeachhostintheGreenplumDatabasecluster,ortolocalstorageontheGreenplumDatabasemasterandsegmenthosts.
BackingUptoaSharedFileSystemTobackuponasharedfilesystem,usethe -p ( --path )command-lineoptiontospecifythelocationofadirectoryonthemountedfilesystemandthe-n ( --name )optiontoprovideanameforthebackup.Specifytheindextobackupwiththe -i (--index )option.
$gptext-backup-i<index-name>-p<path>--n<backup-name>
The gptext-backup utilitythenchecksthat:
theGPTextclusterisup
thesharedfilesystemisvalid
thebackupnamespecifiedwiththe -n optiondoesnotalreadyexistinthedirectoryspecifiedwiththe -p option
Theutilitycreatesthenewdirectoryandthensavesonecopyofeachindexshardtothatdirectory,alongwiththeindex’sconfigurationfilesfromZooKeeper.
Tosavetheconfigurationfilesonly,withnodata,addthe -c ( --backup_conf )command-lineoption.
Torestoreanindexfromasharedfilesystem,usethe gptext-restore managementutility.TheGPTextsystemyourestoretomustbeonaGreenplumDatabaseclusterwiththesamenumberofsegments.Thedatabaseandschemafortheindexmustbepresent.
The -i ( --index )optionspecifiesthenameoftheGPTextindexthatwillberestored.Iftheindexexists,youmustfirstdropitwiththe gptext.drop_index()user-definedfunction.
The -p ( --path )optionspecifiesthelocationofthedirectorycontainingthebackupfiles—thedirectorythat gptext-backup createdonthesharedfilesystem.
©CopyrightPivotalSoftware,Inc,2013-2018 21 2.4.0
$gptext-restore-i<index-name>-p<path>
Youcanaddthe -c optiontorestoreonlytheconfigurationfilestoZooKeeperandcreateanemptyGPTextindex,withoutrestoringanysavedindexdata.
BackingUptoLocalStorageTobackuptolocalstorageontheGreenplumDatabasecluster,addthe local keywordtothe gptext-backup command-line.
AlocalGPTextbackuphasauniquenameconstructedbyappendingatimestamptotheindexname.Youdonotusethe -n optionwithlocalbackups.
$gptext-backuplocal-i<index-name>
Onthemasterhost,inthemasterdatadirectorybydefault,thebackuputilitysavesaJSONfilewithbackupmetadataandadirectorycontainingtheindex’sconfigurationfilesfromZooKeeper.
TheutilitybacksupeachindexshardontheGreenplumDatabasesegmenthostwiththeGPTextnodethatmanagestheshard’sleadreplica.Bydefault,theshardbackupfilesaresavedinasegmentdatadirectory.
The gptext-backup commandoutputreportsthelocationsofallbackupfiles.
Youcanaddthe -p ( --path )optiontothe gptext-backup commandtospecifyalocaldirectorywherethebackupwillbesaved.ThedirectorymustbepresentoneveryGreenplumDatabasehostandmustbewriteablebythegpadminuser.
$gptext-backuplocal-i<index-name>-p<path>
ThebackupfileswillbesavedinthespecifieddirectoryoneachhostinsteadofintheGreenplumDatabasemasterandsegmentdatadirectories.
Torestoreabackupsavedtolocalstorage,addthe local keywordtothe gptext-restore command-lineandspecifythepathtothebackupdirectoryonthemasterhost.
$gptext-restorelocal-p<path>
The <path> isthefullpathtothedirectorythe gptext-backup commandcreatedonthemasterhost,includingthetimestamp,forexample$MASTER_DATA_DIRECTORY/demo.twitter.message_2018-05-08T15:32:21.397779 .
Seegptext-backupforsyntaxandexamplesforrunning gptext-backup .Seegptext-restoreforsyntaxandexamplesforrunning gptext-restore .
ExpandingtheGPTextClusterThe gptext-expand managementutilityaddsGPTextnodestothecluster.Therearetwowaystoaddnodes:
AddGPTextnodestoexistinghostsinthecluster.ThisoptionincreasesthenumberofGPTextnodesoneachhost.
AddGPTextnodestonewhostsaddedbyusingtheGreenplumDatabase gpexpand managementutilitytoexpandtheGreenplumDatabasesystem.
AddingGPTextNodestoExistingSegmentHostsToaddnodestoexistingsegmenthosts,runthe gptext-expand utilitywithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes
ThisexampleaddstwoGPTextnodestoeachhost.
The -e ( --existing )optionspecifiesthatnodesaretobeaddedtoexistinghosts.
The -p ( --expand_paths )optionprovidesalistofdirectorieswherethenewnodes’datadirectoriesaretobecreated.TheseshouldbethesamedirectoriesthatcontaintheGreenplumDatabasesegmentdatadirectoriesandexistingGPTextdatadirectories.Thenumberofdirectoriesinthelististhenumberofnewnodesthatareadded.
©CopyrightPivotalSoftware,Inc,2013-2018 22 2.4.0
AdirectorycanberepeatedinthedirectorylistmultipletimestoincreasethenumberofnewGPTextnodestocreate.Forexample,ifthereiscurrentlyoneGPTextnodeperhostinthe /data1/nodes directory,youcouldaddthreenodeswithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes,/data2/nodes
Thisaddsonenodetothe /data1/nodes directoryandtwonodestothe /data2/nodes directorysotherearetwoGPTextnodesineachdirectory.
AddingGPTextnodesaffectsnewindexes,butnotexistingindexes.Replicasfornewindexeswillbedistributedacrossallofthenodes,includingbotholdnodesandthenewlycreatednodes.Replicasforindexesthatexistedbeforerunning gptext-expand arenotautomaticallymoved.Rebalancingexistingreplicasrequiresreindexing.
AddingGPTextNodestoNewHostsCheckthatthefollowingGPTextprerequisitesareinstalledoneachnewhostaddedtotheGreenplumDatabasecluster:
Java1.8
Python2.6orgreater
Linux lsof utility
NewhostsmustbereachablebyallhostsintheGPTextcluster,includingexistinghostsandthenewhostsyouareadding.
AfterexpandingtheGreenplumDatabaseclusterwiththe gpexpand managementutility,call gptext-expand withthe -H ( --new_hosts )optionandalistofthenewhostsonwhichtoinstallGPText:
gptext-expand-Hnewhost1,newhost2
The gptext-expand utilityinstallsGPTextbinariesonthenewhostsandthencreatesnewGPTextnodesonthenewhosts.
ExpandingaGreenplumDatabaseclusterincreasesthenumberofsegments,sothenumberofGPTextindexshardsforexistingindexesmustbeincreasedtoequalthenewnumberofsegments.Thisrequiresreindexingallexistingdocuments.Newlycreatedindexeswillautomaticallybedistributedamongthenewshards.
TroubleshootingGPTexterrorsareofthefollowingtypes:
Solrerrors
gptext errors
MostoftheSolrerrorsareself-explanatory.
gptext errorsarecausedbymisuseofafunctionorutility.Theyprovideamessagethattellsyouwhenyouhaveusedanincorrectfunctionorargument.
MonitoringLogsYoucanexaminetheGreenplumDatabaseandSolrlogsformoreinformationiferrorsoccur.GreenplumDatabaselogsresidein:
segment-directory/pg-log
Solrlogsresidein:
<GPDBpath>/solr/logs
DeterminingSegmentStatuswithgptext-stateUsethe gptext-state utilitytodetermineifanyprimaryormirrorsegmentsaredown.See gptext-state intheGPTextManagementUtilitiesReference.
©CopyrightPivotalSoftware,Inc,2013-2018 23 2.4.0
©CopyrightPivotalSoftware,Inc,2013-2018 24 2.4.0
GPTextHighAvailabilityTheGPTexthighavailabilityfeatureensuresthatyoucancontinueworkingwithGPTextindexesaslongaseachshardintheindexhasatleastoneworkingreplica.
AGPTextindexhasoneshardforeachGreenplumsegment,sothereisaone-to-onecoorespondencebetweenGreenplumsegmentsandGPTextindexshards.TheshardmanagedbyaGreenplumsegmentisanindexofthedocumentsthataremanagedbythatsegment.
TheGPTexthighavailabilitymechanismistomaintainmultiplecopies,orreplicas,oftheshard.TheZooKeeperservicethatmanagesSolrCloudchoosesaGPTextinstance(SolrCloudnode)foreachreplicatoensureevendistributionandhighavailability.Foreachshard,onereplicaiselectedleaderandtheGreenplumsegmentassociatedwiththeshardoperatesonthisleaderreplica.TheGPTextinstancemanagingtheleadreplicamayormaynotbeonanotherGreenplumhost,soindexingandsearchingoperationsarepassedovertheGreenplumcluster’sinterconnectnetwork.SolrCloudreplicateschangesmadetotheleaderreplicatotheremainingreplicas.
ThefollowingfigureillustratestherelationshipsbetweenGreenplumsegmentsandGPTextindexshardsandreplicas.Theleaderreplicaforeachshardisshowningreenandthefollowersaregray.
Thenumberofreplicastocreateforeachshard,thereplicationfactor,isaSolrCloudproperty.Bydefault,GPTextstartsSolrCloudwithareplicationfactorofthree.ThereplicationfactorforeachindividualindexisthevalueoftheSolrCloudreplicationfactorwhentheindexiscreated.Changingthereplicationfactordoesnotalterthereplicationfactorforexistingindexes.
GreenplumSegmentorHostFailureIfaGreenplumprimarysegmentfailsanditsmirrorisactivated,GPTextfunctionsandutilitiescontinuetoaccesstheleaderreplica.Nointerventionisneeded.
Ifahostintheclusterfails,bothGreenplumandGPTextareaffected.MirrorsfortheGreenplumprimarysegmentslocatedonthefailedhostareactivatedonotherhosts.SolrCloudelectsanewleaderreplicaforaffectedshards.BecauseGreenplumsegmentmirrorsandGPTextshardreplicasaredistributedthroughoutthecluster,asinglehostfailureshouldnotpreventtheclusterfromcontinuingtooperate.Theperformanceofdatabasequeriesandindexingoperationswillbeaffecteduntilthefailedhostisrecoveredandtheclusterisbroughtbackintobalance.
ZooKeeperClusterAvailabilitySolrCloudisdependentonaworking,availableZooKeepercluster.ForZooKeepertobeactive,amajorityoftheZooKeeperclusternodesmustbeupandabletocommunicatewitheachother.AZooKeeperclusterwiththreenodescancontinuetooperateifoneofthenodesfails,sincetwoisamajorityofthree.Totoleratetwofailednodes,theclustermusthaveatleastfivenodessothatthenumberofworkingnodesremainingafterthefailureareamajority.Totoleratennodefailures,then,aZooKeeperclustermusthave2*n*+1nodes.ThisiswhyZooKeeperclustersusuallyhaveanoddnumberofnodes.
Thebestpracticeforahigh-availabilityGPTextclusterisaZooKeeperclusterwithfiveorsevennodessothattheclustercantoleratetwoorthreefailednodes.
©CopyrightPivotalSoftware,Inc,2013-2018 25 2.4.0
ManagingGPTextClusterHealthGPTextdocumentindexingandsearchingservicesremainavailableaslongaseachshardofanindexhasatleastoneworkingreplica.Toensureavailabilityintheeventofafailure,itisimportanttomonitorthestatusoftheclusterandensurethatalloftheindexshardreplicasarehealthy.YoucanmonitortheSolrCloudclusterandindexesusingtheSolrCloudDashboardorusingGPTextfunctionsandmanagementutilities.AccesstheSolrCloudDashboardwithawebbrowseronanyGPTextinstancewithaURLsuchas http://sdw3:18983/solr .(TheportnumbersforGPTextinstancesaresetwiththeGPTEXT_PORT_BASE parameterintheinstallationparametersfileatinstallationtime.)
RefertotheApacheSolrClouddocumentationforhelpusingtheSolrCloudDashboard.
MonitoringtheClusterwithGPTextTheGPText gptext-state managementutilityallowsyoutoquerythestateoftheGPTextclusterandindexes.Youcanalsouse gptext.index_status() toviewthestatusofallindexesoraspecifiedindex.
ToseetheGPTextclusterstaterunthe gptext-state command-lineutilitywiththe -d optiontospecifyadatabasethathastheGPTextschemainstalled.
gptext-state-dmydb
TheutilityreportsanyGPTextnodesthataredownandliststhestatusofeveryGPTextindex.Foreachindex,thedatabasename,indexname,andstatusarereported.Thestatuscolumncontains“Green”,“Yellow”,or“Red”:-Green–allreplicasforallshardsarehealthy-Yellow–allshardshaveatleastonehealthyreplicabutatleastonereplicaisdown-Red–noreplicasareavailableforatleastoneindexshard
ToseethedistributionofindexshardsandreplicasintheGPTextcluster,executethisSQLstatement.
SELECTindex_name,shard_name,replica_name,node_nameFROMgptext.index_summary()ORDERBYnode_name;
TolistallGPTextindexes,runthe gptext-statelist command.
gptext-statelist-dmydb
The gptext-statehealthcheck commandchecksthehealthofthecluster.The -f flagspecifiesthepercentageofavailablediskspacerequiredtoreportahealthycluster.Thedefaultis10.
gptext-statehealthcheck-f20-dmydb
See gptext-state intheManagementUtilitiesreferenceforhelpwithadditional gptext-state options.
Thegptext.index_status()user-definedfunctionreportsthestatusofallGPTextindexesoraspecifiedindex.
SELECT*FROMgptext.index_status();
Specifyanindexnametoreportonlythestatusofthatindex.
SELECT*FROMgptext.index_status('demo.twitter.message');
AddingandDroppingReplicasThe gptext-replica utilityaddsordropsareplicaofasingleindexshard.Usethe gptext.add_replica() and gptext.delete_replica() user-definedfunctionstoperformthesametasksfromwithinthedatabase.
Ifareplicaofashardfails,use gptext-replica toaddanewreplicaandthendropthefailedreplicatobringtheindexbackto“Green”status.
gptext-replicaadd-imydb.public.messages-sshard3
Hereistheequivalent,usingthe gptext.add_replica() function:
©CopyrightPivotalSoftware,Inc,2013-2018 26 2.4.0
SELECT*FROMgptext.add_replica('mydb.public.messages',shard3);
ZooKeeperdetermineswherethereplicawillbelocated,butyoucanalsospecifythenodewherethereplicaiscreated:
gptext-replicaadd-imydb.public.messages-sshard3-nsdw3
Inthe gptext.add_replica() function,addthenodenameasathirdargument.
Todropareplica,call gptext.delete_replica() withthenameoftheindex,thenameoftheshard,andthenameofthereplica.Youcanfindthenameofthereplicabycalling gptext.index_status(index_name) .Thenameisintheformat core_noden .Anoptional -o flagspecifiesthatthereplicaistobedeletedonlyifitisdown.
gptext-replicadrop-imydb.public.messages-sshard3-rcore_node4-o
Hereistheequivalentoftheabovecommandusingthe gptext.delete_replica() user-definedfunction.
SELECT*FROMgptext.delete_replica('mydb.public.messages','shard3','cord_node4',true);
©CopyrightPivotalSoftware,Inc,2013-2018 27 2.4.0
GPTextBestPracticesEachGPText/ApacheSolrnodeisaJavaVirtualMachine(JVM)processandisallocatedmemoryatstartup.ThemaximumamountofmemorytheJVMwilluseissetwiththe -Xmx parameterontheJavacommandline.Performanceproblemsandoutofmemoryfailurescanoccurwhenthenodeshaveinsufficientmemory.
OtherperformanceproblemscanresultfromresourcecontentionbetweentheGreenplumDatabase,Solr,andZooKeeperclusters.
ThistopicdiscussesGPTextusecasesthatstressSolrJVMmemoryindifferentwaysandthebestpracticesforpreventingoralleviatingperformanceproblemsfrominsufficientJVMmemoryandothercauses.
IndexingLargeNumbersofDocumentsIndexingdocumentsconsumesdatainSolrJVMmemory.Whentheindexiscommitted,partsofthememoryarereleased,butsomedataremainsinmemorytosupportfastsearch.Bydefault,Solrperformsanautomaticsoftcommitwhen1,000,000documentsareindexedor20minutes(1,200,000milliseconds)havepassed.Asoftcommitpushesdocumentsfrommemorytotheindex,freeingJVMmemory.Asoftcommitalsomakesthedocumentsvisibleinsearches.Asoftcommitdoesnot,however,maketheindexupdatesdurable;itisstillnecessarytocommittheindexwiththe gptext.commit()
user-definedfunction.
Youcanconfigureanindextoperformamorefrequentautomaticsoftcommitbyeditingthe solrconfig.xml filefortheindex:
$gptext-configedit-fsolrconfig.xml-i<db>.<schema>.<index-name>
The <autoSoftCommit> elementisachildofthe <updateHandler> element.Editthe <maxDocs> and <maxTime> valuestoreducethetimebetweenautomaticcommits.Forexample,thefollowingsettingsperformanautocommitevery100,000documentsor10minutes.
<autoSoftCommit><maxDocs>100000</maxDocs><maxTime>600000</maxTime></autoSoftCommit>
IndexingVeryLargeDocumentsIndexingverylargedocumentscanusealargeamountofJVMmemory.Tomanagethis,youcansetthe gptext.idx_buffer_size configurationparametertoreducethesizeoftheindexingbuffer.
SeeChangingGPTextServerConfigurationParametersforinstructionstochangeconfigurationparametervalues.
DeterminingtheNumberofGPTextNodestoDeployAGPTextnodeisaSolrinstancemanagedbyGPText.ThenodescanbedeployedontheGreenplumDatabaseclusterhostsoronseparatehostsaccessibletotheGreenplumDatabasecluster.ThenumberofnodesisconfiguredduringGPTextinstallation.
ThemaximumrecommendednumberofGPTextnodesyoucandeployisthenumberofGreenplumDatabaseprimarysegments.However,thebestpracticerecommendationistodeployfewerGPTextnodeswithmorememoryratherthantodividethememoryavailabletoGPTextamongthemaximumnumberofGPTextnodes.Usethe JAVA_OPTS installationparametertosetmemorysizeforGPTextnodes.
AsingleGPTextnodeperhostcaneasilyhandleseveralindexes.EachadditionalnodeconsumesadditionalCPUandmemoryresources,soitisdesirabletolimitthenumberofnodesperhost.FormostGPTextinstallations,asingleGPTextnodeperhostissufficient.
IftheJVMhasaverylargeamountofmemory,however,garbagecollectioncancauselongpauseswhiletheJVMreorganizesmemory.Also,theJVMemploysamemoryaddressoptimizationthatcannotbeusedwhenJVMmemoryexceeds32GB,soatmorethan32GB,aGPTextnodelosescapacityandperformance.Therefore,noGPTextnodeshouldhavemorethan32GBofmemory.
Forexample,ifyouhave48GBmemoryavailableforGPTextperhost,youshoulddeploytwoGPTextnodeswith24GBmemory.Ifyouhave128GBavailable,youshoulddeployatleastfourJVMs,andmoreifgarbagecollectionbecomesaproblem.
©CopyrightPivotalSoftware,Inc,2013-2018 28 2.4.0
ConfigureMaximumJVMHeapSizeEachSolrcorefileconsumesJVMheapmemory.AddingmoreindexesincreasesJVMswappingandgarbagecollectionfrequencysothatittakeslongertocreateindexesandtoloadthecorefileswhenGPTextisstarted.IfyoucontinuetocreateindexeswithoutincreasingtheJVMheap,anoutofmemoryerrorwilleventuallyoccur.
MonitorperformanceatstartupandduringindexcreationandincreasetheJVMsizewhenyoubegintoseedegradedperformance.Youcanalsousetoolssuchasjconsole,includedwiththeJavaDeveloperKit,tomonitorJavaheapusage.Ifgarbagecollectionsareoccurringtoofrequentlyandfreeingtoolittlememory,JVMheapshouldbeincreased.
TheJVMsizeisinitiallyconfiguredduringGPTextinstallationbysettingthe JAVA_OPTIONS parameterintheinstallationconfigurationfile.Afterinstallation,usethe gptext-configjvm commandtoincreasetheJVMheapsize.Forexample,this gptext-configjvm commandsetstheJVMmaximumheapoptionto4GB:
$gptext-configjvm-o"-Xmx=4096M"
ManageIndexingandSearchLoadsWithhighindexingorsearchload,JVMgarbagecollectionpausescancausetheSolroverseerqueuetobackup.ForaheavilyloadedGPTextsystem,youcanpreventsomeperformanceproblemsbyschedulingdocumentindexingfortimeswhensearchactivityislow.
TermsQueriesandOutofMemoryErrorsThe gptext.terms() functionretrievestermsvectorsfromdocumentsthatmatchaquery.Anoutofmemoryerrormayoccurifthedocumentsarelarge,orifthequerymatchesalargenumberofdocumentsoneachnode.Otherfactorscancontributetooutofmemoryerrorswhenrunninga gptext.terms() query,includingthemaximummemoryavailabletotheSolrnodes(-Xmxvaluein JAVA_OPTS )andconcurrentqueries.
Ifyouexperienceoutofmemoryerrorswith gptext.terms() youcansetalowervalueforthe term_batch_size GPTextconfigurationvariable.Thedefaultvalueis1000.Forexample,youcouldtryrunningthefailingquerywith term_batch_size setto500.Loweringthevaluemaypreventoutofmemoryerrors,butperformanceoftermsqueriescanbeaffected.
SeeGPTextConfigurationParametersforhelpsettingGPTextconfigurationparameters.
ConfigureFileSystemCachingforZooKeeperGoodSolrperformanceisdependentonfastresponseforZooKeeperrequests.ZooKeeperperformsbestwhenitsdatabaseiscachedsoitdoesnothavetogotodiskforlookups.IfyoufindthatZooKeeperJVMshavefrequentdiskaccesses,lookforwaystoimprovefilecachingormoveZooKeeperdiskstofasterstorage.
TheZooKeeper zkClientTimeout parameteristhetimeaclientisallowedtonottalktoZooKeeperbeforehavingitssessionexpired.
©CopyrightPivotalSoftware,Inc,2013-2018 29 2.4.0
TroubleshootingHadoopConnectionProblemsThissectiondescribesHadoop-relatedproblemsandpotentialsolutionstotheseissues.
DataNodeAccessErrorsYoumayexperienceHadoopaccesserrorswithGPTextifanyDataNodesintheHadoopclusterresideinamulti-homednetwork.GPTextusesanexternalIPaddresstoaccesstheHDFSNameNode.GPTextencountersanerrorwhentheNameNodeprovidesaninternalIPaddressforaDataNode.Inthissituation,additionalconfigurationisrequiredtoconfigureGPTexttoperformitsownDNSresolutionofDataNodehostnames.
PerformthefollowingproceduretoexplicitlyconfigureDNSresolutionofDataNodehostnames:
1. LocatealocalcopyoftheHadoopauthenticationconfigurationdirectorythatyoupreviouslyuploadedtoZooKeeper.Forexample,ifthedirectoryislocatedat /home/gpadmin/auths/hdfs_conf :
$cd/home/gpadmin/auths/hdfs_conf$lscore-site.xmlhdfs-site.xmluser.txt
2. Open hdfs-site.xml intheeditorofyourchoice.Forexample:
$vihdfs-site.xml
3. Addthefollowingpropertyblocktothefile,andthensavethefileandexit:
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
ThispropertyallowsGPTexthoststoperformtheirownDNSresolutionofHDFSDataNodehostnames.
4. Re-uploadthemodifiedconfigurationtoZooKeeper.Forexample,ifthe hdfs_conf directoryincludestheauthenticationconfigurationfilesforaHadoopclusterwith<config_name> hdfs_bill_auth :
$cd..$gptext-externalupload-thdfs-chdfs_bill_auth-phdfs_conf
5. Determinethehostname-to-IPaddressmappingforallDataNodes,andaddtheassociatedentriesintothe /etc/hosts fileonallGPTextclienthosts.
Kerberos-RelatedErrorsThefollowingproblemsarespecifictoHadoopclusterssecuredwithKerberos.
ClockSkewAloginattempttoaHadoopclustersecuredwithKerberoswillfailifclockskewbetweenGPTextclienthostsandtheKerberosKDChostistoogreat.Inthissituation,youmayseethefollowingerrorintheSolrlog:
java.io.IOException causedbya KrbException noting“Clockskewtoogreat”
Toresolvethissituation,ensurethattheclocksontheKerberosKDChostandGPTextclienthostsaresynchronized.
TimeoutErrorsAloginattempttoaHadoopclustersecuredwithKerberosmayfailwithtimeouterrorswhenthe kdc and admin_server settingsinthe krb5.conf filearespecifiedwithahostname,andtheGPTextclienthostscannotresolvethehostname.Inthissituation,youmayseeoneofthefollowingerrorsintheSolrlog:
©CopyrightPivotalSoftware,Inc,2013-2018 30 2.4.0
org.apache.solr.common.SolrException: Failed to login HDFS messagecausedbya java.io.IOException specifyingjavax.security.auth.login.LoginException: Receive timed out
java.nio.channels.UnresolvedAddressException with SocketIOWithTimeout referencedinthestacktrace
Inthissituation,youmaychooseeitherofthefollowing:
UpdatetheKerberos krb5.conf filetospecifythe kdc and admin_server settingsusingIPaddresses.Or
UpdateallGPTexthoststoperformtheirownDNSresolutionoftheKerberosKDCserver.
Ifyouchoosetoupdatethe krb5.conf file:
1. LocatealocalcopyoftheHadoopKerberosauthenticationconfigurationdirectorythatyoupreviouslyuploadedtoZooKeeper.Forexample,ifthedirectoryislocatedat /home/gpadmin/auths/hdfs_kerb_conf :
$cd/home/gpadmin/auths/hdfs_kerb_conf$lscore-site.xmlhdfs-site.xmlkeytabkrb5.confuser.txt
2. Open krb5.conf intheeditorofyourchoice.Forexample:
$vikrb5.conf
3. Replacethe KERBEROS blockattributeswiththeirequivalentIPaddressesandthensavethefileandexit.Forexample:
[realms]KERBEROS={kdc=<kdc_ipaddress>admin_server=<admin_server_ipaddress>}
4. Re-uploadthemodifiedconfigurationtoZooKeeper.Forexample,ifthedirectorynamed hdfs_kerb_conf includestheauthenticationconfigurationfilesforaHadoopclusterdefinedwiththe<config_name> hdfs_kerb_auth :
$cd..$gptext-externalupload-thdfs-chdfs_kerb_auth-phdfs_kerb_conf
Alternately,ifyouchoosetoconfiguretheGPTexthoststoperformtheirownDNSresolutionoftheKerberosKDCserver,addanentryfortheKDChostname-to-IPaddressmappingtothe /etc/hosts fileonallGPTextclienthosts.
©CopyrightPivotalSoftware,Inc,2013-2018 31 2.4.0
UsingPivotalGPTextIntroductiontoPivotalGPText
WorkingWithGPTextIndexes
QueryingGPTextIndexes
CustomingGPTextIndexes
WorkingwithGPTextExternalIndexes
AdministeringGPText
GPTextHighAvailability
GPTextBestPractices
TroubleshootingHadoopConnectionProblems
©CopyrightPivotalSoftware,Inc,2013-2018 32 2.4.0
WorkingWithGPTextIndexesIndexingpreparesdocumentsfortextanalysisandfastqueryprocessing.ThistopicshowsyouhowtocreateGPTextindexesandadddocumentsfromGreenplumDatabasetablestothem,andhowtomaintainandcustomizeindexesforyourownapplications.
ForhelpindexingandsearchingdocumentsstoredoutsideofGreenplumDatabaseseeWorkingWithGPTextExternalIndexes.
SettingUptheSampleDatabaseTheexamplesinthisdocumentationworkwitha demo databasecontainingthreedatabasetables,called wikipedia.articles , twitter.message ,andstore.products .Ifyouwanttoruntheexamplesyourself,followtheinstructionsinthissectiontosetupthe demo database.
1. LogintotheGreenplumDatabasemasterasthegpadminuserandcreatethe demo database.
$createdbdemo
2. Openaninteractiveshellforexecutingqueriesinthe demo database.
$psqldemo
3. Createthe articles tableinthe wikipedia schemawiththefollowingstatements.
CREATESCHEMAwikipedia;CREATETABLEwikipedia.articles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
4. Createthe message tableinthe twitter schemawiththefollowingstatements.
CREATESCHEMAtwitter;CREATETABLEtwitter.message(idbigint,message_idbigint,spamboolean,created_attimestampwithouttimezone,sourcetext,retweetedboolean,favoritedboolean,truncatedboolean,in_reply_to_screen_nametext,in_reply_to_user_idbigint,author_idbigint,author_nametext,author_screen_nametext,author_langtext,author_urltext,author_descriptiontext,author_listed_countinteger,author_statuses_countinteger,author_followers_countinteger,author_friends_countinteger,author_created_attimestampwithouttimezone,author_locationtext,author_verifiedboolean,message_urltext,message_texttext)DISTRIBUTEDBY(id)PARTITIONBYRANGE(created_at)(START(DATE'2011-08-01')INCLUSIVEEND(DATE'2011-12-01')EXCLUSIVEEVERY(INTERVAL'1month'));CREATEINDEXid_idxONtwitter.messageUSINGbtree(id);
5. CREATEthe store.products tablewiththesestatements.
©CopyrightPivotalSoftware,Inc,2013-2018 33 2.4.0
CREATESCHEMAstore;CREATETABLEstore.products(idbigint,titletext,categoryvarchar(32),brandvarchar(32),pricefloat)DISTRIBUTEDBY(id);
6. Downloadtestdataforthethreetableshere .Right-clickthelink,savethefile,andthencopyittothegpadminuser’shomedirectory.
7. Extractthedatafileswiththistarcommand.
$tarxvfzgptext-demo-data.tgz
8. Loadthewikipediadataintothe wikipedia.articles tableusingthe psql\COPY metacommand.
\COPYwikipedia.articlesFROM'/home/gpadmin/demo/articles.csv'HEADERCSV;
The articles tablenowcontainstextfrom23Wikipediaarticles.
9. Loadthetwitterdataintothe twitter.message tableusingthefollowing psql\COPY metacommand.
\COPYtwitter.messageFROM'/home/gpadmin/demo/twitter.csv'CSV;
The message tablenowcontains1730tweetsfromAugusttoOctober,2011.
10. Loadtheproductstableintothe store.products tablewiththefollowing psql\COPY metacommand.
\COPYstore.productsFROM'/home/gpadmin/demo/products.csv'HEADERCSV;
The products tablenowcontains50rows.Thistableisusedtodemonstratefacetedsearchqueries.SeeCreatingFacetedSearchQueries.
SettinguptheGPTextCommand-lineEnvironmentToworkwithGPTextindexes,youmustfirstsetupyourenvironmentandaddtheGPTextschematothedatabasecontainingthedocuments(GreenplumDatabasedata)youwanttoindex.
Tosettheenvironment,loginasthe gpadmin userandsourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/ directory,enterthesecommands:
$source/usr/local/greenplum-db-<version>/greenplum_path.sh$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh
Withtheenvironmentnowset,youcanaccesstheGPTextcommand-lineutilities.
AddingtheGPTextSchematoaDatabaseUsethe gptext-installsql utilitytoaddtheGPTextschematodatabasescontainingdatayouwanttoindexwithGPText.Youperformthistaskonetimeforeachdatabase.Inthisexample,the gptext schemaisinstalledintothe demo database.
$gptext-installsqldemo
The gptext schemaprovidesuser-definedtypes,tables,views,andfunctionsforGPText.ThisschemaisreservedforGPText.Ifyoucreateanynewobjectsinthe gptext schema,theywillbelostwhenyoureinstalltheschemaorupgradeGPText.
CreatingGPTextIndexesandIndexingData
©CopyrightPivotalSoftware,Inc,2013-2018 34 2.4.0
ThegeneralstepsforcreatingaGPTextindexandindexingdocumentsare:
1. CreateanemptySolrindex
2. Customizetheindex(optional)
3. Populatetheindex
4. Committheindex
Afteryoucompletethesesteps,youcancreateandexecuteasearchqueryorimplementmachinelearningalgorithms.SearchingGPTextindexesisdescribedintheQueryingGPTextIndexestopic.
ThefollowingstepsarecompletedbyexecutingSQLcommandsandGPTextfunctionsinthedatabase.RefertotheGPTextFunctionReferencefordetailsabouttheGPTextfunctionsdescribedinthefollowingexamples.
CreateanemptyGPTextindexAGPTextindexisanApacheSolrcollectioncontainingdocumentsaddedfromaGreenplumDatabasetable.TherecanbeoneGPTextindexperGreenplumDatabasetable.EachrowinthedatabasetableisadocumentthatcanbeaddedtotheGPTextindex.
Ifthedatabasetableispartitioned,thereisoneGPTextindexforallpartitions.Youmustspecifytheroottablenamewhencreatingtheindexandaddingdocumentstoit.GPTextprovidessearchsemanticsthatenablesearchingpartitionsefficiently.
AGPTextexternalindexisaSolrindexfordocumentsthatarelocatedoutsideofGreenplumDatabase.GPTextprovidesuser-definedfunctionstocreateexternalindexesandinsertdocumentsintothem.SeeWorkingwithGPTextExternalIndexes.
The gptext.create_index() functioncreatesanewGPTextindex.Thisfunctionhastwosignatures:
gptext.create_index(<schema_name>,<table_name>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])
or
gptext.create_index(<schema_name>,<table_name>,<p_columns>,<p_types>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])
The <schema_name> and <table_name> argumentsspecifythedatabasetablethatcontainsthesourcedocuments.
The <id_col_name> argumentisthenameofthetablecolumnthatcontainsauniqueidentifierforeachrow.The <id_col_name> columncanbeoftypeint4 , int8 , varchar , text ,or uuid .
The <def_search_col_name> argumentisthenameofthetablecolumnthatcontainsthecontentyouwanttosearchbydefault.Forexample,ifyouwanttoindexandsearchjustthe <content> column,youcanusethefirstsignatureandspecifythe content columnnameinthe <def_search_col_name> argument.
Thefinal,optionalargument, <if_check_id_uniqueness> ,isaBooleanargument.Whentrue,thedefault,attemptingtoaddadocumentwithanidthatalreadyexistsintheindexgeneratesanerror.Ifyousettheargumenttofalse,youcanadddocumentswiththesameid,butwhenyousearchtheindexalldocumentswiththesameIDarereturned.
Thefollowingcommandcreatesanindexforthe twitter.message table,withthe id columnastheuniqueIDfieldandthe message_text columnforthedefaultsearchcolumn:
=#SELECT*FROMgptext.create_index('twitter','message','id','message_text');
Toverifythatthe demo.twitter.message indexwascreated,call gptext.index_status() :
©CopyrightPivotalSoftware,Inc,2013-2018 35 2.4.0
=#SELECT*FROMgptext.index_status('demo.twitter.message');content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned|external_index------------+----------------------+------------+-------------+--------------+---------------+--------------------------------------+-------------------+--------------------------+-----------+-------------+----------------0|demo.twitter.message|shard0|active|core_node2|active|demo.twitter.message_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|t|f0|demo.twitter.message|shard0|active|core_node3|active|demo.twitter.message_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|t|f1|demo.twitter.message|shard1|active|core_node1|active|demo.twitter.message_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|t|t|f1|demo.twitter.message|shard1|active|core_node4|active|demo.twitter.message_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|f|t|f(4rows)
ThisexampleexecutedonaGreenplumDatabaseclusterwithtwoprimarysegments.Twoshardswerecreated,oneforeachsegment,andeachshardhastworeplicas.Thereplicasarenamedcore_node1throughcore_node4.
Youcanalsorunthe gptext-state-D
command-lineutilitytoverifytheindexwascreated.Seethegptext-statereferencefordetails.
TheGPTextindexforthe demo.twitter.message tableisconfigured,bydefault,toindexallcolumnsinthe twitter.message databasetable.Youcanwritesearchqueriesthatcontaincriteriausinganycolumninthetable.
Ifyouwanttoindexandsearchasubsetofthetablecolumns,youcanusethesecond gptext.create_index() signature,specifyingthecolumnstoindexinthe<p_columns> argumentandthedatatypesofthosecolumnsinthe <p_types> argument.The <p_columns> and <p_types> argumentsaretextarrays.The
idcolumnnameanddefaultsearchcolumnnamemustbeincludedinthearrays.
Usethesecond gptext.create_index() signaturetocreateanindexforthe wikipedia.articles table.Thisindexwillallowyoutosearchonthe title , content ,andrefs columns.Notethattheidcolumnanddefaultsearchcolumnarestillspecifiedinseparateargumentsfollowingthe <p_columns> and <p_types>
arrays.
=#SELECT*FROMgptext.create_index('wikipedia','articles','{id,title,content,refs}','{long,text_intl,text_intl,text_intl}','id','content',true);INFO:Createdindexdemo.wikipedia.articlescreate_index--------------t(1row)
Becausethe date_time columnwasomittedfromthe <p_columns> and <p_types> arrays,itwillnotbepossibletosearchthe wikipedia.articles indexondatewiththeGPTextsearchfunctions.
Customizetheindex(optional)CreatingaGPTextindexgeneratesasetofconfigurationfilesfortheindex.Beforeyouadddocumentstotheindex,youcancustomizetheconfigurationfilestochangethewaydataisindexedandstored.Youcancustomizeanindexlater,afteryouhaveaddeddocumentstoit,butyoumustthenreindexthedatatotakeadvantageofyourcustomizations.
Onecommoncustomizationistoremapdatatypesforsomedatabasecolumns.Inthe managed-schema configurationfileforanindex,GPTextmapsthedatatypesforeachfieldfromtheGreenplumDatabasetypetoanequivalentSolrdatatype.GPTextappliesdefaultmappings(seeGPTextandSolrDataTypeMappings),butyourindexmaybemoreeffectiveifyouuseadifferentmappingforsomefields.
The demo.twitter.message table,forexample,hasa message_text textcolumnthatcontainstweets.Bydefault,GPTextmapstextcolumnstotheSolr text_intl(internationaltext)type.TheGPText text_sm (socialmediatext)typeisabettermappingforatextcolumnthatcontainssocialmediaidiomssuchasemoticons.
Followthesestepstoremapthe message_text fieldtothe gtext_sm type.
1. Usethe gptext-config utilitytoeditthe managed-schema fileforthe demo.twitter.message index.
$gptext-configedit-idemo.twitter.message-fmanaged-schema
The managed-schema fileloadsinatexteditor(normallyvi).
2. Findthe <field> elementforthe message_text field.
<fieldname="message_text"stored="false"type="text_intl"indexed="true"/>
©CopyrightPivotalSoftware,Inc,2013-2018 36 2.4.0
3. Changethe type attributefrom text_intl to text_sm .
<fieldname="message_text"stored="false"type="text_sm"indexed="true"/>
4. Savethefileandexittheeditor.
TherearemanyotherwaystocustomizeaGPTextindex.Forexample,youcanomitfieldsfromtheindexbychangingthe indexed attributeofthe <field>elementto false ,storethecontentsofthefieldintheindexbychangingthe stored attributeto true ,oruse gptext-config toeditthe stopwords.txt filetospecifyadditionalwordstoignorewhenindexing.
SeeCustomizingGPTextIndexestolearnhowdatatypemappingdetermineshowSolranalyzesandindexesfieldcontentsandformorewaystocustomizeGPTextindexes.
PopulatetheindexTopopulatetheindex,usethetablefunction gptext.index() ,whichhasthefollowingsyntax:
SELECT*FROMgptext.index(TABLE(SELECT*FROM<table_name>),<index_name>);
Toindexallrowsinthe twitter.message table,executethiscommand:
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message');dbid|num_docs------+----------2|8923|838(2rows)
Thiscommandindexestherowsinthe wikipedia.articles table.
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMwikipedia.articles),'demo.wikipedia.articles');dbid|num_docs------+----------3|112|12(2rows)
Theresultsofthiscommandshowthat23documentsfromtwosegmentswereaddedtotheindex.
Thefirstargumentofthe gptext.index() functionisa“table-valuedexpression.” TABLE(SELECT*FROMwikipedia.articles)
createsatable-valuedexpression
fromthearticlestable,usingthetablefunction TABLE .
Youcanchoosethedatatoindexorupdatebychangingtheinnerselectlistinthequerytoselecttherowsyouwanttoindex.Whenaddingnewdocumentstoanexistingindex,forexample,specifya WHERE clauseinthe gptext.index() calltochooseonlythenewrowstoindex.
Theinner SELECT statementcouldalsobeaqueryonadifferenttablewiththesamestructure,oraresultsetconstructedwithanarbitrarilycomplexjoin,providedthecolumnsspecifiedinthe gptext.create_index() functionarepresentintheresults.Ifyouindexdatafromasourceotherthanthetableusedtocreatetheindex,besurethedistributionkeyfortheresultsetmatchesthedistributionkeyofthebasetable.TheGreenplumDatabase SELECT
statementhasa SCATTERBY clausethatyoucanusetospecifythedistributionkeyfortheresultsfromaquery.SeeSpecifyingadistributionkeywithSCATTERBYformoreaboutthedistributionpolicyandGPTextindexes.
CommittheindexAfteryoucreateandpopulateanindex,youcommittheindexusing gptext.commit_index(<index_name>) .
Thisexamplecommitsthedocumentsaddedtotheindexesinthepreviousexample.
©CopyrightPivotalSoftware,Inc,2013-2018 37 2.4.0
=#SELECT*FROMgptext.commit_index('demo.twitter.message');commit_index--------------t(1row)
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
The gptext.commit_index() functioncommitsanynewdataaddedtoordeletedfromtheindexsincethelastcommit.
ManagingGPTextIndexesGPTextprovidescommand-lineutilitiesandfunctionsyoucanusetoperformtheseGPTextmanagementtasks:
Configuringanindex
Optimizinganindex
SpecifyingadistributionpolicywithSCATTERBY
Deletingfromanindex
Droppinganindex
Addingafieldtoanindex
Droppingafieldfromanindex
Listingallindexes
ConfiguringanindexYoucanmodifyyourindexingbehaviorgloballybyusingthe gptext-config utilitytoeditasetofindexconfigurationfiles.Thefilesyoucaneditwithgptext-config are:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seehttp://wiki.apache.org/solr/SolrConfigXml ).
managed-schema –DefinestheanalyzerchainsthatSolrusesforvariousdifferenttypesofsearchfields(seeTextAnalyzerChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalyzerchain.Forexample,iPhone.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalyzerchain.
elevate.xml –Movesspecificwordstothetopofyourfinalindex.
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalyzerchain.(seeTheemoticons.txtfile).
Youcanalsouse gptext-config tomovefiles.
OptimizinganindexThefunction gptext.optimize_index(<index_name>,<max_segments>) mergesallsegmentsintoasmallnumberofsegments( <max_segments> )forincreasedefficiency.
Example:
=#SELECT*FROMgptext.optimize_index('demo.wikipedia.articles',10);optimize_index----------------t(1row)
SpecifyingadistributionpolicywithSCATTERBY
©CopyrightPivotalSoftware,Inc,2013-2018 38 2.4.0
Thefirstparameterof gptext.index() isatable-valuedexpression,suchas TABLE(SELECT*FROMwikipedia.articles)
.Thequeryinthisparametermusthave
thesamedistributionpolicyasthetableyouareindexingsothatdocumentsaddedtotheindexareassociatedwiththecorrectGreenplumDatabasesegments.Somequeries,however,havenodistributionpolicyortheyhaveadifferentdistributionpolicy.Thiscouldhappenifthequeryisajoinoftwoormoretablesoraqueryonanintermediate(staging)tablethatisdistributeddifferentlythanthebasetablefortheindex.
Tospecifyadistributionpolicyforaqueryresultset,theGreenplumDatabaseSELECTstatementhasa“SCATTERBY”clause.
TABLE(SELECT*FROMwikipedia.articlesSCATTERBY<distrib_id>)
where distrib_id isthesamedistributionkeyusedtodistributethebasetablefortheindex.
DeletingfromanindexYoucandeletefromanindexusingaquerywiththefunction gptext.delete(<index_name>,<query>) .Thisdeletesfromtheindexalldocumentsthatmatchthesearchquery.Todeletealldocuments,usethequery '*' .
Afterasuccessfuldeletion,execute gptext.commit_index(<index_name>) tocommitthechange.
Thisexampledeletesalldocumentscontaining "toxin" inthedefaultsearchfield.
=#SELECT*FROMgptext.delete('demo.wikipedia.articles','toxin');delete--------t(1row)
SELECT*FROMgptext.commit_index('demo.wikipedia.articles');
Examplethatdeletesalldocumentsfromtheindex:
SELECT*FROMgptext.delete('demo.wikipedia.articles','*:*');
Besuretocommitchangestotheindexafterdeletingdocuments.
SELECT*FROMgptext.commit_index('demo.wikipedia.articles');
DroppinganindexYoucancompletelyremoveanindexwiththe gptext.drop_index(<index_name>) function.
Example:
SELECT*FROMgptext.drop_index('demo.wikipedia.articles');
AddingafieldtoanindexYoucanaddafieldtoanexistingindexusingthe gptext.add_field() function.Forexample,youcanaddafieldtotheindexafteracolumnisaddedtotheunderlyingdatabasetableoryoucanaddafieldtoindexacolumnthatwasnotspecifiedwhentheindexwascreated.
GPTextmapstheGreenplumDatabasefieldtypetoanequivalentSolrdatatypeautomatically.SeeGPTextandSolrDataTypeMappingsforatableofdatatypemappings.
©CopyrightPivotalSoftware,Inc,2013-2018 39 2.4.0
CREATETABLEmyarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
SELECT*FROMgptext.create_index('wikipedia','myarticles','id','content',true);...populatetheindex...SELECT*FROMgptext.commit_index('demo.wikipedia.myarticles');
ALTERTABLEmyarticlesADDnotestext;SELECT*FROMgptext.add_field('demo.wikipedia.myarticles','notes',false,false);SELECT*FROMgptext.reload_index('demo.wikipedia.myarticles');
AddingafieldtoaGPTextindexrequiresthebasetabletobeavailable.Ifyoudropthetableaftercreatingtheindex,youcannotaddfieldstotheindex.
DroppingafieldfromanindexYoucandropafieldfromanexistingindexwiththe gptext.drop_field() function.Afteryouhavedroppedfields,call gptext.reload_index() toreloadtheindex.
Example:
SELECT*FROMgptext.drop_field('demo.wikipedia.myarticles','notes');SELECT*FROMgptext.reload_index('demo.wikipedia.myarticles');
ListingallindexesYoucanlistallindexesintheGPTextclusterusingthe gptext-state command-lineutility.Forexample:
$gptext-state-D20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-ExecuteGPTextstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.1.220170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Indexstatedetails.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-databaseindexnamestate20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-wikipediademo.wikipedia.articlesGreen20170822:10:11:28:029752gptext-state:gpsne:gpadmin-[INFO]:-Done.
StoringFieldContentinanIndexSolrcanstorethecontentsofcolumnsintheindexsothatresultsofasearchontheindexcanincludethecolumncontents.Thismakesitunnecessarytojointhesearchqueryresultswiththeoriginaltable.Youcanevenstorethecontentsofdatabasecolumnsthatarenotindexedandreturnthatcontentwithsearchresults.GPTextreturnstheadditionalfieldcontentinabufferaddedtothesearchresults.Individualfieldscanberetrievedfromthisbufferusingthe gptext.gptext_retrieve_field() , gptext.gptext_retrieve_field_int() ,and gptext.gptext_retrieve_field_float() functions.
Onedesignpatternistostorecontentforallofatable’scolumnsintheGPTextindexsothedatabasetablecanthenbetruncatedordropped.AdditionaldocumentscanbeaddedtotheGPTextindexlaterbyinsertingthemintothetruncatedtable,orintoatemporarytablewiththesamestructure,andthenaddingthemtotheindexwiththe gptext.index() function.
ToenablestoringcontentinaGPTextindex,youmusteditthe managed-schema filefortheindex.The <field> elementforeachfieldhasa stored attribute,whichdefaultstofalse,exceptfortheuniqueidfield.
Followthesestepstoconfigurethe demo.wikipedia.articles indextostorecontentforthe title , content ,and refs columns.
1. Logintothemasteras gpadmin anduse gptext-config toeditthe managed-schema file.
$gptext-configedit-idemo.wikipedia.articles-fmanaged-schema
©CopyrightPivotalSoftware,Inc,2013-2018 40 2.4.0
2. Findthe <field> elementsforthecolumnsyouwanttostoreintheindex.Notethat <field> elementswithnamesbeginningwithanunderscoreareinternalfieldsandshouldnotbemodified.The“title”,“content”,and“refs”fieldsinthisexampleareindexed,butnotstored.
<fieldname="__temp_field"type="intl_text"indexed="true"stored="false"multiValued="true"/><fieldname="_version_"type="long"indexed="true"stored="true"/><fieldname="id"stored="true"type="long"indexed="true"/><fieldname="__pk"stored="true"indexed="true"type="long"/><fieldname="title"stored="false"type="text"indexed="true"/><fieldname="content"stored="false"type="text"indexed="true"/><fieldname="refs"stored="false"type="text"indexed="true"/>
3. Foreachfieldyouwanttostoreintheindex,changethe stored attributefrom "false" to "true" .
<fieldname="title"stored="true"type="text"indexed="true"/><fieldname="content"stored="true"type="text"indexed="true"/><fieldname="refs"stored="true"type="text"indexed="true"/>
4. Savethefileand,ifanydocumentswerealreadyaddedtotheindex,reindexthetable.SeeRetrievingStoredFieldContentforinformationaboutretrievingthestoredcontentwithGPTextqueryresults.
Formoreaboutthecontentsofthe managed-schema fileandadditionalwaystocustomizeGPTextindexesseeCustomizingGPTextIndexes.
CreatingaGPTextindexforaGreenplumDatabasepartitionedtableCreatingaGPTextindexforapartitionedGreenplumDatabasetableusing gptext.create_index() isthesameascreatinganindexforanon-partitionedtable.Youmustsupplythenameoftherootpartition,however;ifyouattempttocreateaGPTextindexforachildpartition,the gptext.create_index() functionissuesanerrormessage.
GPTextrecognizesapartitionedtableandaddsa __partition fieldtotheindex.Thenwhenyouadddocumentstotheindex,GPTextsavesthechildpartitiontablenameinthe __partition field.Youcanusethe __partition fieldtocreateGPTextqueriesthatsearchandfilterbypartition.
UnlikeGreenplumDatabase,whichmanageschildpartitionsasseparatedatabasetables,GPTextdoesnotcreateaseparateSolrcollectionforeachdatabasepartitionbecausethelargernumberofSolrcorescouldadverselyaffectthecapacityandperformanceoftheSolrcluster.
The demo.twitter.message tablecreatedintheSettingUptheSampleDatabasesectionisapartitionedtable.SeeSearchingPartitionedTablesforexamplesofsearchingpartitions.
AddinganddroppingpartitionsfromGPTextindexesYoucanaddnewpartitionsto,anddroppartitionsfrom,GreenplumDatabasepartitionedtables.IfyouhavecreatedaGPTextindexonapartitionedtable,whenyouaddordroppartitionsinthebasedatabasetable,youmustperformaparallelGPTextindexoperation.
Whenanewpartitionisadded,thepartitioncanbeindexedoncethedataisinplace.Youcanselectrowsdirectlyfromthenewlyaddedchildpartitiontabletoindexthedata.First,usethe gptext.partition() statusfunctiontofindthenamesofchildpartitiontables.
=#SELECT*FROMgptext.partition_status('demo.twitter.message');partition_name|inherits_name|level|cons
------------------------------------+----------------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------demo.twitter.message_1_prt_1|demo.twitter.message|1|((created_at>='2011-08-0100:00:00'::timestampwithouttimezone)AND(created_at<'2011-09-0100:00:00'::timestampwithouttimezone))demo.twitter.message_1_prt_2|demo.twitter.message|1|((created_at>='2011-09-0100:00:00'::timestampwithouttimezone)AND(created_at<'2011-10-0100:00:00'::timestampwithouttimezone))demo.twitter.message_1_prt_3|demo.twitter.message|1|((created_at>='2011-10-0100:00:00'::timestampwithouttimezone)AND(created_at<'2011-11-0100:00:00'::timestampwithouttimezone))demo.twitter.message_1_prt_4|demo.twitter.message|1|((created_at>='2011-11-0100:00:00'::timestampwithouttimezone)AND(created_at<'2011-12-0100:00:00'::timestampwithouttimezone))demo.twitter.message_1_prt_dec2011|demo.twitter.message|1|((created_at>='2011-12-0100:00:00'::timestampwithouttimezone)AND(created_at<'2112-01-0100:00:00'::timestampwithouttimezone))(5rows)
©CopyrightPivotalSoftware,Inc,2013-2018 41 2.4.0
Intheexampleabove,anewpartitionwiththename twitter.message_1_prt_dec2011 wasaddedtothe demo.twitter.message table.ThefollowingstatementsaddthedatafromthenewpartitiontotheGPTextindexandcommitthechanges.
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMtwitter.message_1_prt_dec2011),'demo.twitter.message');dbid|num_docs------+----------3|1092|128(2rows)
=#SELECT*FROMgptext.commit_index('demo.twitter.message');commit_index--------------t(1row)
Thenameofthenewchildpartitionfile(excludingthedatabaseandschemanames)issavedinthe __partition fieldintheindex.
Whenapartitionisdeletedfromapartitionedtable,thedatafromthepartitioncanbedeletedfromtheGPTextindexbyspecifyingthepartitionnameinthe <search> argumentofthe gptext.delete() function.Besuretocommittheindexafterdeletingthepartition.
=#SELECT*FROMgptext.delete('demo.twitter.message','__partition:message_1_prt_dec2011');delete--------t(1row)
=#SELECT*FROMgptext.commit_index('demo.twitter.message');commit_index--------------t(1row)
©CopyrightPivotalSoftware,Inc,2013-2018 42 2.4.0
QueryingGPTextIndexesToretrievedata,yousubmitaquerythatperformsasearchbasedoncriteriathatyouspecify.Simplequeriesreturnstraight-forwardresults.Youcanusethedefaultqueryparser,orspecifyadifferentqueryparseratquerytime.
CreatingaSimpleSearchQueryAfteraSolrindexiscommitted,youcanrunquerieswiththe gptext.search() function,whichhasthissyntax:
gptext.search(<src_table>,<index_name>,<search_query>,<filter_queries>[,<options>])
The <search_query> argumentisatextvaluethatcontainsaSolrquery.The <filter_queries> argumentisanarrayofqueriesthatrestrictthesetofdocumentstosearch.
ThedefaultSolrStandardQueryParserhasarichquerysyntaxthatincludeswildcardcharacters,Booleanoperators,proximityandrangesearches,andfuzzysearches.SeeTheStandardQueryParser attheSolrwebsiteforexamples.
Solrhasadditionalqueryprocessorsthatyoucanspecifyinthe <search_query> argumenttoaccessadditionalfeatures.TheGPTextUniversalQueryParser, gptextqp ,allowsqueriesthatmixfeaturesfromallofthesupportedqueryparsers.
SeeSelectingaQueryParserforalistofthesupportedqueryparsersandhowtorequesttheminyourqueries.SeeUsingtheUniversalQueryParserforexamplesusingtheGPTextUniversalQueryParser.
Thefollowingsectionsshowhowtousethe gptext.search() function,includingexamplequeriesthatdemonstrateSolrsearchfeatures.
AnANDsearchexamplewithtop5resultsThissearchfindsdocumentsinthe wikipedia.articles indexthatcontainbothsearchterms“solar”and“battery”.The 'rows=5' argumentisaSolroptionthatspecifiesthetop5resultsaretobereturnedfromeachsegment.InaGreenplumDatabaseclusterwithtwosegments,thisqueryreturnsupto10rows.
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','solarANDbattery',null,'rows=5')qWHEREq.id::int8=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|2.71286582008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.58101534711003|2017-08-1018:56:00-05|Osmoticpower|2.207300725784|2017-08-2607:10:00-05|Renewableenergy|2.1295567213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896(7rows)
SeeSolroptionsformoreaboutSolroptions.
AnORsearchexamplewithtop5resultsByusingtheORkeyword,thissearchmatchesmoredocumentsthantheANDexample.Thetotalnumberofrowsreturnedislimitedbythe rows=5 Solroption.
©CopyrightPivotalSoftware,Inc,2013-2018 43 2.4.0
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','solarORbattery',null,'rows=5')qWHEREq.id::int8=a.idORDERBYscoreDESC;id|date_time|title|score---------+------------------------+---------------------+-----------2008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.581015325784|2017-08-2607:10:00-05|Renewableenergy|2.12955672120798|2017-01-2800:59:00-06|Lithiumeconomy|2.0416002213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896533423|2017-08-2800:52:00-05|Solarwaterheating|1.02390722988035|2017-03-1206:39:00-05|Vortexengine|0.9519546113728|2017-08-1509:59:00-05|Geothermalenergy|0.680103555017|2017-08-2819:24:00-05|Fusionpower|0.6432224(10rows)
Searchnon-defaultfieldsAGPTextindexhasadefaultsearchcolumn,specifiedwhentheindexiscreatedwiththe gptext.create_index() function.Ifyouhaveincludedadditionalcolumnstoindex,youcanreferencetheminyourqueries.Thisquerysearchesfordocumentswiththeword“solar”inthe title column.
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','title:solar',null,null)qWHEREq.id::int8=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|1.654772927743|2017-08-2015:56:00-05|Solarenergy|1.6547729533423|2017-08-2800:52:00-05|Solarwaterheating|1.1132113213555|2017-08-2712:48:00-05|Solarupdrafttower|1.1132113(4rows)
Thisexamplefindsdocumentswherethe title columnmatches“Solarpower”or“Solarenergy”.
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','title:(solarAND(powerORenergy))',null,null)qWHEREq.id::int8=a.id;id|date_time|title|score----------+------------------------+--------------+-----------27743|2017-08-2015:56:00-05|Solarenergy|3.309545813690575|2017-08-2402:34:00-05|Solarpower|2.9718256(2rows)
Thisexamplesearchesforarticlesthathave“photosynthesis”inthe content columnbutthatdonothave“solar”inthe title column.
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','photosynthesisand-title:solar',null,null)qWHEREq.id::int8=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+-----------25784|2017-08-2607:10:00-05|Renewableenergy|2.972095553716476|2017-08-2820:40:00-05|Seaweedfuel|1.424022114205946|2017-08-2808:46:00-05|Algaefuel|1.3022419608623|2017-08-2703:56:00-05|Ethanolfuel|0.7614042(4rows)
Filteringsearchresults
©CopyrightPivotalSoftware,Inc,2013-2018 44 2.4.0
Afilterqueryappliesfilterstotheresultsreturnedbythequery.The <filter_queries> argumentofthe gptext.search() functionisanarray,soyoucanapplymultiplefilterstothesearchresults.
Thefollowingexamplefindsarticlesthathavetheword“nuclear”inthe content columnandthenappliestwofilterqueriestoremovearticlesthathave“solar”inthe title columnandarticlesthatdonothave“power”inthe title column.
=#SELECTa.id,a.date_time,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','nuclear','{-title:solar,title:power}',null)qWHEREq.id::int8=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+------------14090587|2017-08-1414:00:00-05|Low-carbonpower|1.189789755017|2017-08-2819:24:00-05|Fusionpower|1.175360913021878|2017-08-0905:03:00-05|Geothermalpower|0.99499804(3rows)
Thefollowingexamplesearchesthe demo.twitter.message tableformessagesthatcontainthetext“iphone”andeither“hate”or“love”andfiltersforauthorswhospecifiedEnglishlanguageintheirtwitterprofile.
=#SELECTt.id,q.score,t.author_screen_name,t.message_textFROMtwitter.messaget,gptext.search(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message','(iphoneAND(hateORlove))','{author_lang:en}','rows=5')qWHEREt.id=q.id::int4ORDERBYscoreDESC;id|score|author_screen_name|message_text----------+-----------+--------------------+------------------------------------------------------------------------------------------------------------19424811|3.446217|kennediiscool|Ihate
:iPhones:20663075|2.9209785|Hi_imMac|RT@indigoFKNvanity:IhatetheautocorrectoniPhones!!!!!!!!!20042822|2.9209785|renadrian|@KDMC23ohhhh!!!IhateIphoneTalk!20759274|2.5128412|SteLala|Droppedfrutopiaon
:Myphone...#ciaowaterdamageIhateiPhones.19416451|2.1448703|ShayFknShay|I'minlovewithmynewiPhone(:20350436|2.102924|mahhnamestj|Iabsolutelylovehowfastthisphoneworks.LovetheiPhone.19284329|1.9478481|popolvuhplaya|#nowplayingonmyiPhone:DaftPunk-"DigitalLove"19714120|1.9478481|BipolarBearApp|@ayee_Eddy2011Ilovepancakestoo!#iPhone#app20257190|1.6903389|alasco|Lovemy#iphone-onlyproblemnow?Iwantan#Ipad!20473459|1.379696|ArniBella|ilovemyiphone4butI'mexcitedtoseewhattheiphone5hastooffer#gadgets#iphone#apple#technology(10rows)
CreatingFacetedSearchQueriesFacetingbreaksqueryresultsintomultiplecategorieswithacountofthenumberofdocumentsintheindexforeachcategory.TherearethreeGPTextfacetedsearchfunctions:
gptext.faceted_field_search() –thecategoriesarethevaluesofoneormorefieldsinGPTextindex.
gptext.faceted_query_search() –thecategoriesarealistofsearchqueries.
gptext.faceted_range_search() –thecategoriesarealistofrangescalculatedfromminimumvalue,maximumvalue,andthesizeoftherange(gap).
Theexamplesinthissectionusethe store.products table.SeeSettingUptheDemoDatabaseforcommandstocreateandloaddataintothistable.
Afterthetableiscreatedandthedataloaded,createtheGPTextindex,indexthedata,andthencommittheindexasshowninthisexample.
=# SELECT * FROM gptext.create_index('store', 'products', '{id, title, category, brand, price}',
©CopyrightPivotalSoftware,Inc,2013-2018 45 2.4.0
FacetingonFieldsWiththe gptext.faceted_field_search() function,thecategoriesarevaluesofoneormorefieldsintheindex.Hereisthesyntaxforthe gptext.faceted_field_search()function:
gptext.faceted_field_search(<index_name>,<query>,<filter_queries>,<facet_fields>,<facet_limit>,<minimum>[,<options>])
<index_name> isthenameoftheGPTextindexwithfieldstofacet.
<query> isasearchquerythatselectsthesetofdocumentstobefaceted.Tofacetalldocumentsintheindexspecify '*:*' .
<filter_queries> isanarrayofqueriesthatfilterdocumentsfromthesetreturnedbythe <query> ,or null ifnone.Onlydocumentsthatmatchallqueriesinthelistareincludedinthecounts.
<facet_fields> isanarrayofindexfieldstofacet.
<facet_limit> isthemaximumnumberofresultstoreportforanyonecategory.Use -1 toreportallresults.
<minimum> istheminimumnumberofresultsacategorymusthaveinordertobeincludedintheresults.
Thisexamplefacetsalldocumentsinthe demo.store.products indexonthecategoryfield.
=#SELECT*FROMgptext.faceted_field_search('demo.store.products','*:*',null,'{category}',-1,1);field_name|field_value|value_count------------+--------------+-------------category|Pot|11category|Desktops|10category|Tablets|8category|Monitors|7category|Tent|6category|Luggage|5category|SleepingBag|3(7rows)
Thisexamplefacetsalldocumentsontwofields, category and brand .Onlyfacetswithacountof2ormoreareincludedintheresults.
'{int, text_intl, string, string, float}', 'id', 'title'); =# SELECT * FROM gptext.index(TABLE(SELECT * FROM store.products), 'demo.store.products'); dbid | num_docs------+---------- 2 | 25 3 | 25(2 rows)
=# SELECT * FROM gptext.commit_index('demo.store.products'); commit_index-------------- t(1 row)
©CopyrightPivotalSoftware,Inc,2013-2018 46 2.4.0
=#SELECT*FROMgptext.faceted_field_search('demo.store.products','*:*',null,'{category,brand}',-1,2);field_name|field_value|value_count------------+----------------+-------------brand|ASUS|7brand|Dell|5brand|HP|4brand|Samsung|4brand|Apple|2brand|UtopiaKitchen|2brand|BigAgnes|2brand|Yaheetech|2brand|Kelty|2brand|Huawei|2category|Pot|11category|Desktops|10category|Tablets|8category|Monitors|7category|Tent|6category|Luggage|5category|SleepingBag|3(17rows)
Thenextexampleusesafilterquerytofacetthe brand fieldforjustthe10documentswithcategory“Desktops”.
=#SELECT*FROMgptext.faceted_field_search('demo.store.products','*:*','{category:Desktops}','{brand}',-1,1);field_name|field_value|value_count------------+-------------+-------------brand|Dell|5brand|ASUS|3brand|HP|2(3rows)
FacetingonsearchqueriesWiththe faceted_query_search() function,thecategoriesareGPTextsearchqueries.Thecountsareareportofthenumbersofdocumentsthatmatcheachsearchquery.Hereisthesyntaxforthe faceted_field_search() function:
gptext.faceted_query_search(<index_name>,<query>,<filter_queries>,<facet_queries>);
<index_name> isthenameoftheGPTextindexwithfieldstofacet.
<query> isasearchquerythatselectsthesetofdocumentstobefaceted.Tofacetalldocumentsintheindexspecify '*:*' .
<filter_queries> isanarrayofqueriesthatfilterdocumentsfromthesetreturnedbythe <query> ,or null ifnone.Onlydocumentsthatmatchallqueriesinthelistareincludedinthecounts.
<facet_queries> isanarrayofsearchqueries.Eachqueryinthearrayisacategoryintheresults.
Thisexamplereportsthenumberofdocumentsthatcontain“windows”,“intel”,andboth“windows”and“intel”inthedefaultsearchcolumn( title ).
=#SELECT*FROMgptext.faceted_query_search('demo.store.products','*:*',null,'{windows,intel,windowsANDintel}');query_name|value_count-------------------+-------------intel|7windows|4windowsANDintel|2(3rows)
ThefacetqueriesinthisexampleareSolrrangequeriesthatdefinefourcustomrangesoverthe price field.
©CopyrightPivotalSoftware,Inc,2013-2018 47 2.4.0
=#SELECT*FROMgptext.faceted_query_search('demo.store.products','*:*',null,'{price:[*TO200],price:[201TO250],price:[251TO300],price:[301TO*]}');query_name|value_count--------------------+-------------price:[201TO250]|2price:[251TO300]|2price:[301TO*]|11price:[*TO200]|35(4rows)
FacetingonRangesThe gptext.faceted_range_search() functionfacetsasinglefieldintheGPTextindexintorangesspecifiedwithstart,end,andgapvalues.Thefacetedfieldmustbeanumerictype.
gptext.faceted_range_search(<index_name>,<query>,<filter_queries>,<field_name>,<range_start>,<range_end>,<range_gap>,<options>)
<index_name> isthenameoftheGPTextindexwithfieldstofacet.
<query> isasearchquerythatselectsthesetofdocumentstobefaceted.Tofacetalldocumentsintheindexspecify '*:*' .
<filter_queries> isanarrayofqueriesthatfilterdocumentsfromthesetreturnedbythe <query> ,or null ifnone.Onlydocumentsthatmatchallqueriesinthelistareincludedintheresults.
<field_name> isthenameofthefieldtofacet.Thefieldmusthavenumericcontent.Thecalculatedrangeswillhavethesamedatatypeasthefield.
<range_start> isthesmallestvalueofthefirstrangecategory.
<range_limit> isthehighestvalueofthetoprange.
<range_gap> isthesizeofeachrangecategory.
<options> isanoptionalstringcontainingSolrqueryoptions.
Thisrangesearchexamplefacetsthepricefieldintorangesbetween0and1200withagapof100.The range_value columnintheresultsisatextvalue,sothe ORDERBY clausecaststhevaluetoafloattype.
=#SELECT*fromgptext.faceted_range_search('demo.store.products','*:*',null,'price','0','1200','100')ORDERBYrange_value::float;field_name|range_value|value_count------------+-------------+-------------price|0.0|23price|100.0|12price|200.0|4price|300.0|6price|400.0|0price|500.0|1price|600.0|1price|700.0|1price|800.0|0price|900.0|1price|1000.0|0price|1100.0|1(12rows)
HighlightingSearchTermsinQueryResultsHighlightinginsertsmarkuptagsbeforeandaftereachoccurrenceofthesearchtermsinaquery.Forexample,ifthesearchtermis“iphone”,eachoccurrenceof“iphone”inthefieldismarkedup:
<em>iphone</em>
Youcanchangethedefaultmarkupstringsfrom <em> and </em> bysettingthe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationoptions.
©CopyrightPivotalSoftware,Inc,2013-2018 48 2.4.0
Therearetwowaystohighlightsearchterms,dependingonwhetherthefieldtobemarkedupisstoredintheGPTextindex.
Ifthefieldisindexed,butnotstored,youmustjointhesearchresultswiththedatabasetableandusethe gptext.highlight() functiontoapplymarkuptagstothecolumndata.
Ifthefieldisindexedandstored,Solrcanapplythemarkuptagsandreturnthemarked-upfieldintheresultsofthesearchquery.ThisisthesamewayhighlightingworksforGPTextexternalindexes.(SeeHighlightingExternalIndexSearchResults.)UsingthismethodwithregularGPTextindexesrequiresmodifyingthe solrconfig.xml configurationfilefortheindex.
HighlightingTermswithgptext.highlight()Touse gptext.highlight() theindexmusthavebeencreatedwithtermsenabledforthecolumnsthataretobehighlighted.Use gptext.enable_terms() toenabletermvectorsandthenreindexthedataifitwasalreadyindexed.See gptext.enable_terms() intheGPTextFunctionReference.
Thisexampleenablestermsforthe message_text fieldinthe demo.twitter.message index,reindexesthedata,andcommitsthechangestotheindex:
=#SELECT*FROMgptext.enable_terms('demo.twitter.message','message_text');=#SELECT*FROMgptext.index(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message');=#SELECT*FROMgptext.commit_index('demo.twitter.message');
The gptext.highlight() syntaxis:
gptext.highlight(<column_data>,<column_name>,<offsets>)
The <column_data> argumentcontainsthetextdatathatwillbemarkedupwithhighlightingtags.
The <column_name> argumentisthenameofthecorrespondingtablecolumn.
The <offsets> argumentisaGPText hstore typethatcontainskey-valuepairsthatspecifythelocationsofthesearchterminthetextdata.Thisvalueisconstructedbythe gptext.search() functionwhenhighlightingisenabled.Thekeycontainsthecolumnnameandthevalueisacomma-separatedlistofoffsetswherethedataappears.
Toenablehighlightingina gptext.search() query,addthe hl and hl.fl optionsinthe <options> argument:
hl=true&hl.fl=<field1>,<field2>
Settingthe hl=true optionenableshighlightingforthesearch.The hl.fl optionspecifiesalistofthefieldnamestohighlight.
Thisexamplereturnsuptofiverowsfromeachsegmentwiththetext“iphone”highlightedinthe message_text field.
=#SELECTt.id,gptext.highlight(t.message_text,'message_text',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=5&hl=true&hl.fl=message_text')sWHEREt.id=s.id::int8;id|highlight
----------+---------------------------------------------------------------------------------------------------------------------20473459|ilovemyiphone4butI'mexcitedtoseewhattheiphone5hastooffer#gadgets#<em>iphone</em>#apple#technology19424811|Ihate
:<em>iPhones</em>:20663075|RT@indigoFKNvanity:Ihatetheautocorrecton<em>iPhones</em>!!!!!!!!!20350436|Iabsolutelylovehowfastthisphoneworks.Lovethe<em>iPhone</em>.20042822|@KDMC23ohhhh!!!Ihate<em>Iphone</em>Talk!19714120|@ayee_Eddy2011Ilovepancakestoo!#<em>iPhone</em>#app19284329|#nowplayingonmy<em>iPhone</em>:DaftPunk-"DigitalLove"19416451|I'minlovewithmynew<em>iPhone</em>(:20257190|Lovemy#<em>iphone</em>-onlyproblemnow?Iwantan#Ipad!20759274|Droppedfrutopiaon
:Myphone...#ciaowaterdamageIhate<em>iPhones</em>.(10rows)
Warning:Highlightingaddsoverheadtothequery,includingindexspace,indexingtime,andsearchtime.
©CopyrightPivotalSoftware,Inc,2013-2018 49 2.4.0
HighlightingTermsinStoredFieldsIfthefieldtobehighlightedisstoredintheindex,Solrcanreturnthefieldinthesearchresultswithmarkuptagsapplied.The gptext.highlight() functionisnotusedanditisnotnecessarytoenabletermsforthefield.ThisisthedefaultbehaviorforGPTextexternalindexes,butforregularGPTextindexesyoumustenableitbyeditingthe solrconfig.xml configurationfilefortheindex.
1. Usethe gptext-config utilitytoopenthe solrconfig.xml configurationfilefortheindexintheeditor.. $gptext-configedit-idemo.twitter.message-fsolrconfig.xml
2. Searchfor <!--SearchComponents--> andaddthefollowingelement.
<searchComponentclass="solr.HighlightComponent"name="highlight"/>
3. Searchfor <requestHandlername="/select"class="solr.SearchHandler"> .Inthe <arrname="components"> childelement,change <str>termoffsets</str> to<str>highlight</str> .Thecomplete <requestHandler> entryshouldbe:
<requestHandlername="/select"class="solr.SearchHandler"><!--defaultvaluesforqueryparameterscanbespecified,thesewillbeoverriddenbyparametersintherequest--><lstname="defaults"><strname="echoParams">explicit</str><intname="rows">10</int><strname="df">message_text</str></lst><arrname="components"><str>query</str><str>facet</str><str>mlt</str><str>highlight</str><str>stats</str><str>debug</str></arr></requestHandler>
4. Saveyourchanges.
5. Updatethefielddefinitionsinthe managed-schema configurationfiletostorethefieldsthatwillbehighlighted.SeeStoringAdditionalFieldsinanIndexforinstructions.Besuretoreindexthedataafterchangingstorageoptions.
Thefollowingquerysearchesthe message_text fieldformessagescontainingthetext“iphone”andhighlights“iphone”inthetextreturnedinthe hscolumn.
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=5&hl=true&hl.fl=message_text');id|score|hs|rf----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+----19284329|0.8176138|{"columnValue":[{"name":"message_text","value":"#nowplayingonmy\u003cem\u003eiPhone\u003c/em\u003e:DaftPunk-\"DigitalLove\""}]}|19416451|0.9003142|{"columnValue":[{"name":"message_text","value":"I'minlovewithmynew\u003cem\u003eiPhone\u003c/em\u003e(:"}]}|19424811|1.0051261|{"columnValue":[{"name":"message_text","value":"Ihate\n\u003cem\u003eiPhones\u003c/em\u003e:"}]}|20042822|0.8519347|{"columnValue":[{"name":"message_text","value":"Ihate\u003cem\u003eIphone\u003c/em\u003eTalk!"}]}|(4rows)
Youcanusethe gptext.gptext_retrieve_field() functiontoextractthehighlightedtextfromthe columnValue arrayinthe hs column.Comparethepreviousresultstotheresultsfromthisquery.
©CopyrightPivotalSoftware,Inc,2013-2018 50 2.4.0
=#SELECTid,score,gptext.gptext_retrieve_field(hs,'message_text')message_textFROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=5&hl=true&hl.fl=message_text');id|score|message_text
----------+------------+---------------------------------------------------------------------------------------------------------------------19424811|1.0051261|Ihate
:<em>iPhones</em>:20042822|0.8519347|Ihate<em>Iphone</em>Talk!20350436|0.7387052|Lovethe<em>iPhone</em>.20473459|0.59349346|ilovemyiphone4butI'mexcitedtoseewhattheiphone5hastooffer#gadgets#<em>iphone</em>#apple#technology20663075|0.8519347|RT@indigoFKNvanity:Ihatetheautocorrecton<em>iPhones</em>!!!!!!!!!19284329|0.8176138|#nowplayingonmy<em>iPhone</em>:DaftPunk-"DigitalLove"19416451|0.9003142|I'minlovewithmynew<em>iPhone</em>(:19714120|0.8176138|#<em>iPhone</em>#app20257190|0.7095236|Lovemy#<em>iphone</em>-onlyproblemnow?Iwantan#Ipad!20759274|0.7095236|#ciaowaterdamageIhate<em>iPhones</em>.(10rows)```
##<aid="search_partitions"></a>SearchingPartitionedTables
AGPTextindexforapartitionedGreenplumDatabasetablehasa`__partition`fieldthatcontainsthenameofthechildpartition.Whenyouquerytheindex,youcanusethe`__partition`fieldtorestrictthepartitionstosearch.
Searchallpartitionsinanindexbycalling`gptext.search()`withtherootpartitionname:
```sql=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}blackberry',null,null);id|score|hs|rf-----------+-----------+----+----71559892|5.670539||127444971|5.1496587||127024083|5.1496587||65596365|4.4688635||79177658|4.4688635||78934938|4.4688635||111566417|4.4688635||65058966|3.5941496||92240815|5.212467||38424415|4.730712||96811329|4.730712||146782767|4.730712||41409575|4.1019597||104198393|4.1019597||86943734|3.2956126||89120464|3.2956126||153181836|3.2956126||139227011|3.2956126||20664699|2.8236253||(19rows)
Youcansearchasinglepartitionbycalling gptext.search() withthechildpartitionname.Usethe gptext.partition_status(<index_name>) functiontoseethepartitionnames.Forexample:
=#SELECTpartition_name,levelFROMgptext.partition_status('demo.twitter.message');partition_name|level------------------------------+-------demo.twitter.message_1_prt_1|1demo.twitter.message_1_prt_2|1demo.twitter.message_1_prt_3|1demo.twitter.message_1_prt_4|1(4rows)
Thisexamplesearchesonlythe demo.twitter.message_1_prt_3 partition:
©CopyrightPivotalSoftware,Inc,2013-2018 51 2.4.0
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message_1_prt_3','{!gptextqp}blackberry',null);id|score-----------+-----------71559892|5.67053979177658|4.468863578934938|4.4688635111566417|4.468863592240815|5.21246796811329|4.730712104198393|4.101959786943734|3.295612689120464|3.2956126(9rows)
Youcanalsospecifyapartitionnameorarangeofpartitionsinthequeryfilterargumentofthe gptext.search() function.Thisexamplesearchesthepartitionsbetween message_1_prt_2 and message_1_prt_4 .
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','android','{''[__partition:[message_1_prt_2TOmessage_1_prt_4]''}');id|score-----------+-----------42474603|5.77086895666225|5.67053968701747|4.468863556900818|4.4688635111566417|4.4688635120764432|4.4688635115326522|4.468863567269000|3.594149699959486|6.413594104293903|3.1360807(10rows)
RetrievingStoredFieldContentAGPTextindexdoesnot,bydefault,storethecontentsofdatabasecolumnsintheindex,withtheexceptionoftheuniqueidcolumn.Whenyousearchtheindex,youmustjointhesearchresultswiththeoriginaldatabasetableontheidcolumninordertoaccessothertablecolumns.
YoucanconfigureaGPTextindextostorecontentoffieldswhendocumentsareindexed.Theadditionalstoredfieldscanbereturnedwiththesearchresultssothatitisunnecessarytojoinwiththedatabaseoriginaltable.Forsomeapplications,youcanevendeletedatafromthedatabasetableordropthetableafterthedatahasbeenaddedtotheindex.
RetrievetheadditionalfieldvaluesinaGPTextsearchbyspecifyingalistoffieldsinthe gptext.search() optionsargument.Inthisexample,thedemo.wikipedia.articles indexhasbeenconfiguredtostorethe content , title ,and refs fields,inadditiontothe id field.SeeStoringFieldContentinan
Indexforinstructionstoeditthe managed-schema filetostoretheseadditionalfields.Inthe option argument,theSolr fl parameterrequeststhatcontentsofthe id and title fieldsbeincludedintheresults.
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','+grid+economy',null,'fl=id,title&rows=2');id|score|hs|rf
---------+-----------+----+----------------------------------------------------------------------------------------------------------533423|2.4593863||column_value{name:"id"value:"533423"}column_value{name:"title"value:"Solarwaterheating"}7906908|2.0646634||column_value{name:"id"value:"7906908"}column_value{name:"title"value:"Biomass"}27743|1.823319||column_value{name:"id"value:"27743"}column_value{name:"title"value:"Solarenergy"}113728|1.2235354||column_value{name:"id"value:"113728"}column_value{name:"title"value:"Geothermalenergy"}(4rows)
Toretrieveallfieldsstoredintheindex,usethe * wildcardforthefieldlist: 'fl=*' .
Intheresults,therequestedfieldsarepackedintoanfieldnamed rf addedtotheresults.The rf fieldisatextvaluecontainingastructurewiththefollowingformat:
©CopyrightPivotalSoftware,Inc,2013-2018 52 2.4.0
column_value{name:"<field1_name>"value:"<field1_value>"}[column_value{name:"<field2_name>"value:"<field2_value>"}]...
TheGPTextfunction gptext.gptext_retrieve_field(rf,<column_name>) retrievesasinglefieldvaluebynamefromthisstructureasatextvalue.GPTextprovidesvariationstoretrievethefieldvaluesas int or float values.Ifthespecifiedfieldnamedoesnotexistinthe rf structure,thefunctionreturns NULL .
Thisexampleshowshowyoucanusethe gptext.gptext_retrieve*() functionstounpacksearchresultsintoseparateresultcolumns.
=#SELECTscore,gptext.gptext_retrieve_field_int(rf,'id')id,gptext.gptext_retrieve_field(rf,'title')title,substring(gptext.gptext_retrieve_field(rf,'content'),1,15)contentFROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','+grid+economy',null,'fl=*');score|id|title|content-----------+----------+---------------------+-----------------2.4593863|533423|Solarwaterheating|'''Solarwater2.0646634|7906908|Biomass|'''Biomass'''i2.0444229|13690575|Solarpower|'''Solarpower'1.823319|27743|Solarenergy|'''Solarenergy1.2235354|113728|Geothermalenergy|'''Geothermale1.0890164|14205946|Algaefuel|'''Algaefuel''(6rows)
SelectingaQueryParserWhenyousubmitaquery,Solrprocessesthequeryusingaqueryparser.ThereareseveralSolrqueryparserswithdifferentcapabilities.Forexample,theComplexPhraseQueryParser canparsewildcards,andthe SurroundQueryParser supportsspanqueries—findingwordsinthevicinityofasearchtermina
document.
GPTextsupportsthesequeryparsers:
QParserPlugin ,thedefaultGPTextqueryparser. QParserPlugin isasupersetofthe LuceneQParserPlugin ,Solr’snativeLucenequeryparser.QParserPlugin isageneralpurposequeryparserwithbroadcapabilities. QParserPlugin doesnotsupportspanqueriesandhandlesoperator
precedenceinanunintuitivemanner.Thesupportforfieldselectionisalsoratherweak.Seehttp://wiki.apache.org/solr/SolrQuerySyntax .
ComplexPhraseQueryParser supportswildcards,ORs,ranges,andfuzziesinsidephrasequeries.Seehttps://issues.apache.org/jira/browse/SOLR-1604 .
DisMax (or eDisMax )handlesoperatorprecedenceinanintuitivemannerandiswell-suitedforuserqueriessinceitissimilartopopularsearchenginesontheweb.SeeUsingtheDisMaxandExtendedDisMaxQueryProcessors.
SurroundQueryParser ,supportsthefamilyofspanqueries.SeeProximitySearchQueriesandSurroundQueryParser intheApacheSolrReferenceGuide.
gptextqp ,theGPTextUnifiedQueryParser,canusealloftheabovequeryparsersincombination.SeeUsingtheUniversalQueryParserformoreinformation.
YoucanspecifythequeryparsertouseatquerytimebysettingtheSolr defType optioninthe options argumentofthesearchfunctionorbysettingthetype asaSolrLocalParamembeddedinthequery.
Thisqueryspecifiesthe dismax queryparserinthe options argumentofthe gptext.search() function:
=#SELECTa.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','+hydroelectric-solar',null,'defType=dismax')qWHEREa.id=q.id::int8;title|score------------------------+-----------Forwardosmosis|0.9552469Liquidnitrogenengine|1.0126935(2rows)
Thedefaultqueryparserisspecifiedinthe requestHandler definitionsin solrconfig.xml .Youcanedit solrconfig.xml withthemanagementutilitycommand gptext-config
edit.
©CopyrightPivotalSoftware,Inc,2013-2018 53 2.4.0
ThefollowingqueryusestheComplexPhraseQueryParser,settingthe type parameterinaSolrLocalParam.
=#SELECTa.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','{!type=complexphrase}sequesterANDcarbon',null,null)qWHEREa.id=q.id::int8;title|score---------+---------Biomass|3.83572(1row)
IntheLocalParam,the type= specifiercanbeomittedbecause type isthedefaultparameter:
'{!complexphrase}sequesterANDcarbon'
ProximitySearchQueriesProximitysearchqueriesfinddocumentsthathavesearchtermswithinaspecifieddistance.Thedistanceismeasuredasthenumberoftermmovesthatwouldbeneededtomakethetermsadjacent.
Withthestandardqueryparser,thetermstomatchareplacedinquotesandthedistancebetweenthemisspecifiedbyaddingatilde ~ andanintegeraftertheclosingquote.Thefollowingsearchqueryfindsdocumentswiththeterms“solar”and“fossil”withinfivetermsofeachother.
=#SELECTt.id,s.score,t.titleFROMwikipedia.articlest,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','"solarfossil"~5',null,null)sWHEREs.id::int8=t.id;id|score|title----------+------------+------------------25784|0.4855828|Renewableenergy14090587|0.30585092|Low-carbonpower13690575|0.62667537|Solarpower(3rows)
Thesearchtermsinsidethequotescanappearineitherorder.However,ifthetermsoccurintheoppositeorderinthedocument,thedistancebetweenthemisonegreaterthanifthetermsoccurinthespecifiedorder.
TheSurroundqueryparser allowsorderedandunorderedproximitysearches.The W operatorspecifiesanorderedsearchandthe N operatorspecifiesanunorderedsearch.Themaximumdistancebetweenthetermsisspecifiedbyprefixingthe W or N operatorwithaninteger,forexample3W .
Theproximityquerycanbewrittenwithprefixorinfixnotation.
Prefixnotation: '{!surround}3W(solar,fossil)'
Infixnotation: '{!surround}solar3Wfossil'
HerearesomeproximityqueryexamplesusingtheSurroundqueryparser.
'{!surround} title:2w(solar, heat)'
Searchesthe title fieldfortheterms“solar”and“heat”withintwoterms,andinthespecifiedorder.Thisqueryusesprefixnotation.The N andW operatorsarenotcase-sensitive.
'{!surround} title:heat 2N solar'
Searchesthe title fieldfortheterms“heat”and“solar”withintwoterms,inanyorder.Thisqueryusesinfixnotation.'{!surround} title: W(solar, heat)'
Searchesthe title fieldforadjacentterms“solar”and“heat”.Thedefaultdistanceis1,so 1W canbeabbreviatedto W .
The wikipedia.articles indexcontainsadocumentwiththetitle“Solarwaterheating”.Thefollowingexamplesearch,however,cannotfindit.
TheSurroundqueryparserdoesnotanalyzequerytextliketheotherqueryparsers.GPTextindexesarebydefaultbuiltwithlowercaseandstemmingfilters,forexample,sosurroundqueriescontainingcapitallettersorunstemmedtermswillreturnnoresults.
©CopyrightPivotalSoftware,Inc,2013-2018 54 2.4.0
=#SELECTt.id,s.score,t.titleFROMwikipedia.articlest,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','{!surround}title:2w(Solar,heating)',null,null)sWHEREs.id::int8=t.id;id|score|title----+-------+-------(0rows)
Whenyourewritethequerytouseonlylowercasecharactersandremovethesuffixfrom“heating”,thedocumentisfound.
=#SELECTt.id,s.score,t.titleFROMwikipedia.articlest,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','{!surround}title:2w(solar,heat)',null,null)WHEREs.id::int8=t.id;id|score|title--------+-----------+---------------------533423|1.5434089|Solarwaterheating(1row)
AneasywaytoavoidthislimitationistousetheGPTextUniversalQueryParser,whichdoesanalyzethequerytextandalsosupportstheSurroundqueryparser’sproximitysyntax.
UsingtheUniversalQueryParserWiththeGPTextUniversalQueryParser,youcanperformsearchesusingfeaturesfromanyoftheothersupportedqueryparsers,combinedintoonesearchstring.InvoketheUniversalQueryParserbysettingtheSolr type parameterinaSolrLocalParamwiththisformat:
'{!gptextqp}<search_query>'
Thesearchqueryinthefollowingexampleincludessyntaxfromthreequeryparsers:
sea* –Complexquerywithwildcard
2W –Proximityqueryrequestingamaximumoftwowordsdistancebetweentheterms“sea*”and“oil”or“fuel”
oil OR fuel –SolrStandardQueryProcessor
=#SELECTa.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','{!gptextqp}sea*2W(oilORfuel)',null,null)qWHEREa.id=q.id::int8;title|score--------------+-----------Seaweedfuel|55.250305(1row)
Inthefollowingexample, title:n(power,geothermal) specifiesthattheterms“power”and“geothermal”inthe title fieldmustbeadjacent,buttheycanoccurineitherorder.
=#SELECTa.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','{!gptextqp}title:n(power,geothermal)',null,null)qWHEREa.id=q.id::int8;title|score------------------+---------Geothermalpower|2.05577(1row)
Thisqueryusesthefuzzysearchoperator ~ tofindarticleswithtitlescontainingatermsimilarto“lethiam”andacomplexquerythatfindsarticleswith“ocean”and“wind”inthecontent.
©CopyrightPivotalSoftware,Inc,2013-2018 55 2.4.0
=#SELECTt.id,score,titleFROMwikipedia.articlest,gptext.search(TABLE(SELECT1SCATTERby1),'demo.wikipedia.articles','{!gptextqp}title:lethiam~ORcontent:(oceanANDwind)',null,null)sWHEREt.id=s.id::int8;id|score|title----------+------------+-------------------2120798|1.3326647|Lithiumeconomy4711003|2.6328268|Osmoticpower25784|3.3899183|Renewableenergy55017|0.95579207|Fusionpower113728|1.3909805|Geothermalenergy27743|2.114852|Solarenergy13690575|1.4488393|Solarpower(7rows)
UsingtheDisMaxandExtendedDisMaxQueryParsersTheDisMaxqueryparser supportsasubsetoftheSolrStandardQueryParsersyntax.Itisusefulforqueriesfromenduserswhoarefamiliarwithcommonsearchsystems,suchasGooglesearch.Itsupportsquotedphrases,ANDandORoperators,and+and-operators.TheExtendedDisMaxqueryparser improvesupontheDisMaxqueryparser,supportingthefullStandardqueryparsersyntax.
TheDisMaxandExtendedDisMaxqueryparserbehaviorscanbecustomizedatquerytimebysettingparametersintheSolroptionsargumentofthegptext.search() functionoraslocalparametersinthequerytext.SeeDisMaxParameters andExtendedDisMaxParameters fordetails.Oneuseful
parameteristhe qf (queryfields)parameter,whichspecifiesalistoffieldstosearch.Usingthisparameteravoidshavingtowriteaquerythatsearcheseachfieldindividually.Forexample,insteadofwritingthisquery:
'content:nuclearORtitle:nuclearORlinks:nuclear'
youcanwrite:
{!edismaxqf="contenttitlelinks"}nuclear
ThefollowingexamplequeriesillustratefeaturesoftheDisMaxandExtendedDisMaxqueryparsers.
'{!dismax} +nuclear reactor'
Findsdocumentscontainingtheterm“nuclear”and,optionally,theterm“reactor”.'{!dismax} +"nuclear reactor"'
Findsdocumentscontainingthephrase“nuclearreactor”.'{!dismax} +solar -reactor'
Findsdocumentscontainingtheterm“solar”butnottheterm“reactor”.'{!edismax qf="title refs"} solar'
Findsdocumentswiththeterm“solar”inthe title or refs fields.'{!edismax qf="title"} (solar or renewable) and energy'
Findsthedocumentswithtitles“Solarenergy”and“Renewableenergy”.
©CopyrightPivotalSoftware,Inc,2013-2018 56 2.4.0
CustomizingGPTextIndexesGPTextsavesconfigurationfilesforanindexintheZooKeeper /gptext/configs/<index_name> znode,forexample /gptext/configs/demo.twitter.message .Theconfigurationfilesarecopiedfromthe $GPTXTHOME/share/gp_index_template/conf directoryandmodifiedwithinformationpassedinthe gptext.create_index()
functionargumentsandtheGreenplumDatabasetabledefinition.
Afteranindexhasbeencreated,youcanmodifytheindex’sconfigurationfilesusingthe gptext-config command-lineutility.Youcanalsoeditthetemplatefilesinthe $GPTXTHOME/share/gp_index_template/conf directorysothatanynewindexyoucreatehasyourcustomizations.
Ifyouchoosetocustomizethetemplatefilesinthe $GPTXTHOME/share/gp_index_template/conf directory,youshouldfirstbackupthefilessothatyoucanrestorethedefaultversionsifnecessary.
EditingGPTextIndexConfigurationFilesYoucanedittheindexconfigurationfilessavedinZooKeeperusingthe gptext-config command-lineutilitywiththe edit option.Youprovidethenameoftheindexandthenameoftheconfigurationfileyouwanttomodify.Toeditthe managed-schema fileforthe demo.twitter.message index,forexample:
$gptext-configedit-idemo.twitter.message-fmanaged-schema
Theutilityloadsthefileintoaneditor, vi bydefault.Youcanspecifyadifferenteditorwiththe -e option.Thiscommandusesthe nano editortoeditthe stopwords.txt file.
$gptext-configedit-idemo.twitter.message-fstopwords.txt-enano
Youcanusethe gptext-configupload
commandtouploadalocalconfigurationfiletoZooKeeper.Thisexampleuploadsalocalconfigurationfilenamed
protwords.custom toZooKeeper,overwritingtheexisting protwords.txt file.
$gptext-configupload-idemo.twitter.message-lprotwords.custom-fprotwords.txt20171011:11:24:59:030178gptext-config:gpdb:gpadmin-[INFO]:-ExecuteGPTextconfig.20171011:11:25:00:030178gptext-config:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171011:11:25:00:030178gptext-config:gpdb:gpadmin-[INFO]:-Uploadfileprotwords.customtozookeeper...20171011:11:25:01:030178gptext-config:gpdb:gpadmin-[INFO]:-Reloadingconfiguration...20171011:11:25:02:030178gptext-config:gpdb:gpadmin-[INFO]:-Modificationstoprotwords.txtrequirethatalldatabereindexed.20171011:11:25:02:030178gptext-config:gpdb:gpadmin-[INFO]:-Done.
Usethe gptext-configappend commandtoappendalocaltextfiletoanexistingconfigurationfile.Forexample,youcouldcreateanadditionallistofstopwordsinalocalfile stopwords.add andappendthemtothe stopwords.txt file.
$gptext-configappend-idemo.twitter.message-lstopwords.add-fstopwords.txt20171010:09:52:59:019764gptext-config:gpdb:gpadmin-[INFO]:-ExecuteGPTextconfig.20171010:09:53:00:019764gptext-config:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171010:09:53:00:019764gptext-config:gpdb:gpadmin-[INFO]:-Creatingtemporarycopyofstopwords.txt...20171010:09:53:01:019764gptext-config:gpdb:gpadmin-[INFO]:-Appendingcontentsofstopwords.addtostopwords.txt20171010:09:53:01:019764gptext-config:gpdb:gpadmin-[INFO]:-Backingupstopwords.txtforindexdemo.twitter.message...20171010:09:53:03:019764gptext-config:gpdb:gpadmin-[INFO]:-Reloadingconfiguration...20171010:09:53:22:019764gptext-config:gpdb:gpadmin-[INFO]:-Modificationstostopwords.txtrequirethatalldatabereindexed.20171010:09:53:22:019764gptext-config:gpdb:gpadmin-[INFO]:-Done.
Seethe gptext-config commandreferencefor gptext-config command-lineoptionsandfordescriptionsofthefilesyoucaneditwith gptext-config .
Themanaged-schemaFileThemainconfigurationfileforanindexisthe managed-schema file.The managed-schema fileisanXMLfilecontainingdefinitionsforthefields,fieldtypes,andanalyzerchainsthatdefinethecontentsandbehaviorofaGPTextindex.
Afield( <field> XMLelement)mapsaGreenplumDatabasetablecolumntoafieldintheGPTextindex.
WheneditingXMLfilessuchas managed-schema ,besurethatyousaveavalidXMLdocument.InvalidXMLsyntaxwillcauseSolrerrorsandpreventaccesstoyourindex.
©CopyrightPivotalSoftware,Inc,2013-2018 57 2.4.0
Afieldtype( <fieldType> XMLelement)assignsSolrJavaclassesandanalyzerchainsthathandleadatatypetoafield.
Ananalyzerchain( <analyzer> XMLelement)isacontainerelementthatspecifiestheJavaclassesthattokenizeandfilterthecontentofafieldthatistobeindexed.An <analyzer> elementisachildofa <fieldType> element.
Inadditiontothe managed-schema file,theSolrconfigurationfilesforanindexincludetextfilesthatcontainlistsofwordstotreatspeciallywhenindexingdata,localizationfiles,charactersetcollationmapsusedforsorting,andaSolrserverconfigurationfile.
Thefollowingsectionsprovideanoverviewofthecontentsofthe managed-schema fileandtherelationshipsbetweentheXMLelementsthatdefinefields,fieldtypes,andanalyzers.Byeditingthe managed-schema file,youcanspecifyatthefieldlevelhowSolrindexesandstoresGreenplumDatabasedata.
Fordetaileddocumentationofthecontentsofthe managed-schema file,refertothecommentsinthefileortotheApacheSolrClouddocumentation.
FieldElementsGPTextadds field elementstothe managed-schema fileforcolumnsincludedwhentheindexwascreatedwiththe gptext.create_index() function.Thisexampleisthedefinitionforatextfieldnamed description :
<fieldname="description"stored="false"type="text_intl"indexed="true"/>
The name attributeisthenameofthedatabasecolumn.IfthecolumnnameisnotavalidSolrfieldname,itisalteredtoconform.
The stored attributedeterminesifthecontentofthefieldwillbestoredintheindex.Ifthefieldisstoredintheindex,GPTextsearchresultscanreturnthecontentofthefield.Iftheattributeisnotstored,retrievingthefieldcontentrequiresaSQLjoin.
The type attributemapstheGreenplumDatabasetypetoaSolrtype,definedinthesamefilewitha <fieldType> element.
The indexed attributedetermineswhetherthefieldcontentwillbeindexed.
The <field> elementcanhaveadditionalattributesusedwithsometypes.Seethecommentafterthe <fields> elementforacompletelistofattributes.
FieldTypesThe type attributeofthe <field> elementismappedtothe name attributeofa <fieldType> elementinthe managed-schema file.The <fieldType> elementdetermineshowSolrparsesandstoresafieldintheindex.
The class attributemapsthefieldtypetoaSolrJavaclassthatrecognizesandprocessesthedatatype.Solrincludesmanybasefieldtypes.SeeGPTextandSolrDataTypeMappingsforamappingofSolrtypestoGreenplumDatabasetypes.
YoucanmapafieldtoadifferentSolrtypebychangingthefield’s type attribute.Forexample,tousetheGPTextsocialmediatextanalyzerchain,youcanchangethetypeofatextfieldfrom text_intl to text_sm .
Ifyouhaveacustomtype,youcanaddanewfieldtypebyimplementingSolrJavatypeinterfaces,oryoucanspecifyanexistingbasetypeandcustomizeitwithananalyzerchain,asdescribedinthenextsection.
AnalyzerChainsAnanalyzerexaminesthecontentsoffieldorsearchqueryphraseandreturnsastreamoftokensusedtoindexthefieldorsearchtheindex.The<analyzer> elementisachildofa <fieldType> elementthatspecifieshowtextwillbetokenizedandprocessedbeforeitisindexedorappliedtoasearch.
An <analyzer> canbeoftype index , query
Differentchainscanbedefinedforindexingandqueryingoperationsbyaddinga type attributetothe <analyzer> element.Ifno type attributeappearsthechainisappliedtobothfieldtextthatistobeindexedandquerytextthatsearchestheindex.
Fieldanalysisbeginswitha <tokenizer> thatdividesthecontentsofafieldintotokens.InLatin-basedtextdocuments,thetokensarewordsorterms.InChinese,Japanese,andKorean(CJK)documents,thetokensarecharacters.
Thetokenizercanbefollowedbyoneormore <filter> elementswhichareappliedinsuccession.Filtersrestrictthequeryresults,forexample,byremovingunnecessaryterms(“a”,“an”,“the”),convertingtermformats,orbyperformingotheractionstoensurethatonlyimportant,relevanttermsappearintheresultset.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Solrincludesmanytokenizersandfiltersthatallowanalyzerchainstoprocessdifferentcharactersets,languages,andtransformations.SeeAnalyzers,TokenizersandFilters-Thefulllist foracomprehensivelist.
©CopyrightPivotalSoftware,Inc,2013-2018 58 2.4.0
Fieldtypesareassignedanalyzersinanindex’s managed-schema file.ThefollowingexampleshowstheSolr text fieldtypespecification:
<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"autoGeneratePhraseQueries="true"><analyzertype="index"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>
Ananalyzerhasonlyonetokenizer, solr.WhitespaceTokenizerFactory inthisexample.Thetokenizercanbefollowedbyoneormorefiltersexecutedinsuccession.
Filtersrestrictthequeryresults.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Forexample,the solr.StopFilterFactory filterremovesunnecessaryterms(“a”,“an”,“the”)fromthestreamoftokens.Thewordstofilteroutofthestreamarelistedinthe stopwords.txt configurationfile.Youcaneditthe stopwords.txt filewiththe gptext-config utilitytochangethelistofwordsexcludedfromtheindex.
Thereareseparateanalyzertypesforindexandqueryoperations.Thequeryanalyzerchaininthisexampleincludesa solr.SynonymFilterFactory thatlooksupeachtokeninafile synonyms.txt and,iffound,returnsthesynonyminplaceofthetoken.
Theanalyzerchaincanincludea“stemmer”, solr.PorterStemFilterFactory inthisexample.Thestemmeremploysanalgorithmtochangewordstotheir“stems”.Forexample,“confidential”,“confidentiality”,and“confidentis”areallstemmedto“confidenti”.Usingastemmercandramaticallyreducethesizeoftheindex,butusersexecutingsearchesshouldbeawarethatsomesearchexpressionswillnotworkasexpectedbecauseofstemming.Forexample,searchingwithawildcardsuchas "confidential*" willreturnnomatchesbecausethewordswerestemmedto“confidenti”duringindexing.Withoutawildcard,thewordinthesearchexpressionisalsostemmedandthereforethesearchsucceeds.
GPTextTextAnalyzerChainsInadditiontothetextanalyzerchainsSolrprovides,GPTextprovidesthefollowingtextanalyzerchains:
text_intl,theInternationalTextAnalyzer
text_sm,theSocialMediaTextAnalyzer
text_intl,theInternationalTextAnalyzertext_intl isthedefaultGPTextanalyzer.Itisamultiplelanguagetextanalyzerfor text fields.IthandlesLatin-basedwordsandChinese,Japanese,and
Korean(CJK)characters.
text_intl processesdocumentsasfollows.
1. SeparatesCJKcharactersfromotherlanguagetext.
2. Identifiescurrencytokensorsymbolsthatwereignoredinthefirstpass.
3. ForanyCJKcharacters,generatesabigramfortheCJKcharacterand,forKoreancharactersonly,preservestheoriginalword.
NotethatCJKandnon-CJKtextaretreatedasseparatetokens.PreservingtheoriginalKoreanwordincreasesthenumberoftokensinadocument.
FollowingisthedefinitionfromtheSolr managed-schema template.
©CopyrightPivotalSoftware,Inc,2013-2018 59 2.4.0
<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_intl"positionIncrementGap="100">
<analyzertype="index"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>
Followingaretheanalysisstepsfor text_intl .
1. Theanalyzerchainforindexingbeginswithatokenizercalled WorldLexerTokenizerFactory .Thistokenizerhandlesmostmodernlanguages.ItseparatesCJKcharactersfromotherlanguagetextandidentifiesanycurrencytokensorsymbols.
2. The solr.CJKWidthFilterFactory filternormalizestheCJKcharactersbasedoncharacterwidth.
3. The solr.LowerCaseFilterFactory filterchangesallletterstolowercase.
4. The WorldLexerBigramFilterFactory filtergeneratesabigramforanyCJKcharacters,leavesanynon-CJKcharactersintact,andpreservesoriginalKorean-languagewords.Setthe han , hiragana , katakana ,and hangul attributesto "true" togeneratebigramsforallsupportedCJKlanguages.
5. The solr.StopFilterFactory removescommonwords,suchas“a”,“an”,and“the”,whicharelistedinthe stopwords.txt configurationfile(seeToconfigureanindex).Iftherearenowordsinthe stopwords.txt file,nowordsareremoved.
6. The solr.KeywordMarkerFilterFactory markstheEnglishwordstoprotectfromstemming,usingthewordslistedinthe protwords.txtconfigurationfile(seeToconfigureanindex).If protwords.txt doesnotcontainalistofwords,allwordsinthedocumentarestemmed.
7. Thefinalfilteristhestemmer,inthiscase solr.PorterStemFilterFactory ,afaststemmerfortheEnglishlanguage.
Note:The text_intl analyzerchainforqueryingisthesameasthe text analyzerchainforindexing.
Ananalyzerchain, text ,isincludedinGPText’sSolr managed-schema andisbasedonSolr’sdefaultanalyzerchain.Becauseitstokenizersplitsonwhitespace, text cannotprocessCJKlanguages:whitespaceismeaninglessforCJKlanguages.Bestpracticeistousethe text_intl analyzer.
Forinformationaboutusingananalyzerchainotherthanthedefault,seeUsingthetext_smSocialMediaAnalyzer.
GPTextLanguageProcessing
Theroot-leveltokenizer, WorldLexerTokenizerFactory ,tokenizesinternationallanguages,includingCJKlanguages. WorldLexerTokenizerFactory tokenizeslanguagesbasedontheirUnicodepointsand,forLatin-basedlanguages,whitespace.
Note:UnicodeistheencodingforalltextintheGreenplumDatabase.
Thefollowingaresampleinputto,andoutputfrom,GPText.Eachlineintheoutputcorrespondstoaterm.
EnglishandCJKinput:
₩10대부분english자선단체는.
EnglishandCJKoutput:
₩10
©CopyrightPivotalSoftware,Inc,2013-2018 60 2.4.0
대부분
대부
부분
english
자선
단체는
단체
체는
Bulgarianinput:
Cъставнаnарламента:вж.nротоколи
Bulgarianoutput:
cъстав
на
nарламента
вж
протоколиа
Danishinput:
Genoptagelseafsessionen
Danishoutput:
genoptagelse
af
sessionen
text_intlFilters
Thetext_intlanalyzerusesthefollowingfilters:
The CJKWidthFilterFactory normalizeswidthdifferencesinCJKcharacters.Thisfilternormalizesallcharacterwidthstofullwidth.
The WorldLexerBigramFilterFactory filterformsbigrams(pairs)ofCJKtermsthataregeneratedfrom WorldLexerTokenizerFactory .Thisfilterdoesnotmodifynon-CJKtext.WorldLexerBigramFilterFactory acceptsattributesthatguidethecreationofbigramsforCJKscripts.Forexample,iftheinputcontainsHANGULscriptbut
the hangul attributeissetto false, thisfilterwillnotcreatebigramsforthatscript.Toensurethat WorldLexerBigramFilterFactory createsbigramsasrequired,settheCJKattributes han , hiragana , katakana ,and hangul to true .
text_sm,theSocialMediaTextAnalyzerTheGPText text_sm textanalyzeranalyzestextfromsourcessuchassocialmediafeeds. text_sm consistsofatokenizerandtwofilters.Toconfigurethetext_sm textanalyzer,usethe gptext-config utilitytoeditthe managed-schema file.SeeTousethetext_smSocialMediaAnalyzerfordetails.
text_sm normalizesemoticons:itreplacesemoticonswithtextusingthe emoticons.txt configurationfile.Forexample,itreplacesahappyfaceemoticon,:-) ,withthetext“happy”.
ThefollowingisthedefinitionfromtheSolr managed-schema template.
©CopyrightPivotalSoftware,Inc,2013-2018 61 2.4.0
<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_sm"positionIncrementGap="100"termVectors="true"termPositions="true"termOffsets="true"><analyzertype="index"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><!--Caseinsensitivestopwordremoval.AddenablePositionIncrements=trueinboththeindexandqueryanalyzerstoleavea'gap'formoreaccuratephrasequeries.--><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/></analyzer></fieldType>
TheTwitterTokenizer
TheTwittertokenizerextendstheEnglishlanguagetokenizer, solr.WhitespaceTokenizerFactory, torecognizethefollowingelementsasterms.
Emoticons
Hyperlinks
Hashtagkeywords(forexample,#keyword)
Userreferences(forexample,@username)
Numbers
Floatingpointnumbers
Numbersincludingcommas(forexample10,000)
timeexpressions(forexample,9:30)
Thetext_smfilters
com.emc.solr.analysis.socialmedia.twitter.EmoticonsClassifierFilterFactory classifiesemoticonsas happy , sad ,or wink .Itisbasedonthe emoticons.txt file(oneofthefilesyoucaneditwith gptext-config ,andisintendedforfutureuse,suchasinsentimentanalysis.
TheTwitterStemFilterFactory
com.emc.solr.analysis.socialmedia.twitter.TwitterStemFilterFactory extendsthe solr.PorterStemFilterFactory classtobypassstemmingofthesocialmediapatternsrecognizedbythe twitter.TwitterTokenizerFactory .
©CopyrightPivotalSoftware,Inc,2013-2018 62 2.4.0
Theemoticons.txtfile
Thisfilecontainslistsofemoticonsfor“happy,”“sad,”and“wink.”Theyareseparatedbyatabbydefault.Youcanchangetheseparationtoanycharacterorstringbychangingthevalueof delimiter inthesocialmediaanalyzerchain.Thefollowingisasamplelinefromthe text_sm analyzerchain:
<filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/>
Usingthetext_smSocialMediaAnalyzerTheSolr managed-schema filecreatedforanindexspecifiesananalyzertousetoindexeachfield.Thedefaultanalyzerfortextfieldsis text_intl .Tospecifythe text_sm socialmediaanalyzer,youusethe gptext-config utilitytomodifytheSolr managed-schema foryourindex.
Thestepsare:
1. Createanindexusing gptext.create_index() .
2. Usethe gptext-config utilitytoeditthe managed-schema filecreatedfortheindex:
gptext-configedit-fmanaged-schema-i<index_name>
The managed-schema filecontainsa <field> elementforeachtextfield.Forexample:
<fieldname="message_text"stored="false"type="text_intl"indexed="true"/>
The type attributespecifiestheanalyzertouse. text_intl isthedefaultanalyzer.
3. Modifythe <field> elementforeachtextfieldyouwanttousetheGPTextsocialmediaanalyzerandchangethe type attributeasfollows:
<fieldname="text_search_col"indexed="true"stored="false"type="text_sm"/>
4. Savethe managed-schema file.
UsingMultipleAnalyzerChainsIfyouwanttoindexafieldusingtwodifferentanalyzerchainssimultaneously,youcandothis:
Createanewemptyindex.Thenusethe gptext-config utilitytoaddanewfieldtotheindexthatisacopyofthefieldyouareinterestedin,butwithadifferentnameandanalyzerchain.
Letusassumethatyourindex,asinitiallycreated,includesafieldtoindexnamed mytext .Alsoassumethatthisfieldwillbeindexedusingthedefaultinternationalanalyzer( text_intl ).
Youwanttoaddanewfieldtotheindex’s managed-schema thatisacopyof mytext andthatwillbeindexedwithadifferentanalyzer(saythe text_sm
analyzer).Todoso,followthesesteps:
1. Createanemptyindexwith gptext.create_index() .
2. Opentheindex’s managed-schema fileforeditingwith gptext-config .
3. Adda <field> inthe managed-schema foranewfieldthatwilluseadifferentanalyzerchain.Forexample:<fieldindexed="true"name="mytext2"stored="false"type="text_sm"/>
Bydefiningthetypeofthisnewfieldtobe text_sm ,itwillbeindexedusingthesocialmediaanalyzerratherthanthedefault text_intl .
4. Adda <copyField> in managed-schema tocopytheoriginalfieldtothenewfield.Forexample:<copyFielddest="mytext2"source="mytext"/>
5. Indexandcommitasyounormallywould.
Thedatabasecolumn mytext isnowintheindextwicewithtwodifferentanalyzerchains.Onecolumnis mytext ,whichusesthedefaultinternationalanalyzerchain,andtheotheristhenewlycreated mytext2, whichusesthesocialmediaanalyzerchain.
©CopyrightPivotalSoftware,Inc,2013-2018 63 2.4.0
UsingDifferentAnalyzerChainsforIndividualFieldsYoucanusedifferentanalyzersforindividualfieldsbyeditingthemanaged-schemaconfigurationfile.Forexample,ifonefieldcontainsEnglishtextandanothercontainsChineselanguagetext,youcanspecifydifferentanalyzersforthetwofields.
ExampleYouhaveatablenamed email_tbl withthefollowingdefinition:
createtableemail_tbl(idbigint,english_contenttext,chinese_contenttext,timestampdate,usernametext,ageint,...)#additionalcolumnsthatarenotindexed
Youwanttoindexthesixcolumnsshown— id , english_content , chinese_content , timestamp , username ,and age .
Forthecolumn english_content ,youwanttousetheEnglishlanguageanalyzercalled“text_en”forthetextsegmentation.
Forthecolumn chinese_content ,youwanttousetheinternationallanguageanalyzernamed“text_intl”.
Herearestepstoimplementthisexample:
1. CreatetheGPTextindexforthetable.
SELECT*FROMgptext.create_index('public','email_tbl','id','english_content');
2. Modifytheanalyzerforeachcolumnin managed-schema .
$gptext-configedit-idb.public.email_tbl-fmanaged-schema
3. Findtheelementforthe english_content field.
<fieldname="english_content"type="*"indexed="true"stored="true"/>
Changethe type attributeto text_en .
<fieldname="english_content"type="text_en"indexed="true"stored="true"/>
4. Findtheelementforthe chinese_content field.
<fieldname="chinese_content"type="*"indexed="true"stored="true"/>
Changethe type attributeto text_intl .
<fieldname="chinese_content"type="text_intl"indexed="true"stored="true"/>
5. Indexthetable.
SELECT*FROMgptext.index(TABLE(SELECTid,english_content,chinese_content,timestamp,username,ageFROMemail_tbl),'db.public.email_tbl');
6. Committheindex.
SELECT*FROMgptext.commit_index('db.public.email_tbl');
Thefieldtypes text_en and text_intl aredefinedin <fieldType> entriesinthemanaged-schemafileandthenreferencedinthe type attributeofthe<field> element.
Youcandefineacustomfieldtypebyaddinga <fieldType> entrywithcustomanalyzersandthensettingthefield’s type attributetothenameofthecustomfieldtype.Forexample,thefollowing“text_customize”fieldtypeisacopyofthe“text_en”fieldtypeentrywiththesynonymfiltercommented
©CopyrightPivotalSoftware,Inc,2013-2018 64 2.4.0
outintheindexanalyzer.Thiscustomfieldtypewillapplythesynonymfiltertoqueries,butnottotheindex.
<fieldTypename="text_customize"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></fieldType>
Afieldtypecanalsobecustomizedbyaddinganalyzersaschildelementsofthe <field> element:
<fieldname="english_content"type="text"indexed="true"stored="false"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></field>
©CopyrightPivotalSoftware,Inc,2013-2018 65 2.4.0
WorkingWithGPTextExternalIndexesAGPTextexternalindexisanApacheSolrindexyoucreateinGreenplumDatabasetoindexandsearchdocumentsthatresideoutsideofGreenplumDatabase.Externaldocumentscanbeofmanytypes,forexample,PDF,MicrosoftWord,XML,orHTML.Solrrecognizesdocumenttypesautomatically,usingcodeincludedfromtheApacheTika project.
GPTextsupportsindexingexternaldocumentsthatareaccessiblebyURLwithanHTTPGETrequest.GPTextalsosupportsindexingexternaldocumentsstoredinHadoop,providesfunctionsandutilitiestospecifyrequiredHadoopconfigurationandauthenticationinformation.
Toaddexternaldocumentstoanindex,yousupplyGPTextwithalistofURLsinanarrayorasaSQL SELECT statement.TheURLbecomestheuniqueidfieldintheSolrindex.
HowGPTextExternalIndexesDifferFromRegularGPTextIndexesExternalindexesexistentirelyinSolr—thereisnoassociateddatabasetableinGreenplumDatabase.Becauseofthis,theindexnamedoesnotfollowthe
database.schema.table patternrequiredforregularGPTextindexes.Youcanchooseanynameforanexternalindex,butitmustnotcontainperiods.YoucanaccessaGPTextexternalindexfromanydatabaseintheGreenplumDatabasesystemthathastheGPTextschemainstalled.
GPTextprovidesthefollowingalternatefunctionsforworkingwithexternalindexes:
gptext.create_index_external() –createanexternalindex.
gptext.index_external() –adddocumentstotheexternalindex.
gptext.search_external() –searchanexternalindex.
gptext.highlight_external() –returnsfragmentsofdocumentswithmatchingsearchtermshighlightedwithmarkuptags.
ThedistributionpolicyforaregularGPTextindexisthesameastheunderlyingGreenplumDatabasetable,sothatsegmentsmanagethesameGPTexttabledataastheSolrindexshard.AGPTextexternalindexalsohasoneshardpersegment,butthedocumentsaredistributedamongthesegmentsusingSolrcompositeIdrouting,whichallowsSolrtochoosetheshardforadocument.SeeShardsandIndexingDatainSolrCloud .
AregularGPTextindexonlyindexesandstoresthedatabasetablecolumnsyouspecify.AGPTextexternalindexstoresandindexesthetextualcontentofthefile,aswellasmetadatafieldsthataremembersofthedocumenttype.
Whenanexternaldocumentisaddedtotheindex,thecontentofthedocumentissavedinthe content field.The content fieldisstoredintheindexbutitisnotindexed.
GPTextcopiesthefollowingfieldstothe text field,thedefaultsearchfield,whichisindexedbutnotstored.
title
author
description
keywords
content
content_type
resourcename
url
Tosearchthedocumentcontent,therefore,searchthe text field,buttoretrieveorhighlightdocumentcontents,usethe content field.
Thefollowingcommonmetadatafieldsareindexedandstored:
title
subject
description
comments
author
ReadabouthowSolrandTika(the“SolrCell”framework)extractandindexdocumenttextandmetadataatUploadingDatawithSolrCellusingApacheTika .SeealistofsupporteddocumenttypesatSupportedDocumentFormats .
©CopyrightPivotalSoftware,Inc,2013-2018 66 2.4.0
keywords
category
resourcename
url
content_type
last_modified
links
Adynamicfieldnamed meta_* isalsoindexedandstored.Thisisamulti-valuedfieldwhereSolrstoresdocument-type-specificmetadata.Insearchresults,thisfieldisreturnedasaJSON-formatted columnValue string.Youcanextractindividualmetadatabynameusingthe gptext.gptext_retrieve_field()function.
Searchresultsforexternalindexesincludeallfieldssavedwiththedocuments,includingallmetadatafields.YoucanusetheSolrfieldlistoption(fl=<field-list> )tolimitthefieldsreturned.Youcanalsouse SELECT<field-list>FROM
gptext.search_external()tolimitthefieldsreturned,butitismoreefficient
tofilteroutthefieldsinSolrwiththe fl optionthaninthedatabasesession.
AuthenticatingwithanExternalDocumentSourceTheinformationinthissectionisapplicableonlytoexternaldocumentsourcesthatrequireauthentication.
Iftheexternaldocumentsourcethatyouwanttoindexrequiresauthentication,youmustprovidetheauthenticationconfigurationtoGPText.YoumustalsouseGPTextfunctionstoexplicitlylogintotheexternaldocumentsourcebeforeindexing,andlogoutofthesourceafterindexingcompletes.
Note:Authenticatingisnotrequiredforsearchinganexternaldocument.
GPTextcurrentlysupportsauthenticatingonlytonativeandKerberizedHadoopclusters.
UploadingaConfigurationtoZooKeeperBeforeyouuseGPTexttoindexanexternaldocumentsourcethatrequiresauthentication,youmustuploadconfigurationinformationtoZooKeeper.Usethe gptext-external
uploadcommandtouploadthisinformation:
gptext-externalupload-t<type>-p<config_dir>-c<config_name>
Thistabledescribestheoptionstothe gptext-externalupload
command:
Option Description
<type> Thetypeoftheexternaldocumentsource.Thesupported<type>sare ftp and hdfs .
<config_dir>Thepathtoadirectorythatcontainstheconfigurationfiles.Theconfigurationinformationthatyouprovideinthisdirectorywilldependontheexternaldocumentsource<type>.
<config_name>Thenamethatyouassigntotheconfigurationinformation.Youwillprovidethisnamewhenyoulogintotheexternaldocumentsource.
Note:Retainalocalcopyof<config_dir>.Shouldyouneedtoupdatetheconfiguration,youmusteditalocalcopyofthefile(s)andre-upload.
ConfiguringandUploadingFTPAuthentication
YoucanadddocumentsfromanFTPserverthatrequiresauthenticationtoaGPTextexternalindex.Toauthenticatewiththeservercreateaconfigurationdirectoryandaddafiletoitnamed login.txt .Addthreelinestothe login.txt file:
ThenameoftheusertologintotheFTPserver.
ThepasswordfortheFTPuser,incleartext.
ThemaximumnumberofFTPconnectionsallowedfromGPText.
TouploadconfigurationinformationforanauthenticatedFTPserver:
©CopyrightPivotalSoftware,Inc,2013-2018 67 2.4.0
1. Createadirectoryfortheauthenticationconfiguration.
$mkdirftp_config
2. Createthe login.txt file.
$cdftp_config$touchlogin.txt
3. AddtheFTPusername,password,andthemaximumnumberofFTPconnectionstocreatetothe login.txt fileonseparatelines.Forexample:
echo"bill">ftp_config/login.txtecho"changeme">>ftp_config/login.txtecho"10">>ftp_config/login.txt
4. UploadtheconfigurationdirectorytoZooKeeperusingthe gptext.externalupload command.
$gptext-externalupload-tftp-p./ftp_config-cftp_bill_auth
Thiscommandmapsthe login.txt fileinthe ftp_conf/ directorytothename ftp_bill_auth .
Thepasswordisbase64-encodedwhenstoredinZooKeeper.Toprotectthepassworddeletethe login.txt fileafteryouhaveuploadedtheconfigurationtoZooKeeper.
ConfiguringandUploadingHadoopAuthentication
WhenyouaccessaHadoopexternaldocumentsource,<config_dir>mustincludethefollowingconfigurationfilesfor<type> hdfs :
The core-site.xml and hdfs-site.xml configurationfilesfromtheHadoopserver.
Afilenamed user.txt .ThisfilecontainsasinglelineidentifyingtheHadoopusernametouseforauthentication.IfKerberosisenabledintheHadoopcluster,theusernamein user.txt mustidentifytheKerberosprincipalfortheuser.
IftheHadoopclusterissecuredwithKerberos,alsoincludetheuser’s keytab fileandthe krb5.conf filefortheKerberosrealm.
Forexample,touploadconfigurationinformationforaHadoopexternaldocumentstore:
1. Createadirectoryfortheauthenticationconfigurationfiles.Forexample:
$mkdirhdfs_conf
2. Copythe core-site.xml and hdfs-site.xml configurationfilesfromtheHadoopservertotheconfigurationdirectory.ThelocationofthesefileswilldifferfordifferentHadoopdistributions.Forexample:
$scphdfsuser@hdfsnamenode:/etc/hadoop/conf/core-site.xmlhdfs_conf/$scphdfsuser@hdfsnamenode:/etc/hadoop/conf/hdfs-site.xmlhdfs_conf/
3. Constructthe user.txt file.Forexample,iftheHadoopusernameis bill :
$touchhdfs_conf/user.txt$echo"bill">hdfs_conf/user.txt
4. UploadtheHadoopauthenticationconfigurationfilesforuser bill toZooKeeper.Forexample:
$gptext-externalupload-thdfs-p./hdfs_conf-chdfs_bill_auth
Thiscommandmapstheconfigurationinformationyouprovidedinthe hdfs_conf/ directorytothename hdfs_bill_auth .
LoggingIn/OutoftheExternalDocumentSourcePriortoindexing,youmustexplictlylogintoanexternaldocumentsourcethatrequiresauthentication.Usethe gptext.external_login() functionforthispurpose:
©CopyrightPivotalSoftware,Inc,2013-2018 68 2.4.0
gptext.external_login('<type>','<type>://<url>','<config_name>')
Thetablebelowdescribestheargumentstothe gptext.external_login() function:
Option Description
<type> Thetypeoftheexternaldocumentsource.The<type>valuemaybe ftp or hdfs .
<type>://<url> TheURLoftheexternaldocumentsource<type>.
<config_name> The<config_name>youprovidedwhenyouuploadedtheauthenticationconfigurationwith gptext-external upload .
Whenyouinvokethe gptext.external_login() function,GPTextlogsyouintotheexternaldocumentsourceastheuseridentifiedintheconfigurationdirectoryyouprovidedinthe<config_name>configuration.
Forexample,tologintoaHadoopdocumentsourceusingthe hdfs_bill_auth authenticationconfigurationyouuploadedinthepriorsection:
SELECT*FROMgptext.external_login('hdfs','hdfs://<namenode_host_or_ip>:<hdfs_port>','hdfs_bill_auth');
Thecommandissimilartologintoanftpserver.
SELECT*FROMgptext.external_login('ftp','ftp://<ftpserver_host_or_ip:<ftp_port>'),'ftp_bill_auth');
Itisnotnecessarytoinclude :<ftp_port> intheURLiftheserverusesthedefaultftpport21.
Note:YoucanlogintoonlyoneGPTextexternaldocumentsourceatatime.Youmustexplicitlylogoutbeforeyoucanlogintoanotherexternaldocumentsource.
Tologoutofanexternaldocumentsource,usethe gptext.external_logout('<type>') function.Forexample,tologoutoftheHadoopclusterthatyouarecurrentlyloggedinto:
SELECT*FROMgptext.external_logout('hdfs');
TroubleshootingAuthenticatedDocumentStoresIfyourunintoproblemsloggingintooraccessingdocumentsinanauthenticatedHadoopexternaldocumentstore,refertoTroubleshootingHadoopConnectionProblems.
CreatingExternalIndexesUsethe gptext.create_index_external() functiontocreateanexternalindex.
Thisexamplecreatesanexternalindexnamed gptext-docs .
=#SELECT*FROMgptext.create_index_external('gptext-docs');
AnexternalindexdoesnothaveacorrespondingGreenplumDatabasetable,sotheindexnamedoesnotfollowthe database.schema.table patternrequiredforregularGPTextindexes.Theonlyrestrictionisthatthenameforanexternaltablemustnotcontainperiods.
AddingDocumentstoanExternalIndexToaddexternaldocumentstoanexternalindex,supplyalistofURLswhereSolrcanretrievethedocumenttothe gptext.index_external() function.URLsmaybespecifiedeitherinanarrayorasaSQLresultset.
AhashoftheURListhedocument’sIDintheindex.IfaURLhasalreadybeenaddedtotheindex,thefileisnotreindexed.IfyouaddtwoidenticalfilesretrievedfromdifferentURLs,bothfilesareaddedtotheindex.
ThisexampleaddsasinglePDFdocument,specifiedinanarray,tothe gptext-docs index.
©CopyrightPivotalSoftware,Inc,2013-2018 69 2.4.0
=#SELECT*FROMgptext.index_external('{http://gptext.docs.pivotal.io/archives/GPText-docs-213.pdf}','gptext-docs');dbid|num_docs------+----------3|02|1(2rows)
=#SELECT*FROMgptext.commit_index('gptext-docs');commit_index--------------t(1row)
ThisexampleaddsseveralHTMLdocumentsbyselectingURLsfromadatabasetable.
=#DROPTABLEIFEXISTSgptext_html_docs;=#CREATETABLEgptext_html_docs(idbigint,urltext)DISTRIBUTEDBY(id);CREATETABLE=#INSERTINTOgptext_html_docsVALUES(1,'http://gptext.docs.pivotal.io/latest/topics/administering.html'),(2,'http://gptext.docs.pivotal.io/latest/topics/ext-indexes.html'),(3,'http://gptext.docs.pivotal.io/latest/topics/function_ref.html'),(4,'http://gptext.docs.pivotal.io/latest/topics/guc_ref.html'),(5,'http://gptext.docs.pivotal.io/latest/topics/ha.html'),(6,'http://gptext.docs.pivotal.io/latest/topics/index.html'),(7,'http://gptext.docs.pivotal.io/latest/topics/indexes.html'),(8,'http://gptext.docs.pivotal.io/latest/topics/intro.html'),(9,'http://gptext.docs.pivotal.io/latest/topics/managed-schema.html'),(10,'http://gptext.docs.pivotal.io/latest/topics/performance.html'),(11,'http://gptext.docs.pivotal.io/latest/topics/queries.html'),(12,'http://gptext.docs.pivotal.io/latest/topics/type_ref.html'),(13,'http://gptext.docs.pivotal.io/latest/topics/upgrading.html'),(14,'http://gptext.docs.pivotal.io/latest/topics/utility_ref.html'),(15,'http://gptext.docs.pivotal.io/latest/topics/installing.html');INSERT015=#SELECT*FROMgptext.index_external(TABLE(SELECTurlFROMgptext_html_docs),'gptext-docs');dbid|num_docs------+----------3|62|8(2rows)=#SELECT*FROMgptext.commit_index('gptext-docs');commit_index--------------t(1row)
Toadddocumentsfromanexternaldocumentsourcethatrequiresauthentication,suchashdfsoranftpserver,logintotheexternalsystemwiththegptext.external_login() functionbeforeyouaddthedocuments.Withanauthenticateddocumentsource,youcanaddalldocumentsinadirectory,usingthegptext.external_index_dir() function.Seethe gptext.external_index_dir() functionreferenceforanexample.
SearchingGPTextExternalIndexesYoucansearchGPTextexternalindexeswiththestandard gptext.search() functionorwiththe gptext.search_external() function.Thedifferenceisthatthegptext.search() functionreturnsjustthe id , score , hs ,and rf columnsandthe gptext.search_external() functionbydefaultalsoincludesallofthecontent
andmetadatastoredintheexternalindex.YoucanusetheSolr fl (fieldlist)optionwitheitherfunctiontosettheactualfieldsthatareincludedintheresults.
Searchingwithgptext.search()Thissimple gptext.search() examplesearchesfor“Solr”inthe title fieldofthe gptext-docs externalindex.
©CopyrightPivotalSoftware,Inc,2013-2018 70 2.4.0
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'gptext-docs','title:Solr',null,null);id|score|hs|rf-----------------------------------------------------------+-----------+----+----http://gptext.docs.pivotal.io/latest/topics/type_ref.html|0.9745732||(1row)
Toseethetitleofthedocumentthatmatchedthesearch,youmustrequestthefieldwitha fl option.
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'gptext-docs','title:Solr',null,'fl=title');id|score|hs|rf-----------------------------------------------------------+-----------+----+------------------------------------------------------------------------------------------------------------http://gptext.docs.pivotal.io/latest/topics/type_ref.html|0.9745732||{"columnValue":[{"name":"title","value":"GPTextandSolrDataTypeMappings|\nPivotalGPTextDocs"}]}(1row)
The title fieldspecifiedinthefieldlistoftheSolroptionsargumentisreturnedinthe rf columninaJSONdocument.Ifyouwanttoreturnthetitleinitsownresultcolumn,youcanusethe gptext.gptext_retrieve_field() functiontoextractthetextfromtheJSONdocument.Theexpandeddisplay( \xon )psqloptioninthefollowingexamplesmakestheresultseasiertoread.
=#\xonExpandeddisplayison.demo=#SELECTid,score,gptext.gptext_retrieve_field(rf,'title')titleFROMgptext.search(TABLE(SELECT1SCATTERBY1),'gptext-docs','title:Solr',null,'fl=title');-[RECORD1]----------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/type_ref.htmlscore|0.9745732title|GPTextandSolrDataTypeMappings||PivotalGPTextDocs
Searchingwithgptext.search_external()The gptext.search_external() function,bydefault,returnsastandardsetofmetadatafieldsandthecontentofthedocument.Dependingonthecontenttypeofthedocument, gptext.search_external() returnsadditionalmetadataasaJSONdocumentinthe meta column.
Thefollowingexamplesearchreturnsallfieldsstoredinthe gptext-docs indexforthedocumentwiththeword“Installing”inthetitlefield.The content
and meta columnvaluesintheexampleresultsaretruncated.
=#SELECT*FROMgptext.search_external(TABLE(SELECT1SCATTERBY1),'gptext-docs','title:Installing',null,null);-[RECORD1]------------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/installing.htmltitle|InstallingGPText||PivotalGPTextDocssubject|description|comments|author|keywords|category|resourcename|url|content_type|text/html;charset=UTF-8last_modified|links|sha256|F1182EE7D993CB494CAB8480DA47EA2F82DE8F7DCCC4E76745B6FA5FD7E73FC8content|...score|1.4449482Gmeta|{"columnValue":[{"name":"meta_a","value":"..."},{"name":"meta_content_encoding","value":"UTF-8"},{"name":"meta_dc_title","value":"InstallingGPText|\nPivotalGPTextDocs"},{"name":"meta_div","value":"..."},{"name":"meta_form","value":"application/x-www-form-urlencoded,get,/search"},...
Youusuallyonlywantasubsetofthefieldsintheindex.Youcanspecifythefieldsyouwantinthe SELECT clauseorbyaddingthe fl Solroptionintheoptions argumentofthe gptext.search_external() function.Evenifyoulistthedesiredfieldsinthe SELECT clause,specifyingafieldlistintheoptions
argumentismoreefficientbecauseitreducestheamountofdataSolrtransferstoGreenplumDatabase.
ThisexamplesearchesforHTMLdocumentsthathavetheword“Indexes”inthe title field.Afilterquerychoosesdocumentswith“html”inthe
©CopyrightPivotalSoftware,Inc,2013-2018 71 2.4.0
content_type field.Thefieldlistinthe options argumentcontainsjustthe title field.The id , score ,and meta fieldsarealwaysincludedinsearchresults.
=#SELECTid,title,scoreFROMgptext.search_external(TABLE(SELECT1SCATTERBY1),'gptext-docs','title:indexes','{content_type:*html*}','fl=title');id|title|score-----------------------------------------------------------------+----------------------------------------+-----------http://gptext.docs.pivotal.io/latest/topics/ext-indexes.html|WorkingWithGPTextExternalIndexes||1.1593812:PivotalGPTextDocshttp://gptext.docs.pivotal.io/latest/topics/managed-schema.html|CustomizingGPTextIndexes||1.1191859:PivotalGPTextDocshttp://gptext.docs.pivotal.io/latest/topics/indexes.html|WorkingWithGPTextIndexes||0.8013617:PivotalGPTextDocshttp://gptext.docs.pivotal.io/latest/topics/queries.html|QueryingGPTextIndexes||0.8013617:PivotalGPTextDocs(4rows)
HighlightingExternalIndexSearchResultsSolrhighlightingincludesfragmentsofdocumentsthatmatchasearchqueryinthesearchresults,withthequerytermshighlightedwithmarkuptags.Fragmentsarealsocalledsnippetsorpassages.
HighlightingwithGPTextexternalindexesisadifferentprocessthanhighlightingwithregularGPTextindexes.Becausethetextandallmetadataofexternaldocumentsarestoredinanexternalindex,themarkuptagscanbeappliedinSolrbeforereturningsearchresultstoGreenplumDatabase.Withregularindexes,highlightingcanbeperformedonlyforfieldswithtermsenabled,andthensearchresultsmustbejoinedwiththedatabasetablesothatthe gptext.highlight() functioncaninsertthemarkuptagsintothetext.Youcan,however,configurearegularGPTextindexsothatyoustorethefieldsintheindexandperformhighlightinginSolr.Thisrequireseditingtheindex’s solrconfig.xml and managed-schema configurationfiles.SeeHighlightingTermsinStoredFieldsforstepstoenablethisconfiguration.
Solrhighlightingisperformedbyasearchhandlercalleda HighlightComponent ,configuredinthe managed-schema configurationfile.Solrprovideshighlightersthatworksomewhatdifferentlyandhavedifferentconfigurableoptions.GPTextusestheUnifiedHighlighterbydefault.SeeHighlighting
attheApacheSolrwebsitetolearnmoreaboutSolrhighlightingandtheUnifiedHighlighter.
YoucanenablehighlightingforGPTextexternalindexesintheSolroptionsargumentofa gptext.search() query.Usingthismethod,thehighlightedtextisreturnedinaresultcolumnnamed hs ,whichcontainsaJSON-formattedarrayofhighlightedfragments.Youcanaccessthefragmentsusingthegptext.gptext_retrieve_field() function.
Inaddition,GPTextprovidesthe gptext.highlight_external() function,whichunpackshighlightedfragmentsinthesearchresultsintoseparatecolumnsintheGreenplumDatabasesearchresultset.
First,let’slookattheresultsofasearchquerywithhighlightingenabledusingtheSolroptionsargumentinthe gptext.search() function.Thisstatementsearchesthe gptext-docs externalindexfordocumentscontainingtheterm“apache”.TheSolroptionsare:
hl=true –enableshighlighting.
hl.fl=content title –the content fieldwillbehighlighted.
rows=1 –returnjustonedocumentpersegment.
©CopyrightPivotalSoftware,Inc,2013-2018 72 2.4.0
=#SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'gptext-docs','apache','{content_type:*html*}','hl=true&hl.fl=content&rows=1')-[RECORD1]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/ha.htmlscore|0.4548784hs|{"columnValue":[{"name":"content","value":"Refertothe\u003cem\u003eApache\u003c/em\u003eSolrClouddocumentationforhelpusingtheSolrCloudDashboard.\n\n\n"}]}rf|-[RECORD2]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/function_ref.htmlscore|0.05978464hs|{"columnValue":[{"name":"content","value":"Remarks\n\n\nWhenyouaddanexternaldocumenttotheindex,\u003cem\u003eApache\u003c/em\u003eTikaextractsacoresetofmetadatafromthedocument,thecolumnslistedintheReturntypesection."}]}rf|-[RECORD3]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/ext-indexes.htmlscore|1.2426406hs|{"columnValue":[{"name":"content","value":"Solrrecognizesdocumenttypesautomatically,usingcodeincludedfromthe\u003cem\u003eApache\u003c/em\u003eTikaproject.\n\n\n"}]}rf|-[RECORD4]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/administering.htmlscore|0.8155949hs|{"columnValue":[{"name":"content","value":"ZooKeeperAdministration\n\n\n\u003cem\u003eApache\u003c/em\u003eZooKeeperenablescoordinationbetweenthe\u003cem\u003eApache\u003c/em\u003eSolrandPivotalGPTextdistributedprocessesthroughasharednamespacethatresemblesafilesystem."}]}rf|
Inthisexamplethe hs columnhasasinglefragmentfromeachofthereturneddocuments.Youcanusethe hl.snippets and hl.fragsize Solroptionstoset,respectively,themaximumnumberoffragmentstoreturnandtheapproximatenumberofcharactersineachfragment.OtheroptionsyoucanusetocontrolhowtheUnifiedHighlighterchoosesfragmentsare hl.bs.type and hl.maxAnalyzedChars .The hl.bs.type optionspecifieshowthehighlighterbreaksthetextintofragments.Thedefaultis SENTENCE .Othervalidchoicesare SEPARATOR , SENTENCE , WORD , CHARACTER , LINE ,or WHOLE .Thehl.maxAnalyzedChars option,default51200,isthemaximumnumberofcharacterstoanalyzeforhighlighting.
SeeHighlighting intheSolrdocumentationfortablesofoptionsyoucansetandtheirdefaultvalues.
Foranexternalindex,Solrreturnsthehighlightedfragmentsina columnValue arrayinthe hs resultcolumn.Youcanusethe gptext.gptext_retrieve_field()functioninthe SELECT listtoextractthefragmentsfromthearray.
©CopyrightPivotalSoftware,Inc,2013-2018 73 2.4.0
=#SELECTid,score,gptext.gptext_retrieve_field(hs,'content')AScontentFROMgptext.search(TABLE(SELECT1SCATTERBY1),'gptext-docs','apache','{content_type:*html*}','hl=true&hl.fl=content&hl.snippets=3&hl.fragsize=75&rows=1');-[RECORD1]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/function_ref.htmlscore|0.05978464content|Remarks|||Whenyouaddanexternaldocumenttotheindex,<em>Apache</em>Tikaextractsacoresetofmetadatafromthedocument,thecolumnslistedintheReturntypesection.-[RECORD2]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/ext-indexes.htmlscore|1.2426406content|Solrrecognizesdocumenttypesautomatically,usingcodeincludedfromthe<em>Apache</em>Tikaproject.|||,SeeHighlightingatthe<em>Apache</em>SolrwebsitetolearnmoreaboutSolrhighlightingandtheUnifiedHighlighter.|||,Thisstatementsearchesthegptext-docsexternalindexfordocumentscontainingtheterm“<em>apache</em>”.-[RECORD3]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/administering.htmlscore|0.8155949content|ZooKeeperAdministration|||<em>Apache</em>ZooKeeperenablescoordinationbetweenthe<em>Apache</em>SolrandPivotalGPTextdistributedprocessesthroughasharednamespacethatresemblesafilesystem.-[RECORD4]--------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/ha.htmlscore|0.4548784content|Refertothe<em>Apache</em>SolrClouddocumentationforhelpusingtheSolrCloudDashboard.|
GPTextfoundfourdocuments(1foreachsegment)containingthestring“apache”andextractedfromeachdocumentafragmentofthe content fieldcontainingtheterm.
©CopyrightPivotalSoftware,Inc,2013-2018 74 2.4.0
GPTextFunctionReferenceThefollowingfunctionsareavailableinPivotalGPText.
Indexinggptext.create_index()–createsanemptyindex.
gptext.create_index_external()-createsanindexforexternaldocuments.
gptext.index()–populatesanindex.
gptext.index_external()-addsdocumentstoanexternalindex.
gptext.index_external_dir-addsalldocumentsinadirectoryinanexternaldocumentsourcetoaGPTextexternalindex.
gptext.commit_index()–finalizesindexoperations.
gptext.enable_terms()–enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfrom text fields.
gptext.recreate_error_table()–recreatestheerrortablethatrecordserrorsthatoccurwhileaddingdocumentstoanexternalindex.
AuthenticatingwithExternalDocumentSourcesgptext.external_login()–logintoanexternaldocumentstorethatrequiresauthentication.
gptext.external_logout()–logoutofanexternaldocumentstore.
ModifyingorDeletinganIndexgptext.add_field()–addsafieldtoanindex.
gptext.delete()–deletesdocumentsmatchingasearchquery.
gptext.drop_field()–deletesafieldfromanindex.
gptext.drop_index()-deletesanindex.
Searchgptext.search()–searchesanindex.
gptext.search_count()–returnsnumberofdocumentsthatmatchsearch.
gptext.search_external()-searchesaGPTextexternalindex.
gptext.gptext_retrieve_field–extractsasinglefieldfromthe rf searchresultcolumnastext.
gptext.gptext_retrieve_field_int–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoaninteger.
gptext.gptext_retrieve_field_float–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoafloat.
gptext.highlight()–returnssearchresultwithsearchtermhighlighted.
gptext.highlight_external()–applyshighlighingtosearchresultsfromexternalindexes.
FacetedSearch
©CopyrightPivotalSoftware,Inc,2013-2018 75 2.4.0
gptext.faceted_field_search()–search,facetedbyfields.
gptext.faceted_query_search()–search,facetedbyqueries.
gptext.faceted_range_search()–search,facetedbydefinedranges.
WorkingWithTermsgptext.enable_terms()–enablestermvectorsandpositions.
gptext.terms()–getsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.
ConfigurationandMonitoringgptext.cluster_status()-showsstatusofindexesmanagedbytheGPTextcluster.
gptext.config_append()-appendsthecontentsofalocalfiletoaZooKeeperconfigurationfileforanindex.
gptext.config_delete()-deletesanindexconfigurationfilefromZooKeeper.
gptext.config_get()-displaysthecontentsofaZooKeeperindexconfigurationfile.
gptext.config_list()-listsZooKeeperconfigurationfilesanddirectoriesforanindex.
gptext.config_upload()-uploadsanindexconfigurationfiletoZooKeeper.
gptext.index_size()-showsthenumberofdocumentsindexedandtotaldiskspaceusedforGPTextindexes.
gptext.index_status()–showsstatusofreplicasforanindexorforallindexes.
gptext.live_nodes()–listsactiveSolrnodes.
gptext.partition_status()-listspartitionedindexesandchildpartitions.
gptext.reload_index()–reloadsSolrconfigurationfiles.
gptext.version()–returnsversionofGPTextinstallation.
gptext.zookeeper_hosts()–returnsalistoftheZooKeeperhostnamesandports.
HighAvailabilitygptext.add_replica()–Addsareplicaofanindexshard.
gptext.delete_replica()–Deletesareplicaofanindexshard.
GeneralPurposeFunctionsgptext.count_t()–countsnumberofrowsinatable.
PrivilegesYourprivilegestoexecutetheGPTextfunctionsdependonyourGreenplumDatabaseprivilegesforthetablefromwhichtheindexisgenerated.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumdatabase,youhaveSELECTprivilegesforanindexgeneratedfromthattable.
ExecutingindexfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheSecuritysectionoftheGPTextUser’sGuideforinformationaboutsettingprivileges.
ThePrivilegesrequiredsectionforeachoftheGPTextfunctionsspecifiestheprivilegesrequiredtoexecutethatfunction.
©CopyrightPivotalSoftware,Inc,2013-2018 76 2.4.0
UsageThe gptext functionsinthissectionmustbeexecutedasSQLqueriesintheform:
SELECT*FROMgptext.function();
TheexamplesinthisdocumentuseaGreenplumdatabasenamed demo setupasfollows:
Atablenamed articles inthe wikipedia schema.
Atablenamed message inthe twitter schema.
SeeSettingUptheSampleDatabasefordetailsaboutthesetables.
IndexingIndexingfunctionscreate,setup,populate,andfinalize(commit)Solrindexes.
gptext.create_index()CreatesanemptySolrindex.
Syntax
gptext.create_index(<schema_name>,<table_name>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])
or
gptext.create_index(<schema_name>,<table_name>,<p_columns>,<p_types>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])
Parameters
<schema_name>
ThenameoftheschemaintheGreenplumdatabase.<table_name>
ThenameofthetableintheGreenplumdatabase.Ifthetableispartitionedthismustbethenameoftheroottable.<p_columns>
Atextarraycontainingthenamesofthetablecolumnstoindex.If <p_columns> and <p_types> areomitted,alltablecolumnsareindexed.
Thecolumnsmustbevalidcolumnsinthetable.Thecolumnsidentifiedbythe<id_col_name> and <def_search_col_name> mustbeincludedinthearray.
Ifthe <p_columns> parameterissupplied,the <p_types> parametermustalsobesupplied.Thesizesofthe <p_columns> and <p_types> arraysmustbethesame.
<p_types>
AtextarraycontainingtheSolrdatatypesofthecolumnsinthe <p_columns> array.
Texttypescanbemappedtothenameofananalyzerchain,forexample <text_intl> , <text_sm> ,oranytypedefinedinthe <managed_schema> .SeeMapGreenplumDatabaseDataTypestoSolrDataTypesforequivalentSolrdatatypesforotherGreenplumtypes.
The <p_types> parametermustbesuppliedifthe <p_columns> parameterissupplied.
<id_col_name>
©CopyrightPivotalSoftware,Inc,2013-2018 77 2.4.0
Thenameofacolumnin <table_name> thatisuniqueforeachrow.Thecolumnmustbeoftype int4 , int8 , varchar , text ,or uuid .<def_search_col_name>
Thenameofthedefaultcolumntosearchin <table_name> ,ifnoothercolumnisnamedinaquery.<if_check_id_uniqueness>
Optional.ABooleanvalue.Thedefaultistrue.Settofalsetoindexatablewithanon-uniqueIDfield.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
AGPTextindexisaSolrcollection.
Thecontentsofthe <id_col_name> columnshould,inmostcases,beauniqueIDforeachrow.Itmustbeoftype int4 , int8 , varchar ,or text .
Ifthe <if_check_id_uniqueness> argumentistrue,thedefault,adocumentwithanIDmatchinganexistingIDcannotbeaddedtotheindex.
Ifthe <if_check_id_uniquess> argumentisfalse,documentswithduplicateIDsareallowedtobeaddedtotheindex.ThecontentofotherfieldsmayormaynotbethesameasexistingdocumentswiththesameID.WhenaqueryreturnsmultipledocumentswiththesameID,itistheuser’sresponsibilitytoanticipateandhandlethemultipledocuments.Forexample,atablecouldhavea revision columnthatisincrementedwhenanewversionofadocumentisaddedtotheindex,allowingqueriesthatomitallbutthemostrecentversionfromsearchresults.
Thenameoftheindexcreatedhastheformat:
<database_name>.<schema_name>.<table_name>
Whenthetableispartitioned,theGPTextindexcreatedforthetablewillcontainrecordsforallpartitions.Ifyouspecifythenameofasubpartitiontableinthisfunctionanerrorisreturned.Theindexrecordsfordocumentsaddedtotheindexhavea __partition fieldcontainingthenameofthechildpartitiontable.SeeSearchingPartitionedTablesforsyntaxtosearchbypartitions.
Populatethenewindexwith gptext.index() .
Thenumberofreplicasforeachshardisdeterminedwhentheindexiscreated.Itisthevalueofthe gptext.replication_factor serverconfigurationparameter,2bydefault.
Ifthe gptext.failover_factor serverconfigurationparameterisset, gptext.create_index() failsiftheratioofthenumberofGPTextnodesthatareuptothetotalnumberofGPTextnodesislessthanthe gptext.failover_factor value(from0.0to1.0).IndexshardscanonlybecreatedonactiveGPTextnodes,sothegptext.failover_factor parameterpreventsoverloadingtheactiveGPTextnodeswhentoomanynodesaredown.
Toindexapartitionedtable,specifythenameoftheroottable.The gptext.index() functionreturnsanerrorifyouspecifythenameofachildpartitiontable.
ExamplesCreateanindex, demo.wikipedia.articles ,with content asthedefaultsearchfield.
=#SELECT*FROMgptext.create_index('wikipedia','articles','id','content');
Createanindex, demo.wikipedia.articles ,with content asthedefaultsearchfield.Indexthe id , content ,and title fields.
=#SELECT*FROMgptext.create_index('wikipedia','articles','{"id","content","title"}','{"long","text","text"}','id','content');
©CopyrightPivotalSoftware,Inc,2013-2018 78 2.4.0
gptext.create_index_external()CreatesanemptySolrindexforexternaldocuments.
Syntax
gptext.create_index_external(<index_name>)
Parameters
<index_name>
Thenameoftheindextocreate.Thenamecannotcontainperiods( . ).
Notes
AGPTextexternalindexisaSolrindexfordocumentsexternaltoGreenplumDatabase,forexample,PDF,MicrosoftWord,XML,andHTMLfiles.UnlikeregularGPTextindexes,externalindexesarenotassociatedwithaGreenplumDatabasetable,buttheycanbesearchedwithGPTextsearchfunctions.
Example
=#SELECT*FROMgptext.create_index_external('gptext-docs');
gptext.enable_terms()Enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfromfieldsofdatatype text .
Syntax
gptext.enable_terms(<index_name>,<field_name>)
Parameters
<index_name>
Thenameoftheindexforwhichyouwanttoenableterms.<field_name>
Thenameofthefieldforwhichyouwanttoenableterms.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
Solrcanmarktermsandtheirpositionsindocumentswhenindexing.Thiscapabilityisdisabledbydefault.Use gptext.enable_terms() toenablethecapability.
©CopyrightPivotalSoftware,Inc,2013-2018 79 2.4.0
Call gptext.enable_terms() foreachfieldwhereyouwanttoenableterms.
Aftercallingthisfunction,youmustindexorre-indexwithgptext.index().
Examples
=#SELECT*FROMgptext.enable_terms('demo.twitter.message','message_text');WARNING:Enabletermsforfield:message_textofindex:demo.twitter.messagesuccessfully.Reindexdataneeded.enable_terms--------------t(1row)
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message');dbid|num_docs------+----------3|9472|1020(2rows)
=#SELECT*FROMgptext.commit_index('demo.twitter.message');commit_index--------------t(1row)
gptext.index()Populatesanindexbyindexingdatainatable.
Syntax
gptext.index(TABLE(SELECT*FROM<table_name>),<index_name>)
Parameters
TABLE(SELECT * FROM <table_name>)
Thetabletobeindexed,withdatatype anytable .<index_name>
Nameoftheindexthatwascreatedwith gptext.create_index() andistobepopulated.
Returntype
SETOFdbidINT,num_docsBIGINT
where dbid isthe dbid ofthesegmentthatthedocumentsweresentto,and num_docs isthenumberofdocumentsthatwereindexed.
Privilegesrequired
YoumusthavetheINSERTorUPDATEprivilegetoexecutethisfunction.
Remarks
<index_name> musthavebeencreatedwith gptext.create_index() .
Theargumentstothe gptext.index() functionmustbeexpressions.Forexample, TABLE(SELECT*FROMarticles) createsa“table-valuedexpression”fromthearticlestable,usingthetablefunction TABLE .
©CopyrightPivotalSoftware,Inc,2013-2018 80 2.4.0
Youcanselectivelyindex/updatebychangingtheinnerselectlistinthequery.
Aftersuccessfullyindexing,youmustcommittheindexwith gptext.commit_index(<index_name>) .
Theoutputincludesatwo-columntablewith dbid (theGreenplumsegmentID)and num_docs (thenumberofdocumentsaddedtotheindexforthatsegment)asthecolumns.
Note:Becarefulaboutdistributionpolicies:
Thefirstparameterof gptext.index() is TABLE(SELECT*FROM<table_name>) .Thequeryinthisparametershouldhavethesamedistributionpolicyasthetableyouareindexing.Therearetwocaseswherethequerywillnothavethesamedistributionpolicy:
1. Yourqueryisajoinoftwotables
2. Youareindexinganintermediatetable(stagingtable)thatisdistributeddifferentlythanthefinaltable.
Whenthedistributionpoliciesdiffer,youmustspecify "SCATTERBY"
forthequerylikethis:
TABLE(SELECT*FROMmessagesSCATTERBY<distrib_id>) ,
where <distrib_id> isthedistributionidusedwhenyoucreatedyourprimary/finaltable.
Example
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMwikipedia.articles),'demo.wikipedia.articles');dbid|num_docs------+----------3|62|5(2rows)
gptext.index_external()AddsdocumentsstoredoutsideofGreenplumDatabasetoaGPTextexternalindex.
Syntax
gptext.index_external(<url-list>,<index-name>)
Parameters
<url-list>
AlistofURLsfordocumentstoaddtotheGPTextexternalindex.TheURLsmaybeexpressedasanarrayorasatable-valuedexpression.<index-name>
Thenameoftheindextowhichthedocumentsaretobeadded.
Remarks
IfthedocumentcannotberetrievedatthegivenURL,oranerroroccurswhileindexingthedocument,GPTextinsertsarowinthe gptext.error_table table.Youcanuse gptext.recreate_error_table() tocreateanemptyerrortablebeforeyoucall gptext.index() .
ThevalueoftheGPTextcustomserverparameter gptext.idx_segment_error_limit (default10)isthenumberoferrorsthatcanoccuronanyonesegmentbeforetheindexingoperationiscanceled.
Whenaddingadocumenttoanexternalindex,GPTextcalculatesa256-bithashonthecontentsofthedocument.Thehashisstoredasa64-bytehexadecimalvalueinthe sha256 field.IfyoulateraddadocumentwithaURLmatchinganexistingdocumentintheindex,thenewdocumentisonlyaddedtotheindexifthenewlycalculatedhashdiffersfromthecurrentvalueinthe sha256 field.
©CopyrightPivotalSoftware,Inc,2013-2018 81 2.4.0
Examples
ThisexampleaddsasinglePDFdocumenttotheindex gptext-docs .
=#SELECT*FROMgptext.index_external('{http://gptext.docs.pivotal.io/archives/GPText-docs-213.pdf}','gptext-docs');dbid|num_docs------+----------3|02|1(2rows)
ThisexampleaddsmultipleHTMLdocumentstothe gptext-docs externalindexbyselectingURLsfromadatabasetable.Errorswillbeloggedinthegptext.gptext_errrors table.
=#DROPTABLEIFEXISTSgptext_html_docs;=#CREATETABLEgptext_html_docs(idbigint,urltext)DISTRIBUTEDBY(id);CREATETABLE=#INSERTINTOgptext_html_docsVALUES(1,'http://gptext.docs.pivotal.io/latest/topics/administering.html'),(2,'http://gptext.docs.pivotal.io/latest/topics/ext-indexes.html'),(3,'http://gptext.docs.pivotal.io/latest/topics/function_ref.html'),(4,'http://gptext.docs.pivotal.io/latest/topics/guc_ref.html'),(5,'http://gptext.docs.pivotal.io/latest/topics/ha.html'),(6,'http://gptext.docs.pivotal.io/latest/topics/index.html'),(7,'http://gptext.docs.pivotal.io/latest/topics/indexes.html'),(8,'http://gptext.docs.pivotal.io/latest/topics/intro.html'),(9,'http://gptext.docs.pivotal.io/latest/topics/managed-schema.html'),(10,'http://gptext.docs.pivotal.io/latest/topics/performance.html'),(11,'http://gptext.docs.pivotal.io/latest/topics/queries.html'),(12,'http://gptext.docs.pivotal.io/latest/topics/type_ref.html'),(13,'http://gptext.docs.pivotal.io/latest/topics/upgrading.html'),(14,'http://gptext.docs.pivotal.io/latest/topics/utility_ref.html'),(15,'http://gptext.docs.pivotal.io/latest/topics/installing.html');INSERT015=#SELECT*FROMgptext.index_external(TABLE(SELECTurlFROMgptext_html_docs),'gptext-docs','gptext_docs_errors');dbid|num_docs------+----------3|62|8(2rows)
=#SELECT*FROMgptext.gptext_docs_errors;-[RECORD1]-------------------------------------------------------------------------------------------error_time|2000-01-0100:25:11.282769index_name|gptext-docssqlcmd|errmsg|Code:RUNTIME_ERROR,Message:'http://gptext.docs.pivotal.io/210/topics/ext-indexes.html.'rawdata|http://gptext.docs.pivotal.io/latest/topics/ext-indexes.htmlrawbytes|
=#SELECT*FROMgptext.commit_index('gptext-docs');commit_index--------------t(1row)
gptext.index_external_dir()AddsalldocumentsinadirectoryinanexternaldocumentsourcetoaGPTextexternalindex.
Syntax
gptext.index_external_dir(<directory_url>,<index_name>)
©CopyrightPivotalSoftware,Inc,2013-2018 82 2.4.0
Parameters
<directory_url>
TheURLforthedirectorywithdocumentstoaddtotheindex.
<index_name>
ThenameoftheGPTextexternalindextowhichthedocumentsaretobeadded.
Remarks
The gptext.index_external_dir() functionaddsthedocumentsinadirectoryanditssubdirectoriestoaGPTextexternalindex.
Logintothedocumentsourceusingthe gptext.external_login() functionbeforeyoucallthe gptext.index_external_dir() function.
Ifyouspecifyafileinsteadofadirectory,anerrorisaddedtothe gptext.error_table table.
TheIDforeachfileaddedtotheindexistheURLforthefileintheexternaldocumentsource.
TheApacheTikalibrarydiscoversthe content_type foreachfile.
Theuserwhologsintotheexternaldocumentsourcemusthavereadpermissionsonthedirectory.The gptext.index_external_dir() functionaddstotheindexonlythosedocumentsandsubdirectoriesthattheuserhaspermissiontoread.
Example
ThisexampleaddsdocumentsfromanhdfsstoretotheGPText webdocs externalindex.
#=SELECT*FROMgptext.external_login('hdfs','hdfs://myhadoop:9000','myhadoop');external_login----------------t(1row)
=#SELECT*FROMgptext.index_external_dir('hdfs://myhadoop:9000/gptext_web/public/230/','webdocs');num_docs----------37(1row)
=#SELECT*FROMgptext.commit_index('webdocs');commit_index--------------t(1row)
=#SELECT*FROMgptext.external_logout('hdfs');external_logout-----------------t(1row)
gptext.commit_index()Finishesanindexoperation.Theresultsofanindexingoperationarenotavailableuntilthisfunctioniscalledfortheindex.
Syntax
gptext.commit_index(<index_name>)
Parameters
<index_name>
©CopyrightPivotalSoftware,Inc,2013-2018 83 2.4.0
Thenameoftheindextocommit.Ifthetableispartitionedthismustbethenameoftheroottable.
Returntype
boolean
Privilegesrequired
YoumusthavetheINSERT,UPDATE,orDELETEprivilegetoexecutethisfunction.
Remarks
Mustbecalledaftergptext.index()andgptext.delete().
Example
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
gptext.recreate_error_table()Dropsandrecreatesthe gptext.error_table databasetable.
Syntax
gptext.recreate_error_table()
Returntype
Boolean
Remarks
IfanerroroccurswhileaccessingadocumenttoaddtoaGPTextexternalindex,GPTextaddsarecordtothe gptext.error_table table.See gptext.error_tableforadescriptionofthistable.Usersshouldnotdropormodifythetable.
Rowsaddedtothe gptext.error_table tableremainuntilyouusethe gptext.recreate_error_table() functiontocreateanewemptytable.
Ifyouattempttoexecute gptext.recreate_error_table() whileitisinuseforanindexingoperation,awarningisraisedandthefunctionreturnsfalsewithoutrecreatingthetable.
Examples
=#SELECTgptext.recreate_error_table();recreate_error_table----------------------t(1row)
©CopyrightPivotalSoftware,Inc,2013-2018 84 2.4.0
AuthenticatingwithExternalDocumentSourcesToadddocumentsfromadocumentsourcethatrequiresauthentication,suchasHadoophdfsoranftpserver,loginbeforeaddingdocumentstoaGPTextindexandlogoutwhendone.
gptext.external_login()LogsintoanexternaldocumentsourcebeforeaddingdocumentsfromthesourcetoaGPTextexternalindex.
Syntax
gptext.external_login(<type>,<url>,<config-name>)
Parameters
<type>
Identifiesthetypeoftheexternaldocumentsource.Thetypes 'ftp' and 'hdfs' aresupported.Thetypeisnotcase-sensitive.
<url>
TheURLoftheexternaldocumentsource.
<config-name>
Thenameoftheconfigurationuploadedwiththe gptext-external upload utilitycommand.
Remarks
Youcanlogintoonlyoneexternaldocumentsourceatatime.
Usethe gptext-externallist commandtolisttheconfigurationsthathavebeenuploaded.
ExampleLogintoanhdfsfilesystemusingthe myhdfs configuration.
=#SELECT*FROMgptext.external_login('HDFS','hdfs://198.51.100.23:19000','myhdfs');
Logintoaftpserverusingthe myftp configuration.
=#SELECTgptext.external_login('ftp','ftp://198.51.100.23','myftp');
gptext.external_logout()LogsoutofanexternaldocumentsourceafteraddingdocumentsfromthesourcetoaGPTextexternalindex.
Syntax
gptext.external_logout(<type>)
Parameters
©CopyrightPivotalSoftware,Inc,2013-2018 85 2.4.0
<type>
Identifiesthetypeoftheexternaldocumentsource.Thesupportedtypesare 'ftp' and 'hdfs' .Thetypeisnotcase-sensitive.
Remarks
Youcanlogintoonlyoneexternaldocumentsourceatatime.Toindexdocumentsfromanothersource,youmustfirstcall gptext.external_logout() andthenlogintothenewsourcewith gptext.external_login() .
ExampleLogoutofanhdfsfilesystem.
=#SELECT*FROMgptext.external_logout('HDFS');
Logoutofanftpserver.
=#SELECTgptext.external_logout('ftp');
ModifyingorDeletinganIndexYoucanchangeanindexbyaddingordroppingfields,revertinganindextoitspreviousstate,ordeletingtheindex.
gptext.add_field()Addsafieldtoyourschemaifthefieldwasaddedtothedatabaseaftertheindexwascreated.
Syntax
gptext.add_field(<index_name>,<field_name>[,<is_default_search_col>[,<if_enable_terms>]])
Parameters
<index_name>
Thenameoftheindextowhichyouwanttoaddthefield.Ifthetableispartitionedthismustbethenameoftheroottable.<field_name>
Thenameofthefieldtobeindexed.<is_default_search_col>
Optional.Booleanvalue.Isthistobecomethedefaultsearchcolumn(field)?<if_enable_terms>
Optional.Booleanvalue.EnabletermssupportonthisfieldwhenaddedtotheGPTextindex.
Returntype
SETOFboolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
©CopyrightPivotalSoftware,Inc,2013-2018 86 2.4.0
Remarks
Callthisfunctionforeachfieldyouadd.
Beforeandafteryouaddoneormorefields,reloadtheSolrconfigurationfileswith gptext.reload_index() .Theinitial reload_index() callisrequiredbecauseofSolr4.0behaviorandmaynotberequiredinsubsequentversions.
Afteryouaddoneormorefields,youshouldreindexthetableandcommittheindexwith gptext.commit_index() .
Example
Addsthefield external_links totheindex,thenrecreates,repopulates,andcommitstheindex.
=#ALTERTABLEwikipedia.articlesADDexternal_linkstext;ALTERTABLE=#SELECT*FROMgptext.reload_index('demo.wikipedia.articles');reload_index--------------t(1row)=#SELECT*FROMgptext.add_field('demo.wikipedia.articles','external_links',false,false);INFO:Addfield:external_linksforindex:demo.wikipedia.articlesadd_field-----------t(1row)
=#SELECT*FROMgptext.reload_index('demo.wikipedia.articles');reload_index--------------t(1row)
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
gptext.delete()Deletesalldocumentsthatmatchthesearchquery.
Syntax
gptext.delete(<index_name>,<query>)
Parameters
<index_name>
Thenameoftheindex.<query>
Documentsmatchingthisquerywillbedeleted.Todeletealldocumentsusethequery '*' or '*:*' .
Returntype
boolean
Privilegesrequired
YoumusthavetheDELETEprivilegetoexecutethisfunction.
©CopyrightPivotalSoftware,Inc,2013-2018 87 2.4.0
Remarks
Afterasuccessfuldelete,committheindexusing gptext.commit_index(<index_name>) .
ExamplesDeletealldocumentscontainingtheword“unverified”inthedefaultsearchfield:
=#SELECT*FROMgptext.delete('demo.wikipedia.articles','unverified');delete--------t
(1row)=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
Deletealldocumentsfromtheindex:
=#SELECT*FROMgptext.delete('demo.wikipedia.articles','*:*');delete--------t(1row)
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
gptext.drop_field()Removesafieldfromyourschema.
Syntax
gptext.drop_field(<index_name>,<field_name>)
Parameters
<index_name>
Thenameoftheindexfromwhichtodropthefield.Ifthetableispartitionedthismustbethenameoftheroottable.<field_name>
Thenameofthefieldtodrop.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
©CopyrightPivotalSoftware,Inc,2013-2018 88 2.4.0
Callthisfunctionforeachfieldyoudrop.
Beforeandafterdroppingoneormorefields,youmustreloadtheSolrconfigurationfileswith gptext.reload_index() ,thencommittheindexwithgptext.commit_index() .
Thecolumn __partition inindexesforpartitioneddatabasetablescannotbedropped.
Theinitial reload_index() isrequiredbySolr4.0behaviorandmaynotbenecessaryinsubsequentversions.
Example
Dropsthefield external_links fromtheindex.
=#SELECT*FROMgptext.reload_index('demo.wikipedia.articles');reload_index--------------t(1row)
=#SELECT*FROMgptext.drop_field('demo.wikipedia.articles','external_links');INFO:Dropfield:external_linksforindex:demo.wikipedia.articlesdrop_field------------t(1row)
=#SELECT*FROMgptext.reload_index('demo.wikipedia.articles');reload_index--------------t(1row)
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
gptext.drop_index()Removesanindex.
Syntax
gptext.drop_index(<index_name>)
Parameters
<index_name>
Thenameoftheindextodrop.Ifthedatabasetableispartitioned,thismustbethenameoftheroottable.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
©CopyrightPivotalSoftware,Inc,2013-2018 89 2.4.0
Remarks
Adroppedindexcannotberecovered.
Example
=#SELECT*FROMgptext.drop_index('demo.wikipedia.articles');drop_index------------t(1row)
SearchSearchfunctionsenablequeryinganindex.
ChangingthequeryparseratquerytimeWhenusingthesearchfunctions,youcanchangethequeryparserusedbySolratquerytime.Adifferentqueryparsermayberequired,dependingonthenatureofthequery.SeeUsingAdvancedQueryingOptionsforalistofthequeryparsersGPTextsupports.
Tochangethequeryparseratquerytime,usethe defType Solroptionwiththe gptext.search() function.
Tochangethequeryparserforanysearchfunctionatquerytime,usetheSolrlocalParamssyntax,replacingthe <query> termwith'{!type=edismax}<query>' .
WiththeGPTextUniversalQueryParser,youcanusefeaturesfromanyoftheothersupportedqueryparsersinasinglequery.TousetheUniversalQueryParser,replacethe <query> termwith '{!gptextqp}<query>' .SeeUsingtheUniversalQueryParserforinformationandexamples.
gptext.search()Searchesanindex.
Syntax
gptext.search(<src_tbl>,<index_name>,<search_query>,<filter_queries>[,<options>])
Parameters
<src_table>
Specifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.The <src_table> parameterisan anytable datatype,specifiedinthisformat:
TABLE(SELECT*FROM<src_table>)
<index_name>
Thenameoftheindextosearch.Ifthedatabasetableispartitionedyoucanspecifythenameofasub-partitiontabletosearch.<search_query>
TextvaluecontainingaSolrtextsearchquery.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
©CopyrightPivotalSoftware,Inc,2013-2018 90 2.4.0
Returntype
SETOFgptext.search_scored_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
id text
score doubleprecision
hs(conditional) gptexthstore
rf(conditional) text
The id columnisreturnedastext,evenifthe <id_col> specifiedinthe gptext.create_index() functionisanintegertype.Ifyouorderresultsby id orjoinsearchresultswiththeoriginaltableon id ,youmustcastthereturned id columntothecorrectintegertypeinyourquery.Forexample,thefollowingsearchquerycaststhe id returnedbythesearchquerytoanINT8typetojoinwiththenumeric id columninthe wikipedia.articles tableandtosorttheresultsnumerically.However,the id columnintheresultsisatextvalueandisthereforedisplayedleft-justified.
SELECTs.id,s.score,a.titleFROMwikipedia.articlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.wikipedia.articles','*:*',null)sWHEREs.id::INT8=a.idORDERBYs.id::INT8;id|score|title----------+-------+---------------------------25784|1|Renewableenergy27743|1|Solarenergy54838|1|Biogas55017|1|Fusionpower65499|1|Soilsalinity113728|1|Geothermalenergy213555|1|Solarupdrafttower533423|1|Solarwaterheating608623|1|Ethanolfuel855056|1|Forwardosmosis2008322|1|Vehicle-to-grid2120798|1|Lithiumeconomy2988035|1|Vortexengine4711003|1|Osmoticpower7906908|1|Biomass13021878|1|Geothermalpower13690575|1|Solarpower14090587|1|Low-carbonpower14205946|1|Algaefuel18965585|1|Pressure-retardedosmosis22391303|1|Liquidnitrogenengine26787212|1|Reverseelectrodialysis53716476|1|Seaweedfuel(23rows)
Ifthe <options> parameterisincludedin gptext.search() ,theresultincludesthe offsets column.Thiscolumncontainskey-valuepairs,wherethekeyisthecolumnnameandthevalueisacomma-separatedlistofoffsetstolocationswherethesearchtermoccurs.Thisdataisusedbythe gptext.highlight()
functiontoaddhighlightingtagstothecolumndata.Ifhighlightingisnotenabledwiththe 'hl=true' option,the offsets columnis NULL .
Ifthe fl optionisincludedinthe <options> parametertospecifyadditionalfieldstoaddtotheresult,the rf columncontainstheadditionalfieldsinaformattedtextvalue.The gptext.gptext_retrieve_field() functioncanbeusedtoextractasinglefieldvaluefromthe rf column.Therearevariantsofthegptext.gptext_retrieve_field() functiontoretrieveintegerandfloatvaluesfromthe rf columnvalue.
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Solroptions
Solrqueriesallowthefollowingoptionalrefinements,specifiedasanampersand-delimitedlistinthe options parameter.
defType
©CopyrightPivotalSoftware,Inc,2013-2018 91 2.4.0
Thenameofthequeryparsertouseforthisquery.
Example: defType=edismax
rows
Themaximumnumberofrowstoreturnpersegment.Ifomitted,allrowsarereturned.
Example: rows=100 returns100rowspersegmentorallrowsiftherearefewerthan100.
sort
Sortsonafieldorscoreinascendingordescendingorder.
Examples:
sort=score desc (defaultifnosortdefined)
sort=date_time asc
sort=date_time asc score desc sortson date_time ascending,thenon score descending
start
Thenumberofthefirstrecordtoreturn.
Examples:
start=0 default:returnedrecordsstartwithrecord0
start=25 returnedrecordsstartwithrecord25
hl
Enablehighlighting.
Example: hl=true
hl.fl
Comma-separatedlistoffieldnamestoconsiderwhenhighlighting.
Examples:
hl.fl=message_text
hl.fl=title,content
fl
Comma-separatedlistoffieldstoincludeinsearchresults.Thefieldsmusthavebeensettostored=true inthemanaged-schemafortheindex.
Example: fl=title,refs
RemarksTheoutputincludesatablewithcolumns id (theIDnamedingptext.create_index())and score (the tf-idf score).Acolumnnamed offsets isincludedifhighlightingisspecifiedinthe options parameter.Acolumnnamed rf isincludedwhenalistofadditionalfieldstoincludeisspecifiedinthe <options> parameter.
Tochangethequeryparseratquerytime,specifythe defType optionintheoptionsparameterlist.Forexample,settingtheoptionsparameterto'rows=100&defType=edismax' limitstheoutputto100rowspersegmentandwillchangethequeryparserto edismax .
The TABLE queryisplannedandaffectstheestimatefor gptext.search() ,butdoesnotexecute.Forexample,ifyourqueryincludes
gptext.search(TABLE(SELECT*FROMt),...)
thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.
Ifyoudonotspecifyoptions, gptext.search() returnsallrows.
Theoptionsseparatorhaschangedfromcommatoampersand(&)inordertosupporthighlighting.Ifyoudonotusehighlighting,youcanreverttousingthecommaseparatorbysettingthe gptext.search_param_separator to ',' .
TheSolroption rows specifiesthemaximumnumberofrowstoreturnpersegment.Forexample,ifyouhavefoursegments, 'rows=100' returnsatmostatotalof400rows.Tolimitthenumberofrowsreturnedforanentirequery,seta LIMIT intheSQLquery.Forexample,thefollowingqueryreturnsatmost20rowsforthequery:
©CopyrightPivotalSoftware,Inc,2013-2018 92 2.4.0
=#SELECTt.id,q.score,t.message_textFROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','*:*',null,null)qWHEREt.id=q.id::int8LIMIT'20';
The gptexthstore typeisalimitedformofthePostgres hstore type.Itonlyhasthe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in
and gptext_hstore_out .
ExamplesRunsaGPTextquerythatlooksforWikipediaarticlesthatcontaintheterm“optimization”,andjoinstheresultstotheoriginalGreenplumDatabasearticles table:
=#SELECTa.id,a.title,q.scoreFROMwikipedia.articlesa,gptext.search(TABLE(SELECT*FROMwikipedia.articles),'demo.wikipedia.articles','optimization',null)qWHEREa.id=q.id::int8ORDERBYscoreDESC;id|title|score----------+---------------------------+------------213555|Solarupdrafttower|1.552886218965585|Pressure-retardedosmosis|0.895408457906908|Biomass|0.869263625784|Renewableenergy|0.7473389533423|Solarwaterheating|0.7186527608623|Ethanolfuel|0.694370627743|Solarenergy|0.69437062008322|Vehicle-to-grid|0.635281255017|Fusionpower|0.634744914205946|Algaefuel|0.634744913690575|Solarpower|0.58286035(11rows)
Returns5rowsfromeachsegmentwiththetext“iphone”highlightedinthe message_text column.Thisexamplerequiresthatyouhaveenabledtermsonthe message_text fieldinthe demo.twitter.message table.Seetheexampleinthe gptext.enable_terms() reference.
=#SELECTt.id,gptext.highlight(t.message_text,'message_text',s.hs)messageFROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=5&hl=true&hl.fl=message_text')sWHEREt.id=s.id::int8;id|message----------+---------------------------------------------------------------------------------------------------------------------19714120|@ayee_Eddy2011Ilovepancakestoo!#<em>iPhone</em>#app19284329|#nowplayingonmy<em>iPhone</em>:DaftPunk-"DigitalLove"19416451|I'minlovewithmynew<em>iPhone</em>(:20257190|Lovemy#<em>iphone</em>-onlyproblemnow?Iwantan#Ipad!20759274|Droppedfrutopiaon
:Myphone...#ciaowaterdamageIhate<em>iPhones</em>.20473459|ilovemyiphone4butI'mexcitedtoseewhattheiphone5hastooffer#gadgets#<em>iphone</em>#apple#technology19424811|Ihate
:<em>iPhones</em>:20663075|RT@indigoFKNvanity:Ihatetheautocorrecton<em>iPhones</em>!!!!!!!!!20350436|Iabsolutelylovehowfastthisphoneworks.Lovethe<em>iPhone</em>.20042822|@KDMC23ohhhh!!!Ihate<em>Iphone</em>Talk!(10rows)
gptext.search_count()Returnsthenumberofdocumentsthatmatchthesearchquery.
Syntax
gptext.search_count(<index_name>,<search_query>,<filter_queries>,<options>)
©CopyrightPivotalSoftware,Inc,2013-2018 93 2.4.0
Parameters
<index_name>
Thenameoftheindex.<search_query>
Thesearchquery.<filter_queries>
Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Example
=#SELECT*FROMgptext.search_count('demo.wikipedia.articles','bubble',null);count-------3(1row)
gptext.search_external()SearchesaGPTextexternalindex.
Syntax
gptext.search_external(<table-exp>,<index_name>,<search_query>,<filter_queries>[,<options>])
Parameters
<table>
Atable-valuedexpression.Becauseexternalindexesarenotassociatedwithadatabasetable,thisparameterisignored.Anexpressionlikethefollowingissufficient:
TABLE(SELECT1SCATTERBY1)
<index_name>
ThenameoftheGPTextexternalindextosearch.<search_query>
TextvaluecontainingaSolrtextsearchquery.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
©CopyrightPivotalSoftware,Inc,2013-2018 94 2.4.0
SETOFgptext.search_external_result
Thistypehasthefollowingcolumns:
Column Type
id text
title text
subject text
description text
comments text
author text
keywords text
category text
resourcename text
url text
content_type text
last_modified text
links text
sha256 text
content text
score doubleprecision
meta text
Thelastcolumn, meta ispresentonlyiftheoptional <options> argumentisincludedinthesearch.
Remarks
Whenyouaddanexternaldocumenttotheindex,ApacheTikaextractsacoresetofmetadatafromthedocument,thecolumnslistedintheReturntypesection.Ifanyofthesecoremetadatavaluesarenotpresentordonotexistinthedocumenttype,thevalueofthecolumnintheresultrowisnull.
Ifthe <options> argumentissupplied,theresultscontainanadditionaltextcolumnnamed meta .The meta columncontainsadditionaldocument-type-specificmetadata.Youcanusethe gptext.gptext_retrieve_field() functionanditsvariantstoextractindividualmetadatavaluesbynamefromthe meta
column.
Ifthe <options> argumentcontainsthe fl=<list> Solroption,Solrreturnsvaluesonlyforthecolumnsincludedin <list> andthe id , score ,and meta
columns.Othercolumnsintheresultsetwillhavenullvalues.ItismoreefficienttofilteroutcolumnsinSolrthantoretrieveallcolumnsfromSolrandthenchooseasubsetofcolumnsintheSQL SELECT statement.
ExamplesFindsHTMLdocumentscontainingtheterm“facet”.
=#\xon=#SELECTid,titleFROMgptext.search_external(TABLE(SELECT1SCATTERBY1),'gptext-docs','facet','{content_type:*html*}');-[RECORD1]--------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/function_ref.htmltitle|GPTextFunctionReference||PivotalGPTextDocs-[RECORD2]--------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/queries.htmltitle|QueryingGPTextIndexes||PivotalGPTextDocs-[RECORD3]--------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/guc_ref.htmltitle|GPTextConfigurationParameters||PivotalGPTextDocs
©CopyrightPivotalSoftware,Inc,2013-2018 95 2.4.0
Althoughjusttwocolumnsareinthisresultset,thedataGreenplumDatabasereceivesfromSolrincludesallcoremetadatafields,includingthecontent field,whichcontainsthefulltextofthedocument.ThenextexampleshowshowtolimitthedatatransferredfromSolrtoGreenplumDatabase
withtheSolroptionsargument.
Thisexampleliststhe id , title , sha256 ,and score columnsfromthecoremetadataandextracts meta_creation_date fromtheadditionalmetadatasuppliedforPDFdocumentsinthe gptext-docs externalindex.The fl=title,sha256 SolroptionpreventsSolrfromtransferringunneededfieldsfromtheindextoGreenplumDatabase.(The id and score columnsarealwaystransferred.)
=#\xon=#SELECTid,title,sha256,score,gptext.gptext_retrieve_field(meta,'meta_creation_date')createdFROMgptext.search_external(TABLE(SELECT1SCATTERby1),'gptext-docs','*:*','{content_type:*pdf*}','fl=title,sha256');-[RECORD1]-------------------------------------------------------------id|http://gptext.docs.pivotal.io/archives/GPText-docs-213.pdftitle|PivotalGPText2.1.3Documentation|PivotalGPTextDocssha256|2E063DF5037B9ACC6E180681AE6838077BC5F7A362B4A1E67D9D8FF3E4DD7F3Dscore|1created|2017-09-22T17:22:54Z,2017-09-22T17:22:54Z-[RECORD2]-------------------------------------------------------------id|http://gpdb.docs.pivotal.io/latest/pdf/GPDB510Docs.pdftitle|Version5.1.0sha256|AF0B71D032C99A6BE817E1FA2FB774EB7B4D47D75A755ABF54F4F60FEBB92FF7score|1created|2017-10-31T19:12:41Z,2017-10-31T19:12:41Z
gptext.gptext_retrieve_field()Retrievesasinglefieldfromthe rf or meta searchresultcolumnasatextvalue.
Syntax
gptext.gptext_retrieve_field(rf|meta,<field_name>)
Parameters
rf | meta
ThenameofthecolumninwhichGPTextreturnsfields.Thisis rf forsearchresultsfromregularGPTextindexesand meta forsearchresultsfromGPTextexternalindexes.
<field_name>
Thenameofthefieldtoretrieve.
Remarks
The fl=<field_list> Solrsearchoptionisaddedtothe <options> parameterofthe gptext.search() functiontorequestadditionalstoredfields.Theadditionalfieldsarereturnedintheresultsinacolumnnamed rf ( meta forexternalindexes).Thiscolumnvaluehasaformatlikethefollowing:
column_value{name:"_version"value:"1544714234398507008"}column_value{name:"revision"value:"9.70"}column_value{name:"author"value:"jdough"}
The gptext.gptext_retrieve_field() functionextractsthevalueforasinglespecifiedfieldandreturnsitasatextvalue.Ifthereisnofieldwiththespecifiednameinthe rf column,itreturnsNULL.
Storingadditionalfieldsinanindexrequiresediting managed-schema tospecifythefieldsthatshouldbestored.SeeStoringAdditionalFieldsinanIndexforinstructions.
gptext.gptext_retrieve_field_int()Retrievesasinglefieldfromthe rf or meta searchresultcolumnasanintegervalue.
©CopyrightPivotalSoftware,Inc,2013-2018 96 2.4.0
Syntax
gptext.gptext_retrieve_field_int(rf|meta,<field_name>)
Parameters
rf | meta
Thenameofthecolumncontainingfieldstoberetrieved.ForregularGPTextindexes,itis rf .ForGPTextexternalindexesitis meta .<field_name>
Thenameoftheintegerfieldtoretrieve.
Remarks
The gptext.gptext_retrieve_field_int() functionisthesameasthe gptext.gptext_retrieve_field() function,exceptthattheextractedfieldvalueisconvertedtoanintegervalue.
gptext.gptext_retrieve_field_float()Retrievesasinglefieldfromthesearchresultcolumnasafloatvalue.
Syntax
gptext.gptext_retrieve_field_float(rf|meta,<field_name>)
Parameters
rf | meta
Thenameofthecolumncontainingfieldstoberetrieved.ForregularGPTextindexes,itis rf .ForGPTextexternalindexesitis meta .<field_name>
Thenameofthefloatfieldtoretrieve.
Remarks
The gptext.gptext_retrieve_field_float() functionisthesameasthe gptext.gptext_retrieve_field() function,exceptthattheextractedfieldvalueisconvertedtoafloatvalue.
gptext.highlight()Highlightstermsbyinsertingmarkuptagsintodata.
Syntax
gptext.highlight(<column_data>,<column_name>,<offsets>)
Parameters
<column_data>
Thetextdatafromthetablewhichistobetaggedwithhighlightingtags.<column_name>
Thenameofthecorrespondingcolumnfromthetable.
©CopyrightPivotalSoftware,Inc,2013-2018 97 2.4.0
<offsets>
A gptext hstore valuethatcontainskey-valuepairsthatindicatethelocationsofthetexttohighlightwithinthecolumndata.SeeRemarksforinformationaboutthe gptext hstore datatype.
PrequisiteTousehighlighting,termvectorsmustbeenabledbeforecreatingtheindex.Toenabletermvectors,call gptext.enable_terms() foreachfieldwhereyouwanttoenableterms,thenindexorre-indexwith gptext.index() .
RemarksThe offsets parameterisa gptexthstore ,whereeachkeyisacolumnnameandthevalueisacomma-separatedlistofoffsetsintothecolumndata.Thishstore isconstructedbygptext.search()withhighlightingenabledinthe offsets parameter.
Followingisanexampleofthe offsets hstore content:
"field1"=>"0:5,9:14","field2"=>"13:20"
gptext.highlight() willinserttwosetsoftagsintothe field1 dataandonesetintothe field2 dataattheindicatedoffsets.
The gptexthstore typeisalimitedformofthePostgres hstore type.Ithasonlythe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in
and gptext_hstore_out .
Thehighlighttagsaredefinedbythe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationparameters.Theirdefaultvaluesare <em> and</em> ,respectively.
Example
gptext.highlight_external()Highlightstermsinsearchresultsfromexternalindexesbyinsertingmarkuptags.
Syntax
gptext.highlight_external(<table_exp>,<index>,<search_query>,<filter_queries>[,<options>])
Parameters
<table_exp>
Atableexpression,ignoredforexternalindexes.Anexpressionsuchas TABLE(SELECT 1 SCATTER BY 1) issufficient.<index>
Nameoftheindexcontainingdatatohighlight.<search_query>
TextvaluecontainingaSolrtextsearchquery.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<options
Anoptionalampersand-delimitedlistofSolrqueryparameters.See Solr options .
Remarks
The gptext.highlight_external() functionsearchesaGPTextexternalindexandenclosesthesearchtermsinmarkuptagsinthereturnedresults.
Example
©CopyrightPivotalSoftware,Inc,2013-2018 98 2.4.0
Searchforandhighlighttheterms“zookeeper”and“solr”inHTMLdocuments.
=#SELECTid,contentFROMgptext.highlight_external(TABLE(SELECT1SCATTERBY1),'gptext-docs','{!gptextqp}zookeeperANDsolr','{content_type:*html*}','rows=2');-[RECORD1]-------------------------------------------------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/administering.htmlcontent|includessecurityconsiderations,monitoring<em>Solr</em>indexstatistics,managingandmonitoring<em>ZooKeeper</em>-[RECORD2]-------------------------------------------------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/performance.htmlcontent|problemscanresultfromresourcecontentionbetweentheGreenplumDatabase,<em>Solr</em>,and<em>ZooKeeper</em>clusters-[RECORD3]-------------------------------------------------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/ha.htmlcontent|/topics/utility_ref.htmlGPTextManagementUtilities|rect/210/topics/type_ref.htmlGPTextand<em>Solr</em>-[RECORD4]-------------------------------------------------------------------------------------------------------------------------id|http://gptext.docs.pivotal.io/latest/topics/indexes.htmlcontent|/topics/utility_ref.htmlGPTextManagementUtilities|rect/210/topics/type_ref.htmlGPTextand<em>Solr</em>
FacetedSearchFacetingbreaksupasearchresultintomultiplecategories,showingcountsforeach.
gptext.faceted_field_search()The faceted_field_search() functionbreakssearchresultsintofieldnamecategories.
Syntax
gptext.faceted_field_search(<index_name>,<query>,<filter_queries>,<facet_fields>,<facet_limit>,<minimum>,<options>)
Parameters
<index_name>
Thenameoftheindex.<query>
Querystatement.Use *:* toqueryforallresults.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<facet_fields>
Anarrayoffieldnamestofacet.UseGreenplumDatabasearraynotation.<facet_limit>
Maximumnumberofresultstobereturnedforeachaggregation(facet).<minimum>
Minimumnumberofresultsrequiredbeforeanaggregation(facet)willbereturned.Enter0toreturnallfacets.<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.facet_field_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
field_name text
©CopyrightPivotalSoftware,Inc,2013-2018 99 2.4.0
field_value textvalue_count bigintColumn Type
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
ExamplesFacetalltweetson spam and truncated fields.
=#SELECT*FROMgptext.faceted_field_search('demo.twitter.message','*:*',null,'{spam,truncated}',2,0);field_name|field_value|value_count------------+-------------+-------------spam|true|1730truncated|false|1705truncated|true|25(3rows)
Faceton author_id ,nolimit,withaminimumoffivetweets,alltweets.Selectsfiveauthorswithatleasttwotweets.
=#SELECT*FROMgptext.faceted_field_search('demo.twitter.message','*:*',null,'{author_id}',5,2);field_name|field_value|value_count------------+-------------+-------------author_id|102185050|9author_id|202305785|2author_id|64111799|2author_id|45326213|2author_id|195035308|2(5rows)
gptext.faceted_query_search()The faceted_query_search() functionbreakssearchresultsintocategoriesdefinedbyqueriesthatyouprovide.
Syntax
gptext.faceted_query_search(<index_name>,<query>,<filter_queries>,<facet_queries>,<options>)
Parameters
<index_name>
Thenameoftheindex.<query>
Querystatement.Use *:* toqueryforallresults.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<facet_queries>
Type:text[].Required.Anarrayoffacetqueries.<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
©CopyrightPivotalSoftware,Inc,2013-2018 100 2.4.0
Returntype
SETOFgptext.facet_query_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
query_name text
value_count bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
Example
ThisexampleusesSolrqueriestodividetwitterauthorsintothreeclassesbasedonnumberoffollowers.
=#SELECT*FROMgptext.faceted_query_search('demo.twitter.message','*:*',null,'{author_followers_count:[0TO5],author_followers_count:[6TO10],author_followers_count:[11TO*]}');query_name|value_count----------------------------------+-------------author_followers_count:[0TO5]|39author_followers_count:[11TO*]|1632author_followers_count:[6TO10]|36(3rows)
gptext.faceted_range_search()The faceted_range_search() functionbreakssearchresultsintorangecategoriesoveranumericordatefield,withrangesdefinedbythe <range_start> ,<range_end> ,and <range_gap> arguments.
Syntax
gptext.faceted_range_search(<index_name>,<query>,<filter_queries>,<field_name>,<range_start>,<range_end>,<range_gap>,<options>)
Parameters
<index_name>
Thenameoftheindex.<query>
Querystatement.Use *:* toqueryforallresults.<filter_queries>
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<field_name>
Thenameofthefieldonwhichtofacet.<range_start>
Beginningoftherange.<range_end>
©CopyrightPivotalSoftware,Inc,2013-2018 101 2.4.0
Endoftherange.<range_gap>
Sizeofrangeincrement,atextvalue.<options>
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.facet_range_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
field_name text
range_value text
value_count bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Example
FacetondaterangefrommidnightAugust1,2011tomidnightNovember1,2011,witha7-daygap.
=#SELECT*FROMgptext.faceted_range_search('demo.twitter.message','*:*',null,'created_at','2011-08-01T00:00:00Z','2011-11-01T00:00:00Z','+7DAY');field_name|range_value|value_count------------+----------------------+-------------created_at|2011-08-01T00:00:00Z|0created_at|2011-08-08T00:00:00Z|0created_at|2011-08-15T00:00:00Z|0created_at|2011-08-22T00:00:00Z|52created_at|2011-08-29T00:00:00Z|189created_at|2011-09-05T00:00:00Z|545created_at|2011-09-12T00:00:00Z|0created_at|2011-09-19T00:00:00Z|109created_at|2011-09-26T00:00:00Z|69created_at|2011-10-03T00:00:00Z|59created_at|2011-10-10T00:00:00Z|206created_at|2011-10-17T00:00:00Z|147created_at|2011-10-24T00:00:00Z|112created_at|2011-10-31T00:00:00Z|94(14rows)
WorkingwithTerms
gptext.terms()GetsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.Youcanuse gptext.terms() tocreatetables.
Syntax
gptext.terms(<src_table>,<index_name>,<field_name>,<search_query>,<filter_queries>[,<options>])
©CopyrightPivotalSoftware,Inc,2013-2018 102 2.4.0
Parameters
<src_table>
An anytable valuethatspecifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.Specifyintheformat:
TABLE(SELECT*FROM<src_table>;)
<index_name>
Thenameoftheindextoqueryforterms.<field_name>
Thenameofthefieldtoqueryforterms.<search_query>
Aquerythatthefieldmustmatch.<filter_queries>
Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .<options>
Anoptional,comma-delimitedlistofSolrqueryparameters.
Returntype
SETOFgptext.term_info
Thisisacompositetypewiththefollowingcolumns:
Column Type
id text
term text
positions integer[]
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
RemarksToenableusing gptext.terms() ,executetheGPTextfunction gptext.enable_terms() ,thenreindexwith gptext.index() .
The TABLE queryisplannedandaffectstheestimateofof gptext.terms() ,butdoesnotexecute.Forexample,ifyourqueryincludes:
gptext.terms(TABLE(SELECT*FROMt),...)
Thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.
Examples
Thisexamplecreatesatermstable.
=#CREATETABLEtwitter.termsASSELECT*FROMgptext.terms(TABLE(SELECT*FROMtwitter.messageSCATTERBY1),'demo.twitter.message','message_text','iphone',null)DISTRIBUTEDBY(id);SELECT5385
ConfigurationandMonitoringIndexconfigurationandmonitoringfunctionsenablemanagingindexes,trackingindexstatistics,checkingstatusofindexsegments,andensuringthat
©CopyrightPivotalSoftware,Inc,2013-2018 103 2.4.0
indexcontentsarecurrent.
gptext.cluster_status()ShowsSolrclusterstatus.
Syntax
gptext.cluster_status()
ReturnType
SETOFgptext.cluster_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
index_name text
max_shards_per_node integer
router text
replication_factor integer
auto_add_replicas boolean
znode_version integer
config_name text
partitioned boolean
Example
=#SELECT*FROMgptext.cluster_status();index_name|max_shards_per_node|router|replication_factor|auto_add_replicas|znode_version|config_name|partitioned-------------------------+---------------------+----------+--------------------+-------------------+---------------+-------------------------+-------------demo.twitter.message|4|implicit|2|f|8|demo.twitter.message|tdemo.wikipedia.articles|4|implicit|2|f|8|demo.wikipedia.articles|f(2rows)
gptext.config_append()AppendsthecontentsofalocalfiletoaZooKeeperindexconfigurationfile.
Syntax
gptext.config_append(<index_name>,<local_config_file>[,<index_config_file>])
Parameters
<index_name>
Thenameoftheindextoconfigure.<local_config_file>
©CopyrightPivotalSoftware,Inc,2013-2018 104 2.4.0
Thepathandfilenameofalocalfilethatyouwillappendtotheindexconfigurationfile.<index_config_file>
Optional.ThenameoftheZooKeeperconfigurationfiletowhichyouwillappendthelocalfile.Ifyouomitthisparameter,thefunctionappendsthelocalfiletoafileofthesamenamethatresidesinthetop-levelZooKeeperdirectory.
ReturnType
boolean
Example
Appendthelocalfile /home/gpadmin/stopwords.add tothetop-levelZooKeeperfile stopwords.txt forindex demo.wikipedia.articles :
=#SELECT*FROMgptext.config_append('demo.wikipedia.articles','/home/gpadmin/stopwords.add','stopwords.txt');config_append---------------t(1row)
gptext.config_delete()DeletesanindexconfigurationfilefromZooKeeper.
Syntax
gptext.config_delete(<index_name>,<index_config_file>)
Parameters
<index_name>
Thenameoftheindexthathasthefiletodelete.<index_config_file>
TheZooKeeperconfigurationfiletodelete.Includethepathifthefiledoesnotresideatthetop-leveldirectory.
ReturnType
boolean
Example
Deletethefilenamed stopwords.add fromthetop-levelconfigurationdirectoryfortheindex demo.wikipedia.articles :
=#select*fromgptext.config_delete('demo.wikipedia.articles','stopwords.add');config_delete---------------t(1row)
gptext.config_get()DisplaysthecontentsofaZooKeeperindexconfigurationfile.
©CopyrightPivotalSoftware,Inc,2013-2018 105 2.4.0
Syntax
gptext.config_get(<index_name>,<index_config_file>)
Parameters
<index_name>
Thenameoftheindexthathasthefileyouwanttodisplay.<index_config_file>
TheZooKeeperconfigurationfiletodisplay.Includethepathifthefiledoesnotresideatthetop-levelZooKeeperdirectoryfortheindex.
ReturnType
text
Example
Displaythecontentsof synonyms.txt fortheindex demo.wikipedia.article :
=#select*fromgptext.config_get('demo.wikipedia.articles','synonyms.txt');config_get----------------------------------------------------------------------------#TheASFlicensesthisfiletoYouundertheApacheLicense,Version2.0#(the"License");youmaynotusethisfileexceptincompliancewith#theLicense.YoumayobtainacopyoftheLicenseat##http://www.apache.org/licenses/LICENSE-2.0##Unlessrequiredbyapplicablelaworagreedtoinwriting,software#distributedundertheLicenseisdistributedonan"ASIS"BASIS,#WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied.#SeetheLicenseforthespecificlanguagegoverningpermissionsand#limitationsundertheLicense.
#-----------------------------------------------------------------------#sometestsynonymmappingsunlikelytoappearinrealinputtextaaa=>aaaabbb=>bbbb1bbbb2ccc=>cccc1,cccc2a\=>a=>b\=>ba\,a=>b\,bfooaaa,baraaa,bazaaa
#SomesynonymgroupsspecifictothisexampleGB,gib,gigabyte,gigabytesMB,mib,megabyte,megabytesTelevision,Televisions,TV,TVs#noticeweuse"gib"insteadof"GiB"soanyWordDelimiterFiltercoming#afteruswon'tsplititintotwowords.
#Synonymmappingscanbeusedforspellingcorrectiontoopixima=>pixma
(1row)
gptext.config_list()ListstheZooKeeperconfigurationfilesanddirectoriesforanindex.
Syntax
gptext.config_list(<index_name>,[<index_config_path>,]<is_recursive>)
©CopyrightPivotalSoftware,Inc,2013-2018 106 2.4.0
Parameters
<index_name>
Thenameoftheindexthathasthefilesanddirectoriesyouwanttolist.<index_config_path>
Optional.AspecificdirectoryintheZooKeeperconfigurationthatyouwanttolist.Omitthisoptiontolistconfigurationfilesanddirectoriesinthetop-leveldirectory.
<is_recursive>
Optional.Abooleanvaluethatdetermineswhetherthefunctionrecursivelylistsfilesanddirectoriesthatarepresentinsubdirectories.
ReturnType
SETOFtext
Examples
ListZooKeeperconfigurationfilesanddirectoriesonlyinthetop-leveldirectoryfortheindex demo.wikipedia.articles :
=#select*fromgptext.config_list('demo.wikipedia.articles',false);config_list-----------------------------currency.xmlmapping-FoldToASCII.txtmanaged-schemaprotwords.txtscripts.confsynonyms.txtmanaged_schemastopwords.txtvelocityadmin-extra.htmlaggconfig.xmlemoticons.txtsolrconfig.xmlelevate.xmlxsltmapping-ISOLatin1Accent.txtspellings.txtlang(18rows)
ListZooKeeperconfigurationfilesintheZooKeeper lang subdirectoryfor demo.wikipedia.articles :
=#select*fromgptext.config_list('demo.wikipedia.articles','lang',false);config_list--------------------------lang/contractions_it.txtlang/contractions_ca.txtlang/stemdict_nl.txtlang/stopwords_hy.txtlang/stopwords_no.txtlang/stopwords_id.txt[...](39rows)
Listallconfigurationfilesanddirectoriesfor demo.wikipedia.articles :
©CopyrightPivotalSoftware,Inc,2013-2018 107 2.4.0
=#select*fromgptext.config_list('demo.wikipedia.articles',true);config_list----------------------------------currency.xmlmapping-FoldToASCII.txtmanaged-schemaprotwords.txtscripts.confsynonyms.txtmanaged_schemastopwords.txtvelocityvelocity/doc.vmvelocity/suggest.vmvelocity/hit.vm[...](86rows)
gptext.config_upload()UploadsanindexconfigurationfiletoZooKeeper,replacinganyexistingfileofthesamename.
Syntax
gptext.config_upload(<index_name>,<local_config_file>[,<index_config_file>])
Parameters
<index_name>
Thenameoftheindextoconfigure.<local_config_file>
ThepathandfilenameofalocalfilethatyouwantouploadtoZooKeeperfortheindex.Thefunctionuploadsthisfiletoafilethesamenameinthetop-levelZooKeeperdirectoryfortheindex,unlessyouincludethe <index_config_file> optiontochangethepathorfilename.
<index_config_file>
Optional.ThedestinationpathforthefileinZooKeeper.Ifyouomitthisparameter,thefunctionuploadsthelocalfiletothetop-levelZooKeeperdirectoryfortheindex.
ReturnTypes
boolean
Examples
Uploadthelocalfile /home/gpadmin/stopwords.txt toZooKeeper,overwritingtheexisting stopwords.txt filefortheindex demo.wikipedia.articles :
=#select*fromgptext.config_upload('demo.wikipedia.articles','/home/gpadmin/stopwords.txt');config_upload---------------t(1row)
Uploadthelocalfile /home/gpadmin/stopwords_japanese.txt toZooKeeper,overwritingthefile lang/stopwords_ja.txt fortheindex demo.wikipedia.articles :
#select*fromgptext.config_upload('demo.wikipedia.articles','/home/gpadmin/stopwords_japanese.txt','lang/stopwords_ja.txt');config_upload---------------t(1row)
©CopyrightPivotalSoftware,Inc,2013-2018 108 2.4.0
gptext.index_size()ShowsthenumberofdocumentsindexedandtotaldiskspaceusedforGPTextindexes.
Syntax
gptext.index_size([<index_name>])
Parameters
<index_name>
Thenameoftheindex.Optional.Returnssizesforallindexesifnoindexisspecified.
ReturnTypes
SETOFgptext.index_size_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
index_name text
num_docs integer
size_in_bytes bigint
Examples
=#SELECT*FROMgptext.index_size();index_name|num_docs|size_in_bytes-------------------------+----------+---------------demo.wikipedia.articles|23|500515demo.twitter.message|1730|767118gptext-docs|16|618231(3rows)
=#SELECT*FROMgptext.index_size('demo.wikipedia.articles');index_name|num_docs|size_in_bytes-------------------------+----------+---------------demo.wikipedia.articles|23|500515(1row)
gptext.index_status()Showsstatusofreplicasforaspecifiedindexorforallindexes.
Syntax
gptext.index_status([<index_name>])
Parameters
<index_name>
Thenameoftheindex.Optional.Returnsstatusforallindexesifnoindexisspecified.
©CopyrightPivotalSoftware,Inc,2013-2018 109 2.4.0
ReturnType
SETOFgptext.index_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
content_id smallint
index_name text
shard_name text
shard_state text
replica_name text
replica_state text
core text
node_name text
base_url text
is_leader boolean
partitioned boolean
external_index boolean
Examples1. Showstatusforasingleindex.
=#SELECT*FROMgptext.index_status('demo.wikipedia.articles');content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned|external_index------------+-------------------------+------------+-------------+--------------+---------------+-----------------------------------------+-------------------+--------------------------+-----------+-------------+----------------0|demo.wikipedia.articles|shard0|active|core_node1|active|demo.wikipedia.articles_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|f0|demo.wikipedia.articles|shard0|active|core_node4|active|demo.wikipedia.articles_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|f1|demo.wikipedia.articles|shard1|active|core_node2|active|demo.wikipedia.articles_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|f1|demo.wikipedia.articles|shard1|active|core_node3|active|demo.wikipedia.articles_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|f(4rows)
2. ShowstatusforallGPTextindexes.
©CopyrightPivotalSoftware,Inc,2013-2018 110 2.4.0
=#SELECT*FROMgptext.index_status();content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned|external_index------------+-------------------------+------------+-------------+--------------+---------------+-----------------------------------------+-------------------+--------------------------+-----------+-------------+----------------0|demo.store.products|shard0|active|core_node1|active|demo.store.products_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|t|f|f0|demo.store.products|shard0|active|core_node2|active|demo.store.products_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|f|f|f1|demo.store.products|shard1|active|core_node3|active|demo.store.products_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|f|f|f1|demo.store.products|shard1|active|core_node4|active|demo.store.products_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|t|f|f0|demo.twitter.message|shard0|active|core_node2|active|demo.twitter.message_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|t|f0|demo.twitter.message|shard0|active|core_node3|active|demo.twitter.message_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|t|f1|demo.twitter.message|shard1|active|core_node1|active|demo.twitter.message_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|t|t|f1|demo.twitter.message|shard1|active|core_node4|active|demo.twitter.message_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|f|t|f0|demo.wikipedia.articles|shard0|active|core_node1|active|demo.wikipedia.articles_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|f0|demo.wikipedia.articles|shard0|active|core_node4|active|demo.wikipedia.articles_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|f1|demo.wikipedia.articles|shard1|active|core_node2|active|demo.wikipedia.articles_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|f1|demo.wikipedia.articles|shard1|active|core_node3|active|demo.wikipedia.articles_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|f1|gptext-docs|shard1|active|core_node2|active|gptext-docs_shard1_replica1|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|t1|gptext-docs|shard1|active|core_node3|active|gptext-docs_shard1_replica2|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|t2|gptext-docs|shard2|active|core_node1|active|gptext-docs_shard2_replica1|gpdb51:18983_solr|http://gpdb51:18983/solr|t|f|t2|gptext-docs|shard2|active|core_node4|active|gptext-docs_shard2_replica2|gpdb51:18984_solr|http://gpdb51:18984/solr|f|f|t(16rows)
gptext.live_nodes()ListsactiveSolrnodesandtheirupordownstate.
Syntax
gptext.live_nodes()
ReturnType
SETOFgptext.live_nodes_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
host text
port bigint
data_dir text
status text
Example
©CopyrightPivotalSoftware,Inc,2013-2018 111 2.4.0
=#SELECT*FROMgptext.live_nodes();host|port|data_dir|status--------+-------+---------------------+--------gpdb51|18983|/data/gpdata1/solr0|ugpdb51|18984|/data/gpdata2/solr0|u(2rows)
Remarks
Thestatuscolumncanbe u (up)or d (down).
gptext.partition_status()ListsindexesonpartitionedtablesorchildpartitionnamesinthecurrentGreenplumdatabase.
Syntax
gptext.partition_status([<index_name>])
Parameters
<index_name>
Optional.Returnspartitionstatusforallindexesifnoindexisspecified.
ReturnType
SETOFgptext.partition_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
partition_name text
inherits_name text
level integer
cons text
Example
Listpartitionstatusforanindex.
=#SELECTpartition_name,inherits_name,levelFROMgptext.partition_status('demo.twitter.message');partition_name|inherits_name|level------------------------------+----------------------+-------demo.twitter.message_1_prt_1|demo.twitter.message|1demo.twitter.message_1_prt_2|demo.twitter.message|1demo.twitter.message_1_prt_3|demo.twitter.message|1demo.twitter.message_1_prt_4|demo.twitter.message|1(4rows)
Remarks
The gptext.partition_status() functioncanonlylisttheindexpartitionsfortablesinthecurrentGreenplumdatabase.
©CopyrightPivotalSoftware,Inc,2013-2018 112 2.4.0
gptext.reload_index()ReloadsSolrconfigurationfilesiftheyhavebeenmodified.
Syntax
gptext.reload_index(<index_name>)
Parameters
<index_name>
Optional.Thenameoftheindexforwhichtoreloadtheconfigurationfiles.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
None.
Example
=#SELECT*FROMgptext.reload_index('demo.wikipedia.articles');reload_index--------------t(1row)
gptext.version()ReturnstheversionofyourGPTextinstallation.
Syntax
SELECT*FROMgptext.version()
Parameters
None.
Returntype
text
©CopyrightPivotalSoftware,Inc,2013-2018 113 2.4.0
Privilegesrequired
Youdonotneedanyprivilegestoexecutethisfunction.
Example
=#SELECT*FROMgptext.version();version--------------------------------GreenplumTextAnalytics2.1.3(1row)
gptext.zookeeper_hosts()ReturnsalistofZooKeeperhostsandports.
Syntax
gptext.zookeeper_hosts()
Returntype
text
Remarks
Thisfunctionreturnsacomma-separatedlistofZooKeepernodesinthetheformat <host-name>:<port> .
Example
=#SELECT*FROMgptext.zookeeper_hosts()host|port--------+------gpdb51|2188gpdb51|2189gpdb51|2190(3rows)
HighAvailability
gptext.add_replica()Addsareplicaofanindexshard.
Syntax
gptext.add_replica(<index_name>,<shard_name>[,<node_name>])
Parameters
©CopyrightPivotalSoftware,Inc,2013-2018 114 2.4.0
<index_name>
Nameoftheindex.Iftheindexisforapartitioneddatabasetable,thismustbethenameoftheroottable.<shard_name>
Nameoftheshardtoreplicate.<node_name>
Nameofthenodewherethereplicaistobeadded.Optional.Ifomitted,SolrCloudchoosesthenode.
Returntype
boolean
Remarks
ThisfunctionisusedbytheGPTextmanagementutility gptext-replicaadd .
Thevalueofthe gptext.replication_factor configurationparameterwhenanindexiscreateddetermineshowmanyreplicasarecreatedforeachshard.InaGreenplumsystem,therearethesamenumberofshardsasthereareGreenplumsegments.Thenumberofreplicascreatedforanewindexisthenumberofsegmentstimesthevalueofthe gptext.replication_factor configurationparameter,2bydefault.ThereplicasaredistributedevenlyamongtheliveGPTextnodes.
Replicasconsumespaceonthehostwheretheyarecreated,sotheyareusuallyonlycreatedtoreplaceareplicathathasfailedorbecomeunavailableortorelocateareplicatoanotherGPTextinstance.Whenaddingreplicas,youshouldmaintainequaldistributionofreplicasamongtheGPTextnodesandavoidplacingmultiplereplicasforthesameshardonthesamehost.
ThetotalnumberofreplicasforanindexthatcanbeplacedoneachGPTextnodeissetwhentheindexiscreated.(InSolr,thisisthe MaxShardsPerNode
parameter.)GPTextsetsthislimitbycalculatingthenumberofreplicastocreatepernodeandaddinganadditionalfactor,specifiedinthegptext.extension_factor serverconfigurationparameter.Thisparametercanbesetbetween0and10;thedefaultvalueis2.Sincethelimitissetwhenthe
indexiscreated,itisrecommendedtosetthe gptext.extension_factor parametertoahighernumbertoallownewreplicastobecreatedwhennecessary.
Example
=#SELECT*FROMgptext.add_replica('demo.wikipedia.articles','shard1');success|core_name---------+-----------------------------------------t|demo.wikipedia.articles_shard1_replica3(1row)
gptext.delete_replica()Deletesanamedreplicafromthespecifiedindexandshard.
Syntax
gptext.delete_replica(<index_name>,<shard_name>,<replica_name>[,<only_if_down>])
Parameters
<index_name>
Nameoftheindex.<shard_name>
Nameoftheshardthatcontainsthereplicatodelete.<replica_name>
Nameofthereplicatoremove.<only_if_down>
Optional.Whentrue,noactionistakenifthereplicaisactive.Defaultisfalse.
©CopyrightPivotalSoftware,Inc,2013-2018 115 2.4.0
Returntype
boolean
Remarks
Usethe gptext.index_status() functiontofindthenameofthereplicatodrop.Namesareintheformat core_nodeX ,where X isanumber.
Thisfunctioniscalledfromthe gptext-replicadrop managementutility.
Examples1. Deletethe core_node5 replicaifitisdown.
=#SELECT*FROMgptext.delete_replica('demo.wikipedia.articles','shard1','core_node5',true);ERROR:Deletereplicafailed:Attemptedtoremovereplica:demo.wikipedia.articles/shard1/core_node5withonlyIfDown='true',butstateis'active'.
2. Deletethe core_node5 replicaevenifitisactive.
=#SELECT*FROMgptext.delete_replica('demo.wikipedia.articles','shard1','core_node5');success---------t(1row)
GeneralPurposeFunctions
gptext.count_t()Countsthenumberofrecordsinatable.
Syntax
gptext.count_t(<table_name>)
Parameters
<table_name>
Nameofthetableforwhichtocountrecords.
Returntype
integer
Privilegesrequired
YouneedSELECTprivilegeson <table_name> toexecutethisfunction.
Example
©CopyrightPivotalSoftware,Inc,2013-2018 116 2.4.0
=#SELECT*FROMgptext.count_t('demo.wikipedia.articles');count_t---------23(1row)
©CopyrightPivotalSoftware,Inc,2013-2018 117 2.4.0
GPTextManagementUtilitiesManagementutilitiesareGPTextcommand-lineutilitiesthatareusedtomanagetheGPTextcluster.TheutilitiesmustberunontheGreenplummasterasthegpadminuser.
ToensuretheGPTextcommand-lineutilitiescanbefoundonthepath,sourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/directory,enterthesecommands:
$source/usr/local/greenplum-db-<version>/greenplum_path.sh$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh
HelpTogethelpforautility,specifytheflag -h or --help .Ashorthelpmessagedisplayswithalistofparameters.
DebuggingTogetverboseoutputfordebuggingautility,specifytheflags -v or --verbose .
GPTextUtilitiesgptext-backup–backsupaGPTextindextoasharedfilesystem.
gptext-config–performsGPTextconfigurationoptions.
gptext-expand–addsnewGPTextnodestoexistinghostsinthecluster.
gptext-external–managesconfigurationsforexternaldatasources.
gptext-installsql–installsorremovesthegptextschemaanduser-definedfunctionsinGreenplumdatabases.
gptext-migrator–installstheGPTextbinariesintoanupgradedGreenplumDatabasesystem.
gptext-recover–restartsGPTextnodesthataredown.
gptext-replica–addsordropsareplicaofanindexshard.
gptext-restore–restoresaGPTextindexfromabackuponasharedfilesystem.
gptext-start–startsorrestartstheGPTextcluster.
gptext-state–displaythestateoftheGPTextclusterandindexes.
gptext-stop–shutsdowntheGPTextcluster.
gptext-uninstall–uninstallsGPText,includingdataandinstalledfiles,andZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.
gptext-upgrade-upgradesaGPTextsystemtoanewGPTextversion.
zkManager–checkstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.
gptext-backupBacksupaGPTextindextolocalstorageontheGreenplumDatabaseclusterortoasharedfilesystem.
Syntax
gptext-backup-h
gptext-backuplocal[-p<path>]-i<index>[-v]
gptext-backup-p<path>-i<index>[-n<name>][-v]
gptext-backup-c-p<path>-i<index>[-v]
©CopyrightPivotalSoftware,Inc,2013-2018 118 2.4.0
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-c
--backup_confBackupconfigurationfilesonly.The -c optioncannotbeusedwiththe local keywordorthe -noption.
local Savebackuptolocalstorage.
-p<path>
--path<path>
Ifthe local keywordisincluded,thisisthedirectorywheretheutilitycreatesthebackup.Ifnopathisspecified,backupfilesarecreatedintheGreenplumDatabasemasterandsegmentdatadirectories.Thebackupnameandlocationsofthebackupfilesarereportedintheoutputofthecommand.
Ifthe local keywordisomitted,thisisthepathonthesharedfilesystemwherethebackupwillbesaved.Thefilesystemmustbeaccessiblefromallhostsintheclusteranditmustbereadableandwritablebythegpadminuser.
-i<index_name>
--index<index_name> ThenameoftheGPTextindextobackup.
-n<backup_name>
--name<backup_name>Anameforthebackuponthesharedfilesystem.The -n optioncannotbeusedwiththe localkeywordorthe -c option.
NotesBackupanindexsothatyoucanrestoreittoadifferentGPTextsystemortoavoidhavingtoreindexiftheexistingindexbecomescorrupted.
AfullGPTextindexbackupincludesindexconfigurationfilesfromZooKeeperandindexdatauptothelasttransactioncommittedwiththegptext.commit_index() function.Eachindexshardisbackedupseparately.
Youcanoptionallybackupjusttheindexconfigurationfilesusingthe -c option.
YoucanbackupanindextoasharedfilesystemortolocalGreenplumDatabaseclusterstorage.
BackUptoLocalGreenplumDatabaseClusterStorage
Usethe gptext-backuplocal
commandtobackupaGPTextindextolocalstorage.
Forlocalbackups,theutilitygeneratesabackupnameintheformat <index-name>_<timestamp> ,forexample demo.wikipedia.articles_2018-05-07T17:13:32.064427 .The --name optionisnotallowedwithlocalbackups.
Onthemasterhost, gptext-backup createsaJSONfile, <backup-name>.json ,containingmetadataaboutthebackup,andadirectory, <backup-name> ,containingcopiesoftheZooKeeperconfigurationfilesfortheindex.ThedefaultbackupdirectoryonthemasterhostistheGreenplumDatabasemasterdatadirectory,specifiedbythe MASTER_DATA_DIRECTORY environmentvariable.
gptext-backuplocal
writesabackupfileforeachindexshardonthehostwiththeGPTextnodemanagingtheleadreplicafortheshard.Thesefileshave
namesintheformat snapshot.<index-name>_shard<n>_<timestamp> ,forexample snapshot.demo.wikipedia.articles_shard1_2018-05-07T17:13:32.064427 .Bydefault,thesefilesaresavedinthesegmentdatadirectories.
Youcanspecifyabackupdirectoryforthebackupwiththe --path ( -p )option.Thedirectorymustexistonallhostsintheclusterandbewritablebythegpadminuser.
©CopyrightPivotalSoftware,Inc,2013-2018 119 2.4.0
Thisexamplebacksupthe demo.wikipedia.articles indextothedefaultbackuplocations.
$gptext-backuplocal-idemo.wikipedia.articles20180504:12:35:07:006126gptext-backup:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterbackup.20180504:12:35:08:006126gptext-backup:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180504:12:35:10:006126gptext-backup:mdw:gpadmin-[INFO]:-Checkstatusofindexdemo.wikipedia.articles...20180504:12:35:10:006126gptext-backup:mdw:gpadmin-[INFO]:-Executingbackup...20180504:12:35:10:006126gptext-backup:mdw:gpadmin-[INFO]:-Processing......20180504:12:35:11:006126gptext-backup:mdw:gpadmin-[INFO]:-Recordingmetadataofindex"demo.wikipedia.articles"into"/data/gpmaster/gpseg-1/demo.wikipedia.articles_2018-05-04T12:35:10.594013.json"20180504:12:35:11:006126gptext-backup:mdw:gpadmin-[INFO]:-Backingupconfigurationofindex"demo.wikipedia.articles"into"/data/gpmaster/gpseg-1/demo.wikipedia.articles_2018-05-04T12:35:10.594013"20180504:12:35:12:006126gptext-backup:mdw:gpadmin-[INFO]:-Backup"demo.wikipedia.articles_shard*_2018-05-04T12:35:10.594013"islocatedin"/data/gpdata1/primary"oneachhost20180504:12:35:12:006126gptext-backup:mdw:gpadmin-[INFO]:-Done.
BackUptoaSharedFileSystem
Ifyoubackuptoasharedfilesystem,thesharedfilesystemmustbemountedonallhostswithGPTextnodesandmustbewritablebythegpadminuser.Thefilesystemcouldbe,forexample,anNFSmountoranSSHserverwithsshfssupport.Thefilesystemmustbeconfiguredandaccessiblebeforeyouexecutethe gptext-backup utility,anditmustacceptconnectionsfromeachhostinthecluster.
The gptext-backup utilitycreatesanewsubdirectoryatthespecifiedpathwiththebackupnamespecified.Thecommandfailsifthedirectoryalreadyexists.
Whenthebackupiscomplete,thebackupdirectorycontainsthefollowing:
backup.infoAtextfilecontainingthreecomma-separatedstrings:thedatabasename,schemaname,andindexnamefortheindexthatwasbackedup.
backup.propertiesAtextfilewithpropertiesthatdescribethebackup,suchasthedateandtimethebackupstarted,thenameofthebackup,andthenamesoftheSolrcollectionandcollectionconfiguration.
zk_backupAdirectorycontainingthefollowingfiles:
collection_state.json –aJSONfiledescribingthestatusoftheSolrcollection.
configs/<collection-name>/ –adirectorycontainingcopiesoftheSolrconfigurationfilesstoredinZooKeeperfortheindex,forexample managed-
schema , solrconfig.xml , protwords.txt , stopwords.txt .
snapshot.shard0…snapshot.shard_N_Adirectoryforeachshard,withthefilescontainingcontentoftheshard.
Ifthebackupfails—forexampleifthereisinsufficientdiskspace—anerrormessageisdisplayed,butthebackupdirectoryisnotremoved.Besuretoremovethebackupdirectorybeforerestartingthebackup.
Thefollowingexamplebacksupthe demo.twitter.message indextothe twitter subdirectoryonthe /mnt/nas share.
$gptext-backup-idemo.twitter.message-p/mnt/nas-ntwitter20180508:16:34:02:027794gptext-backup:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterbackup.20180508:16:34:03:027794gptext-backup:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180508:16:34:03:027794gptext-backup:mdw:gpadmin-[INFO]:-Validatesharedfilesystem.20180508:16:34:06:027794gptext-backup:mdw:gpadmin-[INFO]:-Backupindex:demo.twitter.message,intosharedFS'/mnt/nas',asname:twitter.20180508:16:34:06:027794gptext-backup:mdw:gpadmin-[INFO]:-Processing.......20180508:16:34:08:027794gptext-backup:mdw:gpadmin-[INFO]:-Indexbackupsuccessfully.20180508:16:34:08:027794gptext-backup:mdw:gpadmin-[INFO]:-Done.
BackupConfigurationFilesOnly
The gptext-backup option -c ( --backup-conf )createsabackupofjusttheGPTextindexconfigurationfilesfromZooKeeper.Youcanusethe -p ( --path )optiontospecifythedirectorywherethebackupistobecreated.Ifyouomitthe -p option,thebackupiscreatedinthemasterdatadirectory($MASTER_DATA_DIRECTORY ).Theconfigurationfilesaresavedinadirectorywithanameintheformat <index-name>_<timestamp> .
Thisexamplecreatesabackupoftheconfigurationfilesforthe demo.wikipedia.articles index.Thefilesaresavedinthedefaultlocation.Thebackupname,whichyouwillneedtorestorethebackup,isreportedintheoutput.
©CopyrightPivotalSoftware,Inc,2013-2018 120 2.4.0
$gptext-backup-c-idemo.wikipedia.articles20180508:17:17:26:000781gptext-backup:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterbackup.20180508:17:17:27:000781gptext-backup:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180508:17:17:29:000781gptext-backup:mdw:gpadmin-[INFO]:-Recordingmetadataofindex"demo.wikipedia.articles"into"/data/gpmaster/gpseg-1/demo.wikipedia.articles_2018-05-08T17:17:29.197515.json"20180508:17:17:29:000781gptext-backup:mdw:gpadmin-[INFO]:-Backingupconfigurationofindex"demo.wikipedia.articles"into"/data/gpmaster/gpseg-1/demo.wikipedia.articles_2018-05-08T17:17:29.197515"20180508:17:17:30:000781gptext-backup:mdw:gpadmin-[INFO]:-Done.
gptext-configPerformsGPTextconfigurationtasks:
Edit,append,upload,orlistconfigurationfilesinZooKeeper
RevertconfigurationfilechangesinZooKeeper
EditJVMconfigurationoptions
UploadjarfilestotheGPTexthomedirectory
Syntax
gptext-config-h|--help
gptext-configedit-f<file_name>-i<index_name>[-r][-e]
gptext-configlist-i<index_name>[--recursive]
gptext-configupload-l<path/local_file_name>-f<path/zookeeper_file_name>[-i<index_name>]
gptext-configappend-l<local_append_file>-f<file_name>-i<index_name>
gptext-configjar-l<path/jar_file>
gptext-configjvm-o<jvm_options>
ParametersThe -f parameterisoptionalwith gptext-configappend and gptext-config
upload.Ifyouomit -f withthe append command,thenthelocalfileisappendedto
afileofthesamenameinthetop-levelZooKeeperdirectoryfortheindex.Ifyouomit -f withthe upload command,thentheutilityuploadsthelocalfiletoafileofthesamenameinthetop-levelZooKeeperdirectoryfortheindex.
Parameter Description
-h
--help Displaysausagemessageandexits.
-i<index-name>
--index=<index-name>Nameoftheindex.Iftheindexisforapartitionedtable,youmustspecifytherootpartitionname.
-f<filename>
--file=<filename>
ThenameofaZooKeeperconfigurationfiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seeConfiguringsolrconfig.xml attheApacheSolrwebsite).
schema.xml –DefinestheanalyzerchainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalyzerChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,wherelanguage_code isatwo-charactercodesuchas en , fr ,or es .
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalyzerchain.Forexample,<iPhone>.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalyzerchain.
©CopyrightPivotalSoftware,Inc,2013-2018 121 2.4.0
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalyzerchain(seegptext-start).
currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).
jar_file –thenameofajarfiletouploadto <GPText_Install_Directory>/lib/ .
-e<command>
--editor=<command>Editortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.
-l<filename>
Thefullpathofalocalfileto append or upload toaZooKeeperconfigurationfile.gptext-config append appendsthenamedfiletoaconfigurationfileanddistributestheresulting
files.Thisusesthe -f and -i parameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -l parameter.
gptext-configupload
uploadsthespecifiedlocalfiletoZooKeeper.SpecifythedestinationZooKeeperfile
namewiththe -f optionandspecifytheindexnamewiththe -i option.Ifyouomitthe -i optionyoumustsupplythefullpathtothefileinZooKeeperwiththe -f option,forexample-f/gptext/configs/demo.wikipedia.articles/managed-schema .
Whenusedwiththe gptext-confgjar command, -l mustspecifyalocaljarfiletouploadtothe<GPText_Install_Directory>/lib/ .
--recursive
Optional.Usewiththe gptext-config list commandtorecursivelylistallconfigurationfilesdirectoriesavailableinsubdirectories.Bydefault, gptext-config list displaysonlythoseindexconfigurationfilesanddirectorynamesinthetop-levelZooKeeperdirectoryforanindex.
-r
--revert<filename> RevertthenamedZooKeeperfiletothepreviousversion.
-o“<JVM_Options>”ModifiesJVMoptions.ToensurethattheJVMsarerestartedafterchangingJVMoptions,restarttheGPTextclusterusingthe gptext-stop and gptext-start utilities.
Parameter Description
NotesUsethe gptext-config utilitytomodify,add,orlistconfigurationfilesforaspecifiedindex.
Nevereditthetemplateconfigurationfiles.Ifyoudo,everyindexyoucreateaftereditingthetemplateswillbecreatedwithyourmodifiedversions.Usethe gptext-config utilitytoensurethatyouareeditingtheconfigurationfilesforyourindex,ratherthanthetemplateconfigurationfiles.
gptext-config automaticallyreindexesaftermodifyingfilesiftheconfigurationchangesrequireit.
Ifyouusethe -f ( --file )parameterto edit oneoftheindexconfigurationfiles,GPTextautomaticallyplacestheeditedfileinitsproperdirectory.
Tomoveanindexconfigurationfilefromthelocalfilesystemtotheindexconfigurationdirectoryinallofthesegments,usethe upload commandandspecifythelocalfilewiththe -l optionandthedestinationZooKeeperfilewiththe -f ( --file )option.
Examples1. Editthe managed-schema fileinindex demo.wikipedia.articles ,usingthevieditor:
gptext-configedit-fmanaged-schema-idemo.wikipedia.articles-evi
2. Appendthelocalfile stopwords.add to stopwords.txt inindex demo.wikipedia.articles :
gptext-configappend-lstopwords.add-fstopwords.txt-idemo.wikipedia.articles
3. Revertfile managed-schema inindex demo.wikipedia.articles aftereditingit.
©CopyrightPivotalSoftware,Inc,2013-2018 122 2.4.0
gptext-configedit-fmanaged-schema-idemo.wikipedia.articles-r
4. Uploadthelocalfile custom.txt totheZooKeeperfile custom.conf inindex demo.wikipedia.articles :
gptext-configupload-lcustom.txt-fcustom.conf-idemo.wikipedia.articles
5. Listallavailableconfigurationfilesfortheindex demo.wikipedia.articles :
gptext-configlist-idemo.wikipedia.articles--recursive
6. Uploadthejarfile text.jar tothe lib directoryintheGPTexthomedirectory:
gptext-configjar-ltext.jar
7. SetJVMoptions:
gptext-configjvm-o"-Xms256M-Xmx400M"
gptext-expandExpandsaGPTextclusterbyaddingnewGPTextnodestoexistinghostsinaGPTextclusterortohostsaddedbytheGreenplumDatabase gpexpandmanagementutility.ReplicasforindexescreatedafterthenewGPTextnodesareaddedwillbedistributedacrossthenewandexistingnodes.Documentsmustbereindexedtorebalancereplicasonexistinghostsor,afterexpandingtheGreenplumcluster,toredistributetheindextonewshards.
Synax
gptext-expand-h
gptext-expand-e-p<paths>[-d<database>][-v]
gptext-expand-H<new-hosts>[-d<database>][-v]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-e
--existingAddsGPTextnodestoexistinghostsintheGPTextcluster.The`-p`optionmustalsobesuppliedtospecifythedatadirectoriesforthenewnodes.
-p
--expand_paths
SpecifiespathstodirectorieswherethenewGPTextnodes’datadirectoriesaretobecreated.ThesedirectoriesshouldbeparalleltotheGreenplumDatabasesegmentdatadirectories.Ifthereismorethanonedirectory,placetheminacomma-delimitedlist,forexample-p /data1/nodes,/data1/nodes,/data2/nodes .Requiredwhenexpandingonexistinghosts.
-H
--new_hostsSpecifiesthenewhostsonwhichGPTextistobeinstalled.Placemultiplehostnamesinacomma-delimitedlist,forexample -H host1,host2,host3 .SeeNotesforrequirementsfornewhosts.
-d
--database
SpecifiesthenameofadatabasecontainingGPTextschema.Ifthe`gptext-expand`utilityfailstofindadatabasecontainingtheGPTextschemabecausetheusercannotaccessadatabase,usethisoptiontospecifyanaccessibledatabasethatcontainstheGPTextschema.
-v
--verbose Displaysdebugoutput.
©CopyrightPivotalSoftware,Inc,2013-2018 123 2.4.0
Parameter Description
NotesThe -p and -d optionscannotbeusedtogether.
WhennewhostsareaddedtotheGreenplumDatabasecluster,ensurethatthefollowingGPTextprerequisitesareinstalledbeforerunning gpexpand :
Java1.8Python2.6orgreaterLinux lsof utilityAllhostsintheclustermustbeabletoreachthenewandexistinghosts.
Existingreplicasarenotautomaticallyredistributed.TorebalancereplicasamongtheexpandedGPTextcluster,youmustreindex.
Whenexpandingtonewhosts,youmustreindextoredistributetheindexamongexistingandnewshards.
gptext-externalManagesconfigurationsinZooKeeperforexternaldocumentsources.
Syntax
gptext-external-h
gptext-externalupload-t<type>-c<config-name>-p<config-dir>[-d<database>][-v]
gptext-externallist-t<type>[-d<database>][-v]
gptext-externaldelete-t<type>-c<config-dir>[-d<database>][-v]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-t
--typeSpecifiesthetypeoftheexternaldocumentsource.Theonlytypescurrentlysupportedare 'ftp' and'hdfs' .
-c
--clusterAnameforthisexternaldocumentsourceconfiguration.Usethisnametoreferencetheconfigurationinthe gptext.external_login() function.
-p
--pathSpecifiesthepathtoadirectorycontainingtheconfigurationfilestouploadtoZooKeeper.SeetheNotessectionforalistoftherequiredconfigurationfiles.
-d
--database
SpecifiesthenameofadatabasecontainingtheGPTextschema.Ifthe`gptext-external`utilityfailstofindadatabasecontainingtheGPTextschemabecausetheusercannotaccessadatabase,usethisoptiontospecifyanaccessibledatabasethatcontainstheGPTextschema.
-v
--verboseDisplaysdebugoutput.
Notes
©CopyrightPivotalSoftware,Inc,2013-2018 124 2.4.0
Toindexdocumentsstoredinanexternaldocumentsourcethatrequiresauthentication,suchasaHadoopfilesystem(hdfs)orftpserver,youfirstuploadtheconfigurationandauthenticationfilesGPTextneedstoconnecttothedocumentsource.Assemblethefilesinalocaldirectoryandthenusethegptext-externalupload
commandtouploadthecontentsofthedirectorytoZooKeeper.
hdfsForhdfsconnections,createalocaldirectorycontainingthefollowingfiles.
The core-site.xml and hdfs-site.xml configurationfilesfromtheHadoopserver.
Afilenamed user.txt .ThisfilecontainsasinglelineidentifyingtheusernametousetologintoHadoop.Theusermusthavereadpermissioninhdfsforthedocumentsyouwanttoindex.IfKerberosisenabledintheHadoopcluster,theusernameisthenameoftheKerberosprincipalfortheuser.
IftheHadoopclusterissecuredwithKerberos,alsoincludetheuser’s keytab fileandthe krb5.conf filefortheKerberosrealm.
ftpForftpconnections,createalocaldirectorycontainingasinglefile, login.txt .
The login.txt filehasthreelines:
Line1:Thenameoftheusertologintotheftpserver.
Line2:Theuser’spassword.Entertheclear-textpasswordinthisfile.Thepasswordisbase64-encodedwhenGPTextstoresitinZooKeeper.
Line3:ThemaximumnumberofconnectionstocreatetotheFTPserver,aninteger.Ifthislineisomitted,eachGPTextnodeconnectstotheFTPserver.Ifthenumberofconnectionsexceedstheserver’smaximumallowedconnectionsGPTextFTPconnectionswillfail.
Uploadtheconfigurationfileswiththe gptext-externalupload
command:
$gptext-externalupload-t<type>-c<config-name>-p<config-dir>
Youcandeletethelogin.txtfileafteryouuploadtheconfigurationdirectorytoprotecttheftpuser’spassword.
Tomakechangestoconfigurationfiles,editthefileinthelocaldirectoryanduploaditagainwiththesamecommand.
Run gptext-externallist-t<type>
tolistconfigurationsofthespecifiedtype.
Run gptext-externaldelete-t<type>-c<config-name>
toremovetheconfigurationfromZooKeeper.
gptext-installsqlInstallsorremovesthegptextschemaanduser-definedfunctionsindatabases.
Syntax
gptext-installsql-h
gptext-installsql[-c][-v]<db_name>[<db2_name>...]
Parameters
Parameter Description
-c
--clean RemovesthegptextschemaandUDFsfromthespecifieddatabases.
-h
--help Displaysausagemessageandexits.
©CopyrightPivotalSoftware,Inc,2013-2018 125 2.4.0
-v
--verbose Displaysdebugoutput.
Parameter Description
NotesThe gptext schemaisreservedforusebyGPText.The gptext-installsql utilitydropsandrecreatestheschema.IfyouaddanydatabaseobjectstotheschematheywillbelostwhenyoureinstalltheschemaorupgradetheGPTextsystem.
The gptext schemacannotbeinstalledinthesystemdatabases postgres , template0 ,or template1 .
Examples1. InstallGPTextUDFsinthe gpadmin and demo databases.
$gptext-installsqlgpadmindemo20170927:11:06:11:024015gptext-installsql:gpdb:gpadmin-[INFO]:-InstallGPTextudf...20170927:11:06:11:024015gptext-installsql:gpdb:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasegpadmin...20170927:11:06:11:024015gptext-installsql:gpdb:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasedemo...20170927:11:06:12:024015gptext-installsql:gpdb:gpadmin-[INFO]:-Validatinggptextinstallation20170927:11:06:12:024015gptext-installsql:gpdb:gpadmin-[INFO]:-Done.
2. DeleteGPTextUDFsindatabase gpadmin .
$gptext-installsql--cleangpadmin20170927:11:10:34:024325gptext-installsql:gpdb:gpadmin-[INFO]:-CleanGPTextudf...20170927:11:10:35:024325gptext-installsql:gpdb:gpadmin-[INFO]:-Connectingtodatabasegpadmin20170927:11:10:35:024325gptext-installsql:gpdb:gpadmin-[INFO]:-Dropping'gptext'schemaandUDFs...20170927:11:10:35:024325gptext-installsql:gpdb:gpadmin-[INFO]:-Validatingcleanoperation20170927:11:10:35:024325gptext-installsql:gpdb:gpadmin-[INFO]:-Done.
gptext-migratorMigratesthecurrentGPTextsystemintoanupgradedGreenplumDatabasecluster.
Syntax
gptext-migrator[-h|--help]
gptext-migrator[-v|--verbose]
NotesThe gptext-migrator utilityrelocatesthecurrentGPTextsystemtoanewGreenplumDatabaserelease.
TheutilitydeterminesthedestinationGreenplumDatabasereleasefromtheenvironment.IftheGPTextsystemhasalreadybeenmigrated,orifthedestinationGreenplumreleaseisunsupported, gptext-migrator outputsamessageandquits.
IfyouareupgradingGPTextandGreenplumDatabaseatthesametime,completetheGreenplumDatabaseupgradefirst,andthenuse gtext-migrator toaddthecurrentGPTextversiontothenewGreenplumDatabaseinstallation.Finally,use gptext-upgrade toupgradethesystemtothenewGPTextversion.
gptext-recoverRecoversGPTextnodes.
©CopyrightPivotalSoftware,Inc,2013-2018 126 2.4.0
Syntax
gptext-recover-h
gptext-recover-f[-v]
gptext-recover-H<new-host1>,<new-host2>,...[-v]
gptext-recover-r[-v]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-f
--forceForcesrecoveryforanyGPTextnodesthataredown.Ifthenodeisunrecoverable,deletesthenode,createsanewnode,andrecreatesreplicas.
-H
--new-hosts
Recoverdownnodesonnewhosts.Forexample“host1,host2”.Thenumberofnewhostsmustbeequaltothenumberoffailedhosts.
-r
--index_replicas Recoverreplicas,butdonotrecoveranydownnodes.
-v
--verbose Displaysdebugoutput.
NotesThe -f and -H optionscannotbeusedatthesametime.
Ifshardsaredown, gptext-recover advisesyoutoreindex.
Ifnoshardsaredown, gptext-recover restoresanyreplicasthataredown.
IfanyGPTextnodesrecoveredusingthe -f or -H optionsfailtostart,thereplicascannotberecovered.Ifthisshouldhappen,resolvethestartupproblemwiththenewlycreatednodes,andthenrecoverthereplicasusingthe gptext-recover-
roption.ItisimportanttorecoverreplicaswhenallGPText
nodesarehealthysothatreplicaswillbedistributedevenlyamongthenodes.
gptext-replicaAddordeleteareplicaforanindexshard.
Syntax
gptext-replica-h
gptext-replicaadd-i<index-name>-s<shard>[-n<node>]
gptext-replicadrop-i<index-name>-s<shard>-r<replica>[-o]
©CopyrightPivotalSoftware,Inc,2013-2018 127 2.4.0
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-i<index-name>
--index=<index-name> Nameoftheindex.
-f<filename>
--file=<filename>
Thenameofafiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seeSolrConfigXml
).
schema.xml –DefinestheanalyzerchainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalyzerChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,wherelanguage_code isatwo-charactercodesuchas en , fr ,or es .
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalyzerchain.Forexample,<iPhone>.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalyzerchain.
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalyzerchain.Seegptext-start.
currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).
jar_file–thenameofajarfiletouploadto <GPText_Install_Directory>/lib/ .
-e<command>
--editor=<command>Editortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.
-a<filename>
--append=<filename>
Appendsanamedfiletoaconfigurationfileanddistributestheresultingfiles.Requiresthe -f and -iparameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -a parameter.
-r
--revert
<filename>
Revertnamedfiletopreviousversion.
-i<index>
--index=<index> Required.Thenameoftheindex.
-s<shard>
--shard=<shard> Required.Thenameoftheshardtoaddareplicato.
-n<node>
--node=<node> Optional.Thenodewherethereplicaistobeadded.
-r<replica>
--replica=<replica> Requiredforthedropcommandonly.Thenameofthereplicatodrop.
-o
--onlyifdown Optional.Usedonlywiththedropcommand.Onlydropthereplicaifit’sdown.
©CopyrightPivotalSoftware,Inc,2013-2018 128 2.4.0
NotesTofindthenameofareplicatodrop,check gptext.index_status() .Thenameis core_nodeX whereXisanumber.
Examples1. Addareplicaforindex demo.wikipedia.articles inshard shard0 ,onnode node1 .
gptext-replicaadd-idemo.wikipedia.articles-sshard0-nnode1
2. Dropthereplicanamed core_node1 forindex demo.wikipedia.articles inshard shard0 ifthereplicaisdown.
gptext-replicadrop-idemo.wikipedia.articles-sshard0-rcore_node3-o
gptext-restoreRestoreaGPTextindexfromabackupsavedtolocalstorageontheGreenplumDatabaseclusterortoasharedfilesystemmountedonallGreenplumDatabaseclusterhosts.
Syntax
gptext-restore-h
gptext-restore-c-i<index_name>-p<path>[-v]
gptext-restorelocal-p<backup-name>[-v]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-c Restoreindexconfigurationandcreateanemptyindex.
localRestoreanindexthatwasbackeduptolocalGPTextclusterstorage.Ifthe local keywordisnotincluded,theindexisrestoredfromasharedfilesystemmountedonallhosts.
-p<path>
--path<path> Thepathtothebackupdirectoryoneachhost.
NotesUsethe gptext-restore utilitytorestoreaGPTextindexbackup.YoucanrestorethebackuptoanewGPTextsystem,oryoucanrestorethebackuptothesamesysteminordertorecoverfromacorruptedGPTextindex.Withthe -c option,youcanrestoretheconfigurationfilesandcreateanemptyindexwithoutrestoringtheindexdatafromthebackup.
Theindexyouarerestoringmustnotexist.The gptext-restore utilitycreatesanewindexandreloadsthebackedupdataintoit.Ifyouarerestoringinordertorepairacorruptedindex,youmustfirstdeletetheexistingindexwiththe gptext.drop_index() function.Iftheindexyouwanttorestoreexists,gptext-restore outputsanerrorandquits.
RestoreFromLocalGPTextClusterStorage
©CopyrightPivotalSoftware,Inc,2013-2018 129 2.4.0
Usethe gptext-restorelocal
commandtorestoreaGPTextindexfromlocalstorage.Supplythepathtothebackupdirectoryonthemasterhostusingthe
--path ( -p )option.Theargumenttothe --path optionisthepathtothebackupdirectorythatwascreatedwith gptext-backup ,includingthetimestamp.
Thefollowingexamplerestoresabackupthatwascreatedusingthis gptext-backup command: gptext-backuplocal-idemo.store.products-pgptext-backups
$gptext-restorelocal-pgptext-backups/demo.store.products_2018-05-11T10\:17\:54.49344820180511:11:04:31:026221gptext-restore:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterrestore.20180511:11:04:32:026221gptext-restore:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180511:11:04:32:026221gptext-restore:mdw:gpadmin-[INFO]:-Readingmetadatafromfile/home/gpadmin/gptext-backups/demo.store.products_2018-05-11T10:17:54.493448.json...20180511:11:04:32:026221gptext-restore:mdw:gpadmin-[INFO]:-Executingrestore...20180511:11:04:32:026221gptext-restore:mdw:gpadmin-[INFO]:-Creatingindexdemo.store.products...20180511:11:04:35:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard2forindexdemo.store.products.20180511:11:04:35:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:37:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering....20180511:11:04:37:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:04:38:026221gptext-restore:mdw:gpadmin-[INFO]:-Restoringreplicademo.store.products_shard2_replica1frombackupdemo.store.products_shard2_2018-05-11T10:17:54.493448...20180511:11:04:38:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard3forindexdemo.store.products.20180511:11:04:38:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:40:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering....20180511:11:04:40:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:04:41:026221gptext-restore:mdw:gpadmin-[INFO]:-Restoringreplicademo.store.products_shard3_replica1frombackupdemo.store.products_shard3_2018-05-11T10:17:54.493448...20180511:11:04:41:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard0forindexdemo.store.products.20180511:11:04:41:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:43:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering....20180511:11:04:43:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:04:44:026221gptext-restore:mdw:gpadmin-[INFO]:-Restoringreplicademo.store.products_shard0_replica1frombackupdemo.store.products_shard0_2018-05-11T10:17:54.493448...20180511:11:04:44:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard1forindexdemo.store.products.20180511:11:04:44:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:46:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering....20180511:11:04:47:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:04:47:026221gptext-restore:mdw:gpadmin-[INFO]:-Restoringreplicademo.store.products_shard1_replica1frombackupdemo.store.products_shard1_2018-05-11T10:17:54.493448...20180511:11:04:47:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:48:026221gptext-restore:mdw:gpadmin-[INFO]:-Adding1replica(s)todemo.store.products_shard2...20180511:11:04:48:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard2forindexdemo.store.products.20180511:11:04:48:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:51:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering........20180511:11:04:55:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:04:55:026221gptext-restore:mdw:gpadmin-[INFO]:-Adding1replica(s)todemo.store.products_shard3...20180511:11:04:55:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard3forindexdemo.store.products.20180511:11:04:55:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:04:57:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering........20180511:11:05:01:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:05:02:026221gptext-restore:mdw:gpadmin-[INFO]:-Adding1replica(s)todemo.store.products_shard0...20180511:11:05:02:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard0forindexdemo.store.products.20180511:11:05:02:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:05:05:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering........20180511:11:05:09:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:05:09:026221gptext-restore:mdw:gpadmin-[INFO]:-Adding1replica(s)todemo.store.products_shard1...20180511:11:05:09:026221gptext-restore:mdw:gpadmin-[INFO]:-Addreplicaintoshardshard1forindexdemo.store.products.20180511:11:05:09:026221gptext-restore:mdw:gpadmin-[INFO]:-Processing......20180511:11:05:12:026221gptext-restore:mdw:gpadmin-[INFO]:-Thereplicaisadded,datarecovering........20180511:11:05:16:026221gptext-restore:mdw:gpadmin-[INFO]:-Datarecovered,replicabecomesactive....20180511:11:05:16:026221gptext-restore:mdw:gpadmin-[INFO]:-Done.
©CopyrightPivotalSoftware,Inc,2013-2018 130 2.4.0
ThisexamplerestorestheconfigurationfilesandcreatestheGPTextindexwithoutreloadingthedata.Notethe local keywordisomitted.
$gptext-restore-c-pgptext-backups/demo.store.products_2018-05-11T10\:17\:54.49344820180511:11:16:50:028171gptext-restore:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterrestore.20180511:11:16:51:028171gptext-restore:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180511:11:16:51:028171gptext-restore:mdw:gpadmin-[INFO]:-Readingmetadatafromfile/home/gpadmin/gptext-backups/demo.store.products_2018-05-11T10:17:54.493448.json...20180511:11:16:51:028171gptext-restore:mdw:gpadmin-[INFO]:-Executingrestore...20180511:11:16:51:028171gptext-restore:mdw:gpadmin-[INFO]:-Creatingindexdemo.store.products...20180511:11:16:59:028171gptext-restore:mdw:gpadmin-[INFO]:-Done.
RestoreFromaSharedFileSystem
Usethe --path optiontorestoreabackupfromasharedfilesystem.ThesharedfilesystemmustbemountedonallhostswithGPTextnodesandmustbereadablebythegpadminuser.Eachhostintheclustermustbeabletoaccessthefilesystem.
TheGPTextindextorestoremustnotalreadyexist.
Thefollowingexamplerestoresthe demo.twitter.message indexfromasharedfilesystemmountedoneachhostat /mnt/nas .Thebackupwascreatedwiththename twitter ,sothebackupfilesweresavedinthe /mnt/nas/twitter directory.
$gptext-restore--path/mnt/nas/twitter20180510:17:22:46:008054gptext-restore:mdw:gpadmin-[INFO]:-ExecuteGPTextclusterrestore.20180510:17:22:48:008054gptext-restore:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180510:17:22:48:008054gptext-restore:mdw:gpadmin-[INFO]:-Validatesharedfilesystem.20180510:17:22:50:008054gptext-restore:mdw:gpadmin-[INFO]:-Restoreindex:demo.twitter.message,fromsharedFS'/mnt/nas',backupname:twitter.20180510:17:22:50:008054gptext-restore:mdw:gpadmin-[INFO]:-Processing.......................20180510:17:23:10:008054gptext-restore:mdw:gpadmin-[INFO]:-Checkingleaderreplicasofcollectiondemo.twitter.message..........20180510:17:23:16:008054gptext-restore:mdw:gpadmin-[INFO]:-Validatereplicastate........20180510:17:23:19:008054gptext-restore:mdw:gpadmin-[INFO]:-Indexrestoresuccessfully.20180510:17:23:19:008054gptext-restore:mdw:gpadmin-[INFO]:-Done.
gptext-startStartsorrestartstheGPTextcluster.
Syntax
gptext-start-h
gptext-start[-r][-s][-v]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-r
--restart RestartstheGPTextcluster.
-s
--slow_start RestartstheGPTextclusterbystartingnodesoneatatime.
-v
©CopyrightPivotalSoftware,Inc,2013-2018 131 2.4.0
--verbose Displaysdebugoutput.Parameter Description
NotesThe gptext-start-
rcommandcallsthe solr
restartcommandtostopandrestartalloftheSolrinstancesinthecluster.TheGPTextutilitydeterminesifthe
processesarerunningbeforeitcompletes,butitcannotverifythatalloftheSolrprocesseswerestopped.IfitisimportanttobecertainthatSolrprocesseswerestopped,forexampleifyouhavechangedtheJVMoptions,use gptext-stop followedby gptext-start insteadof gptext-start-
r.
The -s ( --slow-start )optionisrecommendedifyouhavealargenumberofindexes.Bydefault,whenaSolrclusterstartsallofthecluster’snodesarestartedatonce.Withalargenumberofindexes,thenumberofinitialZooKeeperrequestscanresultintimeouterrorsandpossiblypreventtheclusterfromstartinginacleanstate.Withthe -s option,GPTextperformsarollingstart,startingnodesoneatatime,toreduceZooKeepercontentionandallowamorestablestartup.Ifyouhavemorethan50indexesanddonotspecifythe -s option, gptext-start displaysawarningmessageandrequiresyoutoconfirm.Withthe -s option, gptext-start doesnotreturnuntilallnodeshavebeenstarted;withoutthe -s option,the gptext-start commandreturnsimmediately.
Examples1. StarttheGPTextcluster.
gptext-start
2. RestarttheGPTextcluster.
gptext-start-r
gptext-stateDisplaysthestateoftheGPTextclusterandGPTextindexes.
Syntax
gptext-state-h
gptext-state[-d<db-name>][-D][-v]
gptext-state-i<index-name>[-d<db-name>][-c<col1,...>][-v]
gptext-statelist[-d<db-name>][-v]
gptext-statehealthcheck[-d<db-name>][-f<percent>][-v]
gptext-statestats[-i<index-name>]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-d<db-name>
--database=<db-name>
ThenameofadatabasecontainingtheGPTextschema.
gptext-state searchesalldatabasesforthefunctionsitneedstorun.Iftheuserdoesnothaveaccesspermissiontothedatabaseitbeginswith,itfails.Inthiscase,usethe --database= parametertospecifyanaccessibledatabasetosearch.
©CopyrightPivotalSoftware,Inc,2013-2018 132 2.4.0
-D
--details
ListthestatusforeachGPTextindex.Whenomitted, gptext-state listscountsofthenumbersofindexeswithGreen,Yellow,andRedstatuses.
-i<index-name>
--index=<index-name>
Thenameofanindex.Displaysstatisticsforthespecifiedindex.Ifthe<index-name>isarootorchildpartition,displaysanyparentorchildpartitions.Thisoptioncannotbeusedwiththe list orhealthcheck subcommands.
-c<column-list>
--stats_columns=<
Usedwiththe -i or --index option,specifiesacomma-separatedlistofstatisticstodisplay.Thelistmaycontain replication_factor , max_shards_per_node , num_docs ,and size_in_bytes .Ifno-c or --stats_columns optionissupplied,allfourstatisticsaredisplayed.
-f<diskfree>
--disk_free=<diskfree>Usedwiththe healthcheck command,specifiesthepercentagediskfreerequiredperhosttoreportahealthyGPTextcluster.Thedefaultis10.
Parameter Description
NotesAllparametersareoptional,exceptthat -i ( --index )isrequiredwhenyouspecify --c ( --stats_columns ).
Ifyouspecifyasubpartitionnamewiththe -i option, gptext-state displaysthenameoftheparenttableorpartitionfromwhichthepartitioninherits.Ifyouspecifythenameofatableorpartitionwithchildpartitions, gptext-state liststhem.
Whenexecutedwithnoarguments, gptext-state displaystheGPTextversionandcountsofindexesintheGreen,Yellow,andRedstates.
AGreenstatemeansthatallshardsandreplicasarehealthy.
AYellowstatemeansthatallshardsareavailable,butoneormorereplicasisdown.
ARedstatemeansthatonemoremoreshardsisdown.
Withthe -D ( --details )optionspecified, gptext-state listsallGPTextindexeswiththecolumns database , index_name ,and state .The state columndisplaysthestatusoftheindexas Green , Yellow ,or Red .
IfanyindexhasaYelloworRedstatus, gptext-state returnsanon-zerovalue.
Examples1. ShowtheGPTextclusterstate.
$gptext-state20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.0.020161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Indexstate.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-stateindexcount20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Green4
2. ShowtheGPTextclusterstatewithdetails,specifying demo asadatabasecontainingtheGPTextschema.
$gptext-state-D-ddemo20170929:15:18:21:000872gptext-state:gpdb:gpadmin-[INFO]:-ExecuteGPTextstate...20170929:15:18:21:000872gptext-state:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170929:15:18:21:000872gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextclusterstatus...20170929:15:18:21:000872gptext-state:gpdb:gpadmin-[INFO]:-CurrentGPTextVersion:2.1.320170929:15:18:21:000872gptext-state:gpdb:gpadmin-[INFO]:-Allnodesareupandrunning.20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:-Indexstatedetails.20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:-databaseindexnamestate20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:-demodemo.twitter.messageGreen20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:-demodemo.wikipedia.articlesGreen20170929:15:18:22:000872gptext-state:gpdb:gpadmin-[INFO]:-Done.
©CopyrightPivotalSoftware,Inc,2013-2018 133 2.4.0
3. Show replication_factor and num_docs statisticsfortheGPTextindex demo.wikipedia.articles .Specify wikipedia asthedatabasewiththeGPTextschema.
$gptext-state-idemo.wikipedia.articles-creplication_factor,num_docs-ddemo20170927:13:00:31:030421gptext-state:gpdb:gpadmin-[INFO]:-ExecuteGPTextstate...20170927:13:00:31:030421gptext-state:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170927:13:00:31:030421gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:-ReplicasUp:520170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:-Indexdemo.wikipedia.articlesstatistics.20170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:-replication_factornum_docs20170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:-22320170927:13:00:33:030421gptext-state:gpdb:gpadmin-[INFO]:-Done.
4. Listallindexes.
$gptext-statelist20170929:15:19:02:001023gptext-state:gpdb:gpadmin-[INFO]:-ExecuteGPTextstate...20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:----------------------------------------------------------20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:-Indexlist20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:----------------------------------------------------------20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:-demo.twitter.message20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:-demo.wikipedia.articles20170929:15:19:03:001023gptext-state:gpdb:gpadmin-[INFO]:-Done.
5. Performahealthcheckwitha20%freediskrequirement.
$gptext-statehealthcheck-f20-ddemo20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-ExecuteGPTextstate...20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextbinaryandutilitiesversionmatch...20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:53:030843gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextconfigfiles...20170927:13:03:55:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:55:030843gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextindexstatus...20170927:13:03:55:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:55:030843gptext-state:gpdb:gpadmin-[INFO]:-Checkingforrequireddiskspace...20170927:13:03:56:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:56:030843gptext-state:gpdb:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20170927:13:03:57:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:57:030843gptext-state:gpdb:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20170927:13:03:58:030843gptext-state:gpdb:gpadmin-[INFO]:-GOOD20170927:13:03:58:030843gptext-state:gpdb:gpadmin-[INFO]:-Done.
6. Checkthestatusofapartitionedtable.
$gptext-state-idemo.twitter.message20170929:15:19:46:001205gptext-state:gpdb:gpadmin-[INFO]:-ExecuteGPTextstate...20170929:15:19:46:001205gptext-state:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170929:15:19:46:001205gptext-state:gpdb:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-ReplicasUp:420170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-Indexdemo.twitter.messagestatistics.20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:------------------------------------------------20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-replication_factormax_shards_per_nodenum_docssizeinbytes20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-24173068749420170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-Childpartitionindexes:20170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-demo.twitter.message_1_prt_120170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-demo.twitter.message_1_prt_220170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-demo.twitter.message_1_prt_320170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-demo.twitter.message_1_prt_420170929:15:19:47:001205gptext-state:gpdb:gpadmin-[INFO]:-Done.
7. ListstatisticsforallGPTextindexes.
©CopyrightPivotalSoftware,Inc,2013-2018 134 2.4.0
20180119:10:26:37:004525gptext-state:gpdb51:gpadmin-[INFO]:-ExecuteGPTextstate...20180119:10:26:37:004525gptext-state:gpdb51:gpadmin-[INFO]:-Checkzookeeperclusterstate...20180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:------------------------------------------------20180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-IndexStatistics.20180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:------------------------------------------------20180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-indexnamenum_docssizeinbytes20180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-demo.store.products501574620180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-demo.twitter.message1730107821920180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-demo.wikipedia.articles2350051620180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-gptext-docs1657266820180119:10:26:38:004525gptext-state:gpdb51:gpadmin-[INFO]:-Done.
gptext-stopStoptheGPTextclusternodes.
Syntax
gptext-stop-h
gptext-stop[-v][-f]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-v
--verbose Displaysdebugoutput.
-f
--forceForcefullystopsallSolrprocesses.
Examples1. StoptheGPTextcluster.
gptext-stop
2. ForcestoptheGPTextcluster.
gptext-stop-f
gptext-uninstallUninstallsGPText,includingdataandinstalledfiles.UninstallsZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.
StopsanyrunningGPTextinstances.
DeletesallSolrdirectoriesinsegmentdirectories.
Deletestheinstallationdirectory.
RemovesallGPTextschemasandindexesfromalldatabases.
©CopyrightPivotalSoftware,Inc,2013-2018 135 2.4.0
UninstallsZooKeeperifitwasinstalledwiththeGPTextinstaller.
Syntax
gptext-uninstall-h|--help
gptext-uninstall[-v|--verbose]
Parameters
Parameter Description
-h
--help Displaysausagemessageandexits.
-v
--verbose Displaysdebugoutput.
NotesTouse gptext-uninstall ,youmusthavesuperuserpermissionsonalldatabaseswithGPTextschemas.
gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.
Examples1. UninstallGPText.
gptext-uninstall
gptext-upgradeUpgradesthecurrentGPTextsystemtoanewGPTextrelease.
Syntax
gptext-upgrade[-h|--help]
gptext-upgrade[-f<upgrade_file>|--file=<upgrade_file>][-c|--base_check][-v|--verbose]
Parameter Description
-h
--help Displaysausagemessageandexits.
-f<upgrade_file>
--file<upgrade_file>Providesthepathtotheupgradefile.Thedefaultupgradefileis $GPPERFMONHOME/share/upgrade.yaml.
-c
--base_check
Bydefault, gptext-upgrade checksthattheGPTextenvironmentcanbeupgradedandreportsanyitemsthatmustbecorrectedbeforeupgrading.Whenthe -c or --base-check optionissupplied,theenvironmentcheckisomitted.
©CopyrightPivotalSoftware,Inc,2013-2018 136 2.4.0
-v
--verbose Displaysdebugoutputwhenexecutingthecommand.
Parameter Description
NotesTheupgrade_fileisaYAML-formattedscriptdefiningactionstoupgradeaGPTextsystemfromapreviousreleasetothecurrentrelease.Thefileisnotintendedtobeeditedbyusers.Iftheupgrade_filedoesnotcontainsupportforthepreviousGPTextrelease, gptext-upgrade outputsanerrormessageandexits.
zkManagerCheckstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.
Syntax
zkManager[-h|--help]
zkManagerstate[-v|--verbose]
zkManagerstart[-v|--verbose]
zkManagerstop[-v|--verbose][-f|--force]
Parameters
Parameter Description
-h
--help Displayausagemessageandquit.
-f
--force Whenusedwiththe stop command,performsaforcedstop.
-v
--verbose Displaysdebugoutputwhenexecutingthecommand.
NotesThe zkManager start and zkManager stop commandsareonlyavailableiftheZooKeeperclusterwasinstalledbytheGPTextinstaller.
Bydefault,all gptext-* utilitieschecktheZooKeeperclusterstate.Iftheclusterisnothealthy,theZooKeeperstateinformationisdisplayedtowarntheuser.
The nc (netcat)commandmustbeinstalledonthemasterhost.Run nc inaterminaltoensurethecommandisinstalled.
Examples1. StarttheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
zkManagerstart
2. StoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
©CopyrightPivotalSoftware,Inc,2013-2018 137 2.4.0
zkManagerstop
3. ForcestoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
zkManagerstop-f
4. CheckthestateoftheZooKeepercluster:
$zkManagerstate20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Executezookeeperstateprocess.20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-HostportLatencymin/avg/maxMode20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21880/0/17follower20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21890/0/17leader20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21900/0/70follower20160603:14:17:06:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Done.
©CopyrightPivotalSoftware,Inc,2013-2018 138 2.4.0
GPTextandSolrDataTypeMappingsThefollowingtablemapsGreenplumDatabasedatatypestoSolrdatatypes.
IfaGreenplumDatabasedatatypeisnotlisted,itisa text typeinSolr.
IfaGreenplumDatabasedatatypeisanarrayitismappedtoamulti-valuetypeinSolr.Forexample, INT[]
mapstoamulti-value int Solrfield.
GreenplumDatabaseType SolrType
bigint long
bit string
bool boolean
bytea binary
char string
date tdate
float4 float
float8 double
int int
int2 int
int4 int
int8 long
interval string
money string
name string
numeric double
point point
smallint int
text text
time string
timestamp tdate
timestamptz tdate
timetz string
uuid uuid
varbit string
varchar text
©CopyrightPivotalSoftware,Inc,2013-2018 139 2.4.0
GPTextSchemaTablesThe gptext schemaincludestablesthatGPTextusestomanagetheGPTextclusterandtologGPTextactivities.
gptext.admin_historyGPTextwritesarecordtothe gptext.admin_history tablewhenthefollowingactionsoccur:
createordropaGPTextindex
addordropafieldinaGPTextindex
backuporrestoreaGPTextindex
addorremoveaZooKeeperroleonaGPTextnode
Column Type Description
time timestampwithouttimezone Thetimetheactionoccurred.
user charactervarying(64) ThenameoftheGreenplumDatabaserolethatperformedtheaction.
action text Atextmessagedescribingtheaction.
gptext.gptext_envsThe gptext.gptext_envs tableisanexternaltablecontainingrowswithvaluesforGPTextenvironmentvariables.Currently,theonlyGPTextenvironmentvariableis $GPTXTHOME ,whichistheGPTextinstallationdirectory.ThesourcefortherowsinthistableistheCSVfile$MASTER_DATA_DIRECTORY/gptxtenvs.conf .
Column Type Description
envname text Thenameofanenvironmentvariable.
value text Thevalueoftheenvironmentvarialbe.
gptext.error_tableGPTextwritesarecordinthe gptext.error_table whenarequesttoaddadocumenttoaGPTextexternalindexfails.Rowsremaininthetableuntilyoucallgptext.recreate_error_table todropandrecreatethetable.
Column Type Description
error_time timestampwithouttimezone Thetimetheerroroccurred.
index_name text Thenameoftheexternalindex.
sqlcmd text TextoftheSQLstatement,ifany.
errmsg text Themessagetextoftheerrorthatoccurred.
rawdata text Dataassociatedwiththeerror,forexamplethedocumentURL.
rawbytes bytea Binarydataassociatedwiththeerror,ifany.
gptext.solr_instancesThe gptext.solr_instances tableisanexternaltablewitharowforeachSolrinstance.
Column Type Description
id integer UniqueidfortheSolrinstance.
host text Nameofthehostwheretheinstanceisrunning.
port integer PortnumberoftheSolrinstance.
solrdir text PathtotheSolrinstance’sdatadirectory.
©CopyrightPivotalSoftware,Inc,2013-2018 140 2.4.0
zoocluster text AlistofZooKeepernodes.Column Type Description
gptext.zoo_clusterThe gptext.zoo_cluster isanexternaltablewithonerowforeachZooKeepernode.
Column Type Description
id integer TheuniqueidoftheZooKeepernode.
host text NameofthehostwheretheZooKeepernodeisrunning.
port integer PortnumberoftheZooKeeperinstance.
data_directory text PathtotheZookeepernode’sdatadirectory.
©CopyrightPivotalSoftware,Inc,2013-2018 141 2.4.0
GPTextConfigurationParametersGPTextconfigurationparameterscanbeoverriddenbysettinganewvalueinaGreenplumDatabasesession.ChangesmadetoconfigurationparametersonlyaffectfutureGPTextoperations;existingindexesusetheparametervaluesthatweresetwhentheywerecreated.
SeeChangingGPTextServerConfigurationParametersforinformationaboutchangingconfigurationparametersandexamples.
ThefollowingtableliststheGPTextconfigurationparameterswiththeirdefaultsandvalueconstraints.
admin_timeout Timeout,inseconds,foradminrequests(create_index,etc.). 30INT_MAX
3600
commit_timeout Timeout,inseconds,forpreparecommitandcommitoperations. 30INT_MAX
3600
delete_timeout Timeout,inseconds,fordeleterequests. 30INT_MAX
3600
extension_factorMaximumnumberofreplicasthatcanbeaddedforanindexperGPTextnodeaftertheindexiscreated.
0 10 2
facet_timeout Timeout,inseconds,forfacetingqueries. 30INT_MAX
3600
failover_factorMinimumratioofSolrnodesthatmustbeupinordertocreateanewindex.(SolrNodesUp/TotalSolrNodes )
0.0 1.0 0.8
hl_post_tag Markupthat gptext-highlight() insertsaftertermsinsearchresults. '</em>'
hl_pre_tag Markupthat gptext-highlight() insertsbeforetermsinsearchresults. '<em>'
idx_buffer_size Sizeofindexingbufferinbytes. 409667108864
134217728
idx_delim Delimitertouseduringindexing.
comma
','
idx_encapsulatorThecharacteroptionallyusedtosurroundvaluestopreservecharacterssuchastheCSVseparatororwhitespace.
quote
'"'
idx_escape Escapecharactertouseforindexing.
backslash
'\\'
index_timeout Timeout,inseconds,forreceivingresponsetoindexingoperation. 30INT_MAX
3600
optimize_timeout Timeout,inseconds,foroptimizeoperations. 30INT_MAX
3600
ping_timeout Timeout,inseconds,forpingrequests. 30INT_MAX
120
replication_factor
Thenumberofreplicaspershardforanewlycreatedindex. 0 10 2
replication_timeout
Timeout,inseconds,forreplicationoperations(backup,restore). 30INT_MAX
43200
rollback_timeout Timeout,inseconds,forrollbackoperations. 30INT_MAX
3600
search_batch_size Batchsizeforsearchrequests. 1INT_MAX
2500000
search_buffer_size
Buffersizeforsearchresults,inbytes. 409667108864
16777216
search_param_separator
Delimitertouseinthe options parameterofthe gptext.search() UDF. '&'
search_post_buffer_size
Postbuffersizeforsearch,inbytes. 5124194304
4096
search_timeout Timeout,inseconds,forsearches. 30 INT_MAX 600
©CopyrightPivotalSoftware,Inc,2013-2018 142 2.4.0
stats_timeout Timeout,inseconds,forobtainingstatistics. 30INT_MAX
600
idx_segment_error_limit
Limitforindexingerrorspersegment.Ifthisvalueisexceededonanysegment,theindexingoperationisstopped.
INT_MAX
10
terms_batch_size Batchsizefortermsoperations. 1INT_MAX
1000
©CopyrightPivotalSoftware,Inc,2013-2018 143 2.4.0