Post on 22-Jan-2018
Predictive Analytics
Integrating Permit Information,
Vessel Monitoring, and Fishery
Observer Programs
Carlos Rivero
Southeast Fisheries Science Center
2
Data Systems Overview
PIMS (Permit Information Management System)
• Southeast Regional Office (St. Petersburg, FL)
• PostgreSQL Database Management System
VMS (Vessel Monitoring System)
• Office of Law Enforcement (Silver Spring, MD)
• Oracle Database Management System
Gulf Shrimp Observer Program
• Southeast Fisheries Science Center (Galveston, TX)
• Microsoft Access
GIS (Geospatial Information System)
• Multiple Sources
• Various Storage Formats (Shapefiles, Grids, Excel files, MS Access
databases, Oracle, JPEG images, Binary & ASCII Raster)
3
Permit Information Management
PIMS (Permit Information Management System)• PostgreSQL Database Management System
• 146 Transactional Tables
• Access via a local replicated database (Disaster Recovery Server)
• MS Access configured to read and extract data (ODBC)
• Migrated data tables to Oracle 11g RDBMS
• Primary Tables Include:
1. TBL_REQMIT (Permits and Requests)
2. TBL_VESSELS (Vessel Characteristics)
3. TBL_FISHERY_TYPE (Fishing Industry)
4. TBL_REQMIT_STATUS (Permit Status)
4
Vessel Monitoring System
VMS• Oracle RDBMS
• 108 Transactional Tables
• Access local replicated database and direct access via DBLink
• Receive nightly updates (~80,000 records) with a 4-day lag
• FMC_POS is the primary table of interest containing:• ID
• LAT_LON (SDO_GEOMETRY)
• UTC_DATE
• COURSE
• SPEED
• TRACK (SDO_GEOMETRY)
• RADIO (VESSEL IDENTIFIER)
5
Gulf Shrimp and Reef Fish
Observer Program
Observer Data• Microsoft Access RDBMS
• Data manipulated to create
TRIPS and TOWS tables:
• The TRIPS table documents
when the trips started and
ended. This information is used
to extract the locations from the
warehouse.
• The TOWS table identifies when
trawling is occurring which is the
target variable. This is used to
assign this behavior to the
locations previously extracted.
TRIPS TOWS
Vessel Official Number Vessel Official Number
Trip Number Trip Number
Trip Start Tow Number
Trip End Time In
Number of Days Time Out
Number of Tows/Sets Location
6
Geospatial Information System
Bathymetry• High Resolution Coastal
• Low Resolution Global
Distance from Shore
Direction to Shore
Speed
7
Geospatial Data Warehouse
1. Assign fishery permit to each VMS location (Vessel_ID and
Date)
2. Spatially-join bathymetry, distance from shore, and direction to
shore to each VMS location (Raster Cell Value)
3. Organize facts and dimensions based on the data warehouse
design.
4. Populate materialized view containing relevant data elements in
one master table
5. Identify which locations pertain to each observer trip. Assign
target variable (FISHING) a value of 1 for each location within
the TIME_IN and TIME_OUT window. All others receive 0.
8
Distribution of VMS Locations
9
Bathymetry by Fishery Code
10
Distance From Shore by Fishery Code
11
Vessel Speed (knots) by Fishery Code
12
Suspected Fishing Locations
(Using Speed & Bathymetry as Primary Criteria)
13
Predictive Analytics
1. Upload training data for Shrimp (trawling) and import into SAS Enterprise Data Miner.
2. Partition the data into training and validation segments based on their original distributions:
1. Develop models, Regression and Decision Tree, to predict fishing behavior. The Auto-
Neural Network model was not selected for this project since the resulting variable
coefficients must be understood.
2. Compare the models to determine which is the most effective at predicting fishing behavior.
BEHAVIOR VALUE SHRIMP
FISHING 1 43.69%
NOT FISHING 0 56.31%
14
Model Pathway
Additional data were not scored due to the relatively high
misclassification rate (0.38551) of the regression model. The
decision tree model had a similar misclassification rate of
(0.38636). The model must be refined prior to its application within
an operational context.
15
Trawling Regression Model
1. The regression model established that the following variables were most useful in predicting
shrimp trawling behavior.
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Standard
Estimate
Intercept 1 -2.7513 0.6571 17.53 <0.0001 0.064
ADW 1 -0.5844 0.0574 103.75 <0.0001 0.557
Bathymetry 1 -0.00663 0.00105 40.12 <0.0001 -0.1403
Freezer 1 0.3899 0.0584 44.64 <0.0001 1.477
Fuel Capacity 1 -0.00004 7.32E-6 23.12 <0.0001 -0.1666
Gross Weight 1 -0.00490 0.00236 4.31 0.0378 -0.0630
Longitude 1 -0.0355 0.00643 30.44 <0.0001 -0.1276
RS 1 0.2542 0.0922 7.60 0.0058 1.289
Steel Hull 1 -0.1832 0.0590 9.64 0.0019 0.833
WRK 1 0.7395 0.1003 54.38 <0.0001 2.095
16
Trawling Regression Equation
-2.7513
+ (-0.5844*ADW)
+ (-0.00663*Bathymetry)
+ (0.3899*Freezer)
+ (-0.00004*Fuel Capacity)
+ (-0.0049*Gross Weight)
+ (-0.0355*Longitude)
+ (0.2542*RS)
+ (-0.1832*Steel Hull)
+ (0.7395*WRK)
Variable Influence
Intercept Negative
ADW Permit Negative
Depth (neg. meters) Positive
Freezer (Y/N) Positive
Fuel Capacity Negative
Gross Weight Negative
Longitude (neg. degrees) Positive
RS Permit Positive
Steel Hull Negative
WRK Permit Positive
17
Trawling Regression Fit Statistics
18
Trawling Regression Iteration Plot
19
Trawling Decision Tree
20
Decision Tree Explained
1. If the LATITUDE is >= 35.165, there is a 60.7% chance that the vessel is fishing.
2. If LATITUDE is < 35.165, there is a 40.1% chance that the vessel is fishing.
3. If LATITUDE is < 35.165 and LONGITUDE < -81.045, there is a 44.7% chance that the vessel is fishing.
Furthermore, if the vessel has a KM permit, there is a 67.3% chance that the vessel is fishing as opposed
to a 43.4% chance if the vessel does not have a KM permit.
4. If LATITUDE is < 35.165 and LONGITUDE > -81.045, there is a 32.1% chance that the vessel is fishing. If
the NET_WEIGHT of the vessel is less than 69.5 tons there is a 41.5% chance that the vessel is fishing. In
addition, if the vessel’s speed is >= 0.105 knots, then there’s a 47.2% chance that it is fishing. If the speed
is <0.105 knots, then the LONGITUDE must be greater than -79.955 degrees to have a 83.3% chance of
predicting fishing behavior.
5. On the other hand, if the NET_WEIGHT of the vessel is >= 69.5 tons, there is a 24.3% chance that the
vessel is fishing. In addition, if the HOLD_CAPACITY of the vessel is less than 14,000 pounds, there is a
52.0% chance that the vessel if fishing. Furthermore, if the DISTANCE to the closest shore is < 7,394
meters, then it is 100% likely that the vessel is fishing as opposed to 40.0% likely if the distance is greater
than or equal to 7,394.
21
Trawling Decision Tree Fit Statistics
22
Trawling Decision Tree Iteration Plot
23
Trawling Decision Tree
Important Variables
24
Model Comparisons
(Event Classification)
Model Role False
Negative
True
Negative
False
Positive
True
Positive
Decision Tree Train 1079 1742 240 459
Decision Tree Validation 1084 1706 276 454
Regression Train 1033 1688 294 505
Regression Validation 1033 1658 324 505
25
Model Comparisons
(Event Classification)
0
500
1000
1500
2000
2500
3000
3500
4000
Decision-Train Decision-Validate Regression-Train Regression-Validate
True Positive
False Positive
True Negative
False Negative
26
Model Selection
(Validation Misclassification Rate)
Selected
Model
Model Misclassification
Rate
Average
Squared Error
Y Regression 0.38551 0.22852
N Decision Tree 0.38636 0.22573
27
Next Steps
1. Develop observer data warehouse
2. Link VMS/Permit and Observer data warehouses
3. Use the observer data to determine fishing vs non-fishing
locations for all programs (pelagics, reef fish, shrimp, sharks)
4. Develop, test, and validate program specific models
5. Incorporate model output into operational scoring routine
6. Use validated models to quantify fishing effort