Azure Cognitive Search

15
dd Author: Saptarshi Bhattacharya, DNA Microsoft Practice Azure Cognitive Search An overview

Transcript of Azure Cognitive Search

Page 1: Azure Cognitive Search

dd

Author: Saptarshi Bhattacharya, DNA Microsoft Practice

Azure Cognitive Search

An overv iew

Page 2: Azure Cognitive Search

P a g e | 2

A z u r e C o g n i t i v e S e a r c h

Contents What is Azure Cognitive Search (ACS)? ................................................................................................. 3

What are Capabilities & Advantages of ACS? ........................................................................................ 3

Search-as-a-Service ............................................................................................................................... 4

Where to use ACS ................................................................................................................................. 4

Further scenarios where ACS can come handy ..................................................................................... 5

ACS Competitors ................................................................................................................................... 5

SLA ........................................................................................................................................................ 6

Security ................................................................................................................................................. 7

ACS Limitations ..................................................................................................................................... 8

Typical Workflow .................................................................................................................................. 9

Creating Service and Indexes from Azure Portal (Testing Purpose) .................................................... 10

Understanding Index and Indexer ............................................................................................... 13

Understanding Text Cognitive Skills and Image Skills .................................................................. 13

References: ......................................................................................................................................... 15

Page 3: Azure Cognitive Search

P a g e | 3

A z u r e C o g n i t i v e S e a r c h

What is Azure Cognitive Search (ACS)?

Azure Cognitive Search is a cloud search service with built-in AI capabilities that enrich all types of information to easily identify and explore relevant content at scale. It has a rich set of features, including instant scaling, AI integrations, and complete code flexibility, making it worthy of consideration beyond its initial developer focus. As a platform, Azure Search is a tool in the toolbox for creating complete AI-driven search solutions, and an excellent one at that. Based on the usual enterprise needs of the business, Azure Search may be a better platform than Microsoft 365 or Microsoft Search.

What are Capabilities & Advantages of ACS?

1. Azure Cognitive Search indexes could be in the billions of items versus O365, which would be in 100s of millions

2. Direct control of relevance ranking, query processing, and indexing

3. No limits on size of documents or number of items indexed

4. Overcome limits of SharePoint Online search (such as files over 10 GB or large amounts of data outside of Office 365)

5. Azure Cognitive Search is much faster than SharePoint search, both in indexing and query performance.

6. Sophisticated search capabilities: including real-time indexing, geo-search, pattern search, and search on machine data.

7. Built-in integration with Cognitive Services e.g. language detection, image tagging, named entity extraction

8. Azure Cognitive Search is a fully managed service so scaling it up or down is far easier than SharePoint. It is also easier to set up than Elasticsearch or SharePoint.

Page 4: Azure Cognitive Search

P a g e | 4

A z u r e C o g n i t i v e S e a r c h

Search-as-a-Service

Azure Cognitive Search (ACS) is a technique for using artificial intelligence (AI) to extract additional metadata from images, blobs, and other unstructured data. It works well for both structured and unstructured data. In the past, we needed to set up a separate search farm to fulfill the search requirements for a web application. Since ACS is a Microsoft Cloud service, we do not need to set up any servers or be a search expert.

Where to use ACS

Requirements like control and openness of the solution, or infinite scalability to potentially billions of records, or specific latency and query performance needs, all point to Azure Search as a key platform to leverage. We see use cases like those emerging more and more, and with them a rapid acceleration of Azure Search as the key enabling engine of enterprise search solutions.

Most of the businesses have many handwritten documents, forms, emails, PowerPoints, Word documents, of unstructured data. For handwritten documents, even if we scan and digitize it, how can we make content searchable? If we have images, drawings, and picture data, how do we extract text contents out of it and make it searchable? If we have many handwritten documents, we can scan it, upload it to Azure Blob Storage containers in an organized fashion and Azure Cognitive search can import the documents from Blob Containers and create the search indexes. The below diagram shows the paper document flow.

Page 5: Azure Cognitive Search

P a g e | 5

A z u r e C o g n i t i v e S e a r c h

Further scenarios where ACS can come handy

• If the local-file share has many documents and running out of space. Example: If the organization is storing documents in File Server, we can index those documents using ACS and can provide a good search experience, so users do not have to use Windows, search explorer to search. We can design nice web application UI which can search using ACS indexes.

• The customer already has data in the cloud. Like data stored in Azure Blob Storage, Azure SQL Database, or Azure Cosmos Db. ACS can easily connect and create indexes on Azure Blob Storage, Azure SQL Db, and Azure Cosmos DB.

• International business companies have documents in many languages. Out of the box, ACS search indexes translated results in many different languages. We can show the search result in a different language as well.

• The client needs to apply AI to business documents. • Documents are lacking the Metadata. Example: Documents that are

having Title only as metadata so all we can search by is Title! But ACS can extract many key phrases from documents, and we can search on key phrases as well.

ACS Competitors

The table below is an independent analysis of the biggest competitors. While it is not an official Microsoft position (there is no formal battle card), the table below can help with future studies and discussions.

Page 6: Azure Cognitive Search

P a g e | 6

A z u r e C o g n i t i v e S e a r c h

Other Microsoft Search Options

Here are other Microsoft Search products or features.

SLA

Azure Cognitive Search SLA is 99.9% availability for index query requests when an Azure Cognitive Search Service Instance is configured with two or more replicas, and index update requests when an Azure Cognitive Search Service Instance is configured with three or more replicas. No SLA is provided for the Free tier. Search Service Instance is

Page 7: Azure Cognitive Search

P a g e | 7

A z u r e C o g n i t i v e S e a r c h

an Azure Cognitive Search service instance containing one or more search indexes. Replica is a copy of a search index within a Search Service Instance. This is a key point to be addressed with clients.

Service level agreements (SLA) for Azure Cognitive Search are targeted at query operations and at index updates that consist of adding, updating, or deleting documents. Basic tier tops out at one partition and three replicas. If we want the flexibility to immediately respond to fluctuations in demand for both indexing and query throughput, consider one of the Standard tiers.

Security

Azure Cognitive Search has the following security features:

• GDPR, Standard Azure OST (Online Service Terms)

• Security filters for trimming results (Documents must include a field

specifying which groups have access)

• Role-based access controls (RBAC)

• Filter content based on user identity

• Standards compliance: ISO 27001, SOC 2, HIPAA

• Encrypted transmission and storage

• Multi-tenant scenarios

• Index per tenant

• Service per tenant

Page 8: Azure Cognitive Search

P a g e | 8

A z u r e C o g n i t i v e S e a r c h

ACS Limitations

Some important service limits are listed below. Details can be found in

https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

Page 9: Azure Cognitive Search

P a g e | 9

A z u r e C o g n i t i v e S e a r c h

Typical Workflow

• Provision service: Create or provision an Azure Cognitive Search

service from the portal or with PowerShell.

• Create an index: An index is a container for data, think “table”. It has schema, CORS options, search options. Create it in the portal or

during app initialization.

• Index data: There are two ways to populate an index with the data.

The first option is to manually push the data into the index using the Azure Cognitive Search REST API or .NET SDK. The second

option is to point a supported data source to the index and let

Azure Cognitive Search automatically pull in the data on a schedule.

• Search an index: When submitting search requests to Azure

Cognitive Search, we can use simple search options, we can filter,

sort, project, and page over results. We can address spelling mistakes, phonetics, and Regex, and there are options for working

with search and suggest. These query parameters allow to achieve

deeper control of the full-text search experience.

Page 10: Azure Cognitive Search

P a g e | 10

A z u r e C o g n i t i v e S e a r c h

Creating Service and Indexes from Azure Portal (Testing Purpose)

The below diagram shows the simple flow from the Azure portal.

Create ACS

Azure Cognitive Search is a standalone resource used to plug a search

experience into custom apps. Azure Cognitive Search integrates easily

with other Azure services, with apps on network servers, or with

software running on other cloud platforms.

1. Choose ACS and create a resource group in Azure Portal

2. Name the Service: In Instance Details, provide a service name in the URL field. The name is part of the URL endpoint against which API calls are issued: https://your-service-name.search.windows.net.

Service name requirements:

• It must be unique within the search.windows.net namespace • It must be between 2 and 60 characters in length • Must use lowercase letters, digits, or dashes ("-") • Do not use dashes ("-") in the first 2 characters or as the last

single character • May not use consecutive dashes ("--") anywhere

Page 11: Azure Cognitive Search

P a g e | 11

A z u r e C o g n i t i v e S e a r c h

3. Choose Location: We can minimize or avoid bandwidth charges by

choosing the same location for multiple services. For example, if

we are indexing data provided by another Azure service (Azure

storage, Azure Cosmos DB, Azure SQL Database), creating our

Azure Cognitive Search service in the same region avoids bandwidth charges (there are no charges for outbound data when

services are in the same region).

4. Choosing Pricing Tier (SKU): Azure Cognitive Search is currently offered in multiple pricing tiers: Free, Basic, or Standard. Each tier

has its own capacity and limits.

*Please refer to https://azure.microsoft.com/en-us/pricing/details/search/ for latest updates

5. Scaling: ACS can be scaled up following Partitions allow the service to store and search through more

documents.

Page 12: Azure Cognitive Search

P a g e | 12

A z u r e C o g n i t i v e S e a r c h

Replicas allow the service to handle a higher load of search

queries.

Using ACS

Once the service is created, following are the steps to quickly establish the search.

• Step 1: Start with documents (unstructured text) such as PDF, HTML, DOCX, Emails, and PPTX in Azure Blob storage. Upload the contents in Azure blob Storage and in ACS. Import the data from Azure Blob Storage.

• Step 2: Select this option to apply cognitive skills • Step 3: Define an index (structure) to store the output (raw

content, Step 2-generated name-value pairs). • Step 4: Create an indexer, Indexer fills the data into the index

fields.

• Step 5: Search on indexes by using Azure Search Explorer. Quick UI can be developed following https://github.com/jj09/AzSearch.js

Page 13: Azure Cognitive Search

P a g e | 13

A z u r e C o g n i t i v e S e a r c h

Understanding Index and Indexer

The search index is like creating an empty table and fields. If we want to search on the data, first we need to figure out which fields we want to make it searchable. Once we decide the fields, how can we populate data into it? The search indexer pulls the data from the source and fills the search indexes with data so we can search on search indexes. It is very quick to define the search indexes and create an indexer from Azure Portal in ACS. In ACS search index is just Json objects.

Understanding Text Cognitive Skills and Image Skills

Out of the box Text Cognitive skills in ACS can extract the people’s names, organization names, location names, and key phrases from the data or documents. Text Cognitive skills can also translate the result in different languages and can also detect the language.

Page 14: Azure Cognitive Search

P a g e | 14

A z u r e C o g n i t i v e S e a r c h

Image skills can generate tags and captions from images and can also

identify celebrities.

Below JSON search index as an example of Image cognitive skill.

Page 15: Azure Cognitive Search

P a g e | 15

A z u r e C o g n i t i v e S e a r c h

References:

1. https://docs.microsoft.com/en-us/azure/search/search-create-service-portal

2. https://dzone.com/articles/cognitive-search-azure-search-with-ai