Distributed and scalable architecture for SAF-T processing and … · Distributed and scalable...

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

Distributed and scalable architecturefor SAF-T processing and analysis

Daniel Silva Reis

Mestrado Integrado em Engenharia Informática e Computação

Supervisor: António Miguel Pontes Pimenta Monteiro

July 28, 2018

Distributed and scalable architecture for SAF-Tprocessing and analysis

Daniel Silva Reis

Mestrado Integrado em Engenharia Informática e Computação

July 28, 2018

Abstract

In 2008, Portugal adopted SAF-T (Standard Audit File for Tax). SAF-T is a guideline from OECD(Organization for Economic Cooperation and Development) for electronic inter-exchange (basi-cally an .XML file) of reliable accounting, sales and data distribution within organizations orregulators (Tax Authorities, Chartered Accountants, Central Banks, Statistics Institutes).

Petapilot developed Colbi to provide an easy way for companies to check their financial“health” and ensure compliance to the tax authority, internal audit and other regulatory entitiesusing this standard. Colbi is a product to handle and provide these compliance. The main issueis related to the automatic scalability and provisioning of the platform. The platform will becomeslower due to the increase in the number of companies and the size of SAF-T documents. Thisplatform must be able to respond to all submitted documents (sometimes with billions of trans-actions) and be always available. The service level is very important as it can imply a delay onsubmitting a company’s information to the tax authorities, with fine implications.

The main goal is to redesign Colbi’s architecture and allow it to scale horizontally and ver-tically. For now, only one machine is responsible for all the processing and control of the files.Therefore, the performance of the platform will always be dependent on the processing capac-ity of a single machine. The current system architecture cannot scale up or down, adapting thecomputational resources to the throughput and data input.

As part of the reengineering process of the actual platform, we will find ways to parallelizethe system with distribution of tasks aiming for a microservice oriented architecture. The platformmust be able to scale in order to automatically allocate computational resources, either by allo-cating more or reducing the number of machines needed for the processing, always consideringa deploy-on-demand scenario. In order to evaluate the obtained results, a direct comparison ofthe execution time of the two architectures will be made, either in a real world scenario, or in anextreme scenario of multiple file analysis. In this way, it will be possible to observe the behaviorand response of the new architecture.

i

Resumo

Em 2008, Portugal adotou o SAF-T (Ficheiro de Auditoria Padrão para Imposto). O SAF-T éuma diretriz da OCDE (Organização para a Cooperação e Desenvolvimento Económico) parao intercâmbio eletrónico (basicamente, um arquivo .XML) de contabilidade confiável, vendas edistribuição de dados dentro de organizações ou reguladores (Autoridades Fiscais, Contabilistas,Bancos Centrais, Institutos Estatísticos).

A Petapilot desenvolveu o Colbi de forma a fornecer uma maneira fácil das empresas verifi-carem a sua situação financeira e garantir a conformidade com a autoridade tributária com recursoá análise do ficheiro SAF-T. O Colbi é um produto desenvolvido para lidar e fornecer essa con-formidade. O principal problema está relacionado com a escalabilidade automática e o provision-amento da plataforma. A plataforma está a tornar-se mais lenta devido ao aumento do númerode empresas e ao tamanho dos documentos SAF-T. Esta plataforma deve ser capaz de respon-der a todos os documentos submetidos (muitas vezes com bilhões de transações) e estar sempredisponível. O nível do serviço extremamente importante, pois pode implicar um atraso na apre-sentação das informações de uma empresa às autoridades fiscais, com implicação de multas.

O objetivo principal do projeto é redesenhar a arquitetura da Colbi de forma a que possaescalar horizontalmente e verticalmente. Por enquanto, apenas uma máquina é responsável portodo o processamento e controle dos ficheiros. Portanto, o desempenho da plataforma dependerásempre da capacidade de processamento de uma única máquina. A arquitetura atual do sistemanão consegue escalar e adaptar os recursos computacionais à taxa de transferência e entrada dedados.

Como parte do processo de reengenharia da plataforma atual, encontraram-se formas de par-alelizar o sistema com a distribuição de tarefas, apontando para uma arquitetura orientada a mi-croserviços. A plataforma foi desenvolvida de forma a permitir alocar automaticamente recursoscomputacionais, seja alocando ou reduzindo o número de máquinas disponíveis para o proces-samento, considerando-se sempre um possível cenário de deploy-on-demand. Para avaliar os re-sultados obtidos, será feita uma comparação direta do tempo de execução das duas arquiteturas,quer num cenário de mundo real, ou num cenário extremo de análise e submissão de múltiplosficheiros. Desta forma, será possível observar o comportamento e a resposta da nova arquitetura.

iii

Acknowledgements

First of all, I would like to thank António Pimenta Monteiro for supervising this thesis with all itsvaluable input and help when I needed it most. Secondly, I would like to thank the CEO of Petapi-lot Valter Pinho for the challenge, as well as all the help he gave me during the accomplishment ofit. He always believed in me and my abilities to take this project forward. For all of this, I cannotthank him enough the commitment and dedication for making this dissertation not only a morereachable goal, but also an objective of his own.

To the entire Petapilot team, I want to acknowledge the valuable help as well. A special men-tion must be made to Diogo Bastos and Daniel Carvalho who helped me moving this dissertationforward and never let me down when I most needed.

I am deeply grateful to my grandparents, parents and sister Inês, for always supporting me inthe most difficult moments. They dedicated their whole life prioritizing my education and futureover theirs. From the bottom of my heart, thank you for supporting me unconditionally.

A special thanks to my girlfriend Maria Guedes for being a fundamental part of my life andfor always motivating me to be a better and happier person.

To all my close friends who over the years have helped and contributed to make me grow as aperson and being as I am today: Domingos Alexandrino Fernandes, Guilherme Pinto, Luís Duarte,Diogo Moura, Flávio Couto, Pedro Castro, Miguel Botelho, David Baião, João Silva and SérgioDomingues.

To all these people, staff at FEUP and particularly those involved in my masters degree: mysincerest thanks.

Daniel Silva Reis

v

“I’d rather attempt to do something great and failthan to attempt to do nothing and succeed.”

Robert H. Schuller

vii

Contents

1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Structure and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Microservice Architecture 52.1 Enterprise Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Monolithic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Microservices Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Monolithic Architecture vs Microservices . . . . . . . . . . . . . . . . . . . . . 92.3 Service-Oriented Architecture vs Microservices . . . . . . . . . . . . . . . . . . 142.4 Migration from Monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Microservices Showcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.1 SoundCloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.2 Gilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Technology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6.1 RabbitMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6.2 Apache Ignite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.6.3 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Colbi Architecture 293.1 General Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Colbi Core Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Architecture and Pipeline Review . . . . . . . . . . . . . . . . . . . . . . . . . 313.4 Possible Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Scalable Colbi Architecture 354.1 Microservice Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Architecture Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

ix

CONTENTS

5 Colbi Exchanger 435.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2 Message Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.3 Microservice Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.4 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.5 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Colbi Cache 496.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.3 Locking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.4 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 Implementation Tests and Results 537.1 Evaluation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537.2 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547.3 Benchmark Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.3.1 Phase One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.3.2 Phase Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8 Conclusions and Future Work 778.1 Conclusion and Expected Results . . . . . . . . . . . . . . . . . . . . . . . . . . 778.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

References 79

x

List of Figures

2.1 Classic Monolithic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 SOA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Microservices Architecture [Ara] . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Monoliths vs Microservices [Mar] . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Organization in a monolithic application 1 [Mar] . . . . . . . . . . . . . . . . . 112.6 Organization in a microservices application 2 [Mar] . . . . . . . . . . . . . . . . 122.7 Data management comparison [Mar] . . . . . . . . . . . . . . . . . . . . . . . . 132.8 SoundCloud Monolithic Architecture [Phia] . . . . . . . . . . . . . . . . . . . . 172.9 Component view of SoundCloud’s monolithic architecture [Phia] . . . . . . . . . 182.10 SoundCloud component’s isolation architecture change 1 [Phia] . . . . . . . . . 182.11 SoundCloud component’s isolation architecture change 2 [Phia] . . . . . . . . . 192.12 SoundCloud component’s isolation architecture change 3 [Phia] . . . . . . . . . 202.13 Gilt flash-sales chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.14 Gilt Architectural Evolution [Eme] . . . . . . . . . . . . . . . . . . . . . . . . . 222.15 RabbitMQ Standard Message Flow . . . . . . . . . . . . . . . . . . . . . . . . . 252.16 Apache Ignite Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Colbi Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Colbi Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Flow Pipeline execution loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1 Colbi Scalable and Distributed Architecture . . . . . . . . . . . . . . . . . . . . 364.2 Colbi New Flow Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3 Sequential Rules Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Parallel Rules execution 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Parallel Rules execution 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.6 Parallel Rules execution 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.7 Parallel Rules execution 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1 Colbi Exchange Call types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2 Message Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.3 Fault Tolerance with Colbi Exchanger . . . . . . . . . . . . . . . . . . . . . . . 47

6.1 Colbi Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.1 Total processing time of a single file . . . . . . . . . . . . . . . . . . . . . . . . 547.2 Phase 1, Test1 - The processing time of each flow on both architectures with a

small file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

xi

LIST OF FIGURES

7.3 Phase 1, Test 1 - The processing time of each flow on both architectures with amedium file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.4 Phase 1, Test 1 - The processing time of each flow on both architectures with alarge file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.5 Phase 1, Test 2 - Average processing time of flows on both architectures with smallfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.6 Phase 1, Test 2 - Average processing time of flows on both architectures withmedium files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.7 Phase 1, Test 2 - Average processing time of flows on both architectures with largefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.8 Phase 1, Test 3 - Average processing time of flows on both architectures with smallfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.9 Phase 1, Test 3 - Average processing time of flows on both architectures withmedium files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.10 Phase 1, Test 3 - Average processing time of flows on both architectures with largefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.11 Phase 2, Test 1 - Processing time of flows on both architectures with a small file . 657.12 Phase 2, Test 1 - Processing time of flows on both architectures with a medium file 667.13 Phase 2, Test 1 - Processing time of flows on both architectures with a large file . 677.14 Phase 2, Test 2 - Processing time of flows on both architectures with small files . 687.15 Phase 2, Test 2 - Processing time of flows on both architectures with medium files 697.16 Phase 2, Test 2 - Processing time of flows on both architectures with large files . 707.17 Phase 2, Test 3 - Processing time of flows on both architectures with small files . 717.18 Phase 2, Test 3 - Processing time of flows on both architectures with medium files 727.19 Phase 2, Test 3 - Processing time of flows on both architectures with large files . 737.20 Overall System Performance Gain . . . . . . . . . . . . . . . . . . . . . . . . . 757.21 Architectures Global Processing Time - 1 File . . . . . . . . . . . . . . . . . . . 757.22 Architectures Global Processing Time - 81 File . . . . . . . . . . . . . . . . . . 767.23 Architectures Global Processing Time - 243 File . . . . . . . . . . . . . . . . . . 76

xii

List of Tables

2.1 SOA vs MSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Comparison between enterprise architectural styles . . . . . . . . . . . . . . . . 15

7.1 File types and sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.2 Number of files per type and size for the load tests . . . . . . . . . . . . . . . . . 557.3 Phase 1, Test 1 - Small file flow and delay times . . . . . . . . . . . . . . . . . . 567.4 Phase 1, Test 1 - Medium file flow and delay times . . . . . . . . . . . . . . . . 577.5 Phase 1, Test 1 - Large file flow and delay times . . . . . . . . . . . . . . . . . . 587.6 Phase 1, Test 2 - Small files average flow and delay times . . . . . . . . . . . . . 597.7 Phase 1, Test 2 - Medium files average flow and delay times . . . . . . . . . . . 607.8 Phase 1, Test 2 - Large files average flow and delay times . . . . . . . . . . . . . 617.9 Phase 1, Test 3 - Small files average flow and delay times . . . . . . . . . . . . . 627.10 Phase 1, Test 3 - Medium files average flow and delay times . . . . . . . . . . . 637.11 Phase 1, Test 3 - Large files average flow and delay times . . . . . . . . . . . . . 647.12 Phase 2, Test 1 - Small file flow and delay time . . . . . . . . . . . . . . . . . . 657.13 Phase 2, Test 1 - Medium file flow and delay time . . . . . . . . . . . . . . . . . 667.14 Phase 2, Test 1 - Large file flow and delay time . . . . . . . . . . . . . . . . . . 677.15 Phase 2, Test 2 - Small file average flow and delay times . . . . . . . . . . . . . 687.16 Phase 2, Test 2 - Medium file average flow and delay times . . . . . . . . . . . . 697.17 Phase 2, Test 2 - Large file average flow and delay times . . . . . . . . . . . . . 707.18 Phase 2, Test 3 - Small file average flow and delay times . . . . . . . . . . . . . 717.19 Phase 2, Test 3 - Medium file average flow and delay times . . . . . . . . . . . . 727.20 Phase 2, Test 3 - Large file average flow and delay times . . . . . . . . . . . . . 73

xiii

LIST OF TABLES

xiv

Abbreviations

AMQP Advanced Message Queuing ProtocolAPI Application Programming InterfaceAWS Amazon Web ServicesCOLBI Collaborative Business IntelligenceCSV Comma-separated valuesESB Enterprise Service BusHTTP Hypertext Transfer ProtocolJSON JavaScript Object NotationJVM Java Virtual MachineKPI Key Performance IndicatorLE Large EnterprisesLOSA Lots of Small ApplicationsMSA Microservices ArchitectureNFS Network File SystemOECD Organization for Economic Cooperation and DevelopmentSaaS Software as a ServiceSAF-T Standard Audit File for Tax PurposesSAFT-PT Standard Audit File for Tax Purposes - Portugal VersionSLA Service-level AgreementSMEs Small to Medium EnterprisesSOA Service-Oriented ArchitectureSQL Structured Query LanguageVP Vice PresidentXML eXtensible Markup LanguageYAML Yet Another Markup LanguageWWW World Wide Web

xv

Chapter 1

Introduction

1.1 Context

The Standard Audit File for Tax Purposes (SAF-T) is an electronic file format for tax purposes.

It’s a XML document regulated by international standards, defined by the OECD and adopted in

several European countries.

It allows the collection of periodic fiscal data such as taxes, commercial, financial and ac-

counting of an organization. Every company that carries out commercial transactions, is obliged

to communicate its monthly invoicing to the Tax Authority [Jas].

SAF-T has been designed to allow auditors access to data in an easily readable format for

substantive testing of system controls and data, using proprietary audit software, as part of a

methodology that provides increased effectiveness and productivity in computer-assisted audit.

It’s intended to be suitable for use by businesses and their auditors across the scale from SME’s

to LE’s, with multiple branches and locations, although there may be some differences in its ap-

plication. With the SAFT file, it is possible to know if a business has paid the correct tax at the

right time, in accordance with tax legislation. It facilitates the extraction and processing of infor-

mation, avoiding the need to specialize auditors in the various systems, simplifying procedures

[For10, Jas].

Mainly, the Portuguese SAFT-PT can be related to the accounting transactions and sales trans-

actions (two file formats and a third with the integration of the two data sets). Since 2008, the

communication of this file to the Tax Authority has become mandatory because it facilitates the

inspections and audits done to the companies as well the fight against tax fraud [Jas].

In Portugal, PetaPilot (a tech startup company) emerged from the vision of creating a tech-

nology for business with a high potential for internationalization. The main goal of this company

is the development of technological products and platforms for data analysis of high variety and

volume with focus on Business Intelligence, Big Data, Cloud Computing and fraud detection so-

lutions, operating in the corporate, governmental and institutional market [Sta].

1

Introduction

“Our team has grown motivated by the challenge to provide to the companies, efficient

and innovative ways to use their data and information. Today, we deliver products and

solutions to the companies, government, regulators and institutions to analyze their

data, providing better decisions, infer fraud, diminish risk and ensure compliance.

We use Big Data and high-end technologies on our platforms to process data at a

massive scale.”

Valter Pinho, CEO

The main product of PetaPilot is Colbi (Collaborative Business Intelligence and Audit). Colbi

is an analytical tool of commercial and financial information and serves a number of sectors rang-

ing from government financial regulation, consultancy companies, industry, distribution and ser-

vices. Companies in Portugal and abroad, such as the Portuguese and the Lithuanian Tax Author-

ities, Sheraton, Volkswagen, Omya, NOS, Saint-Gobain, BDO, Delta, Barbosa and Almeida (BA

Vidro), SUCH, among others, are using this solution for Data Analysis, Decision Support, Audit

Inference and fraud detection [Pet].

This platform is something that needs to be efficient and scalable due to its importance in the

industry so it also has to be always available to the companies. Having a platform with these

characteristics is difficult and a constant challenge. Distributed systems such as SOA (Service-

Oriented Architecture) and Microservices address some of these challenges but companies need to

implement these solutions in their services in a custom way, in order to archive levels of excellence

[Ima].

1.2 Problem

According to Colbi’s current architecture, this platform has some limitations on its scalability.

It is a platform that was built on a monolithic architecture whereby all the functionalities of the

application are built on a single process. In the long term, this becomes a problem because Colbi

is a platform that has to be always available and respond to all requests made by users in a timely

manner. As more and more companies use this platform, there is a growing overhead on the

machine in both volume and file size. In this way, the machine is getting slower and struggle to

run all the processing and analysis required by customers. With such architecture we can only

scale Colbi horizontally by running several application instances behind load-balancer, but this is

not a durable solution for the company needs [Mar].

1.3 Motivations and Objectives

Migrating a monolithic architecture to microservices is not an easy task. It is necessary to have

an overview of the system operation, as well as all the components and dependencies that it holds.

This work aims to implement a very current and extremely successful architecture in the industry.

The construction of a fast and efficient system with the distribution and parallelization of tasks

2

Introduction

is a very attractive and constant challenge for the company, since the migration of a monolithic

application to a microservice-oriented architecture brings advantages and disadvantages.

1.4 Challenges

Petapilot is a fast-growing company that needs to be able to rapidly adapt to the market needs.

The implementation of a new architecture is a goal with many challenges not only for developers

but also for the company. It is a stable product and, over time has gained maturity due to the

implementation of new functionalities. The product is distributed in several customers either in

the Cloud, or on-premises. This way, it is always necessary to ensure that the SLAs that have been

defined, are never broken. In addition, any development in the product must always ensure com-

patibility with previous versions. Any structural change in the product must always guarantee the

data consistency. Therefore, all changes made to the business rules must be thoroughly analyzed

and tested before making sure the product is ready to be distributed. In addition, the location and

environment where the product is installed, is bounded with rules and regulations. This way, it

must be ensured that the product is easy to integrate and deploy in the target environment being

compliant with those rules.

1.5 Structure and Planning

This document begins by presenting a literature review in chapter 2, and a more detailed overview

of the current platform architecture in chapter 3, where the work unfolds. The chapters 4 spec-

ifies the new architecture and explains how it behaves in the system. In chapters 5 and 6, it is

detailed the operation and implementation of two libraries that supported the developed architec-

ture in chapter 4. Chapter 7 contemplates a set of tests that were done to the architecture as well

their results. Finally, chapter 8, presents the conclusions of the developed work as well as the

identification of possible future work.

3

Introduction

4

Chapter 2

Microservice Architecture

Microservices is a buzzword and a fast-paced topic, although the idea or the term itself are not

new in industry. What makes it relevant, are the different experiences from people all over the

world, along with the emergence of new technologies [Sam]. This is having a profound effect

on how microservices are used. Business houses are no longer interested in developing large

applications to manage their end-to-end business functions as they did a few years ago. They rather

opt for quick and agile applications which cost them less money as well [Lok]. Implementations

of microservices have roots in complex-adaptive theory, service design, technology evolution,

domain-driven design, dependency thinking, promise theory, and other backgrounds. They all

come together to allow the people of an organization to truly exhibit agile, responsive, learning

behaviors to stay competitive in a fast-evolving business world [Pos].

“ "Microservices" became the hot term in 2014, attracting lots of attention as a new

way to think about structuring applications. I’d come across this style several years

earlier, talking with my contacts both in ThoughtWorks and beyond. It’s a style that

many good people find is an effective way to work with a significant class of systems.

But to gain any benefit from microservice thinking, you have to understand what it is,

how to do it, and why you should usually do something else.”

Martin Fowler

5


2.1 Enterprise Application Architecture

In order to understand the full benefits of a migration to a microservice oriented architecture, it is

important to understand how enterprise business applications have evolved over time. Although

microservices are a good solution and the future of enterprise applications, they are not the only

solution. It is necessary to take into account the type of project and its scale in order to better

understand what kind of architectural pattern may be the most useful for the project in question.

2.1.1 Monolithic Architecture

Monolithic applications are built as a single unit that bundle together all the functionalities needed

by the architecture. At the architectural level, its the simplest form of architecture because it

involves less actors than other architectural styles [Ale] . Normally, a tiered approach is taken

with a back-end store, middle-tier business logic, and a front-end user interface [Micb].

Figure 2.1: Classic Monolithic Architecture

In figure 2.1, we can observe a classic example of this type of architectural implementation.

This kind of monolithic approach is very common in organizations and some of them enjoy good

enough results, whereas others encounter some limitations [Ces]. Many things have changed over

the last few years. Some developers designed their applications in this model because the tools and

infrastructure were too difficult to build SOAs, and they did not see the need until the application

grew. But most recently, developers are building distributed applications that are for the cloud and

driven by the business [Ant, Ces].

6


2.1.2 Service-Oriented Architecture

Service-oriented architecture (SOA) is an approach that emerged from the need of modern en-

terprises to effectively and quickly respond to the today’s ever more competitiveness and global

markets. Is an architectural style in which applications consist of autonomous, interoperable,

and reusable services, usually implemented as Web services. Software resources are packaged

as "services", which are well defined and self-contained modules that provide standard business

functionality and are independent of the state or context of other services. Services can assume

different roles based on the context they are used. The two main roles in SOA are the service

provider and the service consumer. The service provider defines a service description, and pub-

lishes it to a client (or service discovery agency) through which the service description with the

service capabilities is advertised and made discoverable. The service consumer client (or service

requester) discovers a service (endpoint) and retrieves the service description directly from the

service. Services can act in both roles if they are intermediaries in that route and process messages

or they are service compositions and need to call other services to complete some given sub-tasks.

When comparing SOA with monolithic architectures, is common units of processing logic to

be solution-agnostic. This allows loose-coupling and reusability of services. But, processing SOA

is highly distributed and services can be spread across the server as required. This helps dealing

with different system performance demands. Service communication can be asynchronous and

this promotes stateless and autonomous nature of the services. On the other hand, SOA suffers

some performance overhead due to the introduction of layers of data processing. Derived from the

distributed aspect of SOA, application security becomes more complicated than in the monolithic

architecture. There are also some other pitfalls when adopting SOA. Most of them are related to

limited understanding of what SOA is and how to correctly use it to accomplish given objectives

[PV].

Figure 2.2: SOA Architecture

In the figure 2.2, we can observe a well known implementation of SOA named Enterprise

Service Bus (ESB). The ESB acts as the communication center in SOA providing an integration

7


layer for services. The reduced dependence between services, since they only communicate with

the service bus that is responsible for forwarding the requests to the right destination, constitutes

a major benefit. It is very useful when a system consists in a large number of services where

the management of connections point-to-point would become a nightmare. On the other hand,

communication through ESB may introduces overhead on the service calls which may eventually

lead to a bottleneck [PV, Ima].

2.1.3 Microservices Architecture

Microservices architecture style is build around business functionality rather than technology like

UI (User Interface), middleware and database. Is an approach where it is emphasized the devel-

opment of single application as a suit of small services, each one running in their own process,

communicating through lightweight mechanisms that are often HTTP resource APIs. A microser-

vice is characterized for being a self-contained unit which represents one module’s end to end

functionality. As we can observe in the figure 2.3, one application can be a collection of one or

more modules. Consequently, a bunch of microservices working together, represents an applica-

tion. It is recommended to have a database for each service (where the business’s logic is). As

services are built around the business capabilities, they are independently deployable with fully

automated deployment machinery. These services can be written in different programming lan-

guages and use several different data storage technologies. This is possible because there is a

minimum of centralized management of these services [Mar, Ara].

Figure 2.3: Microservices Architecture [Ara]

Some of the main challenges of microservices architecture approach are deciding when it

makes sense to use them, and how to partition the application into microservices. The developers

need to understand the system requirements and if the system may benefit from service decen-

tralization because developing elaborated distributed architectures will slow down development

and introduce complexity. Develop complex applications is inherently difficult. Opinions dif-

fer from starting developing over the monolithic approach and others don’t recommend starting

8


with the monolithic. In any case, since the monolithic architecture is the basis for microservices

architecture, it is important to understand how the monolithic architecture works [Chr, Ant].

2.2 Monolithic Architecture vs Microservices

After briefly introducing the monolithic and microservice oriented architectures in sections 2.1.1

and 2.1.3 respectively, it is now important to review the differences between this two architectures

in a more thorough and careful manner, highlighting their advantages and disadvantages.

Review in the section 2.1.1, a monolithic architecture consists in a single application layer

which supports the user interface, business rules and data manipulation [Mica]. In figure 2.1, we

saw a classic example of a monolithic application where we have an enterprise application built in

three parts: a client-side user interface, a database and a server-side application. The server-side

application handles the client requests, executes the business logic, retrieves and updates the data

in the database and select and populate the views to be sent to the client. This is a monolithic

application since there is only a single logical executable. If there is any change in the system, it

is required to build and deploy a new version of the server-side application.

The monolithic architecture is the most natural approach for developers to build such sys-

tem. It is easier to use the language features to divide the application into classes, functions and

namespaces keeping all the application logic for handling request in a single process. Application

testing also becomes easier because developers can test the application in their laptops, and use a

deployment pipeline, making sure that the changes are properly tested and deployed into produc-

tion. Monolithic applications can be successful but we can only scale them horizontally behind

a load-balancer. Over time it becomes frustrating, especially with the deployment of more and

more applications to the cloud. This because a change in a small part of the application requires

a rebuild and deployment of the whole monolith. In the course of time, it is very hard to keep a

good modular structure making hard to keep the changes that just affect one module. In order to

scale, we need to scale the entire application instead of parts of it.

9


Figure 2.4: Monoliths vs Microservices [Mar]

With the fact that services are independently deployed and scalable, each service has a firm

boundary allowing the possibility of different services being written in different programming

languages and managed by different teams.

The microservices architecture style is defined by a set of features that differentiate it from the

monolithic architecture and highlight its benefits [Mar]:

• Componetization via Services• Organized around Business Capabilities• Products not Projects• Smart endpoints and dumb pipes• Decentralized Governance• Decentralized Data Management• Infrastructure Automation• Design for failure

Componetization via Services

Normally the desire of developers is building reusable code. They archive this by developing

common libraries as part of their software that are integrated into microservices. The application

componetization results from breaking down of the application into smaller services. Services

as components will provide the application with the ability of being independently deployable

rather than libraries. With this, we do not need to redeploy the entire application when a change is

performed.

10


Organized around Business CapabilitiesWhen splitting large application into parts, it normally tends to divide it depending on the

organizations structure following Conway’s Law.

“ Any organization that designs a system (defined broadly) will produce a design

whose structure is a copy of the organization’s communication structure.”

Melvyn Conway, 1967

It is possible to illustrate this law with in common organization. Normally in an organization,

there is a set of multidisciplinary teams for different technology layers like the UI team, business

logic team and database team. This kind of approach is common when developing a monolithic

application.

Figure 2.5: Organization in a monolithic application 1 [Mar]

With a microservice architecture implementation, the organization is different. It tends to

divide the large monolithic application into teams with different disciplinary knowledge. Each

team is cross-functional, including all the range of the necessary skills to implement a complete

software solution.

11


Figure 2.6: Organization in a microservices application 2 [Mar]

Products not Projects

Another common characteristic with microservices is the way how applications are developed.

In a monolithic application there is a model where the main goal is developing a piece of software.

After the development of this software, the product is then delivered and the team who developed

it is disbanded. In comparison, microservices follows a model that each team should own the

product over its full lifetime.

Smart endpoints and dumb pipes

Using microservices, applications aim to be the most decoupled and cohesive as possible fol-

lowing smart endpoints and dumb pipes, while in monolithic applications, the components are

executing in-process and communication between these components is via method invocation or

function calls. Smart endpoints and dumb pipes states that each service is the owner of his own

domain logic, applying this logic to a request, and producing a response. This responses are

choreographed using RESTful APIs.

Decentralized Governance

With monolithic architecture comes the tendency to focus on just one platform technology

due to centralized governance. Microservices technologies allow the possibility to choose the best

solution for each service and each different problem, thus decentralizing the system’s governance.

Decentralized Data Management

Decentralized data management means that the data models differ between systems. In a

monolithic architecture it is common to use a single database with many tables. This database

persists all the data and sometimes some of the application’s logic. The microservice in contrast

have their own database persistence, and sometimes a completely different database system.

12


Figure 2.7: Data management comparison [Mar]

Microservices should manage with eventual consistency using transactionless coordination

between services in comparison with monolithic architecture that use strong consistency using

transactions.

Continuous Delivery and Infrastructure Automation

Continuous delivery and infrastructure automation benefits the services by facilitating the de-

ployment of an application in production, delivering small portions of changes. This lowers the

cost of integrating new changes to a production environment compared to iterative methods.

Design for failure

As we have seen, microservices in contrast to monolithic architecture divide the application

into services and components. This implies that the software developed must be able to tolerate

failures of the service. So, it is important to implement mechanisms of detecting failures as soon

as possible because services may fail anytime. This mechanisms should be able to automatically

restore the services if the situation allows it. There is a constant need of monitoring and logging

setups for each individual service. That’s why it is important to design the application with certain

stability patterns to improve its stability. Patterns like Timeouts, Circuit Breakers and Bulkheads

[Mar].

13


2.3 Service-Oriented Architecture vs Microservices

After introducing the service-oriented in sections 2.1.2, it is now important to review the differ-

ences between this architecture and microservices. Both architectures rely on services as their

main component. Thus, services can be developed in various technologies which brings technol-

ogy diversity into the development team [Ste].

Developers must deal with the complexity of architecture and a distributed system. They must

implement the inter-service communication mechanism between microservices (if the message

queue is used in microservice architectures) or within ESB and services [Ima]. Table 2.1 presents

some of these differences between the two architectures.

SOA MSABuilt on the idea of “share-as-much-as-possible” architecture approach

Built on the idea of “share-as-little-as-possible” architecture approach

More importance on business functionalityreuse

More importance on the concept of “boundedcontext”

Common governance and standards Relaxed governance, with more focus on peo-ple collaboration and freedom of choice

Uses enterprise service bus (ESB) for com-munication

Uses less elaborate and simple messagingsystem

Supports multiple message protocols Uses lightweight protocols such asHTTP/REST & AMQP

Common platform for all services deployed toit

Application Servers not really used. Plat-forms such as Node.JS could be used

Multi-threaded with more overheads to han-dle I/O

Single-threaded usually with use of EventLoop features for non-locking I/O handling

Use of containers (Dockers, Linux Contain-ers) less popular

Containers work very well in MSA

Maximizes application service reusability More focused on decouplingUses traditional relational databases more of-ten

Uses modern, non-relational databases

A systematic change requires modifying themonolith

A systematic change is to create a new service

DevOps / Continuous Delivery is becomingpopular, but not yet mainstream

Strong focus on DevOps / Continuous Deliv-ery

Table 2.1: SOA vs MSA

2.4 Migration from Monolithic

There are some challenges that organizations will face when attempting to implement a microser-

vices architecture at scale.

14


Before embarking, it is critical that everyone has a common understanding of a microservices

ecosystem. Microservices ecosystem is a platform of services each encapsulating a business ca-

pability. A business capability represents what a business does in a particular domain to fulfill its

objectives and responsibilities.

Each microservice exposes an API that developers can discover and use in a self-serve manner.

Microservices have independent life cycle. Developers can build, test and release each microser-

vice independently.

The microservices ecosystem enforces an organizational structure of autonomous long stand-

ing teams, each responsible for one or multiple services. Usually, in this kind of organization,

there is more freedom in the development. A microservice can be structured and developed using

different languages, individual infrastructures and launching custom scripts. This can became a

problem because the organization may end up with a huge system where there are a thousand ways

to do every single thing. It may end up with hundreds or thousands of services some of which are

running, most of which are maintained, some of which are forgotten about.

Contrary to general perception and ‘micro’ in microservices, the size of each service matters

least and may vary depending on the operational maturity of the organization [TC ][Mar].

In order to better understand the difference between architectures and if the migration from

one architecture to another is justifiable, table 2.2, presents a comparison between them.

Monolith SOA + ESB MicroservicesSingle large application Several applications shar-

ing servicesSmall autonomous ser-vices

Single deployment unit Multiple units dependingon each other

Independently deployableunits

Limited clustering possi-bilities

Distributed deployment Distributed deployment

Homogeneous technolo-gies

Heterogeneous technolo-gies

Heterogeneous technolo-gies

Shared data storage Shared data storage Independent data storageSingle point of failure Single point of failure

(ESB)Resilient to failures

In-memory function calls Remote calls (throughESB)

Lightweight remote calls

Single large team Multiple teams withshared knowledge

Independent teams own-ing full lifecycle

Table 2.2: Comparison between enterprise architectural styles

15


2.5 Microservices Showcase

This chapter is an overview of some well-known companies that had to go through a process of

architecture migration. This short overview contains the reasons that led these companies to make

a change in architecture, as well as the obstacles/challenges they faced, as well what resulted from

this migration.

2.5.1 SoundCloud

SoundCloud is an online audio distribution platform that enables its users to upload, record, pro-

mote, and share their originally created sounds. This platform was a monolithic Ruby on Rails

application. The author, Phil Calçado worked at SoundCloud during this transition period. Ac-

cording to his testimony, the main reason for the company to carry out this migration was a matter

of productivity and not pure technical matters. When he joined the company, he was integrated

into the backend team, the called App team. This team was responsible for a Ruby on Rails mono-

lithic application. The App team included everything in the Rails app, including the user interface.

There was another team responsible for a single-page JavaScript web application. Teams followed

the standard practice for the time and built it as a regular client to their public API, which was

implemented in the Rails monolith. Both teams were really isolated and their only communication

was during meetings or through issue trackers and IRC. If the development process was asked to

any of the collaborators of the two teams, the answer would be something like this:

1. If there is a feature idea, someone write a couple of paragraphs and draw some mockups

and discuss it with the team.

2. Designers shape up the user experience.

3. Write the code.

4. Small testing, and after it is deployed.

But during this process, there was a lot of frustration in the air. Managers and partners com-

plained that they could never get anything done on time and engineers and developers that they

were overworked. Phil Calçado during his stay in the company was able to understand the flow

of development in the company and introduce important changes to increase the efficiency of the

process and leave managers, partners, developers and designers happier. This new process was

able to reduce development time and put all participants in the development of new features even

closer, further reducing this development time. During this process improvement, issues such as:

• Why the need for Pull Requests?Because after some years of experience, people often make silly mistakes, push the change

live and take the whole platform down for hours.

• Why do people make mistakes so often?Code base is too complex. It’s hard to keep everything in their mind.

16


• Why is the code base so complex?SoundCloud started as a very simple website. Over time it grew into a large platform with

a lot of features, various different client applications, very different types of users, sync and

async workflows, and huge scale. The code implements and contains the many components

of a now complex system.

• Why the need of a single code base to implement the many components?The monolithic already has a good deployment process and tooling, has a battle-tested ar-

chitecture against peak performance and DDoS, is easy to scale horizontally, etc.

• Why can’t we have economies of scale for multiple, smaller, systems?Uhm..

The fifth question took a bit longer to answer. After a collection of experiences from peers and

a survey, it was concluded that there would be two alternatives:

(a) Why not economies of scale for multiple, smaller, systems?Is not that it’s not possible. The thing is that it won’t be as efficient as if we keep everything

in one code base. Instead, we should build better tooling and testing around the monolith

and its developer usability.

(b) Why not economies of scale for multiple, smaller, systems?It’s possible but We will need to do some experimentation to find out what tooling and

support we need. Also, depending on how many separate systems are built, we will need to

think economies of scale as well.

Neither one of these approaches sounded obviously right or wrong. The biggest question

is how much effort each approach would require. Money and resources weren’t a problem, but

they didn’t have enough people or time to invest in anything big-bang. A strategy that could be

implemented incrementally, but start delivering value from the very beginning was mandatory.

Teams always thought that the back-end system was as simple as the figure 2.8.

Figure 2.8: SoundCloud Monolithic Architecture [Phia]

The normal mindset makes it seem obvious to implement this big box as a single monolithic

instance. But soon after a more detailed analysis, it was possible to observe that the system was

not as simple as the one of the picture above. If we opened that black-box, we would understand

that the system was more like the (very simplified) one in the picture 2.9.

17


Figure 2.9: Component view of SoundCloud’s monolithic architecture [Phia]

The system was not just a simple website, it was a platform with several components. Each

of them had their own owners and stakeholders, and independent life cycle. For example, the

subscriptions module was only built once, and would only be modified when the payment gateway

asked them to change something. But,notifications and several other modules related to growth and

retention would suffer daily changes because of the increase of users and content. This because the

different service expectations level. It would not be a problem if they could not have notifications

working for one hour, but a five minute outage in the playback module would be enough to hit

their metrics hardly. While exploring option (a), they came with the conclusion that that the only

way to make the monolith work, would be making the components explicit, both in the code and

deployment architecture.

At that code level, they needed to make sure that a change made to a single feature could be

developed in a relative isolation, not requiring them to touch code from other components. They

needed to be sure that changes would not introduce bugs or change the runtime behavior of the

system. This is an old problem in the industry, and they knew that they had to make their implicit

components explicit Bounded Contexts1, and make sure they knew what modules could depend

on others.

Using Rails engines and various other tools to implement it, would look like the figure 2.10.

Figure 2.10: SoundCloud component’s isolation architecture change 1 [Phia]

1Bounded Context is a central pattern in Domain-Driven Design. Is the focus of DDD’s strategic design sectionwhich is all about dealing with large models and teams. DDD deals with large models by dividing them into differentBounded Contexts and being explicit about their interrelationships.

18


On the deployment side, they would need to make sure that a feature could be deployed in

isolation. Pushing a change to a module to production should not require new deployment of

related modules, and if such deployment went bad and production broke, the only feature that

would suffer some kind of impact would be the new one. In order to implement this, they thought

of continuing to deploy the same artifact to all servers, but use a load-balancer to ensure that a

group of servers was responsible for only one feature, isolating all problems with that feature and

other servers 2.11.


The work to make this work would not be easy. Even though the above does not require any

kind of departure from the stack of technologies and tools, these changes would bring their risks

and issues. Even if everything went smoothly, the current code of the monolith would need to be

refactored. Their code suffered a lot during the past years. They still needed to update from Rails

2.x to 3, and by itself, is a great effort [Phia]. Those considerations led them re-consider option

(b). The team thought it wouldn’t look too different 2.12.

At the end, at least they were able to benefit from the approach from day zero. Any new

project they intended to build would become a greenfield, and the delay introduced by pull requests

wouldn’t be necessary. They gave it a try and eventually build everything required for their first

monetisation project as a service, isolated from the monolith. The project introduced several

big features and a complete revamp of their subscription model, delivering the project ahead of

deadlines. The experience was so good that they decided to keep applying this architecture for

anything new they built. Their first services were built using Clojure and JRuby, eventually moving

19


to Scala and Finagle [Phia, Phib].


20


2.5.2 Gilt

In 2015, at the Craft Conference, Adrian Trenaman (VP of engineering at Gilt.com) talked about

the architectural evolution of Gilt.com from a monolithic architecture to a cloud-based microser-

vice platform using Scala, Docker and AWS. Gilt is an online shopping and lifestyle website based

out of the United States that has successfully evolved its application architecture. This company

is specialized in flash-sales of luxury brands and lifestyle goods. Due to its flash-sales nature,

traffic on the website oscillates massively fifteen minutes before the sales start and then, it rapidly

reduces over the next two hours before returning to a new low baseline traffic. This results in the

risk of the application failing, largely depending on the time of day [Dan].

Figure 2.13: Gilt flash-sales chart

“Our customers are like a herd of bison that basically stampede the site every day at

12pm. It’s our own self-imposed denial of service attack, every day...”

Adrian Trenaman, VP of engineering

The Gilt.com website was built in 2007 using a Ruby on Rails monolithic application with a

PostgreSQL database.

With the traffic increase, a memcached layer was added, and some business capabilities in the

website moved to a series of batch processing jobs in order to try to give some more stability. In the

following four years, with the constant increase of traffic, the monolithic nature of the application

began to stress and any crash on the server, caused a complete failure of the website and supporting

business applications.

In 2011, Gilt.com introduced Java programming language and JVM (Java Virtual Machine)

into their application stack, and services based around their business functionality began being

extracted from their original monolithic architecture. During this process, the dependencies around

their original single database were not extracted as there were parts of the application that could

benefit more from greater investment.

21


Figure 2.14: Gilt Architectural Evolution [Eme]

Adrian Trenaman, during the year of 2011 described Gilt architecture as ’large, loosely-typed

JSON/HTTP services’ with exchange of data across service boundaries as a course-grained key/-

value map. With the company rapidly evolving and innovating, the development team accidentally

created a new java-based monolith in their "Swift" view service that quickly became a bottleneck.

The architecture result was a codebase in which ’some parts people cared about, and some they did

not’. Gilt needed to reorganize the teams around strategy initiatives (the so-called inverse Conway

Maneuver 2) with the main goal of quickly putting code into production. Even without an ex-

plicit architect role, a microservice-based architecture emerged called ’Lots of Small Applications

(LOSA)’ driven by Gilt’s engineering culture and values. For each team working on any initiative,

goals and key performance indicators (KPI) were set, and many other initiatives started, resulting

in the creation of 156 microservices by 2015.

When Scala running on JVM was introduced in the company technological stack, the number

of microservices grew. At this point, the average service at Gilt, consisted in 2000 lines of code and

5 source files running on three instances in production. During the period of 2011 and 2015, Gilt

decided to ’lift and shift’ the legacy application to AWS, and began deploying new microservices

into this platform. The vast majority of the services running at Gilt were running on AWS EC2

t2.micro instances 3. These kind of instances have relatively little compute power, but do offer

2Inverse Conway Maneuver: Conway’s Law asserts that organizations are constrained to produce application de-signs which are copies of their communication structures. This often leads to unintended friction points. The ’InverseConway Maneuver’ recommends evolving your team and organizational structure to promote your desired architecture.Ideally your technology architecture will display isomorphism with your business architecture [Tho].

3T2 instances are Burstable Performance Instances that provide a baseline level of CPU performance with the abilityto burst above the baseline. T2 instances receive CPU Credits continuously at a set rate depending on the instance size,accumulating CPU Credits when they are idle, and consuming CPU credits when they are active. T2 instances are agood choice for a variety of general-purpose workloads including micro-services, low-latency interactive applications,small and medium databases, virtual desktops, development, build and stage environments, code repositories, and

22


’burstable performance’.

Gilt was very positive about the microservice architecture, as it gave their organization some

of the following benefits [Dan, Eme] :

• Faster code into production due to less dependencies between teams.

• Multiple technologies/languages/frameworks support

• Graceful degradation of service

• Promotes easy innovation through ’disposable code’ - it is easy to fail and move on

But there were also a series of challenges with the implementation of this new microservice-

based LOSA architecture:

• Maintaining multiple staging environments across multiple teams and services is hard - Gilt

believe that testing in production is the best solution, for example, using ’dark canaries’

• Defining ownership of services is difficult - On Gilt, teams and departments own and main-

tain their services

• Deployment should be automated - Using Docker and AWS

• Lightweight APIs must be defined - Gilt have standardised on REST-style APIs, and are

developing ’apidoc’, which they are labelling as ’an AVRO for REST’

• Staying compliant while giving engineers full autonomy in production is challenging -

Gilt have developed ’really smart alerting’ within their ’continuous audit vault enterprise

(CAVE)’ application

• Managing the I/O explosion requires effort - some inter-service calls may be redundant,

and this is still a concern for the Gilt technical team. For example, loops are not currently

automatically detected.

• Reporting over multiple service databases is difficult - Gilt are working on using real-time

event queues to feed events into a data lake. This is currently implemented using Amazon’s

Kinesis and S3 services.

With a large monolithic service, we need to scale everything together. As we can see, Gilt.com

was unable to deal with the load being placed on it. Splitting their core parts of its system, they

were able to deal with the traffic massive oscillation and today they have more than 450 microser-

vices, each of them running in separate machines. When combining this with on-demand provi-

sioning systems like the ones provided by AWS, it is possible to even apply scaling on demand for

system pieces that need it. Following this kind of approach is even possible to application system

costs more effectively. It’s not that often that following an architectural approach can be so closely

correlated to an almost immediate cost savings effect [Sam].

Docker can help in other ways such as moving an application components to container, which

can help in the implementation of microservices. In addition, this technology helps distribute and

ship containers either in a local environment or in the Cloud.

product prototypes. The t2.micro is the lowest-cost general purpose instance type, and Free Tier eligible.

23


2.6 Technology Overview

2.6.1 RabbitMQ

RabbitMQ is an open source message broker that implements the AMQP protocol. This protocol

is an application layer protocol specification for asynchronous messaging [Joe09]. In RabbitMQ

there are two kinds of applications interacting with a messaging system: producers and consumers.

Producers are those, who send (publishes) messages to a broker, and the consumers are those

who receive messages from the broker. Usually, this programs runs on different machines and

RabbitMQ acts as a communication middleware between them [Bae17]. Producers and consumers

communicate through an exchange that can be the default or one that is defined in the settings.

Exchanges work as a channel where multiple message queues are bridged between producers and

consumers.

Messages are not published directly to a queue, instead, the producer sends messages to an

exchange. Exchanges are message routing agents, defined per virtual host within RabbitMQ. An

exchange is responsible for the routing of the messages to the different queues. An exchange

accepts messages from the producer application and routes them to message queues with help of

header attributes, bindings, and routing keys. A binding is a "link" that you set up to bind a queue

to an exchange. The routing key is a message attribute. The exchange might look at this key when

deciding how to route the message to queues (depending on exchange type). Exchanges, connec-

tions, and queues can be configured with parameters such as durable, temporary, and auto delete

upon creation. Durable exchanges will survive server restarts and will last until they are explicitly

deleted. Temporary exchanges exist until RabbitMQ is shut down. Auto-deleted exchanges are

removed once the last bound object unbound from the exchange. In RabbitMQ, there are four dif-

ferent types of exchange that route the message differently using different parameters and bindings

setups [Lov15].

1. Direct Exchange: A direct exchange delivers messages to queues based on a message

routing key. The routing key is a message attribute added into the message header by the

producer. The routing key can be seen as an "address" that the exchange is using to decide

how to route the message. A message goes to the queue(s) whose binding key exactly

matches the routing key of the message.

2. Topic Exchange: Topic exchanges route messages to queues based on wildcard matches

between the routing key and something called the routing pattern specified by the queue

binding. Messages are routed to one or many queues based on a matching between a mes-

sage routing key and this pattern.

3. Fanout Exchange: The fanout copies and routes a received message to all queues that are

bound to it regardless of routing keys or pattern matching as with direct and topic exchanges.

Keys provided will simply be ignored.

24


4. Headers Exchange: Headers exchanges route based on arguments containing headers and

optional values. Headers exchanges are very similar to topic exchanges, but it routes based

on header values instead of routing keys. A message is considered matching if the value of

the header equals the value specified upon binding.

Clients can create their own exchanges or use the predefined default exchanges, the exchanges

created when the server starts for the first time.

Figure 2.15: RabbitMQ Standard Message Flow

The standard RabbitMQ message flow represented in the figure 2.15 works as follows:

1. The producer publishes a message to the exchange.

2. The exchange receives the message and is now responsible for the routing of the message.

3. A binding has to be set up between the queue and the exchange. In this case, we have

bindings to two different queues from the exchange. The exchange routes the message in to

the queues.

4. The messages stay in the queue until they are handled by a consumer.

5. The consumer handles the message.

25


2.6.2 Apache Ignite

Apache Ignite is an in-memory durable, strongly consistent, and highly available computing plat-

form with powerful SQL, key-value, and processing API’s. Data in Ignite is either partitioned or

replicated across a cluster of multiple nodes 2.16. The option to asynchronously propagate data

to the persistence layer is an added advantage. Additionally, the ability to integrate with a variety

of databases also makes Ignite an easy choice for developers to use it for database caching. This

provides scalability and adds resiliency to the system. Ignite automatically controls how data is

partitioned; however, users can plug in their own distribution (affinity) functions and collocate

various pieces of data together for efficiency. Ignite provides a feature-rich key-value API that is

JCache (JSR-107) compliant and supports Java, C++, and.NET [Dmi17, Pra].

It’s frequently integrated into third party software or SaaS solutions that have business models

requiring the highest levels of performance and scalability. All this so that it can deliver an opti-

mal user experience or meet SLAs for web-scale applications or data-intensive Internet of Things

applications [Ale17].

The Apache Ignite native persistence uses new durable memory architecture that allows storing

and processing the most frequently accessed data and indexes both in-memory and on disk. It

evenly distributes the data across a cluster of computers in either partitioned or replicated manner

as said earlier [Sha17] .

Figure 2.16: Apache Ignite Cache

26


2.6.3 Docker

Docker is a well-known and trendy technology in the industry and highly related to microservices.

Is an open source platform for developing, shipping and running applications. Docker enables

the separation of application from the infrastructure in order to deliver software quickly. With

Docker, it is possible to manage the infrastructure in the same way we manage the applications.

Using Docker’s methodologies for shipping, testing and deploying code quickly, it is possible to

significantly reduce the delay between writing code and running it in production.

With Docker, it is possible to run an application in a isolated environment called a container.

This kind of isolation and security allow us to run several containers simultaneously on a given

host. Containers are lightweight because they don’t need the extra load of a hypervisor, but run

directly within the host machine’s kernel. This allow us to run more containers on a certain hard-

ware combination than if were using virtual machines. It is even possible to run docker containers

within host machines that are actually virtual machines [Doc].

2.7 Conclusions

Literature review is a key part of the project. It allowed not only a broad view of the project, but

also to understand how robust solutions could be developed for similar problems. This study also

helped to understand the current state of paradigms of microservice-based architectures and how

companies are responding. More and more companies make use of new technologies capable of

promoting the stability and performance of their products. Companies such as SoundCloud and

Gilt have had the need to migrate their architectures to respond to market needs. The comparison

between the different existing architectures was quite important to understand their strengths as

well as their most common advantages and uses. Docker’s technology overview was also impor-

tant because Docker and microservices are often tightly connected because of Docker’s properties

like isolation, lightweight communication, and easy deployment capabilities.

27


28

Chapter 3

Colbi Architecture

Despite Colbi’s architecture being quite simple, the transformation processes and flows that a

SAF-T file goes through have some complexity. This complexity is mainly due to business rules

inherent to the product. Therefore, it is necessary to make an overview of all these flows in order

to understand how some of them could be parallelized and distributed in a new architecture.

3.1 General Architecture Overview

Figure 3.1: Colbi Architecture

The figure 3.1 represents the architecture of the product that is currently in production. The

Ruby on Rails server makes all the interaction between the client and the product. Whenever

a new file is submitted, it is responsible for placing it in Colbi’s file system, and registering it

in the databases. Once the upload is complete, the file is ready to be consumed by the Colbi

Core application. The Colbi Core application makes the entire flow of XML structural validation

and file transformation into a more intelligible format. It elaborates these analyses through the

29

Colbi Architecture

application of fiscal, accounting and auditing rules. All this data is stored at every step in both

the Colbi databases and file system. These flows will be discussed in more detail on section 3.2.

The Colbi file system is an NFS mount that is accessible by both the Ruby server and Core. It

consists of three folders: the Inbox stores the files that are ready to be consumed by Colbi Core,

the Outbox stores intermediate files in CSV format, and the Processed Data stores the original

files submitted by the clients.

3.2 Colbi Core Pipeline

Figure 3.2: Colbi Architecture

The application of Colbi Core during its analysis and processing, follows the flow pipeline of

figure 3.2. From the submission of a SAF-T file to the final product that corresponds to information

with value, the file go through eight different flows:

• START: In this flow, there is an active reading of the Inbox folder to fetch SAF-T files to

process. When it finds files, it starts parsing and transforming them into CSV documents

and stores them in the Outbox folder. After transforming the XML file into CSV, the original

file is stored in the Processed Data folder.

• SETUP: Injection of the previously generated CSV documents into the database.

• KPI FILE: A set of rules are applied to the previously injected CSV files. These rules have

as their main purpose measure and evaluate how effectively a company is achieving key

business objectives.

• RULES FILE: It is very similar to the flow described in figure 3.2 with the biggest dif-

ference being the type of rules it applies. In this flow, fiscal, accounting, auditing and

business-specific rules are applied in order to find fraud or inconsistent data.

• MERGE: Previously created and stored data is combined into a global repository. The

repository holds all the client information and previous generated data from older SAF-T

submissions.

30

Colbi Architecture

• KPI REPO: Global client KPI’s are calculated using all the information present in the

repository. This allows the client to have a global overview of the business objectives.

• RULES REPO: rules identical to Rules File 3.2 are applied, but this time on the global

repository.

• FINAL: file processing finishes and all the client views are updated.

3.3 Architecture and Pipeline Review

According to section 3.1, it is possible to observe that with the current architecture, the platform

tends to become slower due to its monolithic base. As there is a trend towards an increase in the

number of companies and SAF-T files submissions, the machine responsible for this processing

cannot scale horizontally and withstand this demand. This problem is fundamentally due to how

the pipeline of section 3.2 is triggered and the lack of use of machine resources. This problem is of

considerable relevance because if the platform is down, it can force companies to delay submitting

their files to tax authorities, delay an emergent decision or internal company analysis which may

eventually lead to profit loss. Therefore, the performance of the platform cannot be dependent on

the processing capacity of a single processing machine. When analyzing the trigger of the flows,

it was observed that there was a master thread responsible for the beginning of each of the flows

as we can see in the figure 3.3.

Figure 3.3: Flow Pipeline execution loop

31

Colbi Architecture

The ’for’ cycle is the basis of the entire product and runs uninterruptedly over the lifetime of

the application. At each iteration of the cycle, the flow attempts to gather data to process through

calls to the database. If there is no data to process, it proceeds to the next flow. If there is data

to process, it begins its processing through a set of actions and tasks that are associated with it.

These tasks and actions are defined in YAML files as represented in the YAML file 3.1.

The execution of the actions is purely sequential and when its execution ends, the cycle imme-

diately moves on to its next iteration. Tasks are dispatched to another thread that executes them

asynchronously from the rest of the flows. In spite of this synchronism, the execution of a flow

associated with a file only moves to the next flow, at the end of the tasks and associated actions.

In this way, it is possible to observe that during the execution of the various flows, there is a delay

of other flows that are ready to be executed because the cycle is trapped in the actions of another

flow.

1 name: impor t_work f low2 t a s k s :3 − name: p a r s e4 − ta sk : P a r s e5 e v e n t s :6 s u c c e s s :7 − name: merge8 ta sk : Merge9 parameters :

10 adapter : o r a c l e11 e v e n t s :12 s u c c e s s :13 − name: c a l c u l a t e _ k p i14 ta sk : KPI15 − name: c a l c u l a t e _ t r i a l _ b a l a n c e16 ta sk : T r i a l Ba l ance

Listing 3.1: Set of tasks of a flow defined in a Yaml file

Regarding tasks, there are times when they can be performed concurrently. Whenever two or

more tasks of an associated flow are at the same level, they are performed concurrently in order

to optimize the flow run time. Given the way in which actions and tasks are executed, it is not

possible to perform great optimizations because of business logic inherent to the product.

3.4 Possible Optimizations

After analyzing the entire architecture and flow pipeline, although most flows, tasks, and actions

are fairly optimized, there are flows that can run in parallel to improve product performance.

The execution of Rules and KPI either at the file level or in the repository, do not present any

kind of dependency so that they can be executed in parallel. The SETUP flow can be optimized

32

Colbi Architecture

as well since CSV loading can be performed in parallel. A more detailed explanation on how to

optimize these flows will be presented in section 4.5 of chapter 4.

At the trigger level of the flows, the way they are triggered by the ’for’ cycle, is not exactly

the most efficient for this type of application. This type of trigger causes delays not only in the

processing time of a file, but also at a global level in files that are already running in other flows.

33

Colbi Architecture

34

Chapter 4

Scalable Colbi Architecture

In order to solve Colbi problems of scalability and fault tolerance, a new architecture based on

microservices was envisioned, design and implemented. Each microservice has a specific respon-

sibility, and communicate with others thought messages. Therefore, all the possible microservices

knowing that all business logic is distributed by a set of flows responsible for a specific part of

processing, each ’flow’ was defined has a microservice. Figure 4.1 represents the new architecture

and the various components that constitute it.

The biggest difference of this architecture with the original one, is at the level of units capa-

ble of performing processing. It is possible to have several machines performing the same type

of work either in a monolithic structure where the flows execute sequentially or in a distributed

environment where flows execute in parallel.

4.1 Microservice Components

As mentioned earlier, each microservice is an isolated processing unit specialized in performing a

particular behavior. In this new implementation, each microservice consists of three main compo-

nents:

• Colbi Exchanger: component developed for the project that will be discussed in greater

detail in chapter 5. Its main purpose is to enable communication through the exchange, of

messages between microservices.

• Colbi Cache: component developed for the project and responsible for storing the overall

processing states of a file. In chapter 6, we will discuss this component in more detail.

• Logic Implementation: component developed from the originally implemented product

that contains a set of functions and classes. These classes encapsulate all business logic into

a flow/behavior of each microservice

35


4.2 Architecture Description

As in the originally developed product, the client performs the SAF-T file submission through the

online platform. As stated in chapter 3, its an application developed in Ruby on Rails that holds

the frontend interface and uploads new files to a shared storage system.

Once uploaded, a message with submission metadata is sent to a RabbitMQ cluster that redi-

rects the message to a specific microservice. As there are several deployed microservices in the

system, a message is redirected to microservices that are configured to receive such messages.

During the processing of the message, the microservice performs a set of operations according to

the business rules. Writing and reading operations are carried out in databases, as well as transfor-

mation of the file into other formats.

When the component finishes processing the message, it performs two operations: saving the

processing state of the file in Colbi Cache, and sending a message back to the RabbitMQ cluster.

Since there can be multiple microservices responsible for performing the same type of flow, the

RabbitMQ cluster does a Round Robbin distribution in order to balance the load between microser-

vices. This entire process is repeated throughout the file’s processing flow until its processing is

completed.

Figure 4.1: Colbi Scalable and Distributed Architecture

36


4.3 Fault Tolerance

In case of unexpected failures, the new architecture is prepared to resist and not lose vital process-

ing information.

Since communication between microservices is through messages, they always exist in the

system and are only consumed at the end of microservice processing. In this way, since there can

be several components of the same type in the system, when one of them fails, the RabbitMQ

cluster detects this failure (through an heartbeat protocol) and forwards the message to another

microservice capable of performing the processing.

The overall processing state of a file is also subject to fault tolerance. Because it exhibits

shared cache behavior, the processing state of a file is always shared among all microservices in

the system, so the state of the file is never lost.

If there is a message in the system that can not be processed, the microservice rejects the

message and redirects it to the cluster. If there is another microservice in the cluster capable of

processing the message, it tries to consume it. At the end of a certain number of attempts defined

in the microservice settings, the message is deleted from the system and reported to the client that

the submitted file has failed to process.

4.4 Scalability

The original architecture was only possible to scale vertically. In this new implementation, scaling

can be not only vertical but also horizontal.

Since all the flows are now encapsulated through microservices, it is possible to instantiate

several microservices of the same type at any moment, considering a performance ratio between

the number of web-services and the machines resources.

The implementation of each microservice is something dynamic and configurable. A microser-

vice can be running only one flow, several flows of the same type, or several different flows. It

is highly configurable so it is possible to adjust point by point accordingly to each installation’s

needs.

At development level, whenever it is necessary to make changes to a microservice, either

replace it by another one or changing its flow, it is possible to do it without great effort. This

because each microservice is implemented on an interface defined in the Colbi Exchanger. Chapter

5 approaches this topic in more detail. When a microservice is changed, it has some impact in the

system. Simply because the microservice needs to be restarted, deployed and it may cause a small

delay in the overall processing of the files.

4.5 Optimizations

During the implementation of the new architecture, in the process of code isolation, it was detected

that there were flows that could be optimized. These optimizations are based on the introduction

37


of parallelism in the system in order to take advantage of the available computational resources.

The zones where it was possible to introduce optimizations were in the pipeline of flows, Setup

flow and Rules flow.

Figure 4.2: Colbi New Flow Pipeline

At the level of the flows pipeline, it was detected that there were flows that could be exe-

cuted concurrently because they did not have any type of dependencies between them, nor with

the business rules of the product. The flows that could undergo parallelization were KPI_FILE,

RULES_FILE, KPI_REPO, and RULES_REPO. In the figure 4.2 the new flow of the applica-

tion is represented. It is largely identical to the original flow as specified in the section 3.2.

On the Setup flow, the loading of the generated entities from the CSV files, could be done in

parallel. Since each CSV is related to a unique table, loading several files in the same time won’t

cause any concurrency event in the database. As it is a process done on the database side, it is

possible to accelerate the load, by running several processes of entities loading.

After the SETUP flow, two different microservices can be triggered at the same time. This

happens because a message is directed to two different microservices.

Now, some microservices are able to perform the flows associated with a file processing, in

a parallel and distributed way. When KPI_FILE and RULES_FILE microservices end their

processing, the last one to complete the processing accesses the global process state of the file,

and checks if all the dependencies are complete. If everything is complete, it sends a message to

the Merge flow and it starts processing its flow. This is possible because these two flows behave

in a non concurrent way - they select data from the SAF-T tables (this doesn’t lock the tables) and

then Insert each of the records in their own results table.

After Merge, the KPI_REPO microservices and RULES_REPO can also run in parallel.

Again, a message is re-directed from the flow of Merge to the two microservices. The last one

to finish processing, verifies that all dependencies are completed and if possible, it starts the next

flow.

Finally, regarding the File and Repo rules, it was found that the way the rules were imple-

mented could be improved. A rule is a SQL statement whose primary purpose is to collect database

38


results to apply processing over them. In the following list 4.1, an example of these rules is shown.

1 r u l e _ i d : "CMP019"

2 r u l e _ t y p e : "20_FILE_INT"

3 l e v e l : 24 s e v e r i t y : 15 adapter : o r a c l e6 d e s c r i p t i o n : "Checks for commercial documents with repeated line identifiers"

7 c o n t e n t :8 commands:9 −

10 command: >11 SELECT12 i . invo ice_no ,13 i l . l ine_number ,14

15 i . import_ id AS import_id ,16 i . row_key AS uid ,17 i . merge_key AS uid218 FROM #T{RULE_FILE_2} il

19 JOIN #EM{Invoice} i

20 ON i l . i n v o i c e _ i d = i . row_key21 WHERE22 i . import_ id = #{colbi.import_id}

23

24 t i t l e : "Checks for commercial documents with repeated line identifiers"

25 a p p l i e s _ t o : "SaftFile"

26 succes s_message : "No commercial documents were detected with repeated lines."

27 i n s u c c e s s _ m e s s a g e : "Commercial document ’@invoice_no’ contains the line ’

@line_number’ with repeated identifier."

28 a p p l i e s _ t o _ v e r s i o n :29 − "PT_1.03_01"

30 − "PT_1.04_01"

31 a p p l i e s _ t o _ d a t a _ t y p e s : [’E’ , ’F’ , ’I’ , ’P’ , ’S’ ]32 a p p l i e s _ t o _ p a r t i a l _ f i l e : f a l s e33 dependenc ies :34 − "RULE_FILE_2"

35 r e f e r e n c e _ t y p e : "Saft::Invoice"

Listing 4.1: Example structure of a rule

39


During the flow of the Rules, there is an algorithm that orders the rules according to their

dependencies. After this ordering, the rules are executed sequentially as in the figure 4.3.

Figure 4.3: Sequential Rules Execution

This process was improved by implementing the rules in a parallel way according to their

dependencies. In order to better understand this process, the following example shows how this

execution is done.

When loading the rules, a dependency tree containing all the rules and dependencies between

them is loaded into memory show in figure 4.4. This is possible because a dependency has it’s

own dependencies, according to the levels of the dependency tree.

Figure 4.4: Parallel Rules execution 1

All the rules without dependencies are automatically executed. This execution is performed

by a Thread Pool which contains a maximum number of rules that are able to run concurrently.

This pool exists in order to not overload the system with an undefined number of Threads. All

other rules that can be executed but do not have the resources to do so, wait on the Thread Pool

for their turn to execute as in figure 4.5.

40



When a rule ends its process, it retrieves all the rules with direct dependence to theirs, and

check if it can execute some of those rules. If there are rules with all the dependencies processed,

it automatically starts processing those rules. Otherwise, if it cannot process more rules, it simply

terminates its processing and leases the resources it was taking. This is useful because rules that

were awaiting execution can start as figure 4.6 shows.


This process is repeated until all rules are executed.


41


42

Chapter 5

Colbi Exchanger

The entire implementation of the new architecture was developed around the basics of this library.

It is a library developed taking in mind to be used on other kinds of applications, and not only with

the purpose of this use case (SAF-T processing).

It is structured in a way that allows to scale a product horizontally, in a simple way. You

can quickly port a monolithic architecture to a distributed, microservice-based architecture. This

migration is achieved relatively quickly by simplifying and encapsulating the whole process of

communication between microservices. All this is possible using only configurations.

5.1 Overview

In order to take advantage of the capabilities of the library, the developer only needs to isolate all

the code that he wants to execute, and specify the microservices where he wants to communicate.

The library supports two types of calls: synchronous and asynchronous shown in figure 5.1.

Figure 5.1: Colbi Exchange Call types

43

Colbi Exchanger

In the synchronous calls between microservices, the microservice executing the request opens

a message queue in order to receive the response. After opening this queue, it sends the request

and waits in a blocking wait for the response to the request. In the microservice responsible for

executing and responding to the request, as soon as it finishes processing the message, it sends

a response to the message queue previously created by the microservice that initially made the

request. In asynchronous calls, the microservice only posts its request to a queue and continues

its own processing. It is through the use of the class ColbiExClient, that these requests can be

executed as the code snippet 5.1 demonstrates

1 public class Main {

2 public static void main(String[] args) {

3 ColbiExClient client = new ColbiExClient(name:"TEST_COMPONENT",host:"

192.168.40.100",username:"teste",password:"123");

4 MessageResponse messageResponse = client.makeSyncRequest(exchange_name:"

exchange1", routing_key:"#", body:"Just a test",timeout:10000);

5 client.close();

6 }

7 }

Listing 5.1: Colbi Exchanger Client

5.2 Message Exchange

As previously mentioned, all communication in Colbi Exchanger is through the use of messages.

These messages are serialized Java objects that contain all the information necessary to execute

the request as demonstrated in figure 5.2.

Figure 5.2: Message Class

5.3 Microservice Implementation

When implementing a microservice, the developer needs to extend a Java interface and implement

the override method as shown in code snippet 5.2, the behavior he wants the microservice to

execute.

44

Colbi Exchanger

1 public class Microservice1 implements IFlow {

2 public MessageResponse executeFlow(Message message) {

3 //TODO Implement Microservice Behaviour

4 }

5 }

Listing 5.2: Implementation of Microservice

5.4 Configuration

As previously mentioned, with the use of Colbi Exchanger, each microservice is something highly

changeable and configurable. In this way, it is necessary to understand all the configurations

involved in a microservice and the meaning of each of these properties.

1 name: "Machine C1"

2 type : "C1"

3 f lowsPath : "flows"

4 exchanges :5 - name: "NEW_FILE_EXCHANGE"

6 durable : t rue7 supportReply : f a l s e8 rabbitAuth :9 hos t : "192.168.1.7"

10 username: "teste"

11 password: "testepass"

12 maxMessages: 513 rede l iveredMaxTries : 314 f low : "Start"

15 queue:16 name: "accept_new_file"

17 bindingKeys :18 − "NewFile"

19 durable : t rue20 e x c l u s i v e : f a l s e21 a u t o D e l e t e : f a l s e22 . . .

Listing 5.3: Colbi Microservice configuration

In the example case of the configuration listing 5.3, it is a machine with the name "Machine

C1" which is responsible for processing messages of type "C1". These two properties are only

descriptive and have no influence on the behavior of the microservice. The microservice that im-

plements the flow, is coded in the package path (flowPath) "flows" of the project. It is possible for

a microservice to implement multiple flows if necessary. In this way, several exchanges through

which the microservice communicates can be specified. This microservice implements only one

45

Colbi Exchanger

flow. This flow communicates through exchange with the name "NEW_FILE_EXCHANGE", which

is persisted in the cluster (durable). This communication is performed in the cluster with host

"192.168.1.7" with the user "test" and pass word "testpass". The flow can only process up to 5

messages in parallel. In addition, it can only try to process a message no more than 3 times.

The flow that implements is in the class "Start" that contains all the business logic. This flow

receives messages coming from the queue with the name "accept_new_file" and with mes-

sages whose key is "NewFile".

All these properties have specific meaning and must be configured according to the need of

the product to be implemented.

• name: The name of the Microservice.

• type: The machine component type. It is a purely descriptive property of the behavior it

implements.

• flowsPath: the project package that contains the behaviors that the microservice imple-

ments.

• exchanges: All the behaviors that the microservice implements, and the queues through

which it sends the messages.

– name: name of the exchange through which the microservice must carry out the com-

munication.

– durable: if the exchange should survive to a machine restart.

– supportReply: if the channel exchange should support sync message reply.

– rabbitAuth.host: host where the microservice should connect to receive messages.

– rabbitAuth.username: username of the cluster where the microservice connects.

– rabbitAuth.password: password of the microservice user trying to connect to the

host.

– maxMessages: maximum messages that the microservice can handle at the same time.

– redeliveredMaxTries: number of tries before process the message as error.

– flow: name of the class responsible for performing the message processing.

– queue.name: name of the quere it should connect and receives the messages.

– queue.bindingKeys: the keys of the messages that the microservice can accept

– queue.durable: if the queue is able to resist a machine restart and keep the non pro-

cessed messages stored in disk.

– queue.exclusive: queue is deleted when the connection that declares it is closed.

– queue.autoDelete: server will delete the queue it is no longer in use.

46

Colbi Exchanger

5.5 Fault Tolerance

When properly used, the library is prepared to withstand unexpected system failures. If for some

reason a microservice receives a message and can not process it, it rejects the message and puts it

back in the queue so that another microservice can process it. We can define the number of times

a microservice attempts to process a message in the microservice configurations. Additionally,

each message stores the number of times is was processed. Finally, when this number of times

exceeds the number of predestined attempts, the message is definitely rejected and deleted from

the system, showing the user that there was an error processing his file due to incorrect data.

It is also prepared to withstand both physical and network failures. When a message is ready to

be processed, it is forwarded to a microservice that is available by the RabbitMQ broker. When the

message is forwarded to the microservice, it is not deleted from the system but rather watermarked

as a message being consumed. It is only at the end of microservice processing that the message

is marked as completed, and deleted from the system. If processing fails, the message that was

previously watermarked as being consumed changes its watermarked state to ready-to-process

state. Finally, it is forwarded to a microservice that is ready to consume the message. Figure 5.3

demonstrates this fault tolerant execution.

Figure 5.3: Fault Tolerance with Colbi Exchanger

47

Colbi Exchanger

48

Chapter 6

Colbi Cache

Colbi-Cache is a library designed to support product optimizations. As described previously, the

processing of a file is composed of a pipeline with multiple flows. Due to the introduction of the

new pipeline as defined in chapter 4 section 4.2, it is necessary that the flows are now synchronized.

This synchronization is achieved through a distributed cache between all microservices sharing

the processing states of the files. When a microservice wants to verify that all dependencies of the

flows associated with the file are complete, it accesses this cache and retrieves the file states.

As microservices can access concurrently, it is guaranteed that the access to the cache is in

order of arrival. If it needs to retrieve the same file states, they have to do it in a synchronized way.

6.1 Design

This library is a fundamental part not only for optimizing flow pipelines but also for maintaining

data coherence. As previously stated, a file when submitted must follow a pipeline that corre-

sponds to all processes that transform raw data into valuable data. Files entered by multiple users

on the system do not have any restrictions on the execution of the pipeline. The biggest problem

comes from clients submitting multiple files within the same organization.

A customer within the product, has a company, an organization, and a fiscal year related to

the processing. By submitting multiple files within the same company with equal fiscal years,

various business rules inherent to Colbi prevent them from running concurrently. This because in

the course of the flows from Merge to Final, it is used all the data from the customer’s repository

(delete,insert,update database operations). The use of this data refers to the fiscal year in which

the customer is, company and organization. In this way, it is necessary to ensure the integrity

of the data when it is used. Therefore, is mandatory to ensure that multiple files from the same

company, organization, and fiscal year,execute their flow pipeline sequentially. In practice, this

represents that all files can execute the first flow (the Start flow because it’s the only flow that

doesn’t depends on database operations and repository integrity), but only one can execute the

49

Colbi Cache

remaining flow pipelines. So the remaining files have to wait their turn, which can only begin

when the previous pipeline is finished.

6.2 Implementation

There are a set of functions that represent the business rules that were presented in the section

6.1 of this chapter. In Colbi’s case, a set of methods that encapsulate these business rules were

implemented. In listing 6.1 of this chapter, all these methods and their purpose in the product are

presented.

1 /**

2 * Adds a new run status associated with a file

3 * @param processFileInfo File that us being processed, along with all its

information

4 * @param colbiFileState The current flow making the request

5 * @param flowStatus the processing status of the operation to save

6 * @return void

7 */

8 public void addNewFileState(ProcessFileInfo processFileInfo, ColbiFileState

colbiFileState, FlowStatus flowStatus)

9

10 /**

11 * Checks if a file waiting to be processed can start


information

13 * @return boolen representing if it is possible to start the file processing or not

14 */

15 public boolean canStartNextFileOnHold(ProcessFileInfo processFileInfo)

16

17 /**

18 * Retrieves the next file from the same company, organization and fiscal year if

one exists.


information

20 * @return if file exists, return it. Otherwise return null

21 */

22 public ProcessFileInfo getNextFileToTrigger(ProcessFileInfo processFileInfo)

23

24 /**

25 * Deletes the a ProcessFileInfo from cache.

26 * @param processFileInfo the ProcessFileInfo we want to delete from cache.

27 * @return if deleted, return true. Otherwise return false

28 */

29 public boolean deleteFile(ProcessFileInfo processFileInfo)

30

31 /**

32 * Checks if a flow can trigger the next one in its pipeline

50

Colbi Cache


information

34 * @return boolen representing if it is possible to start next flow or no

35 */

36 public boolean canTriggerFlow(ProcessFileInfo processFileInfo, String[]

dependencies, ColbiFileState colbiFileState)

37

38 /**

39 * Retrieves a unique key representing the cached file.


information

41 * @return string representing the genereted unique key

42 */

43 private String getKey(ProcessFileInfo processFileInfo)

Listing 6.1: Colbi Cache methods implementation

6.3 Locking System

In order to ensure the synchronization between flows and business rules, the library implemented

together with the Apache Ignite properties presents an architecture similar to the figure 6.1.

Figure 6.1: Colbi Cache

When a microservice accesses its cache, it starts its operation trying to acquire lock to the file

retrieved key for that particular file. This key is retrieved by the getKey() method implemented

in Colbi Cache as specified in the listing 6.1 of this chapter. If it gets the lock on that key, the

associated request operation is executed and, at the end of its execution, the operation result is

returned to the microservice. At the same time that the result is returned to the microservice, the

key previously locked key is released. If the microservice can’t get the lock, it waits until the

previously created lock is released.

51

Colbi Cache

6.4 Fault Tolerance

Regarding to fault tolerance, as the implementation of this cache makes use of an in-memory

datagrid distributed by the microservices, there is no loss of information. When a microservice

writes to its cache, the changes it makes are replicated across all microservices. In this way, if

one of the microservices fails, the information remains always intact. In addition, it is possible to

associate in-memory data storage with local disk persistence. This way, the information is always

maintained even in case of extreme failures.

52

Chapter 7

Implementation Tests and Results

The tests that have been decided are load tests that evaluate and relate the performance of the new

implemented architecture described in chapter 4, with the previous one of chapter 3. Since the

modifications of the architecture did not change business rules, the output that results from the

execution of both is the same. They mainly address the response time of the architecture to the

optimizations, and the introduction of unexpected physical machine failure. This way, in order to

evaluate the results of the implementation, a set of tests and metrics were developed, regarding

fundamentally the following points:

1. Global response time of a single file input

2. Response time of the machine after extreme load of multiple of files

3. Physical machine failure

In the following sections, these tests are specified in more detail, as well as the environment in

which they were produced.

7.1 Evaluation functions

Since the processing of a file in the system has a set of flows, the processing time of a file corre-

sponds to the sum of all the flows it traverses.

FlowDelay(i) =

{FlowBegin(i)−FlowEnd(i−1) : i > 1

FlowBegin(i)−FileU ploaded(i) : i = 1,

FlowTime(i) = FlowEnd(i)−FlowBegin(i),

TotalTime =∞

∑i=1

FlowDelay(i)+FlowTime(i)

i, corresponds to the flow

53


In order to better understand the formulas and the reasons for them, the figure 7.1 of this

chapter, represents the process that a file traverses in its processing.

Figure 7.1: Total processing time of a single file

When a file is submitted or one of its flows ends, there may be a wait time for it to continue

processing. This waiting time corresponds to FlowDelay and it happens because the system may

already be overloaded so it can not accept more processing

While a file is being processed, the time it takes to complete its processing is the subtraction

of the time when the flow began (FlowBegin), with the end of flow (FlowEnd). This calculation is

represented by the variable FlowTime.

Finally, the total processing time of a file (TotalTime) corresponds to the sum of all the flows

it has gone through.

7.2 Environment Setup

For the system benchmark, six machines with exactly the same characteristics were used. A

machine where the two databases were configured together with the RabbitMQ broker, a machine

with the front-end, and four processing machines.

Due to internal restrictions on the company, it was not possible to test the architecture in a pro-

duction environment. This because of ongoing major submissions in the tax authority. Restarting

the machine at this time of the month causes delays in processing the files and was not advis-

able considering the response timings defined in the SLAs (as described in chapter 1 section 1.4)

between Petapilot and the Portuguese Tax Authority. Therefore, it was only possible to obtain a

Quality environment.

In each of the machines, eight microservices were created, and each defines a flow. That is,

each of the machines has the necessary microservices so that a file can be processed. It can be said

that each of the machines is a replica of the others.

7.3 Benchmark Definition

The tests were divided into two phases and each phase has three tests. The difference between the

first and second phases concerns the introduction of fault tolerance. In the first phase, the tests run

normally while in the second, one of the machines is turned off during processing.

54


Each phase is made up of three tests. The first test only tests the behavior of the system with

the submission of one file. The remaining two tests are a system test with a considerable load.

Given that Colbi works with simultaneous file submission, it was interesting to test the behavior

of the system comparing the two architectures in load environment with files of different sizes and

types. This way, it’s possible to simulate a very close test of the real behavior of the system in

production environment.

The table 7.1 of this chapter shows the different sizes and types of test files. The table 7.2 of

this chapter as well, shows the two load tests performed. The two tests are a combination of files

of different types and sizes. Only the first test is not shown. As previously stated, the first is the

performance evaluation of only one file.

File SizeSmall(S) 0.02 MBMedium(M) 40 MBLarge(L) 200 MB

File TypeSales Invoices (F)

Accounting Data (C)Integrated (I)

Table 7.1: File types and sizes

Phase Test Integrated (I) Sales invoices (F) Accounting Data (C) Total FilesSmall Medium Large Small Medium Large Small Medium Large

Phase 1 Test 2 9 9 9 9 9 9 9 9 9 81Test 3 27 27 27 27 27 27 27 27 27 243

Phase 2 Test 2 9 9 9 9 9 9 9 9 9 81Test 3 27 27 27 27 27 27 27 27 27 243

Table 7.2: Number of files per type and size for the load tests

In sections 7.3.1 and 7.3.2 of this chapter, all values that resulted from the tests were recorded.

55


7.3.1 Phase One

The values represented in both charts and tables of test one and two, are the average of each flow.

The charts contains the execution times of the various flows making the comparison between the

two architectures with the increase of file size. The tables present the separation of the total flow

time of each flow, in flow and delay time.

7.3.1.1 Test Case 1

In this first test, only one file of each type and size was submitted. All these submissions were

isolated and not carried out in parallel.

Figure 7.2: Phase 1, Test1 - The processing time of each flow on both architectures with a smallfile

Integrated (I) Sales Invoices (F) Accounting Data (C)Flows and Delays Original Architecture New Architecture Original Architecture New Architecture Original Architecture New ArchitectureDELAY 1 0 0,14 0 0,13 0 0,15START 1,28 1,11 0,97 1,02 0,6 0,55DELAY 2 0 0,13 0 0,13 0 0,14SETUP 2,41 1,94 1,24 0,88 0,63 0,42DELAY 3 0 0,14 0 0,14 0 0,17KPI_FILE 2,32 2,41 1,37 1,31 1,02 1,04DELAY 4 0 0,12 0 0,15 0 0,12RULES_FILE 4,66 3,85 3,56 2,86 1,78 0,96DELAY 5 0 0,15 0 0,12 0 0,15MERGE 7,87 7,76 5,78 5,74 3,04 3,17DELAY 6 0 0,13 0 0,17 0 0,13KPI_REPO 1,21 1,21 0,89 0,96 0,76 0,88DELAY 7 0 0,14 0 0,13 0 0,15RULES_REPO 2,24 1,67 2,03 1,59 1,41 1,09DELAY 8 0 0,12 0 0,14 0 0,17FINAL 1,33 1,37 0,94 0,91 0,87 0,92TOTAL TIME 23,32 22,39 16,78 16,38 10,11 10,21REAL TIME 23,32 18,77 16,78 14,11 10,11 8,37

Table 7.3: Phase 1, Test 1 - Small file flow and delay times

56


Figure 7.3: Phase 1, Test 1 - The processing time of each flow on both architectures with a mediumfile

Integrated (I) Sales Invoices (F) Accounting Data (C)Flows and Delays Original Architecture New Architecture Original Architecture New Architecture Original Architecture New ArchitectureDELAY 1 0 0,15 0 0,13 0 0,15START 5,44 5,58 4,67 4,89 3,04 3,23DELAY 2 0 0,14 0 0,13 0 0,14SETUP 12,21 7,45 5,85 2,67 3,21 1,89DELAY 3 0 0,15 0 0,14 0 0,17KPI_FILE 11,03 10,98 6,57 6,23 5,16 5,23DELAY 4 0 0,15 0 0,15 0 0,12RULES_FILE 21,36 12,86 16,81 9,65 9,05 4,98DELAY 5 0 0,16 0 0,12 0 0,15MERGE 34,45 34,56 27,78 28,04 15,44 16,02DELAY 6 0 0,17 0 0,17 0 0,13KPI_REPO 6,04 5,89 3,93 3,96 3,86 3,95DELAY 7 0 0,12 0 0,13 0 0,15RULES_REPO 10,96 6,12 8,97 5,07 7,16 4,02DELAY 8 0 0,13 0 0,14 0 0,17FINAL 10,36 10,29 4,56 4,66 4,41 4,53TOTAL TIME 111,85 94,9 79,14 66,28 51,33 45,03REAL TIME 111,85 78,03 79,14 56,09 51,33 36.1

Table 7.4: Phase 1, Test 1 - Medium file flow and delay times

57


Figure 7.4: Phase 1, Test 1 - The processing time of each flow on both architectures with a largefile

Integrated (I) Sales Invoices (F) Accounting Data (C)Flows and Delays Original Architecture New Architecture Original Architecture New Architecture Original Architecture New ArchitectureDELAY 1 0 0,16 0 0,14 0 0,16START 22,12 22,23 19,22 19,85 13,82 14,05DELAY 2 0 0,17 0 0,12 0 0,15SETUP 49,51 25,14 24,65 13,64 14,32 8,63DELAY 3 0 0,12 0 0,13 0 0,13KPI_FILE 45,08 44,86 26,55 26,26 23,51 23,42DELAY 4 0 0,14 0 0,15 0 0,12RULES_FILE 86,51 61,25 70,64 44,69 41,32 23,53DELAY 5 0 0,14 0 0,17 0 0,14MERGE 141,87 142,09 115,57 116,23 69,68 68,96DELAY 6 0 0,15 0 0,13 0 0,13KPI_REPO 24,47 24,27 17,41 17,45 17,27 17,15DELAY 7 0 0,16 0 0,14 0 0,17RULES_REPO 44,86 24,79 41,06 25,63 32,68 18,52DELAY 8 0 0,14 0 0,14 0 0,14FINAL 42,32 42,36 18,95 18,88 19,06 19,23TOTAL TIME 456,74 388,17 334,05 283,75 231,66 194,63REAL TIME 456,74 319,04 334,05 240.04 231,66 154,06

Table 7.5: Phase 1, Test 1 - Large file flow and delay times

58


7.3.1.2 Test Case 2

In this second test, eighty one (as defined in table 7.2 of this section) files are executed simulta-

neously. It is a much more real scenario that allow us to better understand Colbi’s response to the

increased of load on the system.

Figure 7.5: Phase 1, Test 2 - Average processing time of flows on both architectures with smallfiles

Integrated (I) Sales Invoices (F) Accounting Data (C)Flows and Delays Original Architecture New Architecture Original Architecture New Architecture Original Architecture New ArchitectureDELAY 1 0,11 0,18 0,08 0,16 0,13 0,18START 1,42 1,36 1,09 1,13 0,66 0,73DELAY 2 0,11 0,19 0,09 0,14 0,07 0,15SETUP 2,67 2,14 1,35 0,98 0,71 0,59DELAY 3 0,07 0,17 0,13 0,15 0,09 0,17KPI_FILE 2,65 2,59 1,52 1,47 1,12 1,14DELAY 4 0,12 0,15 0,13 0,17 0,11 0,14RULES_FILE 5,21 4,11 3,98 2,96 1,98 1,14DELAY 5 0,13 0,2 0,1 0,19 0,1 0,17MERGE 8,72 8,53 6,43 6,35 3,38 3,27DELAY 6 0,13 0,17 0,07 0,16 0,12 0,13KPI_REPO 1,31 1,35 0,98 1,06 0,87 0,79DELAY 7 0,1 0,18 0,11 0,17 0,09 0,19RULES_REPO 2,37 1,76 2,24 1,63 1,57 1,13DELAY 8 0,07 0,18 0,06 0,17 0,06 0,17FINAL 1,53 1,59 1,06 0,99 1,12 1,23TOTAL TIME 26,72 24,85 19,42 17,88 12,18 11,32REAL TIME 26,72 20,91 19,42 15.35 12,18 9,39

Table 7.6: Phase 1, Test 2 - Small files average flow and delay times

59


Figure 7.6: Phase 1, Test 2 - Average processing time of flows on both architectures with mediumfiles

Integrated (I) Sales Invoices (F) Accounting Data (C)Flows and Delays Original Architecture New Architecture Original Architecture New Architecture Original Architecture New ArchitectureDELAY 1 0,09 0,16 0,09 0,18 0,12 0,15START 6,12 6,23 5,18 5,23 3,04 3,18DELAY 2 0,06 0,14 0,13 0,19 0,12 0,16SETUP 13,58 7,61 6,49 3,56 3,21 2,31DELAY 3 0,11 0,17 0,08 0,19 0,06 0,16KPI_FILE 12,26 12,14 7,21 7,16 5,16 4,98DELAY 4 0,06 0,17 0,08 0,21 0,08 0,2RULES_FILE 23,76 15,63 18,66 14,22 9,05 7,23DELAY 5 0,11 0,13 0,12 0,17 0,12 0,18MERGE 38,38 38,45 30,84 31,12 15,44 16,27DELAY 6 0,08 0,17 0,09 0,13 0,07 0,21KPI_REPO 6,73 6,86 4,36 4,35 3,86 4,02DELAY 7 0,09 0,17 0,13 0,15 0,08 0,18RULES_REPO 11,89 5,12 9,96 5,36 7,16 5,99DELAY 8 0,08 0,14 0,07 0,22 0,07 0,15FINAL 11,44 11,98 5,15 5,14 4,41 4,62TOTAL TIME 124,84 105,27 88,64 77,58 52,05 49,99REAL TIME 124,84 88,01 88,64 66,07 52,05 35,99

Table 7.7: Phase 1, Test 2 - Medium files average flow and delay times

60


Figure 7.7: Phase 1, Test 2 - Average processing time of flows on both architectures with largefiles


Table 7.8: Phase 1, Test 2 - Large files average flow and delay times

61


7.3.1.3 Test Case 3

The third test, as said earlier, is the one that best simulates the processing of Colbi in the real

world. There is a more intense load, made of two hundred and forty three different files in both

size and type as defined in table 7.2 of this section.

Figure 7.8: Phase 1, Test 3 - Average processing time of flows on both architectures with smallfiles


Table 7.9: Phase 1, Test 3 - Small files average flow and delay times

62


Figure 7.9: Phase 1, Test 3 - Average processing time of flows on both architectures with mediumfiles


Table 7.10: Phase 1, Test 3 - Medium files average flow and delay times

63


Figure 7.10: Phase 1, Test 3 - Average processing time of flows on both architectures with largefiles


Table 7.11: Phase 1, Test 3 - Large files average flow and delay times

64


7.3.2 Phase Two

The values represented in both charts and tables of test one and two, are the average of each flow.

The bar charts, shows direct comparison between the two architectures with the increase of file

size. There is also a new column containing the flow of execution of the new architecture with

an unexpected failure. This failure corresponds to the shutdown of one processing machine. The

tables present the separation of the total flow time: flow time and flow delay time. There is a

highlight in some cells that correspond to the flow where the failure occurred.

7.3.2.1 Test Case 1

In this test, only one file of each type and size was submitted. It was only performed on the

new architecture because the original one could not support unexpected failures. Again, these

submissions were isolated and not carried out in parallel.

Figure 7.11: Phase 2, Test 1 - Processing time of flows on both architectures with a small file

Flows and Delays Integrated (I) Sales Invoices (F) Accounting Data (C)DELAY 1 0,26 0,14 0,14START 2,53 1,02 0,62DELAY 2 0,13 0,25 0,13SETUP 2,52 1,93 0,41DELAY 3 0,13 0,12 0,14KPI_FILE 2,56 1,25 1,08DELAY 4 0,17 0,16 0,12RULES_FILE 4,35 3,46 0,86DELAY 5 0,14 0,14 0,31MERGE 7,45 5,61 3,32DELAY 6 0,12 0,12 0,14KPI_REPO 1,35 1,03 0,94DELAY 7 0,14 0,14 0,13RULES_REPO 2,43 1,97 1,12DELAY 8 0,15 0,12 0,14FINAL 1,43 0,87 1,01TOTAL TIME 25,86 18,33 10,61REAL TIME 21,95 16,05 8,81

Table 7.12: Phase 2, Test 1 - Small file flow and delay time

65


Figure 7.12: Phase 2, Test 1 - Processing time of flows on both architectures with a medium file


Table 7.13: Phase 2, Test 1 - Medium file flow and delay time

66


Figure 7.13: Phase 2, Test 1 - Processing time of flows on both architectures with a large file


Table 7.14: Phase 2, Test 1 - Large file flow and delay time

67


7.3.2.2 Test Case 2

In this second test, eighty one (as defined in table 7.2 of this section) files are executed simultane-

ously with an unexpected failure. It is a much more real scenario that allow us to better understand

Colbi’s response to the increased of load on the system and unexpected failures.

Figure 7.14: Phase 2, Test 2 - Processing time of flows on both architectures with small files


Table 7.15: Phase 2, Test 2 - Small file average flow and delay times

68


Figure 7.15: Phase 2, Test 2 - Processing time of flows on both architectures with medium files

Flows and Delays Integrated (I) Sales Invoices (F) Accounting Data (C)DELAY 1 0,15 0,23 0,14START 6,12 5,66 3,17DELAY 2 0,17 0,17 0,13SETUP 7,71 3,48 2,45DELAY 3 0,21 0,14 0,19KPI_FILE 12,56 7,36 4,77DELAY 4 0,25 0,15 0,21RULES_FILE 15,88 14,75 7,12DELAY 5 0,28 0,14 0,15MERGE 41,96 30,98 15,99DELAY 6 0,17 0,17 0,36KPI_REPO 7,02 4,56 6,69DELAY 7 0,14 0,19 0,16RULES_REPO 5,08 5,23 6,04DELAY 8 0,15 0,25 0,17FINAL 11,99 6,54 4,49TOTAL TIME 109,84 80 52,23REAL TIME 91,85 67,77 41,07

Table 7.16: Phase 2, Test 2 - Medium file average flow and delay times

69


Figure 7.16: Phase 2, Test 2 - Processing time of flows on both architectures with large files

Flows and Delays Integrated (I) Sales Invoices (F) Accounting Data (C)DELAY 1 0,15 0,24 0,13START 27,89 26,35 15,39DELAY 2 0,17 0,19 0,25SETUP 38,89 16,23 9,89DELAY 3 0,18 0,31 0,17KPI_FILE 20,56 28,98 26,64DELAY 4 0,14 0,21 0,14RULES_FILE 77,65 54,69 25,12DELAY 5 0,26 0,14 0,22MERGE 157,98 126,56 76,41DELAY 6 0,24 0,15 0,13KPI_REPO 27,05 20,15 18,97DELAY 7 0,27 0,13 0,19RULES_REPO 40,25 25,02 21,55DELAY 8 0,14 0,13 0,21FINAL 47,02 23,23 21,89TOTAL TIME 438,84 322,71 217,3REAL TIME 390,81 273,12 172.94

Table 7.17: Phase 2, Test 2 - Large file average flow and delay times

70


7.3.2.3 Test Case 3

The third test, as said earlier, is the one that best simulates the processing of Colbi in the real

world. There is a more intense load, made of two hundred and forty three different files in both

size and type as defined in table 7.2 of this section, with introduction of unexpected failures.

Figure 7.17: Phase 2, Test 3 - Processing time of flows on both architectures with small files


Table 7.18: Phase 2, Test 3 - Small file average flow and delay times

71


Figure 7.18: Phase 2, Test 3 - Processing time of flows on both architectures with medium files


Table 7.19: Phase 2, Test 3 - Medium file average flow and delay times

72


Figure 7.19: Phase 2, Test 3 - Processing time of flows on both architectures with large files


Table 7.20: Phase 2, Test 3 - Large file average flow and delay times

73


7.4 Conclusion

In both phases, it was observed that the processing with the new architecture was always faster.

But this improvement is less observed in small files. These improvements are mainly due to

optimizations in the workflow of the flows.

The most observable improvement is the parallelization of Rules and KPI as stated in chapter

4 section 4.5. Since these flows can now run in parallel, the time it takes to execute the two flows

corresponds to the execution time of the worst flow. In addition, in the optimization of SETUP,

some improvements were noticed. Especially when the file size raises due to the increase in the

number of entities that can be loaded in parallel into the database.

Regarding the delay in the flows, the old architecture with just one file submission does not

have any kind of delay. Since it is sequential, there is no wait time to start another flow. But in

the new architecture, there is a delay associated with network latency and the RabbitMQ broker.

As the results show, as the number of files increases in the old architecture, this flow delay time

increases. This is due to the fact that, as explained in chapter 3 section 3.3, flows are launched and

executed sequentially. This way, even if the machine can execute the next flow, it always has to

wait for other previous flows that are running.

The optimization of RULES_FILE and RULES_REPO is very useful especially for medium

and large files. Since all rules now run according to their dependencies as described in chapter 4.5,

the processing time of the rules decreases.

Regarding fault tolerance, during the second phase tests, it was observed that the file pro-

cessing time increases but not very significantly. This increase is dependent on where the failure

occurred. If it fails in the merge, as flow stores in the database where it was, when the microser-

vice tries to reprocess the flow, it resumes where it stopped. But in START and SETUP, if the

flow fails, it has to redo the flow regardless where it was. This penalizes more significantly the

processing time.

Evaluating the overall performance of the architecture in contrast to the old one as can be seen

in chart 7.20 of this section, we noticed quite significant improvements. Where the architecture

improvement was more noticeable, was in the example of two hundred and forty three files (the

closest approximation of colbi currently). This is a good indicator because the architecture has

capabilities to be very performant. It can be more than forty percent faster than the older one.

The charts 7.21, 7.22 and 7.23 of this section give a comparison of the processing time between

the old and new architecture as well the new architecture with failures.

74


Figure 7.20: Overall System Performance Gain

Figure 7.21: Architectures Global Processing Time - 1 File

75




76

Chapter 8

Conclusions and Future Work

8.1 Conclusion and Expected Results

The result of this work was very positive. All the developed support libraries were successfully

implemented, with room for future development. For example, through Colbi Exchanger, it is

possible to integrate even more flows into the architecture. This allows in the short term to achieve

one of the company’s most wanted goals. Assemble all the development environments of countries

where it operates. In short, an unified product. Besides, the developed and implemented architec-

ture is capable of withstanding unexpected failures and, some parts of the system were optimized.

Regarding performance, in chapter 7 it was demonstrated that even with a considerable load, (in a

scenario similar to the real one) the system was able to perform well and fast.

This new architecture will be a major breakthrough for the company as it will allow a great

stability of the system as well as a possibility to progress further in the international market with

the facility of quickly being able to implement new services to accommodate different types of

files. Unfortunately, due to time constraints, the system could not be integrated in the production

environments. Due to the challenges stated in chapter 1 section 1.4, this objective could not be

achieved.

The list of what has been achieved and optimized with new implementations:

• Optimization 1: Reduce the time between flows with the elimination of the ’for’ cycle

responsible for running each flow sequentially.

• Optimization 2: Eliminate sequential loading of documents generated from the Setup flow

for parallel loading.

• Optimization 3: Improvement of flows pipeline, with parallel execution of flows without

dependencies.

• Architecture Fault Tolerance: The architecture can now fail and recover from failures

by continuing its processing. For the user, this has a processing time cost which worsens

77

Conclusions and Future Work

slightly according to the flow in which it has stopped. This is because there are flows that,

when stopped, have to run again in order to maintain data integrity. In others, there is a

serialization in the database of each step of the flow where they were, so they can resume

where it stopped.

• Architecture Scalability: Colbi can now gracefully scale. The number of microservices

that are needed can be configured and instantiated.

With regard to the maintenance and continuous development of the platform, given the mi-

croservice nature of the architecture, it is possible that the company will undergo some restruc-

turing in the way it develops. This can facilitate development since the team are now separated

according to their role in the company. But they can develop faster features and quickly integrate

into the system. Possibly, over time there may be problems with versioning of microservices. This

is because whenever a new microservice is developed or changed, the microservice version must

be ensured. The system must be able to handle files that require older flows. This is a problem that

has to be well approached later if the company decides to implement the developed architecture.

8.2 Future Work

Colbi is available in several countries such as Portugal, Lithuania and Poland. For each of these

environments there is a local installation with the business rules specific to each environment. As

the company’s main goal is to unify the product into one, the new architecture is an enabler of this

goal. In this way, it is now possible to integrate each environment in a single installation since

only the Portuguese environment has been implemented in the new architecture.

It is now interesting to think about porting the new architecture to the cloud. For this, it would

be necessary to create all necessary Docker configurations for each of the microservices, as well

as the deploy policies of each microservice. There is already implemented in each microservice

a small REST Dropwizard server that contains a health functions for basic monitoring. For now,

it only checks if a microservice is up and running. This could be improved with the implementa-

tion of more advanced monitoring functions that could more accurately reflect the load that each

microservice is subjected to.

For the Colbi Exchanger library, the algorithm that performs the message distribution can be

redesigned. At this moment the message distribution is performed through Round Robin as it

was the first approach for the first implementation. However, a better algorithm that makes the

message distribution taking into consideration other useful metrics can be used. For example,

build an algorithm that uses the load of each microservice, files metadata with the application of

machine learning techniques and predictive the time and resources that a file may use. This could

definitely improve performance and make a more balanced work distribution.

78

References

[Ale] Alessandro Nadalin. On monoliths, service-oriented architectures and microservices.

[Ale17] Alexandre Rodrigues. Nikita Ivanov on Apache Ignite In-Memory Computing Platform,2017.

[Ant] Anton Kharenko. Monolithic vs. Microservices Architecture – Microservices Practi-tioner Articles.

[Ara] Aravindan Varadan. Migration to Microservices Architecture - DZone Microservices.

[Bae17] Baeldung. Introduction to RabbitMQ | Baeldung, 2017.

[Ces] Bill Wagner Cesar de la Torre, Mike Jones, Sébastien Putier, Maira Wenzel. Monolithicapplications | Microsoft Docs.

[Chr] Chris Richardson. Monolithic Architecture pattern.

[Dan] Daniel Bryant. Scaling Microservices at Gilt with Scala, Docker and AWS.

[Dmi17] Dmitriy Setrakyan. What Is Apache Ignite? - DZone Big Data, 2017.

[Doc] Docker. Docker overview | Docker Documentation.

[Eme] Emerson Loureiro. AWS re:Invent 2016: From Monolithic to Microservices: Architec-ture Patterns in the Cloud (ARC305).

[For10] FORUM ON TAX ADMINISTRATION. Technical report, ORGANISATION FORECONOMIC CO-OPERATION AND DEVELOPMENT, Paris,France, 2010.

[Ima] Ima Miri. Microservices vs. SOA - DZone Microservices.

[Jas] Jasmin. O que é o SAF-T? Technical report, Jasmin.

[Joe09] Joern Barthel. Getting started with AMQP and RabbitMQ, 2009.

[Lok] Lokesh Gupta. Microservices - Definition, Principles and Benefits - HowToDoInJava.

[Lov15] Lovisa Johansson. Part 4: RabbitMQ Exchanges, routing keys and bindings -CloudAMQP, 2015.

[Mar] Martin Fowler. Microservices.

[Mica] Microsoft. Three-tier Application Model.

[Micb] Microsoft Azure. Introduction to microservices on Azure | Microsoft Docs.

79

REFERENCES

[Pet] Petapilot. Who we are | Petapilot.

[Phia] Phil Calçado. How we ended up with microservices.

[Phib] Phil Calçado2. Backstage Blog - Building Products at SoundCloud —Part I: Dealingwith the Monolith - SoundCloud Developers.

[Pos] Christian Posta. Microservices for Java Developers.

[Pra] Prachi Garg. Apache Ignite for Database Caching - DZone Database.

[PV] Mike P Papazoglou and Willem-Jan Van Den Heuvel. Service Oriented Architectures:Approaches, Technologies and Research Issues.

[Sam] Sam Newman. Building Microservices.

[Sha17] Shamim Bhuiyan. Apache Ignite Native Persistence, a Brief Overview - DZone BigData, 2017.

[Sta] Startup Lisboa. PetaPilot — Startup Lisboa.

[Ste] Stephen Watts. Microservices vs SOA: What’s the Difference? – BMC Blogs.

[TC ] TC Currie. Six Challenges Every Organization Will Face Implementing Microservices- The New Stack.

[Tho] Thought Works. Inverse Conway Maneuver | Technology Radar | ThoughtWorks.

80

Distributed and scalable architecture for SAF-T processing and … · Distributed and scalable...

Documents

Transcript of Distributed and scalable architecture for SAF-T processing and … · Distributed and scalable...