Automating the Quality
-
Upload
dejan-vukmirovic -
Category
Engineering
-
view
148 -
download
2
Transcript of Automating the Quality
![Page 1: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/1.jpg)
Dejan VukmirovićBelgrade, 2016
Automating the Quality
![Page 2: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/2.jpg)
A bit of context…
Global ticket sales and distribution company.A cliche, but the global leader in it’s line of business. Large IT operation.Engineering HQs in Los Angeles and London. More than 150 platforms/products.Both legacy stuff and edge technologies.
TicketmasterBelgrade based IT company.Ticketmaster’s development centre.Currently around 50 people, only engineering.Mainly Java projects.Strong in local Scala community.
Bakson
![Page 3: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/3.jpg)
A bit of context…
Why emphasise the “quality” ?
![Page 4: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/4.jpg)
A bit of context…
Each high priority production bug (Business Disruption) can be directly linked to and measured in money loss.
Bug? Fans can’t purchase the tickets.
Bug? Fans can’t enter the venue.
Because the Business people
![Page 5: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/5.jpg)
A bit of context…
Because the fansEntire Adele’s European tour was sold out in two days, in less than 15 minutes per day.
Huge success. But…
![Page 6: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/6.jpg)
A bit of context…
![Page 7: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/7.jpg)
A bit of context…
How can DevOps help teams?And how to move “there”?
![Page 8: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/8.jpg)
A bit of context…
Last phase targets: Canary release, Chaos Monkey, etc.
DevOps Maturity Model
Company wide initiative.Assessed by Gartner.18 categories - “Deployment”, “Support”, etc.
Products are required to “move” through the matrix.Progress is constantly evaluated.
Additional benefits: standardisation, guidance.
![Page 9: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/9.jpg)
Public API
HTTP service. Not RESTful.Close to 100 endpoints/actions.
2 years live in production.
Development + QA team size = 10 people
![Page 10: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/10.jpg)
Public API
Distributed architecture (microservices).Java stack.
Storages: relational, NoSQL, search engines…
APIGEE as management layer.
Each microservice has it’s on source code repository.
![Page 11: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/11.jpg)
Issues list
A week of testing upon release development is completed.Long lasting regression campaign1
Only going to shared environment after entire release is developed/completed.Late integration with clients2
Variety of tools. Or even manual. Procedures differ from env to env.Non-standardised deploy procedures3
Automated testing on entire release, also clients are testing only entire release build.Difficult to pinpoint a root cause of broken functionalities4
![Page 12: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/12.jpg)
The goal
Start automation on feature completion (code pushed to repository)
RunUnit
Tests
Do StaticCode
Analysis
Build& Save
PackageDeploy
CheckServiceStatus
Run Integration
Tests
Send Reports
![Page 13: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/13.jpg)
Tool - Gitlab
Git repository management tool.
Many additional features: code review, continuous integration, deploy…
On premise or SaaS.Free and Commercial editions.
In our flows first point since via webhooks, upon code push, the next tool in the flow is triggered.
(Note: our first AWS-based service is utilising CI on the Gitlab. But that is WiP.)
![Page 14: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/14.jpg)
Continuous Integration
Start automation on feature completion (code pushed to repository)
RunUnit
Tests
Do StaticCode
Analysis
Build& Save
PackageDeploy
CheckServiceStatus
Run Integration
Tests
Send Reports
Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.
- Martin Fowler
![Page 15: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/15.jpg)
Tool - Jenkins
Automation server. Gets additional power from numerous plugins available.
Open source. Available only as on premise.
Main unit is “job”.
In TM jobs can be created only through code repository. Creation via GUI is disabled.Two configuration XML files are part of the application code.Reasoning: - (distributed) versioning - easy to restore in case of issues with Jenkins server - easy to migrate between Jenkins instances
![Page 16: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/16.jpg)
• JENKINS Screenshot
• Demo ?
![Page 17: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/17.jpg)
Tool - SonarQube
Platform for continuous inspection of code quality.
More than 20 programming languages are covered.
Open source. Available only as on premise.
Some TM teams are failing Jenkins job on code quality violations.
API team is reviewing reports per Sprint/Release
Using FindBugs as a plugin.
![Page 18: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/18.jpg)
![Page 19: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/19.jpg)
![Page 20: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/20.jpg)
Tool - Nexus
Artifact repository.
Free or Commercial. Available only as on premise.
OOTB providing support for multiple platforms (Java, NPM, Docker…).
TM instance is locked for manual upload of artifacts.Only Jenkins instances can upload, through predefined Release plugin.
Support for release process, only “promoted” artifacts are available for Production deploy.
API team is reviewing reports per Sprint/Release.
![Page 21: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/21.jpg)
![Page 22: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/22.jpg)
CI is completed
When to run integration tests?Where to run integration tests?
![Page 23: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/23.jpg)
GitFlow
Branching model, introduced by Atlasssian.
Feature/task is merged to “develop” on completion (as by “Definition of Done”).
“Release” branch is created on demand.
“Release” is merged to “master” when ready for production.
This helps answering “When?”. On merge to “develop”.
![Page 24: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/24.jpg)
Where?
The major problem……is not developing tests…it’s not creating environments …it’s not even about automating the whole thing.
IT’S ALWAYS DATA.
![Page 25: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/25.jpg)
Data setup
Because you (usually) can’t control data in your dependencies.
Easier to initially develop.
Difficult to maintain. Tracking evolution of dependencies.
Allows easier setup of testing environemnts.
Use mocksIt allows testing in “real” environment.
Difficult to initially develop.
Easier to maintain, since owners of your datawill have to migrate it together with rest of their data.
Permanent data sets
We decided to go with permanent sets!There is a creation tool available on TM backends.
![Page 26: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/26.jpg)
API environments
DEVs TPI Production(s)QAs Stage
CAP
Stage and Production have SLAs defined.
Mapping to Gitflow: “develop” -> TPI, “release” -> Stage, “master” -> Prod
![Page 27: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/27.jpg)
Where?
Each service (should) have it’s own integration tests.
Test everything !!!
But for API it is crucial that on Gateway “everything works”.
![Page 28: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/28.jpg)
Tool - Rundeck
Tool for runbook automation and execution of arbitrary management tasks.
Open source. Available only as on premise.
Is Rundeck even needed if you already use Jenkins?
- “Rundeck is made for Operations and knows about the details of your environments.”
- “Jenkins is fundamentally not a deployment tool, although it can be used like one.”
![Page 29: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/29.jpg)
QA framework
Separate project. Own source-code repo.
Implemented in Java. Maven project.Uses standard HTTP clients and Java testing libs (JUnit, TestNG).
Used for functional testing.Blackbox testing of our services (no DB access, log checks…)
Smoke suite: ~1.000 tests, ~5 mins to executeRegression suite: ~10.000 tests, ~35 mins to execute
Every feature or bug we ever had is included in the regression suite.We are constantly supporting 2 API versions with test suites covering both.
![Page 30: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/30.jpg)
Implementation issues…
- New feature branching-out will result with identical copy of Jenkins XML configs.- Jenkins plugins have limited support for conditional executions in some phases.
Limit Jenkins “job” only to be executed from “develop”1
- Another set of conditionals/variables to be set/passed between jobs.QA “job” only to be triggered by service’s “develop”2
- Only way to cover all cases/features is to always deploy and test all service.Know services that are involved in feature3
![Page 31: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/31.jpg)
Try with job chaining
Standard Jenkins "freestyle" jobs support simple sequential tasks execution.
Doesn’t work in our case.- Git triggers would result in service restarts while test execution is active.
Additional idea was to introduce additional branch so that entire flow would not be triggered from “develop”.- Additional work/thinking required from developers.- Where to place “signal” that would trigger entire flow?
![Page 32: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/32.jpg)
Try with plugins
“Closest” to what to we need found in “JobFanIn” plugin.
This plugin provides a watch on upstream projects to trigger downstream projects once all upstream projects are successfully build.
Doesn’t work in our case.- Impossible to predict on which services will feature reside.
![Page 33: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/33.jpg)
Step back. Rethink.
Do we really need to deploy and test everything always?Does this approach actually fits microservices architecture?
LETS SIMPLIFY.For each service only deploy and test itself.
Yes, developers will need to do additional thinking when finishing feature that spans over multiple services.
![Page 34: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/34.jpg)
Testing agreement
On merge to develop (as by Gitflow).
Deploy to live environment - TPI.
Use permanent data sets.
Each micro service (and gateway) will have accompanying QA framework.Upon service deploy execute it.
If feature is on multiple microservices it will be on developers to sequence the testing.
![Page 35: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/35.jpg)
![Page 36: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/36.jpg)
![Page 37: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/37.jpg)
The goal
Start automation on feature completion (code pushed to repository)
RunUnit
Tests
Do StaticCode
Analysis
Build& Save
PackageDeploy
CheckServiceStatus
Run Integration
Tests
Send Reports
![Page 38: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/38.jpg)
Deploy validation
Via healthchecks. Internally exposed HTTP endpoints that provide summary of dependencies’ and internal statuses.
Every product must implement this TM standard.
Response must be quick.Healthcheck status is composed by background job.
Healthchecks are used in monitoring,and by load-balancers.
Rundeck/Jenkins will fail job if healthcheck is negative.
![Page 39: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/39.jpg)
![Page 40: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/40.jpg)
The Question!
Is this we are doing the Continuous Delivery?Or maybe Continuous Deployment?
![Page 41: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/41.jpg)
CD vs CD
Continuous Delivery is about keeping your application in a state where it is always able to deploy into production.
Continuous Deployment is actually deploying every change into production, every day or more frequently.
- Martin Fowler
![Page 42: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/42.jpg)
CD vs CD
Why not all the way to Production?
We (API) are only the half-product. - Vanja Radaković (Product Manager)
Even if all tests on API pass that doesn’t mean no functionality is broken on our clients.
We “sit” a week in Stage env, for sign-off from major clients,between when release is ready and actually deployed to Production.
DEVs TPI Production(s)QAs Stage
![Page 43: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/43.jpg)
Automating the security
Veracode is platform for application security scanning.Commercial. Available only as SaaS.
We have added a branch that (via GitLab and Jenkins)automatically uploads artifacts to Veracode.
Due to long-lasting scan this is not includedin regular flow on feature completion.
There are company-wide defined policies.
We are reviewing status once per Sprint/Release.
![Page 44: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/44.jpg)
![Page 45: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/45.jpg)
Performance testing (WiP)
Running on dedicated environment.Same topology (num. of servers) and data size as in production.
Our production data is imported on demand.
All of backend dependencies are mocked due to difficulties to provision data.TPI we use for functional testing contains inconsistent and not-big-enough data.
Mocks are based on or logs from production.
API mocking tool - WireMock.
What if in need to mock something other than HTTP API, like storage? Rethink your architecture.
![Page 46: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/46.jpg)
Tool - Gatling
Load testing framework.Open source.
Supports code written in Scala or Java.
Can be executed from command line.
Easy to integrate with Jenkins using the official plugin.
![Page 47: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/47.jpg)
![Page 48: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/48.jpg)
![Page 49: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/49.jpg)
Performance testing ideas
Automate in a way similar to security scanning - new branch.
Jenkins to build.Rundeck to deploy.Gatling to execute tests.
Bonus: Attach APM tooling that would provide insights during testing.Currently evaluating New Relic and Ruxit.
![Page 50: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/50.jpg)
Logging
Company standards to separate logs:• application log• payload log (inbound/outbound)• performance log
Only application logs are indexed.Others are available on servers for N days (depending on retention policy).
Unique “Correlation ID” that allows tracking of requests through multiple services and all type of log files.
![Page 51: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/51.jpg)
Tool - Splunk
Platform for operational intelligence. Much more than log aggregation (searching, monitoring and vizualization).
On premise or SaaS.Free and Commercial editions.
Our dashboards: relationships between HTTP errors (not application errors) and clients.
Our alerting: on detected deviation/increase in volume of errors.
![Page 52: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/52.jpg)
![Page 53: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/53.jpg)
![Page 54: Automating the Quality](https://reader037.fdocuments.us/reader037/viewer/2022103010/5871d1751a28ab423c8b5bc5/html5/thumbnails/54.jpg)
Benefits we (Dev team) got
Less thinking for developers.Quicker test and feedback cycles.
Automation on “feature completion”.1
Feeling very comfortable during production deploys.Using same tools for all environments2
Being able to react quickly. Or even do preemptive actions.Visibility of changes and metrics3
No need to “reinvent the wheel”.Shared knowledge. Contributing to solutions.
Company initiatives as guidance4