Giuliana Benedetti - Can Magento handle 1M products?

43

Transcript of Giuliana Benedetti - Can Magento handle 1M products?

About me

• Project Manager @ Webformat

• Magento and TYPO3 projects

• Requirements analysis

• Planning of development and support activities

What’s in the menu?

• Huge catalog

• What we did

• What we are doing

Once upon a time..

• The project began as a migration from a proprietary platform to Magento 1 Community

• Shoes and accessories E-commerce

• We developed the integration between their management software, that was handling products anagraphic, warehouse and orders anagraphic

• Integration with Amazon e Ebay

Products Database

• The original products database counted around 150k products

• Configurable products• On average, 10 simple products for each configurable

• Virtual products

Continued Growth

• In one year only we reached the amount of 700K products stored in Magento

• 66k configurable products

Challenges facedAlignment between catalog and management software

Updating warehouse

Reindexing

Generating images

Server response time

Backoffice operations

Third parts modules integration

Export of feed for Google shopping & Co.

Marketplace synchronization

Disk space

Updating the catalog (1/2)

• Initially 150k products, this is what we planned:• Massive initial import

• Frequent update during the day via webservice

• When the catalog started growing, the data exchange volumes via webservice began unsustainable. The exchange procedure needed a redesign.

Updating the catalog (2/2)

• Today we have700k products• Based on Magmi and CSV file exchange (product anagraphic)

• Nighttime update – the DIFF

• Exceptional whole catalog update

• The client accepted that the new products will be published with a delay of 1 day

Warehouse update (1/2)

• No warehouse fully dedicated to web

• Shared with the offline shops

• It’s not possible to update the warehouse nighttime only and use that stock during the day

• Frequent updates

Warehouse update (2/2)

• Every 15 minutes update from management software by loading the DIFF

• Only stock update

• Via CSV file writing directly on database (Magmi)

Reindex (1/2)

• The bigger the catalog, the slower the reindex

• Initially, the reidex was lauched after each update (15 min)

• After a while, the reindex started being too much time demanding: the update cycle was starting when the previous update reindex cycle was still running.

Reindex (2/2)

• Solution: • All the reindexes have been disabled, except for the stock reindex

• All reindexes are now performed after the nighttime import

• Today a full reindex takes around 75 minutes and generates a heavy load on the database

Catalog_url_rewrite (1/2)

• Magento 1 has a critical point with URL rewrite process:• All product URLs are rewritten, also simple products that are «Not visible

individually» and exist only to be associated to a configurable.

• With 700k products catalog, this meant:• Creating millions of rows in the catalog_url_rewrite table

• An URL rewrite process that takes hours to be completed

Catalog_url_rewrite (2/2)

• A patch has been installed, to avoid the simple and not visible individually products url generation

• Module Dnd_Patchindexurl:https://www.magentocommerce.com/magento-connect/dn-d-patch-index-url-1.html

• Now the reindex process takes around 20 minutes

Images generation (1/2)

• One of the main problems that we had to face was the product thumbnails generation, done by Imagemagik

• Every day hundreds of products are published

We verified that the frontend CPUs were often stressed because of Imagemagik process and the writing operations on database

Images generation (2/2)

• We found a solution in generating the thumbnails during the massive import, so Imagemagik could work together with the import procedure

• Nighttime, the images are generated and saved in a dedicated server, without interfering with user navigation

• Today we have around 881K images saved

Server response time

• With such a huge catalog, some categories hold even hundreds of products

• The first loading time (if they are not cached) is indeed high

• We activated caching on Redis and Varnish

• Not enough, the first loading time was anyway too heavy

Solutions 1/2

• Moving the cache clearing process during the night

• At 8 in the morning, the website navigation was starting to suffer

• We planned a job to pre-cache all the critical pages

• Minimized cache invalidation• Clear cache only for products for which the stock quantity was updated via

WS

Solutions 2/2

• Client training to better handle the cache erasing

• Minimized the number of filters in layered navigation• Each filter increases the reindex time and the pages combinations not cached

Backoffice operations

• Initially all the catalog update activities were performed from Magento backoffice

• Problems:• Frequent reindexes

• Frequent cache updates

• Server load (the backoffice product list filters are CPU demanding and they charge MySql)

• Common operations were slown

• Several BE users ended to be concurrent

Solutions

• Initially a new backoffice server have been introduced• MySql load problem was not solved. Reindex re-caching as well.

• We introduced a new process to handle the catalog, using an excel file • This improved the efficiency of who was managing the anagraphic data

• Massive excel file import performed each 3 days via FTP

• Categories still handled from backoffice

Third party modules integration

• Critical point

• Not all the modules found in the Marketplace are developed in an optimal way• They «simply» load the products collection without pagination

• They execute nested query

• There are cycles on collections that initialize all products unnecessarily

• …

• A big profiling and optimization work was needed

Feed export (Google Shopping & Co.) 1/2

• While the catalog was growing, the feed time export was encreasing as well

• In the very beginning, the exports were handled by a Magento module

Feed export (Google Shopping & Co.) 2/2

• Solution steps:• The module have been replaced with ad-hoc procedures, with high level of

optimization

• The exportation jobs are executed on backoffice server during the night, to not load the frontend

• It have been introduced a MySql slave as data source, to not load the master and the website as a consequence

Marketplace synch

• We are using M2E Pro

• Client side: EAN code full check

• Tech side: handling the automatic synchronization process• An automatic full synchronization is too heavy. When synchronize?

• What synchronize?

• Magmi

Disk space (1/2)

• Well, here we are: even if disk space is quite cheap, using too much of it it’s not convenient..

• Data exchange logs very heavy• Frequent data exchange and huge amount of data

• Log files were growing fast

• Log rotate was activated hourly

• Log are archived after few days

Disk space (2/2)

• High image quantity, continuously growing

• Huge feed export

• Huge CSV import files

• …

• Solutions applied:• Constant monitoring activated

• Activated automatic procedures to clean log, old images, expired feed, etc.

Challenges to be facedElasticsearch integration

Growing catalog, until 1M products

More sells, more page views

Magento 2 migration

Elasticsearch

• For two reasons:• Improve the search functionality offered to the client

• Minimize the load produced by the Magento internal search engine

• Critical issues to be faced:• Catalog index time

• Only configurable products?

• What about the sizes?

1M products

• Expected growth: in 1 year we’ll have 1M products

• At the moment we are performing tests with fake products

• We didn’t detect other critical aspects• At the moment, we had to develop some more data exchange and feed

generation procedures optimization

More sells, more page views

• Sessions are increasing the number of not cached pages views is increasing• Pre – caching extension

• Increasing Varnish cache TTL

• Minimize products in categories and filters used

• Sales are increasing increasing also frequency of out-of stock products • To be evaluated: the impact of new reindex and re-caching politics on client

What if..?

• We’re planning with the client a Magento 2 migration

• We started our tests by migrating the actual Magento 1 environment (700K products) to a Magento 2 installation

• We collected the results and still performing some other tests

HW specs

All tests were run on a VirtualBox VM with Linux Ubuntu 16.04.1 LTS, 8 GB RAM, 1 x 2,60 GHz cpu

Lamp configuration was featuring PHP version 5.6, Apache 2.4.18, MySQL 14.14

Migration was performed from Magento version 1.9.2.2 through 2.1.3

Magento 2 migration (1/4)

• DB migration times: 1h 20‘

• BE performances:

BE Operation Magento 1 with cache Magento 2 with cache

Access to catalog almost 5' 7''

Access to product 3'' 10''

Access to categories 7'' 6''

Product searching 1'5'' 3''

Magento 2 migration (2/4)

• FE performaces for catalog browsing:

FE Operation Magento 1 with cache Magento 2 with cache

Catalog browsing / categories 30'' 7''

Magento 2 migration – Reindex Times (3/4)

M1 M2

Total: 2h 55‘’11’’ Total: 2h 53‘ 47’’

Magento 2 migration (4/4)

• We had some issues with the Catalog Fullsearch reindex (Magento 2)

• we had to apply a patch https://github.com/magento/magento2/issues/5146

• Catalog Fullsearch reindex without patch takes around 2 hours with patch applied took around 1 hour, so the times are quite comparable

02:12:37

02:12:37

Catalog URL rewrite

• M1 with Dnd_Patchindexurl module: 00:14:34

• M1 without Dnd_Patchindexurl module: 01:03:50

• M2: no catalog URL rewrite. URL Rewrite is handled at the product saving

ToolsXdebug

New Relic

AOE Profiler

Conclusions

Yes, we can!

• It’s possible, but not without effort

• Large initial analysis

• Special attention to optimization processes

• What about Magento 2?

Q & A

• Giuliana Benedetti – [email protected]

• WEBFORMAT srl - www.webformat.com