Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the acf domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/html/wp-includes/functions.php on line 6131
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the polylang domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/html/wp-includes/functions.php on line 6131
Warning: Cannot modify header information - headers already sent by (output started at /var/www/html/wp-includes/functions.php:6131) in /var/www/html/wp-includes/feed-rss2.php on line 8
To demonstrate the optimization ideas better, we have prepared two Information Retrieval datasets.
It is worth mentioning that Elasticsearch is designed by default for a much larger collection (millions of documents). However, we found the limited SQUAD version faster to compute and well-generalizing.
SQUAD paragraphs come from Wikipedia, so the text is concise and well written, and is not likely to contain errors. Meanwhile, the SWIFT UI benchmark consists of texts from recorded speech samples – it is more vivid, less concrete, but still grammatically correct. Moreover, it is rich in technical, software engineering-oriented vocabulary.

For validation of the Information Retrieval task, usually the MRR (mean reciprocal rank) or MAP (mean average precision) are used. We also use them on a daily basis; however, for the purpose of this article, to simplify the interpretation of outcomes, we have chosen the ones which are much more straightforward – the ratio of answered questions within top N hits: hits@10, hits@5, hits@3, hits@1. For implementation details see our NeuroSYS GitHub repository, where you can find other articles, and our MAGDA library.
As described in the previous article, we can use a multitude of different analyzers to perform standard NLP preprocessing operations on indexing texts. As you can probably recall, analyzers are by and large a combination of tokenizers and filters and are used for storing terms in the index in an optimally searchable form. Hence, experimenting with filters and tokenizers should probably be the first step you should take towards optimizing your engine’s performance.
To confirm the above statement, we present validation results of applying different analyzers to the limited SQUAD documents. Depending on the operations performed, the effectiveness of the search varies significantly.
We provide the results of experiments carried out using around 50 analyzers on the limited SQUAD sorted by hits@10. The table is collapsed for readability purposes; however, feel free to take a look at the full results and code on our GitHub.

Based on our observations of multiple datasets, we present the following conclusions about analyzers, which, we hope, will be helpful during your optimization process. Please bear in mind that these tips may not apply to all language domains, but we still highly recommend trying them out by yourselves on your datasets. Here is what we came up with:
It is also worth noting that the default standard analyzer, which consists of a standard tokenizer, lowercase, and stop-words filters, usually works quite well as it is. Nevertheless, we were frequently able to outperform it on multiple datasets by experimenting with other operations.
As we know, Elasticsearch uses Lucene indices for sharding, which works in favor of time efficiency, but can also give you a headache if you are not aware of it. One of the surprises is that Elasticsearch carries out score calculation separately for each shard. It might affect the search performance, if too many shards are used. In consequence, the results can turn out to be non-deterministic between indexations.
Inverse Document Frequency is an integral part of BM25 and is calculated for each term, while putting documents into separate buckets. Therefore, the search score may differ more for particular terms, the more shards we have.
Nevertheless, it is possible to force Elasticsearch to calculate the BM25 score for all shards together, treating them as if they were a single, big index. However, it affects the search time greatly. If you don’t care about the search time but about the consistency/reproducibility, consider using Distributed Frequency Search. It will sum up all BM25 factors, regardless of the number of shards.
We have presented the accuracy of the Information Retrieval task in the below table. Note: It was our intention to focus on accuracy of the results and not on how fast we’ve managed to acquire them.

It can be clearly seen that the accuracy fluctuates when changing the number of shards. It can also be noted that the number of shards does not affect the scores when using DFS.
However, with a dataset large enough, the impact of shards will be less. The more documents in an index, the more IDF parts of BM25 become normalized throughout shards.

In the table above, you can observe that the impact of the shards (a relative difference between DFS and non-DFS scores) is lower the more documents are indexed. Hence, the problem is less painful when working with more extensive collections of texts. However, in such a case, it is more probable that we would require more shards due to time performance. When it comes to smaller indices, we recommend setting the shards’ number to the default value of one and not worrying too much about the shards effect too much.
BM25 is a well-established scoring algorithm that performs great in many cases. However, if you would like to try out other algorithms and see how well they do in your language domain, Elasticsearch allows you to choose from a couple of implemented functions or to define your own if needed.
Even though we do not recommend starting optimization by changing the scoring algorithm, the possibility remains open. We would like to present results on SQUAD 10k with the use of the following functions:


As you can see in the case of the limited SQUAD, the BM25 turned out to be the best-performing scoring function. However, when it comes to SWIFT UI, slightly better results can be obtained using the alternative similarity scores, depending on the metric we care about.
Staying on the scoring topic, there are a couple of parameters the values of which can be changed within the BM25 algorithm. However, as in the case of choosing other scoring functions, we again do not recommend changing the parameters as the first steps of optimization.
The default values for parameters are:
They usually perform best across multiple benchmarks, which we’ve confirmed as well in our tests on SQUAD.
Keep in mind that despite the defaults being considered most universal, it doesn’t mean you should ignore other options. For example, in the case of the SWIFT UI dataset, other values performed better by 2% on the top 10 hits.

In this case, the default parameters turned out to be again the best for SQUAD, while SWIFT UI would benefit more from other ones.
As already mentioned, there are plenty of options in NLP, which text can be enriched with. We would like to show you what happens when we decide to add synonyms or other word derivatives like phonemes.
For the implementation details, we once again encourage you to have a glimpse at our repository.
Wondering how to make our documents more verbose or easier to query, we may try to extend the available wording used for document descriptions. However, this must be done with great care. Blindly adding more words to documents may lead to loss of their meaning, especially when it comes to the longer texts.
It is possible to automatically extend our inverted index with additional words, using synonyms from the WordNet synsets. Elasticsearch has a built-in synonyms filter that allows for an easy integration.
Below, we’ve presented search results on both SQUAD and SWIFT UI datasets with and without the use of all available synonyms.


As can be seen, using automatic, blindly added synonyms reduced the performance drastically. With thousands of additional words, documents’ representations get overpopulated; thus they lose their original meaning. Those redundant synonyms may not only fail to improve documents’ descriptiveness, but may also harm already meaningful texts.

The number of terms in the SWIFT UI dataset has more than tripled when synonyms were used. It brings very negative consequences for the BM25 algorithm. Remember that the algorithm penalizes lengthy texts, hence documents that were previously short and descriptive may now be significantly lower on your search results page.
Of course, using synonyms may not always be a poor idea, but it might require some actual manual work.
Our intention was to create a simulation with certain business entities to which one can refer in search queries in many different ways. Below you can see the results.

Search performance improves with the use of manually added synonyms. Even though the experiment was carried out on a not too large sample, we hope that it illustrates the concept well – you can benefit from adding some meaningful words’ equivalents if you have proper domain knowledge. The process is time-consuming, and can hardly be automated; however, we believe it to be often worth the invested time and effort.
It should be noted that, when working with ASR (automatic speech recognition) transcriptions, many words can be recognized incorrectly. They are often subject to numerous errors in transcription since some phrases and words sound alike. It might also happen that non-native speakers may mispronounce the words. For example:

To use a phonetic tokenizer a special plugin must be installed in the Elasticsearch node.
The sentence “Tom Hanks is a good actor as he loves playing” is represented as:
and

We’ve come to the conclusion that using phonemes instead of the original text in the case of high-quality, non-ASR datasets like SQUAD does not yield much of an improvement. However, indexing phonemes and the original text in separate fields, and searching by both of them, slightly increased the performance. In the case of SWIFT UI the quality of transcriptions is surprisingly good, although the text comes from ASR. Therefore, the phonetic tokenizer is not applicable here as well.
Note: It might be a good idea to use phonetic tokenizers when working with more corrupted transcriptions, when the text is prone to typos and errors.
You might come up with the idea of putting additional fields to the index and expect them to boost the search performance. In Data Science it’s called feature engineering, or an ability to derive and create more valuable and informative features from available attributes. So, why not try deriving new features from text and index them in parallel as separate fields?
In this little experiment, we wanted to prove whether the above idea makes sense in Elasticsearch, and how to achieve it. We’ve tested it by:
,Note: The named entities, as well as keywords, are the excerpts already existing in the text but were extracted to separate fields. In contrast, lemmas are additionally processed words; they provide more information than available in the original text.

While we were conducting the experiments, we discovered that, in this case, keywords and NERs did not improve the IR performance. On the contrary, word lemmatization seemed to provide a significant boost.
As a side note, we have not compared the lemmatization with stemming in this experiment. It’s worth mentioning that lemmatization is usually much trickier and can perform slightly worse in relation to stemming. For English, stemming is usually enough; however, in the case of other languages cutting off the suffixes will not suffice.
Based on our experience, we can also say that indexing parts of the original text without modifications, and putting them into separate fields, doesn’t provide much improvement. In fact, BM25 does just fine with keywords or Named Entities left in the original text, and thanks to the algorithm’s formula, it knows which words are more important than others, so there is no need to index them separately.
In short, it seems that fields providing some extra information (such as text title) or containing additionally processed, meaningful phrases (like word lemmas) can improve search accuracy.
Last but not least, there are numerous options for creating queries. Not only can we change the query type but also we can boost individual fields in an index. Next to analyzer usage, we highly recommend experimenting with this step, as it usually improves the results.
We have conducted a small experiment, in which we have tested the following types of Elastic multi-match queries: best_fields, most_fields, cross_fields, on fields:
Transformers,Alongside, we have boosted each field from the default value of 1.0 to 2.0 with increments of 0.25.

As it has been proven above, the results on SQUAD dataset, despite being limited, show that queries of cross_field type provided the best results. What should also be noted is that boosting the title field was a good choice, as in most cases, it already contained important and descriptive data about the whole document. We’ve also observed that boosting only the keywords or NER fields gives the worst results.
However, as often happens, there is nothing like one clear and universal choice. When experimenting with SWIFT UI, we’ve figured that the title field is less important in this case, as it is often missing or contains gibberish. Also, when it comes to the query type, while cross_fields usually appears at the top, there are plenty of best_fields queries with very similar performance. In both cases, most_fields queries are usually placed somewhere in the middle.
Keep in mind that it all will most likely come down to analysis per dataset, as each of them is different, and other rules may apply. Feel free to use our code, plug in your dataset and find out what works best for you.
Compared to deep learning Information Retrieval models, full-text search still performs pretty well in plenty of use cases. Elasticsearch is a great and popular tool, so you might be tempted to start using it right away. However, we encourage you to at least read up a bit upfront and then try to optimize your search performance. This way you will avoid falling into a wrong-usage-hole and the attempts to get out of it.
We highly recommend beginning with analyzers and query optimization. By utilizing ready-to-use NLP mechanisms in Elastic, you can significantly improve your search results. Only then, proceed further with more sophisticated or experimental ideas like scoring functions, synonyms or additional fields.
Remember, it is crucial to apply methods appropriate to the nature of your data and to use a reliable validation procedure, adapted to the given problem. In this subject, there is no “one size fits all” solution.
]]>To demonstrate the optimization ideas better, we have prepared two Information Retrieval datasets.
It is worth mentioning that Elasticsearch is designed by default for a much larger collection (millions of documents). However, we found the limited SQUAD version faster to compute and well-generalizing.
SQUAD paragraphs come from Wikipedia, so the text is concise and well written, and is not likely to contain errors. Meanwhile, the SWIFT UI benchmark consists of texts from recorded speech samples – it is more vivid, less concrete, but still grammatically correct. Moreover, it is rich in technical, software engineering-oriented vocabulary.

For validation of the Information Retrieval task, usually the MRR (mean reciprocal rank) or MAP (mean average precision) are used. We also use them on a daily basis; however, for the purpose of this article, to simplify the interpretation of outcomes, we have chosen the ones which are much more straightforward – the ratio of answered questions within top N hits: hits@10, hits@5, hits@3, hits@1. For implementation details see our NeuroSYS GitHub repository, where you can find other articles, and our MAGDA library.
As described in the previous article, we can use a multitude of different analyzers to perform standard NLP preprocessing operations on indexing texts. As you can probably recall, analyzers are by and large a combination of tokenizers and filters and are used for storing terms in the index in an optimally searchable form. Hence, experimenting with filters and tokenizers should probably be the first step you should take towards optimizing your engine’s performance.
To confirm the above statement, we present validation results of applying different analyzers to the limited SQUAD documents. Depending on the operations performed, the effectiveness of the search varies significantly.
We provide the results of experiments carried out using around 50 analyzers on the limited SQUAD sorted by hits@10. The table is collapsed for readability purposes; however, feel free to take a look at the full results and code on our GitHub.

Based on our observations of multiple datasets, we present the following conclusions about analyzers, which, we hope, will be helpful during your optimization process. Please bear in mind that these tips may not apply to all language domains, but we still highly recommend trying them out by yourselves on your datasets. Here is what we came up with:
It is also worth noting that the default standard analyzer, which consists of a standard tokenizer, lowercase, and stop-words filters, usually works quite well as it is. Nevertheless, we were frequently able to outperform it on multiple datasets by experimenting with other operations.
As we know, Elasticsearch uses Lucene indices for sharding, which works in favor of time efficiency, but can also give you a headache if you are not aware of it. One of the surprises is that Elasticsearch carries out score calculation separately for each shard. It might affect the search performance, if too many shards are used. In consequence, the results can turn out to be non-deterministic between indexations.
Inverse Document Frequency is an integral part of BM25 and is calculated for each term, while putting documents into separate buckets. Therefore, the search score may differ more for particular terms, the more shards we have.
Nevertheless, it is possible to force Elasticsearch to calculate the BM25 score for all shards together, treating them as if they were a single, big index. However, it affects the search time greatly. If you don’t care about the search time but about the consistency/reproducibility, consider using Distributed Frequency Search. It will sum up all BM25 factors, regardless of the number of shards.
We have presented the accuracy of the Information Retrieval task in the below table. Note: It was our intention to focus on accuracy of the results and not on how fast we’ve managed to acquire them.

It can be clearly seen that the accuracy fluctuates when changing the number of shards. It can also be noted that the number of shards does not affect the scores when using DFS.
However, with a dataset large enough, the impact of shards will be less. The more documents in an index, the more IDF parts of BM25 become normalized throughout shards.

In the table above, you can observe that the impact of the shards (a relative difference between DFS and non-DFS scores) is lower the more documents are indexed. Hence, the problem is less painful when working with more extensive collections of texts. However, in such a case, it is more probable that we would require more shards due to time performance. When it comes to smaller indices, we recommend setting the shards’ number to the default value of one and not worrying too much about the shards effect too much.
BM25 is a well-established scoring algorithm that performs great in many cases. However, if you would like to try out other algorithms and see how well they do in your language domain, Elasticsearch allows you to choose from a couple of implemented functions or to define your own if needed.
Even though we do not recommend starting optimization by changing the scoring algorithm, the possibility remains open. We would like to present results on SQUAD 10k with the use of the following functions:


As you can see in the case of the limited SQUAD, the BM25 turned out to be the best-performing scoring function. However, when it comes to SWIFT UI, slightly better results can be obtained using the alternative similarity scores, depending on the metric we care about.
Staying on the scoring topic, there are a couple of parameters the values of which can be changed within the BM25 algorithm. However, as in the case of choosing other scoring functions, we again do not recommend changing the parameters as the first steps of optimization.
The default values for parameters are:
They usually perform best across multiple benchmarks, which we’ve confirmed as well in our tests on SQUAD.
Keep in mind that despite the defaults being considered most universal, it doesn’t mean you should ignore other options. For example, in the case of the SWIFT UI dataset, other values performed better by 2% on the top 10 hits.

In this case, the default parameters turned out to be again the best for SQUAD, while SWIFT UI would benefit more from other ones.
As already mentioned, there are plenty of options in NLP, which text can be enriched with. We would like to show you what happens when we decide to add synonyms or other word derivatives like phonemes.
For the implementation details, we once again encourage you to have a glimpse at our repository.
Wondering how to make our documents more verbose or easier to query, we may try to extend the available wording used for document descriptions. However, this must be done with great care. Blindly adding more words to documents may lead to loss of their meaning, especially when it comes to the longer texts.
It is possible to automatically extend our inverted index with additional words, using synonyms from the WordNet synsets. Elasticsearch has a built-in synonyms filter that allows for an easy integration.
Below, we’ve presented search results on both SQUAD and SWIFT UI datasets with and without the use of all available synonyms.


As can be seen, using automatic, blindly added synonyms reduced the performance drastically. With thousands of additional words, documents’ representations get overpopulated; thus they lose their original meaning. Those redundant synonyms may not only fail to improve documents’ descriptiveness, but may also harm already meaningful texts.

The number of terms in the SWIFT UI dataset has more than tripled when synonyms were used. It brings very negative consequences for the BM25 algorithm. Remember that the algorithm penalizes lengthy texts, hence documents that were previously short and descriptive may now be significantly lower on your search results page.
Of course, using synonyms may not always be a poor idea, but it might require some actual manual work.
Our intention was to create a simulation with certain business entities to which one can refer in search queries in many different ways. Below you can see the results.

Search performance improves with the use of manually added synonyms. Even though the experiment was carried out on a not too large sample, we hope that it illustrates the concept well – you can benefit from adding some meaningful words’ equivalents if you have proper domain knowledge. The process is time-consuming, and can hardly be automated; however, we believe it to be often worth the invested time and effort.
It should be noted that, when working with ASR (automatic speech recognition) transcriptions, many words can be recognized incorrectly. They are often subject to numerous errors in transcription since some phrases and words sound alike. It might also happen that non-native speakers may mispronounce the words. For example:

To use a phonetic tokenizer a special plugin must be installed in the Elasticsearch node.
The sentence “Tom Hanks is a good actor as he loves playing” is represented as:
and

We’ve come to the conclusion that using phonemes instead of the original text in the case of high-quality, non-ASR datasets like SQUAD does not yield much of an improvement. However, indexing phonemes and the original text in separate fields, and searching by both of them, slightly increased the performance. In the case of SWIFT UI the quality of transcriptions is surprisingly good, although the text comes from ASR. Therefore, the phonetic tokenizer is not applicable here as well.
Note: It might be a good idea to use phonetic tokenizers when working with more corrupted transcriptions, when the text is prone to typos and errors.
You might come up with the idea of putting additional fields to the index and expect them to boost the search performance. In Data Science it’s called feature engineering, or an ability to derive and create more valuable and informative features from available attributes. So, why not try deriving new features from text and index them in parallel as separate fields?
In this little experiment, we wanted to prove whether the above idea makes sense in Elasticsearch, and how to achieve it. We’ve tested it by:
,Note: The named entities, as well as keywords, are the excerpts already existing in the text but were extracted to separate fields. In contrast, lemmas are additionally processed words; they provide more information than available in the original text.

While we were conducting the experiments, we discovered that, in this case, keywords and NERs did not improve the IR performance. On the contrary, word lemmatization seemed to provide a significant boost.
As a side note, we have not compared the lemmatization with stemming in this experiment. It’s worth mentioning that lemmatization is usually much trickier and can perform slightly worse in relation to stemming. For English, stemming is usually enough; however, in the case of other languages cutting off the suffixes will not suffice.
Based on our experience, we can also say that indexing parts of the original text without modifications, and putting them into separate fields, doesn’t provide much improvement. In fact, BM25 does just fine with keywords or Named Entities left in the original text, and thanks to the algorithm’s formula, it knows which words are more important than others, so there is no need to index them separately.
In short, it seems that fields providing some extra information (such as text title) or containing additionally processed, meaningful phrases (like word lemmas) can improve search accuracy.
Last but not least, there are numerous options for creating queries. Not only can we change the query type but also we can boost individual fields in an index. Next to analyzer usage, we highly recommend experimenting with this step, as it usually improves the results.
We have conducted a small experiment, in which we have tested the following types of Elastic multi-match queries: best_fields, most_fields, cross_fields, on fields:
Transformers,Alongside, we have boosted each field from the default value of 1.0 to 2.0 with increments of 0.25.

As it has been proven above, the results on SQUAD dataset, despite being limited, show that queries of cross_field type provided the best results. What should also be noted is that boosting the title field was a good choice, as in most cases, it already contained important and descriptive data about the whole document. We’ve also observed that boosting only the keywords or NER fields gives the worst results.
However, as often happens, there is nothing like one clear and universal choice. When experimenting with SWIFT UI, we’ve figured that the title field is less important in this case, as it is often missing or contains gibberish. Also, when it comes to the query type, while cross_fields usually appears at the top, there are plenty of best_fields queries with very similar performance. In both cases, most_fields queries are usually placed somewhere in the middle.
Keep in mind that it all will most likely come down to analysis per dataset, as each of them is different, and other rules may apply. Feel free to use our code, plug in your dataset and find out what works best for you.
Compared to deep learning Information Retrieval models, full-text search still performs pretty well in plenty of use cases. Elasticsearch is a great and popular tool, so you might be tempted to start using it right away. However, we encourage you to at least read up a bit upfront and then try to optimize your search performance. This way you will avoid falling into a wrong-usage-hole and the attempts to get out of it.
We highly recommend beginning with analyzers and query optimization. By utilizing ready-to-use NLP mechanisms in Elastic, you can significantly improve your search results. Only then, proceed further with more sophisticated or experimental ideas like scoring functions, synonyms or additional fields.
Remember, it is crucial to apply methods appropriate to the nature of your data and to use a reliable validation procedure, adapted to the given problem. In this subject, there is no “one size fits all” solution.
]]>Do you happen to hear about custom web applications at every turn? Are there really no other apps in the tank? Let’s find out together what the fuss is all about.
It might seem we’re taking it backward, but somehow it looks simpler to start this way. So, firstly, let’s rule out two types of non-custom web apps.
Off-the-shelf web applications are the ones that you buy as finished, ready-made products. Usually, you can label them with your brand or integrate them with your digital product. But, by and large, you can’t modify them, or it is possible only to a limited extent, not to mention adding new functionalities, that’s way beyond the scope. So, instead, you take the app as it is served.
Ready-made apps are sold to many companies in the same form, often in a subscription-based SaaS model. And it comes with a price.
Yet another type of web application is a customized or customizable app. These products can be personalized according to the client’s needs but usually aren’t built from scratch. Our learning management system Samelane is a good example. It comes as a ready-made package but we also can, and often do, customize it for particular clients, adding dedicated features and functionalities.
Just to flag it up, sometimes building a custom app would be beating a dead horse. Mainly if you’re working on a low-scale product or app analogous to many others on the market or you have a low budget and prefer to pay a monthly fee rather than spend loads on development. What is more, an off-the-shelf app is ready and can be used in no time.
There are out-of-the-box solutions that can serve your purpose, such as ready-made CRM and CMS systems, e-commerce engines, booking systems, and chats. Hence, it might be more economical to incorporate them into your app instead of developing a new one.
And even when we need a custom app, because the off-the-shelf option demonstrates notable lacks, it’s worth calculating if it won’t be more cost-effective to brush them off. Sometimes adjusting internal processes is more efficient than building a brand new dedicated app. Thus, it would help if you had a custom web application development strategy to decide which option would fit the bill.

Let us dissect the frog here and assess each of the term components. Then, based on that, you will be able to appraise every digital product.
Custom
It means the product is built according to the unique requirements and business goals, which translate into features, design, user experience, etc.
Web
It means that the product is accessed via the Internet, through a browser. So users don’t have to download, update, and configure it to enjoy all the features,
App
The product provides particular functionalities and two-way interaction, unlike informational websites that present data mainly.
Summing up, a custom web app is a unique digital product that can be accessed via the Internet providing its users with functionalities and allowing for some interaction.
Custom web apps are highly desirable when you need:
Custom development allows you to craft products like no other. You’re totally free to choose or design its functionalities and UX/UI. Provided, of course, that your unique selling point depends on them.
Building a custom product, you’re free to add new features or resources (such as cloud storage) along with your company growth. It is less costly than upgrading the off-the-shelf app licenses. Also, you can integrate your app with other systems in the future, and you’re ready for that.
An app built from scratch frees you from external providers and the changes they introduce, among others, in pricing. Not only that, but you can also opt out of the on-premise installation, when your security policies prohibit the cloud solutions.
If you intend to build the next digital product sooner or later, with at least partially similar functionalities, your backend code will be fully reusable. As a consequence, the development will be faster and more cost-effective.
Similarly to tailored suits, building a custom digital product is more time-consuming and costly (in the short run). But at the same time, it fits your needs better, and the initial cost pays off later.

It’s time to discuss the stages and methods that lead you from the grand idea to the final product. Same as Rome wasn’t built in a day, a custom web application won’t be either, in fact, there is no such thing as a final product in the case of apps. You most certainly will add functionalities and improve the app with time, based on actual user data. Also, you have to keep tabs on the market and competition activity to keep up and introduce adjustments. There is no time to rest on laurels.
First things first, you need a bright product idea. You don’t want to build a run-of-the-mill product, do you? Thus, it has to be an informed decision on how to position yourself in the market. To do so, you will need to:
So now you know your competition and have a concrete idea of what you want to build. The next step is to put it into action. What you need is a plan that includes:
Now it’s time to work on the visual aspects of your app which translates to UX/UI design. The techniques worth mentioning here are:
Not to delve into too much detail here; your development team can start working on the app code when the design part is ready. At this stage, it is finally brought to life.
Although testing should be done regularly within the development phase, before the product is launched, you should verify if it does the job, as simple as that. Thus, your quality assurance (QA) team has to test the outcome thoroughly in terms of:
Remember that the devil is in the detail, so don’t turn a blind eye to any bug! Do not fear, though, because we’ve got an article on tests ready for you.
When it comes to deployment, it’s time to work on the following matters:
And here we go, you’re ready to push the boat out!
Here’s a quick recap on the topic of custom web apps. They are unique products accessed via the Internet that do more than just inform the users. Particular functionalities of the apps allow for interaction. The custom app’s main benefits are scalability, freedom to develop it as you wish, and the independence they grant you.
However, if building a custom web app gives you the willies, defer to the experts then. We’re always here to help, whatever stage of the app building you’re currently at. We would gladly discuss your project during our free consultations. Just let us know!
]]>During our work in NeuroSYS, we’ve dealt with a variety of problems in Natural Language Processing, including Information Retrieval. We have mainly focused on deep learning models based on Transformers. However, Elasticsearch has often served us as a great baseline. We have been using this search engine extensively; thus, we would like to share our findings with you.
But why should you read this if you can go straight to the Elasticsearch documentation? Don’t get us wrong, the documentation is an excellent source of information, that we rely on everyday. However, as documentations do, they need to be thorough and include every bit of information on what the tool has got to offer.
Instead, we will focus more on NLP and practical aspects of Elasticsearch. We’ve also decided to split this article into two parts:
Even if you are more interested in the latter, we still strongly encourage you to read the introduction.
In the following five steps, we reveal what we find to be the most important to start experimenting with your search results quality improvement.
Elasticsearch is a search engine, used by millions for finding query results in no time. Elastic has many applications; however, we will mainly focus on aspects most crucial for us in Natural language processing – the functionality of so-called full-text search.
Note: This article concentrates on the seventh version of Elasticsearch, as of writing this article, a more recent version 8 is already released that comes with some additional features.
But wait, isn’t a commonly used database designed to store and search for information quickly? Do we really need Elastic or any other search engine? Well yes and no. Databases are great for fast and frequent inserts, updates, or deletes, unlike Data Warehouses or Elasticsearch.
Yes, that’s right, Elasticsearch is not a good choice when it comes to endless inserts. It’s often recommended to treat Elastic as “once built, never modified again.” It is mainly due to the way inverted indices work – they are optimized for search, not modification.
Besides, databases and Elastic differ in their use case for searching. Let’s use an example for better illustration; imagine you run a library and have plenty of books in your collection. Each book can have numerous properties associated with it, for example the title, text, author, ISBN (unique books identifier), etc., which all have to be stored somewhere, most probably in some sort of database.

When trying to find a particular book of a given author in a query, this search is likely fast. Probably even faster if you create a database index on this field. Then it is saved on a disk in a sorted manner, which speeds up the lookup process significantly.
But what if you wanted to find all books containing a certain text fragment? In a database, we would probably look at SQL LIKE statement, possibly with some wildcards %.
Soon, further questions come along:
You can probably see how problematic dealing with the more complex search is when using SQL-like queries and standard databases. That’s the exact use case for a search engine.
In short, if you want to search by ISBN, title or author, go ahead and use the database. However, if you intend to search for documents based on passages in a long text, at the same time focusing on the relevance of words, a search engine, Elasticsearch in particular, would be a better choice.
Elasticsearch manages to deal with matching queries and documents’ texts through a multitude of various query types that we’ll expand further on. However, its most important feature is an inverted index, created on terms coming from tokenized and preprocessed original texts.

The inverted index can be thought of as a dictionary: we look for some word and get a matching description. So, here what it basically is, is a mapping from a single word/words to a whole document.
Given the previous example of a book, we would create an inverse index by taking the key words from a book’s content or the ones that describe it best, and map them as a set/vector, which from now on would represent that book.
So normally, when querying, we would have to go through each database row and check for some condition. Instead, we can break up the query into a tokenized representation (a vector of tokens) and only compare this vector to an already stored vector of tokens in our database. Thanks to that, we can also easily implement a scoring mechanism to measure how relatable all objects are to this query.
As a side note, it is also worth adding that each Elasticsearch cluster comprises many indices, which in turn contain many shards, also called Apache Lucene indices. In practice, it uses several of these shards at once to subset the data for faster querying.
Elasticsearch is a wonderful tool; however, as in the case of many tools, when used incorrectly,, can cause as many problems as it actually solves. What we would like you to grasp from this article is that Elasticsearch is not a database but a search engine and should be treated as such. Meaning, don’t treat it as the only data storage you have. There are multiple reasons for that but we think the most important ones are:

In re 1)
Don’t pollute the search engine with stuff you don’t intend to use for searching. We know how databases grow, and schemas change with time. New data gets added causing more complex structures to form. Elasticsearch won’t be fine with it; therefore, a separate database from which you can link some additional information to your search results, might be a good idea. Besides, additional data may also influence the search results, as you will find out in the section on BM25.
In re 2)
Inverted indices are costly to create and modify. New entries in Elasticsearch enforce changes in the inverted index. The creators of Elastic have thought of that as well, and instead of rebuilding the whole index every time an update happens (eg. 10 times a second), a separate small Lucene index is created (lower level mechanism Elastic builds on). It is then merged (reindex operation) with the main one. The process takes place every second by default, but it also needs some time to complete reindexing. It takes even more time when dealing with more replicas and sharding.

Any extra data will cause the process to take even longer. For this reason, you should only keep important search data in your indices. Besides, don’t expect the data to be immediately available, as Elastic is not ACID compliant, as it is more like a NoSQL datastore that focuses mainly on BASE properties.
The terms stored in the index influence the scoring mechanism. BM25 is the default scoring/relevance algorithm in Elasticsearch, a successor to TF-IDF. We will not dive into the math too much here, as it would take up an entirety of the article. However, we will pick the most important parts and try to give you a basic understanding of how it works.
The equation might be a little confusing at first, but it becomes pretty intuitive when looking at each component separately.
For example:

If we tokenized the sentence, we would expect words like Elasticsearch, search, engine, querying to be more valuable than is, a, cool, designed, for, fast, as the latter ones contribute less to the essence of this sentence.
Probably the first question you’d need to raise when thinking of optimization is: how the texts are preprocessed and represented within your inverted index. There are many ready-to-use concepts in Elasticsearch, which are taken from Natural Language Processing. They are encapsulated within so-called analyzers that change the continuous text into separate terms, which are indexed instead. In “Layman’s terms”, an analyzer is both a Tokenizer, which divides the text into tokens (terms), and a collection of Filters, which do additional processing.
We can use built-in Analyzers provided by Elastic, or define our own. In order to create a custom one, we should determine which tokenizer we’d like to use and provide a set of filters.
We can apply three possible analyzers’ types to a given field, which varies based on how and when they process text:
Usually, there is no point in applying a search analyzer different from an indexing analyzer. Additionally, if you would like to test them yourself, it can be easily done via the built-in API or directly from the library in the language of your choice.
The built-in analyzers should be able to cover the most often used operations applied during indexing. If needed, you can use analyzers that are explicitly made for a specific language, called Language analyzers.
Despite their name, Filters not only perform token selection but are also responsible for a multitude of common NLP preprocessing tasks. They can also be used for a number of operations such as:
However, they cannot perform lemmatization. Below, we’ve listed some of the most common ones. However, if you’re interested in the complete list of available filters, you can find it here.
They aim to divide the text into tokens according to a selected strategy, for example:
In the diagram below, we present some exemplary analyzers and their results on the sentence “Tom Hanks is a good actor, as he loves playing.”

Each tokenizer operates differently, so pick the one that works best for your data. However, a standard analyzer is usually a good fit for many scenarios.
Elasticsearch enables a variety of different query types. The basic distinction we can make is whether we care about the relevance score or not. Having this considered, we have got two contexts to choose from:
So, use a query context to tell how closely documents match the query and a filter context to filter out unmatched documents that will not be considered when calculating the score.
Even though we’ve already stated that we will mainly focus on text queries, it’s essential to at least understand the basics of Bool queries since match queries boil down to them. The most significant aspect is the operator we decide to use. When creating queries, we would often like to use logical expressions like AND, OR, NOR. They are available in Elasticsearch DSL (domain specific language) as must, should, and must_not, respectively. Using them we can easily describe the required logical relationships.
These are the ones we are most interested in, since they are ideal for fields containing text on which an analyzer has been applied. It is worth noting that when querying each field during a search, the query text will also be processed with the same analyzer used for indexing the field.
There are several types of FTS queries:
We will now focus on explaining Match based queries in more detail, as we find them versatile enough to do everything we need while being pretty quick to write and modify.
Match query – this is a standard for full-text searches, where each query is analyzed the same way as the field it is matched against. We find the following parameters to be the most important ones:
Match phrase query – it’s a variation of match query where all terms must appear in the queried field, in the same order, next to each other. The sequence can be modified a bit when using an analyzer that removes stopwords.
Match prefix query – it converts the last term in the query into a prefix term, which acts as a term followed by a wildcard. There are two types of this query:

When using a match phrase prefix query, “Bitcoin mining c” would be matched with both documents “Bitcoin mining center”, as well as “Bitcoin mining cluster”, since the first two words form a phrase, while the last one is considered as a prefix.
Combined fields query – allows for searching through multiple fields as if they were combined into a single one. Clarity is a huge advantage of combined fields query, since when creating this type of a query it is converted to a boolean query and chosen logical operators are used. However, there is one important assumption for combined fields query; all queried fields require the same analyzer.
The disadvantage of this query is the increased search time, as it must combine fields on the fly. That’s why, it might be wiser to use copy_to when indexing documents.

Copy_to allows for creating separate fields which combine data from other fields. Which translates into no additional overhead during searches.
Multi match query – it differs from combined fields, since it enables querying multiple fields that have different analyzers applied or even of a different type. The most important parameter is the type of a query:
Note: best_fields and most_fields are treated as FIELD centric, meaning that matches in a query are applied per field instead of per term. For example, query “Search Engine” with operator AND means that all terms must be present in a single field, which might not be our intention.
We would also like to highlight that queries can be boosted. We use this feature extensively on a daily basis.

This query would multiply the score for the field Title by 2 times, Author by 4 times while Description score will remain unboosted. Boost can be an integer or a floating point number; however it must be greater or equal to 1.0.
To sum up, we’ve presented five steps we find crucial to start working with Elastic. We’ve discussed what Elasticsearch is, and what it isn’t, and how you’re supposed to and not supposed to use it. We’ve also described the scoring mechanism and various types of queries and analyzers.
We are confident that the knowledge collected in this article is essential to start optimizing your search results. The article was intended as an introduction to some key concepts, but also as a foundation for the next one, in which we will provide you with examples of what is worth experimenting with, and will share the code.
We hope that this blog post gave you some insight on how different search mechanisms work. We hope you’ve learned something new or handy, which one day you might find useful in your projects.
]]>Knowing your users inside out is the best starting point for building physical and digital products. Before you think about its awe-inspiring features and dazzling looks, it’s good to learn if your customers actually need them. User research will help you discover that by gathering insights into their motivations and behaviour. Only with this knowledge will you be able to build desirable products and services, and guide users best via app design and copy.
User research is popular in UI design, UX design, and UX writing. When planning your study, always remember to set clear objectives and determine available resources, so you don’t bite more than you can chew.
After a short introduction, we’re good to go!
User research is a meticulous study of customers held to understand their needs, problems, and motivations. The study aims to create the best products, in terms of design and usability, which in our case applies to web and mobile applications.
The tool is particularly helpful when product owners, together with their teams, have to make tough decisions. After the research is done, they can do it based on insight and information rather than a personal conviction or lucky guess.
What is important to mention here is that user research is a methodological and structured approach that has to follow certain research principles. Therefore, asking your friends and colleagues how they like your app, by far, can’t be called user research!

Before we dive deeper into specific types and methods, and before you decide on taking a particular approach, we recommend you to think over the following issues:
What do I want to find out? So you won’t spend time and money on research that doesn’t bring you any closer to your goals. A good example would be: What do users need first and foremost in a smart home app?
Do I have the capacity to conduct research? Your study will undoubtedly result in vast amounts of data, documents, and various kinds of files. Thus you will need enough resources not only to conduct the research but also to organise data and analyse it. Since it is a structured task, you’ll also need a person responsible for the project.
Do I know how to tackle legal matters? Taking into account GDPR and other regulations, user data is extremely sensitive. You can’t take the issue lightly when working with real people, gathering their personal information. No matter whether you analyse data collectively or record interviews with particular users, you have to obtain their consent. You might also need to sign NDAs with interviewees if they’re testing your prototypes. To do it by the book, consult a lawyer specialising in these particular issues in a given country.
Do I have relevant experience? Educate yourself, involve researchers from your team, you might also consider a collaboration with a research company or freelance researchers that will take the project off your shoulders or at least support you in particular areas, such as choosing the most appropriate method.
There are different user research categorizations regarding data types (what), the way something is done (how), and when it fits in the project’s timeline.
One of the most well-known categorizations focuses on the type of data that is collected:
The second categorization refers to the way data is collected:
The third categorization we wanted to specify focuses on when the research is conducted:

You may come across a number of user research methods, but we will only focus on those we exercise ourselves.
Competitor analysis is one of the most common, simple, and inexpensive (in comparison to other) research methods. We can’t imagine considering entirely new products or adding significant features without identifying and analysing companies that sell similar solutions. Knowing what the competition offers and how they present their products and services, you can decide in an informed way how to position yourself in the market.
As we’ve mentioned before, a competitor analysis has to be conducted in a structured way. You set identical variables according to which an analysis of each company is conducted. Thus, the best way to do this is through a table, be it a Google Spreadsheet, Microsoft Excel, or any other tool of your choice.
When analysing data gathered, you can highlight similarities and differences between your competition and the things that seem to stand out from the crowd. Watch out, though! The method poses a risk of copying others, which is never a good choice because why would customers choose you in this respect?
This method involves observing your current or potential customers in their natural environment and real-life situations. You watch what they do and say, either incognito or overtly. This way, you can follow them using your application: what they click first, how they proceed, what problems they have, how they solve them, etc. Apart from their behaviour (user experience research), you observe what they say when interacting with your product or how they act in a general situation.
To structure your study correctly, you need to prepare an observation guide, covering all crucial aspects of your customer’s journey. Watch out for legal issues here, especially if you observe people being in disguise or conducting observations in the public spaces.
Focus groups are small assemblies, usually no more than 12 people, gathered in one place. The study aims to stimulate a discussion between the participants. A structured debate, to be precise: with questions prepared beforehand and space where everyone has a voice.
This fantastic user research tool allows you to ask pertinent questions, see how people react to your product, figure out their needs, and gather ideas for new features. Unfortunately, since a lot happens during focus group sessions, it’s easy to dive into it and not see the wood for the trees. Thus, you have to focus on what you want to find out at all times. A user research plan will help you with that.
During user interviews, you have a chance to converse with your current and potential customers and ask them relevant questions. These one-to-one sessions have the potential to provide you with the deepest insights. For a vast majority of people, it is easier to talk about their habits, values, needs and dreams but also ask questions about the product in an informal, private discussion where they don’t feel judged by others.
Not to distract your partners by taking notes, request their permission (in writing) to record the sessions. And as always, remember about the interview structure, so after multiple sessions, you can gather results, compare them, and draw conclusions. Lastly, pick your interviewees wisely, so they represent your customers as adequately as possible regarding their gender, age, income, etc.
Surveys are a poorer version of user interviews but are also less costly and quicker to carry out. In addition, they allow you to ask personal questions similar to those asked during interviews. However, there is little or no room (online surveys) to respond to the answers and deepen the knowledge.
On the one hand, your surveyees don’t feel judged and can respond more sincerely, but on the other, they might feel less motivated to spend time filling your forms and might do it by halves. Thus, try to keep your surveys brief and the questions clear.
When working on a smart home application, one of our tasks was redesigning its interface. The key aim was to improve its usability.
Having redesigned the IoT app, we wanted to verify if the new design met our and the client’s expectations (validating research), including:
We run thorough UX tests that included:
Interviewees were to complete tasks using interactive app mockups. Thanks to this particular research method, we were able to observe their decision-making process, the difficulties they ran into, and the way they reacted to task solving.
We have examined the following areas:
When you want to fully understand your target group and draw the most accurate conclusions, the best idea is to combine the above methods. Gather both numerical and non-numerical data, analyse what your competition does, and ask real users what they think, especially when arriving at irreversible decisions based on the research findings. And one more thing, consider the incentives, small prizes for your research group participants, adequate to the time they devoted.
]]>We’re living in the glory days of startups.
There’s a startup for nearly every problem that needs to be solved. These small but fierce companies play a big role in the business ecosystem. One of the startups’ greatest strengths is inducing market competition and stimulating innovation, leading to economic development. No wonder there are countless venture capitals and investors looking for emerging companies with high growth potential to devote their money to.
But let’s put aside inspirational pitches, and instead of extolling virtues – get back to business. The key question is – how to invest in startups? It’s not an easy one, so we’ll take it slow.
The three above-mentioned don’t necessarily need to be mutually exclusive, but let’s not get ahead.
Are the concerns about startup credibility even justified? Or are stories about unfortunate investments just urban legends?
Most probably you’ve heard about some breakthrough companies, offering complex blood testing from a single drop despite not having the actual technology for it, providing innovative home appliances to squeeze juices from premade packets in a revolutionary way, that could be prepared by hand and not cost $400, or smart cups so smart they could recognize the liquid inside them or count the times they were refilled.
The above-mentioned visionaries are just some of the most flagrant cases, but there are many more examples of products that just weren’t worth it. There’s nothing bad with someone trying their luck with a product or service, unless it involves deceiving the investors and/or the public.
Aside from the question depicting an oxymoron of course. We don’t believe in unicorns when it comes to business. We believe in integrity and engaging in trusted, proven undertakings.

So, you’re serious about investing in tech startups and looking for an IT startup to add to your investor portfolio? Or do you already keep an eye on something? Assuming the emerging company operates in areas you’re familiar with, your risk is smaller. If the startup you’d like to invest in is from the medical field, its assessment requires experts with a medical background. The same goes for other areas, typically entailing at least some general understanding.
When you’re looking for a startup to invest in, don’t follow the hype or a temporary fashion. Sure, if it’s digital solutions that you fancy, the field is dynamically changing. Still, your hard-earned money should rather be put to good use after some analysis and consideration.
Hard questions need to be asked. A startup pitch is fun and catchy, crafted to allure and stun. What we’re looking for is a down-to-earth, merit, and rational evaluation of what really is going to happen and what are the facts behind the idea.
If the startup of your interest plans to conquer the IT scope, your envoy should be someone familiar with digital technology. IT suffers from a lot of hype and buzzwords, but when you take a closer look at the actual stack and capacity – not everything is as it is advertised.
One of the common exaggerations is calling everything artificial intelligence. You might think that AI is everywhere. From your fridge and car through municipal bins and vending machines to all sorts of business processes. The trick is, often it isn’t AI at all.
Many service providers use ordinary statistics and data analysis – if it’s sufficient and works for their product, good for them. However, labeling their offering with the most buzzing names, calling it BIG DATA and ARTIFICIAL INTELLIGENCE, when there’s no evidence of any advanced algorithms, is no different from false advertising of miracle diets or rejuvenating cosmetics with mysterious ingredients that in the end turn out to be ordinary vaseline. Maybe not so ordinary, since it’s packed in a fancy wrapper and advertised by a popular celebrity. Still, it’s a shell product – there’s not much behind all that glitter and great promises. Someone purchasing it for the promised spectacular results and extraordinary effectiveness would feel highly disappointed in the end, after discovering it’s not what they paid for. Marketing, promotion, storytelling, and all other bells and whistles did their job right, but for the wrong cause.
What we’re saying is that overpaying e.g. a cosmetic product by 20$ can be a letdown but misinvesting in a shell startup can be – you guessed it – a major disenchantment. When you’re an investor and on the lookout for a company to entrust your funds to, there must be actual technology and know-how following the marketing magic.
Aren’t you much of a tech expert yourself? Consider a technical audit. Before splurging out on that new, innovative, disruptive technology send your emissary to ask around and verify the facts.
Or: objection! We’ll leave the choice to individual auditors. The thing is that a technical assessment is vital for a tech startup investment. Don’t let anyone put wool over your eyes saying “it’s too complicated”, “you wouldn’t get it”, “we’ll explain later, now we need the money to develop the solution”, etc. Technologies too complex to understand don’t emerge suddenly; most probably you’ve already heard about something similar and comprehend at least the general idea.
While fireworks can work wonders in marketing, when it comes to spending large amounts, we need the startup to lay their cards on the table. It’s not that uncommon for the loudest, most attention-grabbing advertisements to cover the weakest ideas. Some good ideas, products, and services are quiet. The best way to invest in startups is to know what’s working under the hood. Startups investment opportunities require some time and consideration before you decide to go all in.
A few topics to address before investing in startup companies.
Checking feasibility requires determining the viability, profitability, and practicality of the breakthrough idea. Has the startup analyzed all available data, conducted market research, and prepared projected income statements? In short, do they know where they stand? Sustainable development of business ideas calls for proper preparation and delivering tangible data for assessment. How to check it? Ask for Proof of Concept (PoC) and/or Minimum Viable Product (MVP) or subsequent “M’s” – MMP, MMF, MMR, MSP, etc. Delving deeper into preliminary product versions allows investors to see through the honeyed words. When you’re about to invest millions, it better really be artificial intelligence as promised. And not a bunch of apprentices working in the back, pretending to be the advertised algorithms.
There’s even been a startup that hired actors and rented a lab to set up a believable show for investors’ visit at “their site”. After all, maybe those apprentices aren’t the worst that could happen? Still, that’s not what investors sign up for when spending their money.
There are plenty of tutorials for startup founders and serial entrepreneurs that advise not to learn to code when building a startup. While this may work for non-technical founders and new companies aiming for other market fields, when it comes to tech startups – code is king. Can the startup handle the technical risk of their idea? Can the architecture be built and work as meant to? Is the code behind their project adequate to the advertised potential?
A common sin of startups is, again, those ill-fated apprentices or students assigned to write code. Code, that once the project is about to be commercialized, requires immediate rewriting to present any value in terms of further development, maintenance, or just ensuring stability and responsiveness for users.
Do the startup founders have a growth strategy for their product or service? Is the idea developed well enough to work in real-life conditions? Can their product handle an increasing workload or is it sufficient only for test purposes? Investing in an idea that only looks good when the business model assumes an extensive user base is a risky move. Startup assessment requires checking all the things that could go wrong and not being able to deliver promises is a major sin to eliminate.
We’re a software company with over 12 years of experience and an extensive portfolio of executed projects. You’re here not to read our bragging, so if you’d like to learn more about our expertise, check the case studies tab. We may not be a startup ourselves, but having two of our own (Nsflow and Samelane), we know the tech field inside and out, meaning our auditors can recognize shams on the spot and help you with a business startup investment you won’t regret. We may not tell you outright where to invest in startups, but we’re positive about recognizing tech companies to invest in safely.
Don’t leave your business’s future to chance. Sure, honest mistakes happen even in the most proven and reliable cooperations. But a stitch in time saves nine, so if you have even the slightest doubt about whether a startup you’d like to invest in seems a tad off – an audit won’t hurt.
]]>Scrum and Kanban are the most popular frameworks that belong to the same Agile family. Whereas Scrum likes rituals, clear roles and rules, Kanban is more of a free spirit, known for its pretty face (and effectiveness, too!). Which style feels closer to your heart? Which would you like to form close bonds with? If you are on the fence between the two, we’re here to give you a hand.
Scrum and Kanban are both Agile frameworks, so they share a lot of features. Thus, before we can dig deeper into them one at a time, we need to stop for a while to discuss what Agile really is.
The answer will depend on whom you talk to. Product owners, developers, business analysts, or CEOs might perceive it in a different way. They might refer to Agile as a philosophy, mindset, way of thinking, or, more down to earth, a methodology.
Agile followers live by four, let’s call them, commandments. Here they are:
The chief reason why Agile was created and became so popular is that the traditional methodologies, such as Waterfall, deliver value at the end of the project. Taking into consideration that it takes months or years to build digital products, waiting till the end of the way is definitely too long. Therefore, Agile focuses on delivering value faster, in smaller increments. This way you can test solutions, adjust, improve, deliver MVPs, get user feedback, start earning, gain funding, and so on. And on top of that, Agile welcomes changes with open arms, because they typically lead to improvements.
Agile is often contrasted with the traditional Waterfall methodology. The latter, linear approach means that you and your team can’t move to the next project phase unless you have completed the tasks from the previous one. It’s also difficult to go back, once something is done.
In Waterfall, you have to identify most of the requirements, analyse them, design, develop, implement a solution, and finally test if it all works. If you proceed step-by-step, you deliver value and get customer feedback really late. The problem is, that if you decide to make some changes, while already being in the last two phases of the project, it will take a lot of time and work. Basically, you need to go back to square one. Another thing, which may happen is that requirements have been understood differently by the client and the contractor/development team. Due to the nature of this linear methodology, you can make this discovery only at the end of the project. Waterfall doesn’t like changes.

Scrum is out and away the most popular Agile framework. In fact, when companies say they work in Agile, in most cases they mean Scrum.
Scrum cherishes roles and ceremonies, of which sprints come first: time-boxes wherein other events take place. What makes it highly effective is the/its transparency. All roles, responsibilities, and meetings are clearly defined, and everyone knows what other team members are working on at a given moment. If any disagreement arises, the team discusses the problem and resolves it TOGETHER.
Roles and their responsibilities in Scrum are clearly defined:
Sprints are the essence of Scrum. A single sprint takes from 1 to 4 weeks. It consists of a variety of Scrum ceremonies and events which include:
On top of that, there are other terms that you will come across in Scrum: user stories, team velocity, Scrum poker, product backlog, product increment, the definition of ready, and the definition of done. But we won’t delve deeper into the terminology, as we can refer you to some of our more detailed articles in the subject:

Kanban is the next, after Scrum, most popular Agile framework. It is known best for its visual aspect, a Kanban board, which helps to understand workflows easily.
Kanban is a continuous process, there are no time-boxes or fixed events. Of course, you can have daily stand-ups and retros but you don’t have to, it depends entirely on you. The key metrics in Kanban are time-based: lead time and cycle time.
Roles in Kanban aren’t defined, team members comply with their organisational roles. Also, they aren’t assigned the tasks, they simply pick the cards from the board depending on their skills, talents, or what they feel like doing at the moment.
In Kanban, there’s much room for companies and teams to lay down their own rules and policies on how to manage things. The key is to make the policies explicit and known to everyone concerned.
The project board shows the status of work in progress, so one look at it should give you an idea of how everything is going. Kanban cards contain information on tasks and they are grouped into three areas: to do, doing, and done. Usually, their hierarchy is set from top to bottom, beginning with the highest priority. Team members pick their tasks and as time goes by, they move them between three sections of the board.
Kanban boards can be physical, arranged with sticky notes, but online ones have become more popular. The reason for digital Kanban boards preference is the hybrid and remote work, requiring the dispersed teams to collaborate closely. Online Kanban boards can be created with a variety of well-known apps, such as Trello, Jira, or YouTrack (Agile Boards function), which we use in NeuroSYS.
Kanban concentrates on task completion. Too many tasks marked as in progress might indicate that the work is not proceeding or the tasks were put on hold. That is why Kanban limits the WIP, work in progress. A good practice to keep focus and get things done is to set WIP limits in your online Kanban board.

Being Agile frameworks, Scrum and Kanban have a lot in common, such as task estimation and focus on delivering value in no time. But now it’s time to get your arms around their disparity.
The difference between Scrum and Kanban lies in a variety of aspects, more or less fundamental:
| AREA | SCRUM | KANBAN |
| Structure | Structured | Less structured |
| Time-boxes | Sprints | Fluid cycles, no set breaks |
| Retrospectives | After every sprint | When it makes sense |
| Tasks | Assigned to the team | Picked by the team |
| Roles | Specified | Non-specified |
| Teams | Cross-functional | Cross-functional or Specialised |
| Metrics | Velocity | Cycle time |
And is an apple better than a pear? This is a similar type of question.
Or, as our Managing Director would answer, IT DEPENDS. It depends on your organisation, team composition and its members’ experience. Naturally, personal preferences play an important role as well.
The fact is that organisations which have already started their Agile journey, in most cases have begun with Scrum. The framework offers structure and a set of rules that are helpful, especially at the beginning. Starting straight away with Kanban might feel more like throwing yourself in at the deep end. But it doesn’t have to be this way. Kanban can be a made-to-measure approach, particularly when increment of work isn’t linear and the project has to pick up speed first, before it will be monitored and managed.
Generally, if matters like cyclical delivery of increments, tools for work planning, customer engagement, transparency, and retrospectives are important to you, you should go for Scrum. Meanwhile, Kanban works perfectly during maintenance periods when the system goes through end-to-end tests or is streamlined, or technical debt is being paid off. In situations where work is hard to plan, Kanban is a perfect match.
To give you food for thought, Kanban doesn’t have built-in retrospective mechanisms. Thus sometimes it is difficult to give a sense of purpose and success, to the team and clients. Scrum secures that thanks to cyclical events and clear sprint goals.
For those who are still undecided or like both options equally, there is something in between, a framework called Scrumban. It is a blend of Scrum and Kanban, taking the best practices out of each. For example, in Scrumban you use the Kanban board but also have mandatory daily meetings.
As you can see, it isn’t a black-and-white choice to make. We can’t categorise the projects as Scrum- and Kanban-prone just that easily. What we can suggest here, is to use Scrum/Kanban as logic dictates, taking into account the above-mentioned benefits but also limitations.
We can’t praise Agile methodology enough. No matter whether you choose Scrum or Kanban, Agile’s focus will be put on software quality, effectiveness, constant improvement, great results, and trust in people. Simple as that but most importantly working.
Bob’s your uncle, we’ve reached our destination. But what a journey it was, right? At the end of the day, the choice of the framework is yours, though we hope we’ve managed to help you out. If you’re still in two minds about it, let us know. We can give you a helping hand during free consultations.
]]>When thinking about futuristic cars, pop culture made us yearn for the incredible KITT known from the Knight Rider series, vehicles spiked with useful gadgets used by James Bond, and the unforgettable DMC DeLorean from the Back to The Future movies.
For many petrolheads, applying new technology in the automotive industry is unnecessary, as, in their opinion, car design and performance reached their peak in the 80s and 90s. Many drivers however admit that the vehicle evolution shouldn’t stop at headlight wipers.
So, Hollywood magic aside, how does today’s technology change cars and what is there to come for the industry? Are we already cruising in cars of tomorrow?
Probably the most visible technologies shaping the industry include the shift towards electric vehicles. Since hybrid and electric cars are becoming more and more competitive, their market share will continue to grow. Since we’re over the fact that we won’t be driving flying cars or get the chance to befriend KITT anytime soon, let’s break down what the automotive transformation has already changed – for the better.
Just a few decades ago driverless cars seemed like pure sci-fi yet here we are, driving around hands-free and minding our own business, while the autopilot keeps its eyes on the road. Since autonomous cars are capable of sensing the environment and responding immediately to encountered obstacles and occurring events, a human driver is not necessary anymore. The human doesn’t even need to be in the car! Or – that’s how the manufacturers wish it would work like, but we’re still not there. Yet. What have we come to, passenger-less cars, who would have guessed?

Since most probably the future of the automotive industry is electric, self-driving cars, more and more manufacturers enter the race. It was Tesla that stirred the general public imagination, but the Texas-based innovative manufacturer is not a lonely driver anymore. The biggest European and Asian car moguls decided to integrate driving assistants to enable their clients to drive without holding the wheel as well, enhancing competition.
Autonomous cars’ relative, the parking assistant, makes vehicles more attractive by simplifying drivers’ lives. While still enjoying full control over the automobile on the road, drivers can choose not to park on their own. It’s not just placing the vehicle in any random free spot in the lot – the system will remember drivers’ preferences and notify them once they absent-mindedly pass by their favorite place.
Each manufacturer’s system has a different name, but the working principle is similar. Self-parking cars use integrated cameras and sensors to prevent collisions and properly maneuver, keeping the vehicle on the right track. Parallel or perpendicular parking? Not a problem. A sneaky curb or other cars standing in the way? The self-parking system will find a way.
Biometrics have the power to enhance car users’ experience. No more manually adjusting the seat and steering wheel after each switch between drivers, the vehicle will remember preferences. Once seeing the familiar face, the system automatically adjusts each regulated piece inside the car, including temperature and map settings.
Applying digital assessment to biological features is not only about convenience. With the growth of automotive biometrics, vehicle security can be increased. Employing a system remembering unique physical traits (facial recognition, fingerprints) allows drivers to fully embrace the possibilities of keyless ignition, keyless door opening, and surveillance. Digital solutions focused on the user monitor their health. Whether it’s fever, excessive driver fatigue, sleepiness, or drowsiness, the vehicle recognizes potential hazards in road traffic. Using automotive biometrics contributes to better alertness during driving and improved security.
Digital models, representing physical assets in 3D, allow designers to try out assumptions prior to/instead of using traditional measures. Advanced software gathers sensor and inspection data, configuration details, and other bits of information. Digital twins mirror the appearance and behavior of the entire car or its components.
Industrial companies, including car manufacturers, value the potential digital twins carry. 3D representations streamline the design and production process, contributing to better performance of the vehicle and reducing costs on the manufacturers’ side. From car design to predictive maintenance to boosting sales with digitally created models, the twin technology is becoming one of the most popular software solutions in modern car manufacturing.
GAN, neural networks, are a class of machine learning algorithms used to create images based on provided picture sets. The automotive industry uses GAN in generative design to boost additive manufacturing with AI. Employing GAN and coupling it with in-depth data analysis and 3D printing, car manufacturers can achieve results previously impossible to obtain using traditional methods. One of the opportunities is injection molds, allowing producers to create unusual shapes and construction, opening new ways for the more and more desired customization.
Car manufacturing giants like BMW employ artificial intelligence in their production lines. Companies entrust AR with quality control, as even the most meticulous workers are prone to fatigue that can result in errors. On the contrary, algorithms can work error-free, 24/7. In Bavaria-originating manufacturer’s plants, the car assembly process takes on average 30 hours. From the floor plate to a complete vehicle, production generates extensive data sets, useful in improving the cycle.
For instance, the plant marks all metal sheets with lasers. The engraved codes allow processing stage tracking, aggregating details and parameters. As a result, the factory can cut the necessary inspections down, as algorithms signal the need for part replacement, unburdening staff from constantly monitoring machinery condition. The manufacturing plant employs digital tools to supervise over dusting levels in the paint shop, test car keys calibration, and perform other tasks.
New technology in the automotive industry doesn’t end with digital solutions applied to vehicles per se. Answering the changing users’ needs (and manufacturers too) calls for employing top-notch tools.
Digital transformation in the automotive industry includes software solutions aimed at service improvement. More and more people living in big cities depart from owning cars in favor of alternative options. When a car is necessary, shared mobility companies give a helping hand, providing ready-to-drive vehicles to be used only when they’re needed. Repairs, check-ups, car insurance? Users don’t need to bother with these aspects, as the service provider takes care of it.

Shared mobility solutions entail managing extensive data sets to understand their customers’ behavior, forecast the demand for vehicles, plan their distribution across desired areas, and, as a result, enhance customer experience and satisfaction. Digital tools are there to analyze data, visualize it, and put it to good use, for example, parking shared vehicles at the right time and in the right place once such demand is recognized.
Employee training and knowledge retention in case of generational change is a vital matter across numerous industrial sectors. What is special about the automotive industry is the sudden need to train not only employees replacing the retiring generations, but also a new workforce specializing in electric vehicles (EV).
Even in the most advanced manufacturing plants processes can be mundane or troublesome, burdening staff with excessive workload. On-site employee skill-building can be streamlined by adopting augmented reality solutions. As a result, staff undergoes standardized training, the process can be shortened, and use fewer resources, e.g. trainers’ time. AR allows precisely guided, step-by-step courses, overseeing results, and gathering data for future reference.
There are no indications for the car industry to return from the once chosen path. The sector faces various challenges, and the automotive digital transformation is most probably the best shot companies can take at future-proofing their operations.
While not every challenge can be addressed directly with digital solutions, modern technology drives the automotive industry. From autonomous vehicles, through digital twins and predictive maintenance, to customized services, new technology in the automotive industry will continue to deliver futuristic cars we get to ride aside from watching on the silver screen.
The undeniable impact of the newest technologies on cars has already reached our homes.
With over 50 countries manufacturing and assembling vehicles and millions of cars getting into the market annually, the range of possibilities to improve with digital technology will only grow.
Are you looking for a digital transformation partner for your automotive company? Let’s have a chat and see where we can get together with the help of technology.
]]>* If it is not yet a broken record for you, it’s time to catch up!. Below you’ll find our other articles on digital transformation that will lay solid foundations for today’s topic.
Just to clarify things, digital solutions don’t equal artificial intelligence by default. They can but they don’t have to. Saying it flat out, there is no need to look for AI solutions just for the sake of it. Sometimes simply switching widely-used tools to e-tools will do the trick. However, in a lot of cases, artificial intelligence is the way to push the envelope and expand your business.
Before we get the bit between our teeth let’s spell one thing out:
How to tell AI-powered solutions from the rest?
The easiest way to find out is to determine whether they aim to mimic intelligent human behaviour and solve the problems unsolvable for traditional algorithms, the way people would. Also, through data processing and analysis, AI algorithms should be able to learn in time and get better in what they do.
Now it’s time to put all the above into practice and show you AI-based digital transformation in action. To organise it neatly, we’ve divided the topic into five areas.

We use artificial intelligence to detect, recognize, and identify the contents of photos and videos. Depending on the business needs and areas to be digitalized, AI focuses on:
We’d like to point out that computer vision is widely used in manufacturing quality control, in algorithms that don’t use AI at all. Computer vision with AI is needed in cases where conventional CV can’t figure it out, such as telling air bubbles from bacteria colonies grown on Petri dishes.
With natural language processing (NLP) algorithms, digital systems can identify, understand, and analyse human language. We would like to flag up the fact that it is still one of the most challenging areas of AI and the systems don’t work perfectly. However, the new Generative Pre-trained Transformer 3 (GPT-3) seems to do the trick.
With NLP, we can speed up a lot of tasks, such as:
Every day, your business gathers a mass of data: on your customers and their journey, operations, employee effectiveness, etc. Data science aims at uncovering intricate patterns that can help businesses to improve their processes, and eventually grow. The areas worth mentioning are:
Similarly to the case of computer vision, we need to emphasise that not all data science mechanisms use artificial intelligence by definition. DS involves a lot of conventional statistics before it needs to reach for AI-based algorithms.
You can use predictive modelling to forecast events, customer behaviour, or market changes. Instead of analysing historical and current internal/external data manually, algorithms can do that effectively, speedily, and, most importantly, in real-time. A couple of usage examples:
Sound identification algorithms might seem less spectacular and their use limited compared to the above examples. Still, you can use them successfully in the process digitalization:
As proven with the numerous examples above, artificial intelligence plays a significant role in digital transformation. It takes operations, customer support, and daily work on a whole new level and makes businesses immune, or at least prepared, to the unexpected events. Want to try AI for yourself? We’ll be happy to help (so, make sure to contact us, we’ll take you for a test drive!).
]]>On many occasions before we’ve mentioned that – in the perfect world – the transformation…

There’s no time to waste in the global market – grab your processes and get in the car, we’re going to transform.
We’ve already focused our attention on reducing costs thanks to including digital solutions and this time, we’d like to show you how to improve process effectiveness. With digital transformation, of course.
It may require a bit more than just a hunch to identify the right area for improvement. Among tools helpful in assessing them we can list:
Business operations generate a lot of data. Applying statistical and/or logical techniques allows us to evaluate the bigger picture emerging from it. Tools like operational surveys, process mapping, and cause analyses enable the precise identification of bottlenecks and trouble spots.
The examination of a company’s reports and books can give unambiguous answers on areas for improvement, potential pain points, and risks. Audit results should allow for preparing a strategy on the necessary process improvement steps and prioritization of particular stages.
The business approach shouldn’t change things for the sake of changing, that’s why indicators are necessary. Said meters measure the performance of investigated processes and help assess the results of actions.
Benchmarks are reference points against which the taken measures and their results are compared. Depending on needs and particular processes, benchmarks can apply to the competition, industry standards, and trends. Setting reference points helps assess the performance and identify further deficiencies to address.
Inefficient processes are often rooted in similar causes, including fear of innovation, the force of habit, and the consequent attachment to outdated solutions.
Faulty processes are an Achilles’ heel of the organization. While temporary setbacks may happen everywhere, once a problem persists, permanent damage to the company’s operational efficiency may occur.
Improving efficiency with digital solutions has many faces. Sometimes, it may mean as much as introducing better channels of communication. In an organization where employees spend too much time writing emails, making phone calls, or sitting in meetings, the process can be streamlined with digital communication tools.
In another case, process effectiveness can be improved with automation. When operations require staff to perform repetitive tasks, ceding work to technology can free the workforce to focus on other, more demanding activities. This does not apply only to fields like industrial manufacturing, more associated with the newest technologies. Administrative and office tasks can equally well be automated. Printing, scanning, archiving extensive binders full of files? That’s not just a waste of paper and storage space, but the most valuable resource – time.
As the office case shows, improving processes doesn’t need to mean full-blown digitization from Day 1. Digital transformation can be successfully handled in stages. Let’s take as an example invoice processing.

This way, effectiveness can be measured, personnel has time to adapt to new processes, and the structure functions smoothly without disruptions.
Usually, there are a few scenarios possible. Either the company has traditional processes and requires digitalization, or some operational areas are already digital (or partially digital), but are unsuited to the needs. Or, the process on the clients’ side needs both digitalization and optimization. While each situation requires different solutions, the cooperation has similar conduct in all of them.
Starting with the overarching question: How could you improve a process with digital transformation?, we begin with the analysis of the company’s operations. Only after getting to know its needs and requirements, together we choose processes for digitalization. While drafting a strategy, we agree on subsequent stages to follow during its implementation. As digital transformation is not an all-or-nothing undertaking, we decide on MVPs and the criteria for their assessment.
Before we introduce improvements into the company’s structure, it’s time to engage its personnel. Changes need not only to be announced but also the staff should – and in many cases, can – be involved in the process. While data gives valuable insights into core operations, asking daily users for their feedback and ideas on what to improve can shorten the time needed to come up with a working solution.
Without knowing where we’re heading it would be hard to decide if we’ve made it there, thus measuring the digital transformation requires adequate meters. This is also the time to consider A/B testing. Split testing of two or more variants helps us to assess which version performs better. If the existing method falls short compared to available alternatives, it’s time to consider improvements. Each improvement can undergo A/B testing again until the right solution is in place.
Digitalization doesn’t end with taking the nearest technology and throwing it in the middle of a working structure. Changes resulting from the transformation can take more time and thus will require proper management. We say this to emphasize that most technologies can’t be treated as miracle cures and be left to do wonders unsupervised. It’s the other way around. Once the digital enhancements aimed at improving process effectiveness are ready for operation, it’s time to observe, measure, and, if necessary – improve the solution. The strategy for improving processes can change in the course of action and thus, should be observed.
Not necessarily, no. Beginnings may be challenging, especially for organizations that have been functioning traditionally. It does however get better with time. What we wholeheartedly advise is to engage in the transformation process with moderation and avoid a hype-driven revolution.
Digitalizing a single, standalone area within the enterprise is a good starting point. When we turn the whole structure upside down in most cases it will cause chaos. Instead, minor changes can bring significant improvements – when carried out properly. In addition, going for the low-hanging fruit can be a great start to the transformation. Identifying processes that are easily digitalized and produce excellent results will encourage the company to move to the next stages.
Some distinguish digital transformation from digital improvement, but when it comes to tangible results, we won’t argue about semantics. That’s true, some changes may not seem spectacular when looking from the outside, yet they bring satisfactory results. Even if the digital enhancement may seem too isolated from a full-blown transformation, it’s the result that matters.

Change is always burdened with risk, and so is the implementation of new solutions. Proper preparation, not only in terms of the sole process but also of the staff is necessary. What needs to be emphasized is that introducing technology into operations shouldn’t be feared by personnel. Actions ceded to digital solutions will unburden the workforce and allow to relocate the resources within the company. As a result, staff can take on more demanding tasks instead of continuing repetitive work.
Digital transformation is not an extreme home makeover. It doesn’t happen overnight, giving instant and spectacular results. But let us tell you something – it’s better this way. Improving process effectiveness with digital transformation requires attention, analysis, and expertise, and as such – brings tangible, lasting results.
A well-thought-out and well-informed change is certainly worth a shot in the dynamically changing world. Do you want to learn more? You can drop us a line and book your one-hour free consultation, where we’ll dwell on your needs and what our team can do to be the digital transformation partner you need. Up for some more reads? See what we have in store for you and find out more about whether your company is ready for digital transformation, what digital transformation actually is, and how does it turn out in particular industries, like the healthcare field.
]]>