Articles – NeuroSYS: AI & Custom Software Development

Test post for Zapier

admin — Tue, 28 Mar 2023 13:28:05 +0000

In the previous article on Elasticsearch, we’ve laid out essential facts about the engine’s mechanism and its components. This time, we would like to share some ideas you may come across or want to experiment with to boost your search performance. For the purposes of this article, we’ve conducted some experiments to illustrate the concepts and provide you with a ready-to-use code. Furthermore, based on our previous research experiences in the NLP domain, we will also try to explain what we found most beneficial when optimizing a search engine.

Experiments setup

To demonstrate the optimization ideas better, we have prepared two Information Retrieval datasets.

The first one is based on SQUAD 2.0 – a Question Answering benchmark that can be used for validation of the Information Retrieval (IR) as well, when adjusted properly. We’ve extracted 10.000 random documents and 1.000 of related questions. We treat each SQUAD paragraph as a separate index document.

From AI Workshop to Working Solution

Check Real Stories of Driving AI Innovations

Email address *

It is worth mentioning that Elasticsearch is designed by default for a much larger collection (millions of documents). However, we found the limited SQUAD version faster to compute and well-generalizing.

The second benchmark is our custom-prepared and much smaller. We used it to show how Elasticsearch behaves on smaller indices and how the same query types act differently on other datasets. The dataset is based on Stanford University lectures transcription of the SWIFT UI course, which can be found here. We’ve split the first seven lectures into 185 smaller ones, 25-sentenced documents with five sentences overlapping. We have also prepared 184 questions with multiple possible answers.

SQUAD paragraphs come from Wikipedia, so the text is concise and well written, and is not likely to contain errors. Meanwhile, the SWIFT UI benchmark consists of texts from recorded speech samples – it is more vivid, less concrete, but still grammatically correct. Moreover, it is rich in technical, software engineering-oriented vocabulary.

Comparison of two custom benchmarks used for purposes of this article

For validation of the Information Retrieval task, usually the MRR (mean reciprocal rank) or MAP (mean average precision) are used. We also use them on a daily basis; however, for the purpose of this article, to simplify the interpretation of outcomes, we have chosen the ones which are much more straightforward – the ratio of answered questions within top N hits: hits@10, hits@5, hits@3, hits@1. For implementation details see our NeuroSYS GitHub repository, where you can find other articles, and our MAGDA library.

Idea 1: Check the impact of analyzers on IR performance

As described in the previous article, we can use a multitude of different analyzers to perform standard NLP preprocessing operations on indexing texts. As you can probably recall, analyzers are by and large a combination of tokenizers and filters and are used for storing terms in the index in an optimally searchable form. Hence, experimenting with filters and tokenizers should probably be the first step you should take towards optimizing your engine’s performance.

To confirm the above statement, we present validation results of applying different analyzers to the limited SQUAD documents. Depending on the operations performed, the effectiveness of the search varies significantly.

We provide the results of experiments carried out using around 50 analyzers on the limited SQUAD sorted by hits@10. The table is collapsed for readability purposes; however, feel free to take a look at the full results and code on our GitHub.

Performance on limited SQUAD dataset with the use of different filters and tokenizers. Full table can be found on GitHub

Based on our observations of multiple datasets, we present the following conclusions about analyzers, which, we hope, will be helpful during your optimization process. Please bear in mind that these tips may not apply to all language domains, but we still highly recommend trying them out by yourselves on your datasets. Here is what we came up with:

stemming provides significant improvements,
stopwords removal doesn’t improve performance too much,
the standard tokenizer usually performs best, while the whitespace tokenizer does the worst,
usage of shingles (word n-grams) does not provide much improvement, while char n-grams can even decrease the performance,
simultaneous stemming while keeping the original words in the same field performs worse than bare stemming,
Porter stemming outperforms Lovins algorithm.

It is also worth noting that the default standard analyzer, which consists of a standard tokenizer, lowercase, and stop-words filters, usually works quite well as it is. Nevertheless, we were frequently able to outperform it on multiple datasets by experimenting with other operations.

Idea 2: Dig deeper into the scoring mechanism

As we know, Elasticsearch uses Lucene indices for sharding, which works in favor of time efficiency, but can also give you a headache if you are not aware of it. One of the surprises is that Elasticsearch carries out score calculation separately for each shard. It might affect the search performance, if too many shards are used. In consequence, the results can turn out to be non-deterministic between indexations.

Inverse Document Frequency is an integral part of BM25 and is calculated for each term, while putting documents into separate buckets. Therefore, the search score may differ more for particular terms, the more shards we have.

Nevertheless, it is possible to force Elasticsearch to calculate the BM25 score for all shards together, treating them as if they were a single, big index. However, it affects the search time greatly. If you don’t care about the search time but about the consistency/reproducibility, consider using Distributed Frequency Search. It will sum up all BM25 factors, regardless of the number of shards.

We have presented the accuracy of the Information Retrieval task in the below table. Note: It was our intention to focus on accuracy of the results and not on how fast we’ve managed to acquire them.

SWIFT UI – the impact of shards number and Distributed Frequency Search

It can be clearly seen that the accuracy fluctuates when changing the number of shards. It can also be noted that the number of shards does not affect the scores when using DFS.

However, with a dataset large enough, the impact of shards will be less. The more documents in an index, the more IDF parts of BM25 become normalized throughout shards.

Impact of shards number for different index sizes

In the table above, you can observe that the impact of the shards (a relative difference between DFS and non-DFS scores) is lower the more documents are indexed. Hence, the problem is less painful when working with more extensive collections of texts. However, in such a case, it is more probable that we would require more shards due to time performance. When it comes to smaller indices, we recommend setting the shards’ number to the default value of one and not worrying too much about the shards effect too much.

Idea 3: Check the impact of different scoring functions

BM25 is a well-established scoring algorithm that performs great in many cases. However, if you would like to try out other algorithms and see how well they do in your language domain, Elasticsearch allows you to choose from a couple of implemented functions or to define your own if needed.

Even though we do not recommend starting optimization by changing the scoring algorithm, the possibility remains open. We would like to present results on SQUAD 10k with the use of the following functions:

Okapi BM25 (default),
DFR (divergence from randomness),
DFI (divergence from independence),
IB (information-based),
LM Dirichlet,
LM Jelinek Mercer,
a custom TFIDF implemented as a scripted similarity function.

Impact of similarity function on limited SQUAD dataset – sorted by hits@10

Impact of similarity function on SWIFT UI dataset – sorted by hits@10

As you can see in the case of the limited SQUAD, the BM25 turned out to be the best-performing scoring function. However, when it comes to SWIFT UI, slightly better results can be obtained using the alternative similarity scores, depending on the metric we care about.

Idea 4: Tune Okapi BM25 parameters

Staying on the scoring topic, there are a couple of parameters the values of which can be changed within the BM25 algorithm. However, as in the case of choosing other scoring functions, we again do not recommend changing the parameters as the first steps of optimization.

The default values for parameters are:

b – term frequency normalization coefficient based on the document length,
k1 – term frequency non-linear normalization coefficient

They usually perform best across multiple benchmarks, which we’ve confirmed as well in our tests on SQUAD.

Keep in mind that despite the defaults being considered most universal, it doesn’t mean you should ignore other options. For example, in the case of the SWIFT UI dataset, other values performed better by 2% on the top 10 hits.

Impact of BM25 parameters on SWIFT UI sorted by hits@10 – a complete table to be found on GitHub

In this case, the default parameters turned out to be again the best for SQUAD, while SWIFT UI would benefit more from other ones.

Idea 5: Add extra data to your index with custom filters

As already mentioned, there are plenty of options in NLP, which text can be enriched with. We would like to show you what happens when we decide to add synonyms or other word derivatives like phonemes.

For the implementation details, we once again encourage you to have a glimpse at our repository.

Synonyms

Wondering how to make our documents more verbose or easier to query, we may try to extend the available wording used for document descriptions. However, this must be done with great care. Blindly adding more words to documents may lead to loss of their meaning, especially when it comes to the longer texts.

Automatic – WordNet synonyms

It is possible to automatically extend our inverted index with additional words, using synonyms from the WordNet synsets. Elasticsearch has a built-in synonyms filter that allows for an easy integration.

Below, we’ve presented search results on both SQUAD and SWIFT UI datasets with and without the use of all available synonyms.

SQUAD – the impact of using all WordNet synonyms

SWIFT UI – the impact of using all WordNet synonyms

As can be seen, using automatic, blindly added synonyms reduced the performance drastically. With thousands of additional words, documents’ representations get overpopulated; thus they lose their original meaning. Those redundant synonyms may not only fail to improve documents’ descriptiveness, but may also harm already meaningful texts.

The impact of using the WordNet synonyms analyzer on terms count

The number of terms in the SWIFT UI dataset has more than tripled when synonyms were used. It brings very negative consequences for the BM25 algorithm. Remember that the algorithm penalizes lengthy texts, hence documents that were previously short and descriptive may now be significantly lower on your search results page.

Meaningful synonyms

Of course, using synonyms may not always be a poor idea, but it might require some actual manual work.

Firstly, using spaCy, we’ve extracted 50 different Named Entities from the Swift programming language domain used in the SWIFT UI dataset.
Secondly, we’ve found synonyms for them, manually. As our simulation does not require usage of actual, existing words we have simply used random ones as business entities’ substitutes.
Finally, we have replaced occurrences of the Named Entities in questions with the selected word equivalents from the previous step, and added a list of the synonyms to the index with the synonym_analyzer.

Our intention was to create a simulation with certain business entities to which one can refer in search queries in many different ways. Below you can see the results.

Performance impact of using synonyms of business entities

Search performance improves with the use of manually added synonyms. Even though the experiment was carried out on a not too large sample, we hope that it illustrates the concept well – you can benefit from adding some meaningful words’ equivalents if you have proper domain knowledge. The process is time-consuming, and can hardly be automated; however, we believe it to be often worth the invested time and effort.

Impact of phonemes

It should be noted that, when working with ASR (automatic speech recognition) transcriptions, many words can be recognized incorrectly. They are often subject to numerous errors in transcription since some phrases and words sound alike. It might also happen that non-native speakers may mispronounce the words. For example:

To use a phonetic tokenizer a special plugin must be installed in the Elasticsearch node.

The sentence “Tom Hanks is a good actor as he loves playing” is represented as:

[‘TM’, ‘HNKS’, ‘IS’, ‘A’, ‘KT’, ‘AKTR’, ‘AS’, ‘H’, ‘LFS’, ‘PLYN’], when using Metaphone phonetic tokenizer,

and

[‘TM’, ‘tom’, ‘HNKS’, ‘hanks’, ‘IS’, ‘is’, ‘A’, ‘a’, ‘KT’, ‘good’, ‘AKTR’, ‘actor’, ‘AS’, ‘as’, ‘H’, ‘he’, ‘LFS’, ‘loves’, ‘PLYN’, ‘playing’], when using both phonemes and original words simultaneously.

Results of phonemes filter usage on SQUAD sorted by hits@10 – a complete table can be found on GitHub

We’ve come to the conclusion that using phonemes instead of the original text in the case of high-quality, non-ASR datasets like SQUAD does not yield much of an improvement. However, indexing phonemes and the original text in separate fields, and searching by both of them, slightly increased the performance. In the case of SWIFT UI the quality of transcriptions is surprisingly good, although the text comes from ASR. Therefore, the phonetic tokenizer is not applicable here as well.

Note: It might be a good idea to use phonetic tokenizers when working with more corrupted transcriptions, when the text is prone to typos and errors.

Idea 6: Add extra fields to your index

You might come up with the idea of putting additional fields to the index and expect them to boost the search performance. In Data Science it’s called feature engineering, or an ability to derive and create more valuable and informative features from available attributes. So, why not try deriving new features from text and index them in parallel as separate fields?

In this little experiment, we wanted to prove whether the above idea makes sense in Elasticsearch, and how to achieve it. We’ve tested it by:

extracting Named Entities using Transformer-based deep learning models from Huggingface ,
getting keywords by using the KeyBERT model,
adding lemmas from SpaCy

Note: The named entities, as well as keywords, are the excerpts already existing in the text but were extracted to separate fields. In contrast, lemmas are additionally processed words; they provide more information than available in the original text.

Additional fields indexing impact on limited SQUAD performance. Results are sorted by hits@10, multi-match query with cross-fields sub strategy used. More results can be found on GitHub.

While we were conducting the experiments, we discovered that, in this case, keywords and NERs did not improve the IR performance. On the contrary, word lemmatization seemed to provide a significant boost.

As a side note, we have not compared the lemmatization with stemming in this experiment. It’s worth mentioning that lemmatization is usually much trickier and can perform slightly worse in relation to stemming. For English, stemming is usually enough; however, in the case of other languages cutting off the suffixes will not suffice.

Based on our experience, we can also say that indexing parts of the original text without modifications, and putting them into separate fields, doesn’t provide much improvement. In fact, BM25 does just fine with keywords or Named Entities left in the original text, and thanks to the algorithm’s formula, it knows which words are more important than others, so there is no need to index them separately.

In short, it seems that fields providing some extra information (such as text title) or containing additionally processed, meaningful phrases (like word lemmas) can improve search accuracy.

Idea 7: Optimize the query

Last but not least, there are numerous options for creating queries. Not only can we change the query type but also we can boost individual fields in an index. Next to analyzer usage, we highly recommend experimenting with this step, as it usually improves the results.

We have conducted a small experiment, in which we have tested the following types of Elastic multi-match queries: best_fields, most_fields, cross_fields, on fields:

text – original text,
title – the title of the document, only if provided,
keywords – taken from KeyBERT,
NERs – done via Huggingface Transformers,
lemmas – extracted by SpaCy,

Alongside, we have boosted each field from the default value of 1.0 to 2.0 with increments of 0.25.

Usage results of different multi-match query subtype and fields weighing on limited SQUAD sorted by hits@10 – a complete table can be found on GitHub

As it has been proven above, the results on SQUAD dataset, despite being limited, show that queries of cross_field type provided the best results. What should also be noted is that boosting the title field was a good choice, as in most cases, it already contained important and descriptive data about the whole document. We’ve also observed that boosting only the keywords or NER fields gives the worst results.

However, as often happens, there is nothing like one clear and universal choice. When experimenting with SWIFT UI, we’ve figured that the title field is less important in this case, as it is often missing or contains gibberish. Also, when it comes to the query type, while cross_fields usually appears at the top, there are plenty of best_fields queries with very similar performance. In both cases, most_fields queries are usually placed somewhere in the middle.

Keep in mind that it all will most likely come down to analysis per dataset, as each of them is different, and other rules may apply. Feel free to use our code, plug in your dataset and find out what works best for you.

Conclusion

Compared to deep learning Information Retrieval models, full-text search still performs pretty well in plenty of use cases. Elasticsearch is a great and popular tool, so you might be tempted to start using it right away. However, we encourage you to at least read up a bit upfront and then try to optimize your search performance. This way you will avoid falling into a wrong-usage-hole and the attempts to get out of it.

We highly recommend beginning with analyzers and query optimization. By utilizing ready-to-use NLP mechanisms in Elastic, you can significantly improve your search results. Only then, proceed further with more sophisticated or experimental ideas like scoring functions, synonyms or additional fields.

Remember, it is crucial to apply methods appropriate to the nature of your data and to use a reliable validation procedure, adapted to the given problem. In this subject, there is no “one size fits all” solution.

Elasticsearch – search optimization ideas 2

admin — Fri, 07 Oct 2022 09:39:45 +0000

Experiments setup

To demonstrate the optimization ideas better, we have prepared two Information Retrieval datasets.

The first one is based on SQUAD 2.0 – a Question Answering benchmark that can be used for validation of the Information Retrieval (IR) as well, when adjusted properly. We’ve extracted 10.000 random documents and 1.000 of related questions. We treat each SQUAD paragraph as a separate index document.

The second benchmark is our custom-prepared and much smaller. We used it to show how Elasticsearch behaves on smaller indices and how the same query types act differently on other datasets. The dataset is based on Stanford University lectures transcription of the SWIFT UI course, which can be found here. We’ve split the first seven lectures into 185 smaller ones, 25-sentenced documents with five sentences overlapping. We have also prepared 184 questions with multiple possible answers.

Comparison of two custom benchmarks used for purposes of this article

Idea 1: Check the impact of analyzers on IR performance

Performance on limited SQUAD dataset with the use of different filters and tokenizers. Full table can be found on GitHub

stemming provides significant improvements,
stopwords removal doesn’t improve performance too much,
the standard tokenizer usually performs best, while the whitespace tokenizer does the worst,
usage of shingles (word n-grams) does not provide much improvement, while char n-grams can even decrease the performance,
simultaneous stemming while keeping the original words in the same field performs worse than bare stemming,
Porter stemming outperforms Lovins algorithm.

Idea 2: Dig deeper into the scoring mechanism

SWIFT UI – the impact of shards number and Distributed Frequency Search

It can be clearly seen that the accuracy fluctuates when changing the number of shards. It can also be noted that the number of shards does not affect the scores when using DFS.

However, with a dataset large enough, the impact of shards will be less. The more documents in an index, the more IDF parts of BM25 become normalized throughout shards.

Impact of shards number for different index sizes

Idea 3: Check the impact of different scoring functions

Okapi BM25 (default),
DFR (divergence from randomness),
DFI (divergence from independence),
IB (information-based),
LM Dirichlet,
LM Jelinek Mercer,
a custom TFIDF implemented as a scripted similarity function.

Impact of similarity function on limited SQUAD dataset – sorted by hits@10

Impact of similarity function on SWIFT UI dataset – sorted by hits@10

Idea 4: Tune Okapi BM25 parameters

The default values for parameters are:

b – term frequency normalization coefficient based on the document length,
k1 – term frequency non-linear normalization coefficient

They usually perform best across multiple benchmarks, which we’ve confirmed as well in our tests on SQUAD.

Impact of BM25 parameters on SWIFT UI sorted by hits@10 – a complete table to be found on GitHub

In this case, the default parameters turned out to be again the best for SQUAD, while SWIFT UI would benefit more from other ones.

Idea 5: Add extra data to your index with custom filters

For the implementation details, we once again encourage you to have a glimpse at our repository.

Synonyms

Automatic – WordNet synonyms

Below, we’ve presented search results on both SQUAD and SWIFT UI datasets with and without the use of all available synonyms.

SQUAD – the impact of using all WordNet synonyms

SWIFT UI – the impact of using all WordNet synonyms

The impact of using the WordNet synonyms analyzer on terms count

Meaningful synonyms

Of course, using synonyms may not always be a poor idea, but it might require some actual manual work.

Firstly, using spaCy, we’ve extracted 50 different Named Entities from the Swift programming language domain used in the SWIFT UI dataset.
Secondly, we’ve found synonyms for them, manually. As our simulation does not require usage of actual, existing words we have simply used random ones as business entities’ substitutes.
Finally, we have replaced occurrences of the Named Entities in questions with the selected word equivalents from the previous step, and added a list of the synonyms to the index with the synonym_analyzer.

Our intention was to create a simulation with certain business entities to which one can refer in search queries in many different ways. Below you can see the results.

Performance impact of using synonyms of business entities

Impact of phonemes

To use a phonetic tokenizer a special plugin must be installed in the Elasticsearch node.

The sentence “Tom Hanks is a good actor as he loves playing” is represented as:

[‘TM’, ‘HNKS’, ‘IS’, ‘A’, ‘KT’, ‘AKTR’, ‘AS’, ‘H’, ‘LFS’, ‘PLYN’], when using Metaphone phonetic tokenizer,

and

[‘TM’, ‘tom’, ‘HNKS’, ‘hanks’, ‘IS’, ‘is’, ‘A’, ‘a’, ‘KT’, ‘good’, ‘AKTR’, ‘actor’, ‘AS’, ‘as’, ‘H’, ‘he’, ‘LFS’, ‘loves’, ‘PLYN’, ‘playing’], when using both phonemes and original words simultaneously.

Results of phonemes filter usage on SQUAD sorted by hits@10 – a complete table can be found on GitHub

Note: It might be a good idea to use phonetic tokenizers when working with more corrupted transcriptions, when the text is prone to typos and errors.

Idea 6: Add extra fields to your index

In this little experiment, we wanted to prove whether the above idea makes sense in Elasticsearch, and how to achieve it. We’ve tested it by:

extracting Named Entities using Transformer-based deep learning models from Huggingface ,
getting keywords by using the KeyBERT model,
adding lemmas from SpaCy

Additional fields indexing impact on limited SQUAD performance. Results are sorted by hits@10, multi-match query with cross-fields sub strategy used. More results can be found on GitHub.

Idea 7: Optimize the query

We have conducted a small experiment, in which we have tested the following types of Elastic multi-match queries: best_fields, most_fields, cross_fields, on fields:

text – original text,
title – the title of the document, only if provided,
keywords – taken from KeyBERT,
NERs – done via Huggingface Transformers,
lemmas – extracted by SpaCy,

Alongside, we have boosted each field from the default value of 1.0 to 2.0 with increments of 0.25.

Usage results of different multi-match query subtype and fields weighing on limited SQUAD sorted by hits@10 – a complete table can be found on GitHub

Conclusion

What is a custom web application? And what it certainly is not

p.kozlowski@dev.neurosys.com — Mon, 19 Sep 2022 10:33:08 +0000

Do you happen to hear about custom web applications at every turn? Are there really no other apps in the tank? Let’s find out together what the fuss is all about.

What is not a custom web application?

It might seem we’re taking it backward, but somehow it looks simpler to start this way. So, firstly, let’s rule out two types of non-custom web apps.

Off-the-shelf apps

Off-the-shelf web applications are the ones that you buy as finished, ready-made products. Usually, you can label them with your brand or integrate them with your digital product. But, by and large, you can’t modify them, or it is possible only to a limited extent, not to mention adding new functionalities, that’s way beyond the scope. So, instead, you take the app as it is served.

Ready-made apps are sold to many companies in the same form, often in a subscription-based SaaS model. And it comes with a price.

Customized web app

Yet another type of web application is a customized or customizable app. These products can be personalized according to the client’s needs but usually aren’t built from scratch. Our learning management system Samelane is a good example. It comes as a ready-made package but we also can, and often do, customize it for particular clients, adding dedicated features and functionalities.

When is the off-the-shelf better?

Just to flag it up, sometimes building a custom app would be beating a dead horse. Mainly if you’re working on a low-scale product or app analogous to many others on the market or you have a low budget and prefer to pay a monthly fee rather than spend loads on development. What is more, an off-the-shelf app is ready and can be used in no time.

There are out-of-the-box solutions that can serve your purpose, such as ready-made CRM and CMS systems, e-commerce engines, booking systems, and chats. Hence, it might be more economical to incorporate them into your app instead of developing a new one.

And even when we need a custom app, because the off-the-shelf option demonstrates notable lacks, it’s worth calculating if it won’t be more cost-effective to brush them off. Sometimes adjusting internal processes is more efficient than building a brand new dedicated app. Thus, it would help if you had a custom web application development strategy to decide which option would fit the bill.

Web application types

The definition of a custom web app

Let us dissect the frog here and assess each of the term components. Then, based on that, you will be able to appraise every digital product.

Custom

It means the product is built according to the unique requirements and business goals, which translate into features, design, user experience, etc.

Web

It means that the product is accessed via the Internet, through a browser. So users don’t have to download, update, and configure it to enjoy all the features,

App

The product provides particular functionalities and two-way interaction, unlike informational websites that present data mainly.

Summing up, a custom web app is a unique digital product that can be accessed via the Internet providing its users with functionalities and allowing for some interaction.

The benefits of building a custom product

Custom web apps are highly desirable when you need:

Unique features

Custom development allows you to craft products like no other. You’re totally free to choose or design its functionalities and UX/UI. Provided, of course, that your unique selling point depends on them.

Scalability

Building a custom product, you’re free to add new features or resources (such as cloud storage) along with your company growth. It is less costly than upgrading the off-the-shelf app licenses. Also, you can integrate your app with other systems in the future, and you’re ready for that.

Independence

An app built from scratch frees you from external providers and the changes they introduce, among others, in pricing. Not only that, but you can also opt out of the on-premise installation, when your security policies prohibit the cloud solutions.

Reusability

If you intend to build the next digital product sooner or later, with at least partially similar functionalities, your backend code will be fully reusable. As a consequence, the development will be faster and more cost-effective.

Similarly to tailored suits, building a custom digital product is more time-consuming and costly (in the short run). But at the same time, it fits your needs better, and the initial cost pays off later.

The benefits od a custom web app

Custom web application development techniques

It’s time to discuss the stages and methods that lead you from the grand idea to the final product. Same as Rome wasn’t built in a day, a custom web application won’t be either, in fact, there is no such thing as a final product in the case of apps. You most certainly will add functionalities and improve the app with time, based on actual user data. Also, you have to keep tabs on the market and competition activity to keep up and introduce adjustments. There is no time to rest on laurels.

Ideation

First things first, you need a bright product idea. You don’t want to build a run-of-the-mill product, do you? Thus, it has to be an informed decision on how to position yourself in the market. To do so, you will need to:

Generate the idea: source and develop a clear vision of your product goals and main functionalities.
Conduct market research: prepare competition analysis and assess their strengths and weaknesses.
Carry out user research: collect user demographics, needs, and problems your product will solve.
Map out a business strategy: list a business model, business priorities, short-term and long-term goals, and a plan for how to achieve them. You might feel like a kid in a candy store when it comes to a multitude of possibilities, but you need to keep to your priorities – business strategy is always there to help you.

Planning

So now you know your competition and have a concrete idea of what you want to build. The next step is to put it into action. What you need is a plan that includes:

Setting milestones
Thinking of how your MVP should look like and what’s coming next after building the MVP.
Specifying architecture.
Deciding on a tech stack: tools, frameworks, and programming languages.
Defining your web app functionalities and workflows.
Detailed planning of the next sprints if you work in agile

Design

Now it’s time to work on the visual aspects of your app which translates to UX/UI design. The techniques worth mentioning here are:

Wireframing
Prototyping
Creating mockups
Branding: fonts, colours, button groups, general look & feel
User and usability testing

Development

Not to delve into too much detail here; your development team can start working on the app code when the design part is ready. At this stage, it is finally brought to life.

Tests and deploy

Although testing should be done regularly within the development phase, before the product is launched, you should verify if it does the job, as simple as that. Thus, your quality assurance (QA) team has to test the outcome thoroughly in terms of:

Code quality
Usability
Performance, including stress testing
Security

Remember that the devil is in the detail, so don’t turn a blind eye to any bug! Do not fear, though, because we’ve got an article on tests ready for you.

When it comes to deployment, it’s time to work on the following matters:

Where the app is hosted.
How the production, test, and development environment is built
How installations and updates will work.
Monitoring and maintenance
Crash scenarios

And here we go, you’re ready to push the boat out!

Wrap-up

Here’s a quick recap on the topic of custom web apps. They are unique products accessed via the Internet that do more than just inform the users. Particular functionalities of the apps allow for interaction. The custom app’s main benefits are scalability, freedom to develop it as you wish, and the independence they grant you.

However, if building a custom web app gives you the willies, defer to the experts then. We’re always here to help, whatever stage of the app building you’re currently at. We would gladly discuss your project during our free consultations. Just let us know!

Elasticsearch – introduction to key concepts

p.kozlowski@dev.neurosys.com — Fri, 16 Sep 2022 10:28:24 +0000

The ambition behind this article

During our work in NeuroSYS, we’ve dealt with a variety of problems in Natural Language Processing, including Information Retrieval. We have mainly focused on deep learning models based on Transformers. However, Elasticsearch has often served us as a great baseline. We have been using this search engine extensively; thus, we would like to share our findings with you.

But why should you read this if you can go straight to the Elasticsearch documentation? Don’t get us wrong, the documentation is an excellent source of information, that we rely on everyday. However, as documentations do, they need to be thorough and include every bit of information on what the tool has got to offer.

Instead, we will focus more on NLP and practical aspects of Elasticsearch. We’ve also decided to split this article into two parts:

Introductory part
1. explaining main concepts,
2. pointing out what we consider to be the most important,
3. identifying the non-obvious things that might lead to errors or improper usage,
Experimental part
1. providing ready-to-use code,
2. proposing some tips on optimization,
3. presenting results of different strategies usage

Even if you are more interested in the latter, we still strongly encourage you to read the introduction.

In the following five steps, we reveal what we find to be the most important to start experimenting with your search results quality improvement.

Step 1: Understand what is Elasticsearch, and what is a search engine

Elasticsearch is a search engine, used by millions for finding query results in no time. Elastic has many applications; however, we will mainly focus on aspects most crucial for us in Natural language processing – the functionality of so-called full-text search.

Note: This article concentrates on the seventh version of Elasticsearch, as of writing this article, a more recent version 8 is already released that comes with some additional features.

Database vs. Elasticsearch

But wait, isn’t a commonly used database designed to store and search for information quickly? Do we really need Elastic or any other search engine? Well yes and no. Databases are great for fast and frequent inserts, updates, or deletes, unlike Data Warehouses or Elasticsearch.

Yes, that’s right, Elasticsearch is not a good choice when it comes to endless inserts. It’s often recommended to treat Elastic as “once built, never modified again.” It is mainly due to the way inverted indices work – they are optimized for search, not modification.

Besides, databases and Elastic differ in their use case for searching. Let’s use an example for better illustration; imagine you run a library and have plenty of books in your collection. Each book can have numerous properties associated with it, for example the title, text, author, ISBN (unique books identifier), etc., which all have to be stored somewhere, most probably in some sort of database.

When trying to find a particular book of a given author in a query, this search is likely fast. Probably even faster if you create a database index on this field. Then it is saved on a disk in a sorted manner, which speeds up the lookup process significantly.

But what if you wanted to find all books containing a certain text fragment? In a database, we would probably look at SQL LIKE statement, possibly with some wildcards %.

Soon, further questions come along:

What if you want to order the rows by how closely the text relates to what you queried for?
What if you have two fields for e.g. title and text, that you would like to include in your search?
What if you don’t want to search for the entire phrase but divide the query into separate words and accept hits containing only some of them?
What if you want to reject the commonly occurring words in the language and consider only the relevant parts of your query?

You can probably see how problematic dealing with the more complex search is when using SQL-like queries and standard databases. That’s the exact use case for a search engine.

In short, if you want to search by ISBN, title or author, go ahead and use the database. However, if you intend to search for documents based on passages in a long text, at the same time focusing on the relevance of words, a search engine, Elasticsearch in particular, would be a better choice.

Elasticsearch manages to deal with matching queries and documents’ texts through a multitude of various query types that we’ll expand further on. However, its most important feature is an inverted index, created on terms coming from tokenized and preprocessed original texts.

The inverted index can be thought of as a dictionary: we look for some word and get a matching description. So, here what it basically is, is a mapping from a single word/words to a whole document.

Given the previous example of a book, we would create an inverse index by taking the key words from a book’s content or the ones that describe it best, and map them as a set/vector, which from now on would represent that book.

So normally, when querying, we would have to go through each database row and check for some condition. Instead, we can break up the query into a tokenized representation (a vector of tokens) and only compare this vector to an already stored vector of tokens in our database. Thanks to that, we can also easily implement a scoring mechanism to measure how relatable all objects are to this query.

As a side note, it is also worth adding that each Elasticsearch cluster comprises many indices, which in turn contain many shards, also called Apache Lucene indices. In practice, it uses several of these shards at once to subset the data for faster querying.

Step 2: Understand when not to use Elasticsearch

Elasticsearch is a wonderful tool; however, as in the case of many tools, when used incorrectly,, can cause as many problems as it actually solves. What we would like you to grasp from this article is that Elasticsearch is not a database but a search engine and should be treated as such. Meaning, don’t treat it as the only data storage you have. There are multiple reasons for that but we think the most important ones are:

Search engines should only care about the data that they actually use for searches.
When using search engines, you should avoid frequent updates and inserts.

In re 1)

Don’t pollute the search engine with stuff you don’t intend to use for searching. We know how databases grow, and schemas change with time. New data gets added causing more complex structures to form. Elasticsearch won’t be fine with it; therefore, a separate database from which you can link some additional information to your search results, might be a good idea. Besides, additional data may also influence the search results, as you will find out in the section on BM25.

In re 2)

Inverted indices are costly to create and modify. New entries in Elasticsearch enforce changes in the inverted index. The creators of Elastic have thought of that as well, and instead of rebuilding the whole index every time an update happens (eg. 10 times a second), a separate small Lucene index is created (lower level mechanism Elastic builds on). It is then merged (reindex operation) with the main one. The process takes place every second by default, but it also needs some time to complete reindexing. It takes even more time when dealing with more replicas and sharding.

Any extra data will cause the process to take even longer. For this reason, you should only keep important search data in your indices. Besides, don’t expect the data to be immediately available, as Elastic is not ACID compliant, as it is more like a NoSQL datastore that focuses mainly on BASE properties.

Step 3: Understand the scoring mechanism

Okapi BM25

The terms stored in the index influence the scoring mechanism. BM25 is the default scoring/relevance algorithm in Elasticsearch, a successor to TF-IDF. We will not dive into the math too much here, as it would take up an entirety of the article. However, we will pick the most important parts and try to give you a basic understanding of how it works.

The equation might be a little confusing at first, but it becomes pretty intuitive when looking at each component separately.

The first function is IDF(q_i) – if you are comfortable with IDF (inverse document frequency), this one might be familiar to you. q_i stands for each term from a query. What it essentially does is it penalizes the terms that are found more often in all documents by counting how many times they appear in total. We would rather take into account only the most descriptive words in a query and discard the other ones.

For example:

If we tokenized the sentence, we would expect words like Elasticsearch, search, engine, querying to be more valuable than is, a, cool, designed, for, fast, as the latter ones contribute less to the essence of this sentence.

Another relevant factor is the function f(q_i, D) or frequency of the term q_iwithin document D, for which the score is being counted. Intuitively, the higher the frequency of query terms within a particular document, the more relevant this document is.
Last but not least is fieldLen/avgFieldLen ratio. It calculates how long a given document is compared with the average length of all documents stored. Since it is placed in a denominator we can observe that the score will decrease with the document’s length growth, and vice versa. So if you are experiencing more short results than longer ones it is simply because this factor boosts shorter texts.

Step 4: Understand the mechanism of text pre-processing

Analyzers

Probably the first question you’d need to raise when thinking of optimization is: how the texts are preprocessed and represented within your inverted index. There are many ready-to-use concepts in Elasticsearch, which are taken from Natural Language Processing. They are encapsulated within so-called analyzers that change the continuous text into separate terms, which are indexed instead. In “Layman’s terms”, an analyzer is both a Tokenizer, which divides the text into tokens (terms), and a collection of Filters, which do additional processing.

We can use built-in Analyzers provided by Elastic, or define our own. In order to create a custom one, we should determine which tokenizer we’d like to use and provide a set of filters.

We can apply three possible analyzers’ types to a given field, which varies based on how and when they process text:

indexing analyzer – used during the document indexing phase,
search analyzer – used to map query terms during the search, so they can be compared to terms indexed in a field. Note: if we don’t explicitly define the search analyzer, by default, the indexing analyzer for this field will be used instead
search quote analyzer – used for strict search of full phrases

Usually, there is no point in applying a search analyzer different from an indexing analyzer. Additionally, if you would like to test them yourself, it can be easily done via the built-in API or directly from the library in the language of your choice.

The built-in analyzers should be able to cover the most often used operations applied during indexing. If needed, you can use analyzers that are explicitly made for a specific language, called Language analyzers.

Filters

Despite their name, Filters not only perform token selection but are also responsible for a multitude of common NLP preprocessing tasks. They can also be used for a number of operations such as:

stemming,
stopwords filtering,
lower/upper casing,
n-grams creation on chars or words.

However, they cannot perform lemmatization. Below, we’ve listed some of the most common ones. However, if you’re interested in the complete list of available filters, you can find it here.

shingle – creates n-grams of words,
n-gram – creates n-grams of characters,
stop-words – removes stopwords,
stemmer (Porter/Lovins) – performs stemming according to the Porter/Lovins algorithm,
remove_duplicate – removes duplicate tokens.

Tokenizers

They aim to divide the text into tokens according to a selected strategy, for example:

standard_tokenizer – removes punctuation and breaks text based on words boundaries,
letter_tokenizer – breaks text on each non-letter character,
whitespace_tokenizer – breaks text on any whitespace,
pattern_tokenizer – breaks texts on specified delimiter e.g., semicolon or comma.

In the diagram below, we present some exemplary analyzers and their results on the sentence “Tom Hanks is a good actor, as he loves playing.”

Each tokenizer operates differently, so pick the one that works best for your data. However, a standard analyzer is usually a good fit for many scenarios.

Step 5: Understand different types of queries

Queries

Elasticsearch enables a variety of different query types. The basic distinction we can make is whether we care about the relevance score or not. Having this considered, we have got two contexts to choose from:

query context – calculates the score, whether the document matches a query, and how good is the match,
filter context – does not calculate the score, it only identifies the documents that match the query or not.

So, use a query context to tell how closely documents match the query and a filter context to filter out unmatched documents that will not be considered when calculating the score.

Bool query

Even though we’ve already stated that we will mainly focus on text queries, it’s essential to at least understand the basics of Bool queries since match queries boil down to them. The most significant aspect is the operator we decide to use. When creating queries, we would often like to use logical expressions like AND, OR, NOR. They are available in Elasticsearch DSL (domain specific language) as must, should, and must_not, respectively. Using them we can easily describe the required logical relationships.

Full text queries

These are the ones we are most interested in, since they are ideal for fields containing text on which an analyzer has been applied. It is worth noting that when querying each field during a search, the query text will also be processed with the same analyzer used for indexing the field.

There are several types of FTS queries:

intervals – Uses rules for matching terms, and allows for their ordering. What the query excels in is proximity searches. We are able to define an interval (since the query name) where we can look for some terms. Useful, especially when we know that the searched terms will not necessarily occur together but might appear in a predefined distance from each other. Or on the contrary, we want them to stay close together.
combined_fields – This type allows for querying multiple fields and treating them as if they were combined. For example: when querying for the first name and the last name which we might want to be paired.
query_string – A lower level query syntax. It allows for creating complex queries using operators like AND, OR, NOT, as well as, multiple fields querying or multiple additions like wildcard operators.
simple_query_string – It’s a higher level wrapper for a query_string, which is more end-user friendly.
match – “the go-to” choice for FTS, the subtypes are:
- match_phrase – designed for “exact phrases” and word proximity matching,
- multi_match – a match type that allows for querying multiple fields in a preffered manner.

We will now focus on explaining Match based queries in more detail, as we find them versatile enough to do everything we need while being pretty quick to write and modify.

Match query – this is a standard for full-text searches, where each query is analyzed the same way as the field it is matched against. We find the following parameters to be the most important ones:

Fuzziness – when searching for some phrases users can make typos. Fuzziness enables to deal quickly with such spelling errors searching for similar words at the same time. It defines the accepted error rate for each word, which is interpreted as Levenstein edit distance. Fuzziness is an optional parameter and can take values such as 0, 1, 2, or AUTO. We recommend keeping the parameter as AUTO since it automatically adjusts how many errors can be made per word depending on its length. The error distance is 0, 1, and 2 for 2, 3-5, and over 5 characters word length, respectively. If you decide to use synonyms in a field, fuzziness cannot be used anymore.
Operator – as mentioned above, a boolean query is constructed based on the analyzed search text. This parameter defines which operator AND or OR will be used, and defaults to OR. For example, the text “Super Bitcoin mining” for OR operator is constructed as “Super OR Bitcoin OR mining,” while for AND, it is built as “Super AND Bitcoin AND mining.”
Minimum_should_match – this defines how many of the terms in a boolean query should be matched for the document to be accepted. It is quite versatile as it accepts integers, percentages, or even their combinations.

Match phrase query – it’s a variation of match query where all terms must appear in the queried field, in the same order, next to each other. The sequence can be modified a bit when using an analyzer that removes stopwords.

Match prefix query – it converts the last term in the query into a prefix term, which acts as a term followed by a wildcard. There are two types of this query:

Match boolean prefix – from the terms a boolean query is constructed,
Match phrase prefix – the terms are treated as a phrase; they need to be in specific order.

When using a match phrase prefix query, “Bitcoin mining c” would be matched with both documents “Bitcoin mining center”, as well as “Bitcoin mining cluster”, since the first two words form a phrase, while the last one is considered as a prefix.

Combined fields query – allows for searching through multiple fields as if they were combined into a single one. Clarity is a huge advantage of combined fields query, since when creating this type of a query it is converted to a boolean query and chosen logical operators are used. However, there is one important assumption for combined fields query; all queried fields require the same analyzer.

The disadvantage of this query is the increased search time, as it must combine fields on the fly. That’s why, it might be wiser to use copy_to when indexing documents.

Copy_to allows for creating separate fields which combine data from other fields. Which translates into no additional overhead during searches.

Multi match query – it differs from combined fields, since it enables querying multiple fields that have different analyzers applied or even of a different type. The most important parameter is the type of a query:

best_fields – a default value, it calculates the score in each of the specified fields. Useful when we want the answer to appear in only one of the given fields instead of the terms to be found in multiple fields.
most_fields – the best when the same text can be found in different fields. Different analyzers might be used on those fields, one of them can have stemming and synonyms, while the second can use n-grams, and the last one, the original text. The relevance score combines all fields’ scores and then is divided by the number of matches in each field.

Note: best_fields and most_fields are treated as FIELD centric, meaning that matches in a query are applied per field instead of per term. For example, query “Search Engine” with operator AND means that all terms must be present in a single field, which might not be our intention.

cross_fields – is considered to be TERM centric and is a good choice when we expect the answer to be found in multiple fields. Such as, when querying for the first and the last name, we would expect to find them in different fields. Compared to most and best fields, where the terms MUST be found in the same field, here, all terms MUST be placed in at least one field. One more cool thing about cross_fields is that it can group together the fields with the same analyzer, and calculate scores on groups instead. More details can be found in the official documentation.

Boosting

We would also like to highlight that queries can be boosted. We use this feature extensively on a daily basis.

This query would multiply the score for the field Title by 2 times, Author by 4 times while Description score will remain unboosted. Boost can be an integer or a floating point number; however it must be greater or equal to 1.0.

Conclusion

To sum up, we’ve presented five steps we find crucial to start working with Elastic. We’ve discussed what Elasticsearch is, and what it isn’t, and how you’re supposed to and not supposed to use it. We’ve also described the scoring mechanism and various types of queries and analyzers.

We are confident that the knowledge collected in this article is essential to start optimizing your search results. The article was intended as an introduction to some key concepts, but also as a foundation for the next one, in which we will provide you with examples of what is worth experimenting with, and will share the code.

We hope that this blog post gave you some insight on how different search mechanisms work. We hope you’ve learned something new or handy, which one day you might find useful in your projects.

What is user research? Overview of types and methods

p.kozlowski@dev.neurosys.com — Tue, 06 Sep 2022 11:26:36 +0000

Knowing your users inside out is the best starting point for building physical and digital products. Before you think about its awe-inspiring features and dazzling looks, it’s good to learn if your customers actually need them. User research will help you discover that by gathering insights into their motivations and behaviour. Only with this knowledge will you be able to build desirable products and services, and guide users best via app design and copy.

User research is popular in UI design, UX design, and UX writing. When planning your study, always remember to set clear objectives and determine available resources, so you don’t bite more than you can chew.

After a short introduction, we’re good to go!

What is user research?

User research is a meticulous study of customers held to understand their needs, problems, and motivations. The study aims to create the best products, in terms of design and usability, which in our case applies to web and mobile applications.

The tool is particularly helpful when product owners, together with their teams, have to make tough decisions. After the research is done, they can do it based on insight and information rather than a personal conviction or lucky guess.

What is key in user research?

What is important to mention here is that user research is a methodological and structured approach that has to follow certain research principles. Therefore, asking your friends and colleagues how they like your app, by far, can’t be called user research!

Questions to ask yourself

Before we dive deeper into specific types and methods, and before you decide on taking a particular approach, we recommend you to think over the following issues:

What do I want to find out? So you won’t spend time and money on research that doesn’t bring you any closer to your goals. A good example would be: What do users need first and foremost in a smart home app?

Do I have the capacity to conduct research? Your study will undoubtedly result in vast amounts of data, documents, and various kinds of files. Thus you will need enough resources not only to conduct the research but also to organise data and analyse it. Since it is a structured task, you’ll also need a person responsible for the project.

Do I know how to tackle legal matters? Taking into account GDPR and other regulations, user data is extremely sensitive. You can’t take the issue lightly when working with real people, gathering their personal information. No matter whether you analyse data collectively or record interviews with particular users, you have to obtain their consent. You might also need to sign NDAs with interviewees if they’re testing your prototypes. To do it by the book, consult a lawyer specialising in these particular issues in a given country.

Do I have relevant experience? Educate yourself, involve researchers from your team, you might also consider a collaboration with a research company or freelance researchers that will take the project off your shoulders or at least support you in particular areas, such as choosing the most appropriate method.

Types of user research

There are different user research categorizations regarding data types (what), the way something is done (how), and when it fits in the project’s timeline.

What?

One of the most well-known categorizations focuses on the type of data that is collected:

Qualitative research
You collect and analyse non-numerical data to understand opinions, experiences, and broader concepts.
Quantitative research
You collect and analyse numerical data to discover correlations and formulate hypotheses.

How?

The second categorization refers to the way data is collected:

Primary research
You gather information on your own.
Secondary research
You analyse data collected by others, such as statistics, books, and articles.

When?

The third categorization we wanted to specify focuses on when the research is conducted:

Exploratory research
You carry it out at the beginning of the process before building a product or a feature.
Validating research
You assess if what you’ve built actually works.

Five user research methods

You may come across a number of user research methods, but we will only focus on those we exercise ourselves.

Competitor analysis

Competitor analysis is one of the most common, simple, and inexpensive (in comparison to other) research methods. We can’t imagine considering entirely new products or adding significant features without identifying and analysing companies that sell similar solutions. Knowing what the competition offers and how they present their products and services, you can decide in an informed way how to position yourself in the market.

As we’ve mentioned before, a competitor analysis has to be conducted in a structured way. You set identical variables according to which an analysis of each company is conducted. Thus, the best way to do this is through a table, be it a Google Spreadsheet, Microsoft Excel, or any other tool of your choice.

When analysing data gathered, you can highlight similarities and differences between your competition and the things that seem to stand out from the crowd. Watch out, though! The method poses a risk of copying others, which is never a good choice because why would customers choose you in this respect?

Target group observation

This method involves observing your current or potential customers in their natural environment and real-life situations. You watch what they do and say, either incognito or overtly. This way, you can follow them using your application: what they click first, how they proceed, what problems they have, how they solve them, etc. Apart from their behaviour (user experience research), you observe what they say when interacting with your product or how they act in a general situation.

To structure your study correctly, you need to prepare an observation guide, covering all crucial aspects of your customer’s journey. Watch out for legal issues here, especially if you observe people being in disguise or conducting observations in the public spaces.

Focus groups

Focus groups are small assemblies, usually no more than 12 people, gathered in one place. The study aims to stimulate a discussion between the participants. A structured debate, to be precise: with questions prepared beforehand and space where everyone has a voice.

This fantastic user research tool allows you to ask pertinent questions, see how people react to your product, figure out their needs, and gather ideas for new features. Unfortunately, since a lot happens during focus group sessions, it’s easy to dive into it and not see the wood for the trees. Thus, you have to focus on what you want to find out at all times. A user research plan will help you with that.

User interviews

During user interviews, you have a chance to converse with your current and potential customers and ask them relevant questions. These one-to-one sessions have the potential to provide you with the deepest insights. For a vast majority of people, it is easier to talk about their habits, values, needs and dreams but also ask questions about the product in an informal, private discussion where they don’t feel judged by others.

Not to distract your partners by taking notes, request their permission (in writing) to record the sessions. And as always, remember about the interview structure, so after multiple sessions, you can gather results, compare them, and draw conclusions. Lastly, pick your interviewees wisely, so they represent your customers as adequately as possible regarding their gender, age, income, etc.

Surveys

Surveys are a poorer version of user interviews but are also less costly and quicker to carry out. In addition, they allow you to ask personal questions similar to those asked during interviews. However, there is little or no room (online surveys) to respond to the answers and deepen the knowledge.

On the one hand, your surveyees don’t feel judged and can respond more sincerely, but on the other, they might feel less motivated to spend time filling your forms and might do it by halves. Thus, try to keep your surveys brief and the questions clear.

User research examples

When working on a smart home application, one of our tasks was redesigning its interface. The key aim was to improve its usability.

Research objectives

Having redesigned the IoT app, we wanted to verify if the new design met our and the client’s expectations (validating research), including:

Learnability. Is the new interface user-friendly? How long does it take to complete simple tasks when users encounter the app for the first time?
Effectiveness. How do the users deal with the application?
Memorability. Is the application designed in such a way that the users can easily handle the navigation flow?
Mistakes. How many mistakes do the users make? How often does it take place? How can they put them right?
Satisfaction. Is it gratifying to use the app? Are there any areas that can be improved to make it more user-friendly?

Test types

We run thorough UX tests that included:

Individual In-depth Interviews (IDI)
Tasks performed by users in real-time

Interviewees were to complete tasks using interactive app mockups. Thanks to this particular research method, we were able to observe their decision-making process, the difficulties they ran into, and the way they reacted to task solving.

Areas verified

We have examined the following areas:

App navigation: whether the new app is easy to navigate,
Information architecture: whether the information structure is logical and it’s easy to find what’s needed,
The influence of the app design on its usability: whether users feel comfortable and confident interacting with the app,
Copy: whether the names and instructions are understandable and helpful,
Abstraction level: whether the icons and other visual elements are clear and users know what they mean,
App overload: whether users aren’t lost in the multitude of widgets,
UX design: whether all the elements have been designed following the golden User Experience rules.

In-depth user research means combining methods

When you want to fully understand your target group and draw the most accurate conclusions, the best idea is to combine the above methods. Gather both numerical and non-numerical data, analyse what your competition does, and ask real users what they think, especially when arriving at irreversible decisions based on the research findings. And one more thing, consider the incentives, small prizes for your research group participants, adequate to the time they devoted.

How to invest in tech startups?

Jowita Kessler — Tue, 16 Aug 2022 12:28:38 +0000

How to tell if your tech startup of choice is legit?

We’re living in the glory days of startups.

There’s a startup for nearly every problem that needs to be solved. These small but fierce companies play a big role in the business ecosystem. One of the startups’ greatest strengths is inducing market competition and stimulating innovation, leading to economic development. No wonder there are countless venture capitals and investors looking for emerging companies with high growth potential to devote their money to.

But let’s put aside inspirational pitches, and instead of extolling virtues – get back to business. The key question is – how to invest in startups? It’s not an easy one, so we’ll take it slow.

The good, the bad, and the wisely advertised

The three above-mentioned don’t necessarily need to be mutually exclusive, but let’s not get ahead.

Are the concerns about startup credibility even justified? Or are stories about unfortunate investments just urban legends?

Most probably you’ve heard about some breakthrough companies, offering complex blood testing from a single drop despite not having the actual technology for it, providing innovative home appliances to squeeze juices from premade packets in a revolutionary way, that could be prepared by hand and not cost $400, or smart cups so smart they could recognize the liquid inside them or count the times they were refilled.

The above-mentioned visionaries are just some of the most flagrant cases, but there are many more examples of products that just weren’t worth it. There’s nothing bad with someone trying their luck with a product or service, unless it involves deceiving the investors and/or the public.

How to recognize the real unicorn? Or: Which startup is best to invest in?

Aside from the question depicting an oxymoron of course. We don’t believe in unicorns when it comes to business. We believe in integrity and engaging in trusted, proven undertakings.

So, you’re serious about investing in tech startups and looking for an IT startup to add to your investor portfolio? Or do you already keep an eye on something? Assuming the emerging company operates in areas you’re familiar with, your risk is smaller. If the startup you’d like to invest in is from the medical field, its assessment requires experts with a medical background. The same goes for other areas, typically entailing at least some general understanding.

When you’re looking for a startup to invest in, don’t follow the hype or a temporary fashion. Sure, if it’s digital solutions that you fancy, the field is dynamically changing. Still, your hard-earned money should rather be put to good use after some analysis and consideration.

Hard questions need to be asked. A startup pitch is fun and catchy, crafted to allure and stun. What we’re looking for is a down-to-earth, merit, and rational evaluation of what really is going to happen and what are the facts behind the idea.

If the startup of your interest plans to conquer the IT scope, your envoy should be someone familiar with digital technology. IT suffers from a lot of hype and buzzwords, but when you take a closer look at the actual stack and capacity – not everything is as it is advertised.

Not all that glitters is gold

One of the common exaggerations is calling everything artificial intelligence. You might think that AI is everywhere. From your fridge and car through municipal bins and vending machines to all sorts of business processes. The trick is, often it isn’t AI at all.

Many service providers use ordinary statistics and data analysis – if it’s sufficient and works for their product, good for them. However, labeling their offering with the most buzzing names, calling it BIG DATA and ARTIFICIAL INTELLIGENCE, when there’s no evidence of any advanced algorithms, is no different from false advertising of miracle diets or rejuvenating cosmetics with mysterious ingredients that in the end turn out to be ordinary vaseline. Maybe not so ordinary, since it’s packed in a fancy wrapper and advertised by a popular celebrity. Still, it’s a shell product – there’s not much behind all that glitter and great promises. Someone purchasing it for the promised spectacular results and extraordinary effectiveness would feel highly disappointed in the end, after discovering it’s not what they paid for. Marketing, promotion, storytelling, and all other bells and whistles did their job right, but for the wrong cause.

What we’re saying is that overpaying e.g. a cosmetic product by 20$ can be a letdown but misinvesting in a shell startup can be – you guessed it – a major disenchantment. When you’re an investor and on the lookout for a company to entrust your funds to, there must be actual technology and know-how following the marketing magic.

Aren’t you much of a tech expert yourself? Consider a technical audit. Before splurging out on that new, innovative, disruptive technology send your emissary to ask around and verify the facts.

Time to say: Check!

Or: objection! We’ll leave the choice to individual auditors. The thing is that a technical assessment is vital for a tech startup investment. Don’t let anyone put wool over your eyes saying “it’s too complicated”, “you wouldn’t get it”, “we’ll explain later, now we need the money to develop the solution”, etc. Technologies too complex to understand don’t emerge suddenly; most probably you’ve already heard about something similar and comprehend at least the general idea.

While fireworks can work wonders in marketing, when it comes to spending large amounts, we need the startup to lay their cards on the table. It’s not that uncommon for the loudest, most attention-grabbing advertisements to cover the weakest ideas. Some good ideas, products, and services are quiet. The best way to invest in startups is to know what’s working under the hood. Startups investment opportunities require some time and consideration before you decide to go all in.

The things to verify:

A few topics to address before investing in startup companies.

Feasibility of the idea

Checking feasibility requires determining the viability, profitability, and practicality of the breakthrough idea. Has the startup analyzed all available data, conducted market research, and prepared projected income statements? In short, do they know where they stand? Sustainable development of business ideas calls for proper preparation and delivering tangible data for assessment. How to check it? Ask for Proof of Concept (PoC) and/or Minimum Viable Product (MVP) or subsequent “M’s” – MMP, MMF, MMR, MSP, etc. Delving deeper into preliminary product versions allows investors to see through the honeyed words. When you’re about to invest millions, it better really be artificial intelligence as promised. And not a bunch of apprentices working in the back, pretending to be the advertised algorithms.

There’s even been a startup that hired actors and rented a lab to set up a believable show for investors’ visit at “their site”. After all, maybe those apprentices aren’t the worst that could happen? Still, that’s not what investors sign up for when spending their money.

A working code

There are plenty of tutorials for startup founders and serial entrepreneurs that advise not to learn to code when building a startup. While this may work for non-technical founders and new companies aiming for other market fields, when it comes to tech startups – code is king. Can the startup handle the technical risk of their idea? Can the architecture be built and work as meant to? Is the code behind their project adequate to the advertised potential?

A common sin of startups is, again, those ill-fated apprentices or students assigned to write code. Code, that once the project is about to be commercialized, requires immediate rewriting to present any value in terms of further development, maintenance, or just ensuring stability and responsiveness for users.

Potential for delivering the promised results

Do the startup founders have a growth strategy for their product or service? Is the idea developed well enough to work in real-life conditions? Can their product handle an increasing workload or is it sufficient only for test purposes? Investing in an idea that only looks good when the business model assumes an extensive user base is a risky move. Startup assessment requires checking all the things that could go wrong and not being able to deliver promises is a major sin to eliminate.

In case you’d wonder, why should you choose us?

We’re a software company with over 12 years of experience and an extensive portfolio of executed projects. You’re here not to read our bragging, so if you’d like to learn more about our expertise, check the case studies tab. We may not be a startup ourselves, but having two of our own (Nsflow and Samelane), we know the tech field inside and out, meaning our auditors can recognize shams on the spot and help you with a business startup investment you won’t regret. We may not tell you outright where to invest in startups, but we’re positive about recognizing tech companies to invest in safely.

The takeaway

Don’t leave your business’s future to chance. Sure, honest mistakes happen even in the most proven and reliable cooperations. But a stitch in time saves nine, so if you have even the slightest doubt about whether a startup you’d like to invest in seems a tad off – an audit won’t hurt.

Scrum vs Kanban, what’s the difference? Which one to choose?

Rusana Holen — Fri, 12 Aug 2022 12:13:00 +0000

Scrum and Kanban are the most popular frameworks that belong to the same Agile family. Whereas Scrum likes rituals, clear roles and rules, Kanban is more of a free spirit, known for its pretty face (and effectiveness, too!). Which style feels closer to your heart? Which would you like to form close bonds with? If you are on the fence between the two, we’re here to give you a hand.

What is Agile?

Scrum and Kanban are both Agile frameworks, so they share a lot of features. Thus, before we can dig deeper into them one at a time, we need to stop for a while to discuss what Agile really is.

The answer will depend on whom you talk to. Product owners, developers, business analysts, or CEOs might perceive it in a different way. They might refer to Agile as a philosophy, mindset, way of thinking, or, more down to earth, a methodology.

Four Agile values

Agile followers live by four, let’s call them, commandments. Here they are:

Individuals and interactions over processes and tools, so that you invest more time and effort in face-to-face communication.
Working software over comprehensive documentation, so that you focus on project deliverables instead of creating lengthy papers.
Customer collaboration over contract negotiation, so that customer success is your guideline along the way and you’re not blindly following the initial deals.
Responding to change over following a plan, so that you are ready (and willing) to adjust to whatever happens.

The chief reason why Agile was created and became so popular is that the traditional methodologies, such as Waterfall, deliver value at the end of the project. Taking into consideration that it takes months or years to build digital products, waiting till the end of the way is definitely too long. Therefore, Agile focuses on delivering value faster, in smaller increments. This way you can test solutions, adjust, improve, deliver MVPs, get user feedback, start earning, gain funding, and so on. And on top of that, Agile welcomes changes with open arms, because they typically lead to improvements.

Waterfall approach

Agile is often contrasted with the traditional Waterfall methodology. The latter, linear approach means that you and your team can’t move to the next project phase unless you have completed the tasks from the previous one. It’s also difficult to go back, once something is done.

In Waterfall, you have to identify most of the requirements, analyse them, design, develop, implement a solution, and finally test if it all works. If you proceed step-by-step, you deliver value and get customer feedback really late. The problem is, that if you decide to make some changes, while already being in the last two phases of the project, it will take a lot of time and work. Basically, you need to go back to square one. Another thing, which may happen is that requirements have been understood differently by the client and the contractor/development team. Due to the nature of this linear methodology, you can make this discovery only at the end of the project. Waterfall doesn’t like changes.

The Waterfall approach

Scrum methodology

Scrum is out and away the most popular Agile framework. In fact, when companies say they work in Agile, in most cases they mean Scrum.

Scrum cherishes roles and ceremonies, of which sprints come first: time-boxes wherein other events take place. What makes it highly effective is the/its transparency. All roles, responsibilities, and meetings are clearly defined, and everyone knows what other team members are working on at a given moment. If any disagreement arises, the team discusses the problem and resolves it TOGETHER.

Scrum roles

Roles and their responsibilities in Scrum are clearly defined:

Product Owner: is responsible for product vision. S/he gets in touch with a client, understands their needs and project challenges. Based on this knowledge, Product Owner creates user stories and identifies priorities for the team.
Scrum Master: assists the team, helps in their daily work and Scrum ceremonies, and removes project roadblocks. S/he is neither a project manager nor a team member.
Scrum Team: is responsible for technical aspects and project execution. Everyone that works on the project belongs to this category, which means not only developers but also business analysts or UI designers, etc.

Ceremonies and events

Sprints are the essence of Scrum. A single sprint takes from 1 to 4 weeks. It consists of a variety of Scrum ceremonies and events which include:

Daily standup meetings. Dailies are the 15-minute meetings that take place every day at the same time and place. At the meeting, every team member answers three questions: what they did yesterday, what they’re going to do today, and if there are any obstacles in fulfilling their tasks.
Sprint planning meetings. The team thoroughly plans what they can deliver to the client in a given timeframe, that includes not only development but also testing, so the feature is ready to go live.
Sprint review meetings. These take place at the end of every sprint when the team, product owner, and client discuss the progress and other issues to consider during the next sprint. It’s rather an informal meeting.
Sprint retrospectives. During retros, the team discusses the last sprint in terms of what they did and didn’t do well, and what should be improved in the future based on lessons learned. The meeting should be treated as a safe space for everyone to share their thoughts, so there’s no room for blaming or criticising anyone.
Backlog refinement. This meeting resembles workshops. Its aim is to add more details to the backlog once a sprint is underway.

On top of that, there are other terms that you will come across in Scrum: user stories, team velocity, Scrum poker, product backlog, product increment, the definition of ready, and the definition of done. But we won’t delve deeper into the terminology, as we can refer you to some of our more detailed articles in the subject:

Sprints in Scrum

Kanban methodology

Kanban is the next, after Scrum, most popular Agile framework. It is known best for its visual aspect, a Kanban board, which helps to understand workflows easily.

Kanban is a continuous process, there are no time-boxes or fixed events. Of course, you can have daily stand-ups and retros but you don’t have to, it depends entirely on you. The key metrics in Kanban are time-based: lead time and cycle time.

Roles and tasks in Kanban

Roles in Kanban aren’t defined, team members comply with their organisational roles. Also, they aren’t assigned the tasks, they simply pick the cards from the board depending on their skills, talents, or what they feel like doing at the moment.

In Kanban, there’s much room for companies and teams to lay down their own rules and policies on how to manage things. The key is to make the policies explicit and known to everyone concerned.

Kanban board

The project board shows the status of work in progress, so one look at it should give you an idea of how everything is going. Kanban cards contain information on tasks and they are grouped into three areas: to do, doing, and done. Usually, their hierarchy is set from top to bottom, beginning with the highest priority. Team members pick their tasks and as time goes by, they move them between three sections of the board.

Kanban boards can be physical, arranged with sticky notes, but online ones have become more popular. The reason for digital Kanban boards preference is the hybrid and remote work, requiring the dispersed teams to collaborate closely. Online Kanban boards can be created with a variety of well-known apps, such as Trello, Jira, or YouTrack (Agile Boards function), which we use in NeuroSYS.

WIP limits

Kanban concentrates on task completion. Too many tasks marked as in progress might indicate that the work is not proceeding or the tasks were put on hold. That is why Kanban limits the WIP, work in progress. A good practice to keep focus and get things done is to set WIP limits in your online Kanban board.

Kanban board

What is the difference between Scrum and Kanban?

Being Agile frameworks, Scrum and Kanban have a lot in common, such as task estimation and focus on delivering value in no time. But now it’s time to get your arms around their disparity.

The difference between Scrum and Kanban lies in a variety of aspects, more or less fundamental:

Structure: Scrum is highly structured. All roles and meetings, including their duration, are clearly defined. Kanban is fluid and way less structured. There are no set roles, sprints, and meetings (if you don’t need them).
Time-boxes: In Scrum, your work is divided into 1-4-week sprints. That’s the essence of the framework. With Kanban, the work cycles are fluid, you move from the to do tasks to the done section without clear breaks.
Retrospectives: In Scrum, the discussion on what worked, what didn’t, and why, takes place after every sprint and constitutes its inherent part. In Kanban, you organise a meeting whenever you feel it would be good to talk things through.
Tasks: Scrum tasks are assigned to the specific team members. In Kanban, it is team members that pick tasks for themselves and take ownership of the assignements.
Roles: In Scrum, you’ll find team members with set roles. In Kanban there are no specific roles defined. Team members comply with their organisational roles.
Teams: Scrum teams are cross-functional, they have all the competencies needed to carry out their tasks. In contrast, Kanban teams can be specialised, such as teams of testers or engineers.
Metrics: The key metric in Scrum is velocity. It reflects the number of story points that are delivered in each sprint. In Kanban, the key metric is cycle time, which is the time that passes between the beginning of the task and its completion.

AREA	SCRUM	KANBAN
Structure	Structured	Less structured
Time-boxes	Sprints	Fluid cycles, no set breaks
Retrospectives	After every sprint	When it makes sense
Tasks	Assigned to the team	Picked by the team
Roles	Specified	Non-specified
Teams	Cross-functional	Cross-functional or Specialised
Metrics	Velocity	Cycle time

The difference between Scrum and Kanban

Is Kanban better than Scrum?

And is an apple better than a pear? This is a similar type of question.

Or, as our Managing Director would answer, IT DEPENDS. It depends on your organisation, team composition and its members’ experience. Naturally, personal preferences play an important role as well.

The fact is that organisations which have already started their Agile journey, in most cases have begun with Scrum. The framework offers structure and a set of rules that are helpful, especially at the beginning. Starting straight away with Kanban might feel more like throwing yourself in at the deep end. But it doesn’t have to be this way. Kanban can be a made-to-measure approach, particularly when increment of work isn’t linear and the project has to pick up speed first, before it will be monitored and managed.

Generally, if matters like cyclical delivery of increments, tools for work planning, customer engagement, transparency, and retrospectives are important to you, you should go for Scrum. Meanwhile, Kanban works perfectly during maintenance periods when the system goes through end-to-end tests or is streamlined, or technical debt is being paid off. In situations where work is hard to plan, Kanban is a perfect match.

To give you food for thought, Kanban doesn’t have built-in retrospective mechanisms. Thus sometimes it is difficult to give a sense of purpose and success, to the team and clients. Scrum secures that thanks to cyclical events and clear sprint goals.

For those who are still undecided or like both options equally, there is something in between, a framework called Scrumban. It is a blend of Scrum and Kanban, taking the best practices out of each. For example, in Scrumban you use the Kanban board but also have mandatory daily meetings.

As you can see, it isn’t a black-and-white choice to make. We can’t categorise the projects as Scrum- and Kanban-prone just that easily. What we can suggest here, is to use Scrum/Kanban as logic dictates, taking into account the above-mentioned benefits but also limitations.

Scrum vs Kanban wrap-up

We can’t praise Agile methodology enough. No matter whether you choose Scrum or Kanban, Agile’s focus will be put on software quality, effectiveness, constant improvement, great results, and trust in people. Simple as that but most importantly working.

Bob’s your uncle, we’ve reached our destination. But what a journey it was, right? At the end of the day, the choice of the framework is yours, though we hope we’ve managed to help you out. If you’re still in two minds about it, let us know. We can give you a helping hand during free consultations.

Digital transformation in the automotive industry

p.kozlowski@dev.neurosys.com — Tue, 02 Aug 2022 14:01:22 +0000

The automotive industry is one of the most dynamically moving (You see what we did there, don’t you?) market fields. It’s also the second most data-driven sector. Each decade brings further enhancements, spectacular changes, and new solutions, some of which are quickly discarded while others continue to shape the cars of the future.

Is the future already here?

When thinking about futuristic cars, pop culture made us yearn for the incredible KITT known from the Knight Rider series, vehicles spiked with useful gadgets used by James Bond, and the unforgettable DMC DeLorean from the Back to The Future movies.

How has technology changed cars?

For many petrolheads, applying new technology in the automotive industry is unnecessary, as, in their opinion, car design and performance reached their peak in the 80s and 90s. Many drivers however admit that the vehicle evolution shouldn’t stop at headlight wipers.

So, Hollywood magic aside, how does today’s technology change cars and what is there to come for the industry? Are we already cruising in cars of tomorrow?

Probably the most visible technologies shaping the industry include the shift towards electric vehicles. Since hybrid and electric cars are becoming more and more competitive, their market share will continue to grow. Since we’re over the fact that we won’t be driving flying cars or get the chance to befriend KITT anytime soon, let’s break down what the automotive transformation has already changed – for the better.

Autonomous vehicles

Just a few decades ago driverless cars seemed like pure sci-fi yet here we are, driving around hands-free and minding our own business, while the autopilot keeps its eyes on the road. Since autonomous cars are capable of sensing the environment and responding immediately to encountered obstacles and occurring events, a human driver is not necessary anymore. The human doesn’t even need to be in the car! Or – that’s how the manufacturers wish it would work like, but we’re still not there. Yet. What have we come to, passenger-less cars, who would have guessed?

Since most probably the future of the automotive industry is electric, self-driving cars, more and more manufacturers enter the race. It was Tesla that stirred the general public imagination, but the Texas-based innovative manufacturer is not a lonely driver anymore. The biggest European and Asian car moguls decided to integrate driving assistants to enable their clients to drive without holding the wheel as well, enhancing competition.

Self-parking systems

Autonomous cars’ relative, the parking assistant, makes vehicles more attractive by simplifying drivers’ lives. While still enjoying full control over the automobile on the road, drivers can choose not to park on their own. It’s not just placing the vehicle in any random free spot in the lot – the system will remember drivers’ preferences and notify them once they absent-mindedly pass by their favorite place.

Each manufacturer’s system has a different name, but the working principle is similar. Self-parking cars use integrated cameras and sensors to prevent collisions and properly maneuver, keeping the vehicle on the right track. Parallel or perpendicular parking? Not a problem. A sneaky curb or other cars standing in the way? The self-parking system will find a way.

Biometry

Biometrics have the power to enhance car users’ experience. No more manually adjusting the seat and steering wheel after each switch between drivers, the vehicle will remember preferences. Once seeing the familiar face, the system automatically adjusts each regulated piece inside the car, including temperature and map settings.

Applying digital assessment to biological features is not only about convenience. With the growth of automotive biometrics, vehicle security can be increased. Employing a system remembering unique physical traits (facial recognition, fingerprints) allows drivers to fully embrace the possibilities of keyless ignition, keyless door opening, and surveillance. Digital solutions focused on the user monitor their health. Whether it’s fever, excessive driver fatigue, sleepiness, or drowsiness, the vehicle recognizes potential hazards in road traffic. Using automotive biometrics contributes to better alertness during driving and improved security.

Digital twins

Digital models, representing physical assets in 3D, allow designers to try out assumptions prior to/instead of using traditional measures. Advanced software gathers sensor and inspection data, configuration details, and other bits of information. Digital twins mirror the appearance and behavior of the entire car or its components.

Industrial companies, including car manufacturers, value the potential digital twins carry. 3D representations streamline the design and production process, contributing to better performance of the vehicle and reducing costs on the manufacturers’ side. From car design to predictive maintenance to boosting sales with digitally created models, the twin technology is becoming one of the most popular software solutions in modern car manufacturing.

Generative Adversarial Networks (GAN)

GAN, neural networks, are a class of machine learning algorithms used to create images based on provided picture sets. The automotive industry uses GAN in generative design to boost additive manufacturing with AI. Employing GAN and coupling it with in-depth data analysis and 3D printing, car manufacturers can achieve results previously impossible to obtain using traditional methods. One of the opportunities is injection molds, allowing producers to create unusual shapes and construction, opening new ways for the more and more desired customization.

Quality assurance

Car manufacturing giants like BMW employ artificial intelligence in their production lines. Companies entrust AR with quality control, as even the most meticulous workers are prone to fatigue that can result in errors. On the contrary, algorithms can work error-free, 24/7. In Bavaria-originating manufacturer’s plants, the car assembly process takes on average 30 hours. From the floor plate to a complete vehicle, production generates extensive data sets, useful in improving the cycle.

For instance, the plant marks all metal sheets with lasers. The engraved codes allow processing stage tracking, aggregating details and parameters. As a result, the factory can cut the necessary inspections down, as algorithms signal the need for part replacement, unburdening staff from constantly monitoring machinery condition. The manufacturing plant employs digital tools to supervise over dusting levels in the paint shop, test car keys calibration, and perform other tasks.

New technology in the automotive industry doesn’t end with digital solutions applied to vehicles per se. Answering the changing users’ needs (and manufacturers too) calls for employing top-notch tools.

Shared mobility

Digital transformation in the automotive industry includes software solutions aimed at service improvement. More and more people living in big cities depart from owning cars in favor of alternative options. When a car is necessary, shared mobility companies give a helping hand, providing ready-to-drive vehicles to be used only when they’re needed. Repairs, check-ups, car insurance? Users don’t need to bother with these aspects, as the service provider takes care of it.

Shared mobility solutions entail managing extensive data sets to understand their customers’ behavior, forecast the demand for vehicles, plan their distribution across desired areas, and, as a result, enhance customer experience and satisfaction. Digital tools are there to analyze data, visualize it, and put it to good use, for example, parking shared vehicles at the right time and in the right place once such demand is recognized.

Knowledge retention & management

Employee training and knowledge retention in case of generational change is a vital matter across numerous industrial sectors. What is special about the automotive industry is the sudden need to train not only employees replacing the retiring generations, but also a new workforce specializing in electric vehicles (EV).

Even in the most advanced manufacturing plants processes can be mundane or troublesome, burdening staff with excessive workload. On-site employee skill-building can be streamlined by adopting augmented reality solutions. As a result, staff undergoes standardized training, the process can be shortened, and use fewer resources, e.g. trainers’ time. AR allows precisely guided, step-by-step courses, overseeing results, and gathering data for future reference.

Will the top technology trends in the automotive industry grow?

There are no indications for the car industry to return from the once chosen path. The sector faces various challenges, and the automotive digital transformation is most probably the best shot companies can take at future-proofing their operations.

Pain points in the industry for technology to solve:

Knowledge retention, mitigating the generational change effects
Training a whole new generation of employees such as EV technicians
Improving the manufacturing process
Changes in customer behavior, preferences, and expectations
Restrictive regulations
High competition
Availability of materials and components

While not every challenge can be addressed directly with digital solutions, modern technology drives the automotive industry. From autonomous vehicles, through digital twins and predictive maintenance, to customized services, new technology in the automotive industry will continue to deliver futuristic cars we get to ride aside from watching on the silver screen.

The undeniable impact of the newest technologies on cars has already reached our homes.

With over 50 countries manufacturing and assembling vehicles and millions of cars getting into the market annually, the range of possibilities to improve with digital technology will only grow.

Are you looking for a digital transformation partner for your automotive company? Let’s have a chat and see where we can get together with the help of technology.

Artificial intelligence does the trick in digital transformation

Marta Dunajko — Tue, 26 Jul 2022 09:25:24 +0000

Digital transformation, our old chestnut, huh?* You might be thinking about leaving the page right now, but hold your horses as today we’ll present it to you from a brand new perspective. The perspective is, yes you’ve guessed it right, artificial intelligence. The apple of our eye and something that we have full confidence in.

* If it is not yet a broken record for you, it’s time to catch up!. Below you’ll find our other articles on digital transformation that will lay solid foundations for today’s topic.

Why is AI in digital transformation important?

Just to clarify things, digital solutions don’t equal artificial intelligence by default. They can but they don’t have to. Saying it flat out, there is no need to look for AI solutions just for the sake of it. Sometimes simply switching widely-used tools to e-tools will do the trick. However, in a lot of cases, artificial intelligence is the way to push the envelope and expand your business.

Examples of artificial intelligence solutions in digital transformation

Before we get the bit between our teeth let’s spell one thing out:

How to tell AI-powered solutions from the rest?

The easiest way to find out is to determine whether they aim to mimic intelligent human behaviour and solve the problems unsolvable for traditional algorithms, the way people would. Also, through data processing and analysis, AI algorithms should be able to learn in time and get better in what they do.

Now it’s time to put all the above into practice and show you AI-based digital transformation in action. To organise it neatly, we’ve divided the topic into five areas.

Computer vision

We use artificial intelligence to detect, recognize, and identify the contents of photos and videos. Depending on the business needs and areas to be digitalized, AI focuses on:

people and faces, as in case of entrance authentication, identifying workplace bottlenecks, determining whether employees wear protective equipment
places, e.g. localising your workers, creating self-driving industrial vehicles, locating parcels in logistics, improving workstation ergonomics (you can delve into the topic in visual place recognition and VPR Part 2)
objects – machinery automation (machines gaining sight), healthcare (disease diagnosis based on X-rays), pharma process automation (see our project on bacterial colony identification and counting), advanced quality control, e.g. elimination of impurities in the production processes, soil and crop monitoring for more adequate watering or fertilisation
text – invoice and contract automation, including optical character recognition (OCR); digitalization of all documentation and other sources (paperless factory being a thing nowadays)

We’d like to point out that computer vision is widely used in manufacturing quality control, in algorithms that don’t use AI at all. Computer vision with AI is needed in cases where conventional CV can’t figure it out, such as telling air bubbles from bacteria colonies grown on Petri dishes.

Natural language processing

With natural language processing (NLP) algorithms, digital systems can identify, understand, and analyse human language. We would like to flag up the fact that it is still one of the most challenging areas of AI and the systems don’t work perfectly. However, the new Generative Pre-trained Transformer 3 (GPT-3) seems to do the trick.

With NLP, we can speed up a lot of tasks, such as:

customer service – AI-powered chatbots answering the most common inquiries, while detection of the most sensitive cases that need an immediate reaction is possible thanks to sentiment analysis
customer profiling offering tailored solutions automatically (increasing the chances for your offer to be accepted)
semantic search helping employees to look for information in company files
classification of documents and client/patient/contractor data

Data science

Every day, your business gathers a mass of data: on your customers and their journey, operations, employee effectiveness, etc. Data science aims at uncovering intricate patterns that can help businesses to improve their processes, and eventually grow. The areas worth mentioning are:

forecasting – route planning in logistics, management of orders, forecasting the interest in particular products at a given time e.g. at Christmas, during the holiday season
risk reduction – risk analysis, predictive maintenance in manufacturing
operation efficiency improvement – bottleneck identification, resource management, waste reduction
recommender systems in e-commerce and well-targeted, more effective marketing

Similarly to the case of computer vision, we need to emphasise that not all data science mechanisms use artificial intelligence by definition. DS involves a lot of conventional statistics before it needs to reach for AI-based algorithms.

Predictive modelling

You can use predictive modelling to forecast events, customer behaviour, or market changes. Instead of analysing historical and current internal/external data manually, algorithms can do that effectively, speedily, and, most importantly, in real-time. A couple of usage examples:

sales volume prediction – for more effective production or store/hotel/restaurant service demand planning
risk calculation – commonly used in banking (among others in fraud detection), the insurance industry, manufacturing (predictive maintenance), or health care for analysing patients’ medical records

Sound recognition

Sound identification algorithms might seem less spectacular and their use limited compared to the above examples. Still, you can use them successfully in the process digitalization:

surveillance and monitoring – systems immediately detect the sound of glass breaking or any other unusual sounds, also identifying faulty machinery
voice-controlled devices and machines in manufacturing, pharma, and healthcare, which do not require taking the gloves off
automatic transcription and voice dictation converting your calls and meetings into text
assisting employees and customers with disabilities such as vision impairment

As proven with the numerous examples above, artificial intelligence plays a significant role in digital transformation. It takes operations, customer support, and daily work on a whole new level and makes businesses immune, or at least prepared, to the unexpected events. Want to try AI for yourself? We’ll be happy to help (so, make sure to contact us, we’ll take you for a test drive!).

How to improve process effectiveness with digital transformation

p.kozlowski@dev.neurosys.com — Wed, 20 Jul 2022 09:27:21 +0000

Digital transformation serves particular purposes. What could these purposes be? Since the process of employing the newest technologies in organizations is aimed at reimagining business in the digital era, the goal of said transformation can’t be something trivial.

Digital transformation and how should it be done

On many occasions before we’ve mentioned that – in the perfect world – the transformation…

The transformation should be a process

There’s no time to waste in the global market – grab your processes and get in the car, we’re going to transform.

We’ve already focused our attention on reducing costs thanks to including digital solutions and this time, we’d like to show you how to improve process effectiveness. With digital transformation, of course.

How to decide whether an industrial or business process needs improvement?

It may require a bit more than just a hunch to identify the right area for improvement. Among tools helpful in assessing them we can list:

Analysis

Business operations generate a lot of data. Applying statistical and/or logical techniques allows us to evaluate the bigger picture emerging from it. Tools like operational surveys, process mapping, and cause analyses enable the precise identification of bottlenecks and trouble spots.

Audits

The examination of a company’s reports and books can give unambiguous answers on areas for improvement, potential pain points, and risks. Audit results should allow for preparing a strategy on the necessary process improvement steps and prioritization of particular stages.

Key Performance Indicators (KPI)

The business approach shouldn’t change things for the sake of changing, that’s why indicators are necessary. Said meters measure the performance of investigated processes and help assess the results of actions.

Benchmarks

Benchmarks are reference points against which the taken measures and their results are compared. Depending on needs and particular processes, benchmarks can apply to the competition, industry standards, and trends. Setting reference points helps assess the performance and identify further deficiencies to address.

Which processes can be suspected to lack effectiveness in the first place?

These that worked well when the company was ¼ of its present size
These that were sufficient when the telex was the latest fashion
These that require way too much attention or energy compared to the value they add
These that are unnecessarily done manually

Inefficient processes are often rooted in similar causes, including fear of innovation, the force of habit, and the consequent attachment to outdated solutions.

How does digital transformation improve process efficiency?

Faulty processes are an Achilles’ heel of the organization. While temporary setbacks may happen everywhere, once a problem persists, permanent damage to the company’s operational efficiency may occur.

Improving efficiency with digital solutions has many faces. Sometimes, it may mean as much as introducing better channels of communication. In an organization where employees spend too much time writing emails, making phone calls, or sitting in meetings, the process can be streamlined with digital communication tools.

In another case, process effectiveness can be improved with automation. When operations require staff to perform repetitive tasks, ceding work to technology can free the workforce to focus on other, more demanding activities. This does not apply only to fields like industrial manufacturing, more associated with the newest technologies. Administrative and office tasks can equally well be automated. Printing, scanning, archiving extensive binders full of files? That’s not just a waste of paper and storage space, but the most valuable resource – time.

As the office case shows, improving processes doesn’t need to mean full-blown digitization from Day 1. Digital transformation can be successfully handled in stages. Let’s take as an example invoice processing.

Simple and more complex automation

This way, effectiveness can be measured, personnel has time to adapt to new processes, and the structure functions smoothly without disruptions.

How does it look from a digital service provider perspective?

Usually, there are a few scenarios possible. Either the company has traditional processes and requires digitalization, or some operational areas are already digital (or partially digital), but are unsuited to the needs. Or, the process on the clients’ side needs both digitalization and optimization. While each situation requires different solutions, the cooperation has similar conduct in all of them.

Starting with the overarching question: How could you improve a process with digital transformation?, we begin with the analysis of the company’s operations. Only after getting to know its needs and requirements, together we choose processes for digitalization. While drafting a strategy, we agree on subsequent stages to follow during its implementation. As digital transformation is not an all-or-nothing undertaking, we decide on MVPs and the criteria for their assessment.

Before we introduce improvements into the company’s structure, it’s time to engage its personnel. Changes need not only to be announced but also the staff should – and in many cases, can – be involved in the process. While data gives valuable insights into core operations, asking daily users for their feedback and ideas on what to improve can shorten the time needed to come up with a working solution.

Without knowing where we’re heading it would be hard to decide if we’ve made it there, thus measuring the digital transformation requires adequate meters. This is also the time to consider A/B testing. Split testing of two or more variants helps us to assess which version performs better. If the existing method falls short compared to available alternatives, it’s time to consider improvements. Each improvement can undergo A/B testing again until the right solution is in place.

Digitalization doesn’t end with taking the nearest technology and throwing it in the middle of a working structure. Changes resulting from the transformation can take more time and thus will require proper management. We say this to emphasize that most technologies can’t be treated as miracle cures and be left to do wonders unsupervised. It’s the other way around. Once the digital enhancements aimed at improving process effectiveness are ready for operation, it’s time to observe, measure, and, if necessary – improve the solution. The strategy for improving processes can change in the course of action and thus, should be observed.

The first pancake is always spoiled

Not necessarily, no. Beginnings may be challenging, especially for organizations that have been functioning traditionally. It does however get better with time. What we wholeheartedly advise is to engage in the transformation process with moderation and avoid a hype-driven revolution.

Digitalizing a single, standalone area within the enterprise is a good starting point. When we turn the whole structure upside down in most cases it will cause chaos. Instead, minor changes can bring significant improvements – when carried out properly. In addition, going for the low-hanging fruit can be a great start to the transformation. Identifying processes that are easily digitalized and produce excellent results will encourage the company to move to the next stages.

Some distinguish digital transformation from digital improvement, but when it comes to tangible results, we won’t argue about semantics. That’s true, some changes may not seem spectacular when looking from the outside, yet they bring satisfactory results. Even if the digital enhancement may seem too isolated from a full-blown transformation, it’s the result that matters.

Heart and soul of digital transformation

Risk of poor process improvement

Change is always burdened with risk, and so is the implementation of new solutions. Proper preparation, not only in terms of the sole process but also of the staff is necessary. What needs to be emphasized is that introducing technology into operations shouldn’t be feared by personnel. Actions ceded to digital solutions will unburden the workforce and allow to relocate the resources within the company. As a result, staff can take on more demanding tasks instead of continuing repetitive work.

The takeaway

Digital transformation is not an extreme home makeover. It doesn’t happen overnight, giving instant and spectacular results. But let us tell you something – it’s better this way. Improving process effectiveness with digital transformation requires attention, analysis, and expertise, and as such – brings tangible, lasting results.

A well-thought-out and well-informed change is certainly worth a shot in the dynamically changing world. Do you want to learn more? You can drop us a line and book your one-hour free consultation, where we’ll dwell on your needs and what our team can do to be the digital transformation partner you need. Up for some more reads? See what we have in store for you and find out more about whether your company is ready for digital transformation, what digital transformation actually is, and how does it turn out in particular industries, like the healthcare field.

Articles – NeuroSYS: AI & Custom Software Development

Test post for Zapier

Experiments setup

Idea 1: Check the impact of analyzers on IR performance

Idea 2: Dig deeper into the scoring mechanism

Idea 3: Check the impact of different scoring functions

Idea 4: Tune Okapi BM25 parameters

Idea 5: Add extra data to your index with custom filters

Synonyms

Automatic – WordNet synonyms

Meaningful synonyms

Impact of phonemes

Idea 6: Add extra fields to your index

Idea 7: Optimize the query

Conclusion

Elasticsearch – search optimization ideas 2

Experiments setup

Idea 1: Check the impact of analyzers on IR performance

Idea 2: Dig deeper into the scoring mechanism

Idea 3: Check the impact of different scoring functions

Idea 4: Tune Okapi BM25 parameters

Idea 5: Add extra data to your index with custom filters

Synonyms

Automatic – WordNet synonyms

Meaningful synonyms

Impact of phonemes

Idea 6: Add extra fields to your index

Idea 7: Optimize the query

Conclusion

What is a custom web application? And what it certainly is not

In this article

What is not a custom web application?

Off-the-shelf apps

Customized web app

When is the off-the-shelf better?

The definition of a custom web app

The benefits of building a custom product

Unique features

Scalability

Independence

Reusability

Custom web application development techniques

Ideation

Planning

Design

Development

Tests and deploy

Wrap-up

Elasticsearch – introduction to key concepts

In this article

The ambition behind this article

Step 1: Understand what is Elasticsearch, and what is a search engine

Database vs. Elasticsearch

Step 2: Understand when not to use Elasticsearch

Step 3: Understand the scoring mechanism

Okapi BM25

Step 4: Understand the mechanism of text pre-processing

Analyzers

Filters

Tokenizers

Step 5: Understand different types of queries

Queries

Bool query

Full text queries

Boosting

Conclusion

What is user research? Overview of types and methods

In this article

What is user research?

What is key in user research?

Questions to ask yourself

Types of user research

What?

How?

When?

Five user research methods

Competitor analysis

Target group observation

Focus groups

User interviews