Debugging UBIK

One of the most complex challenges when working on any software project is to debug unintended behavior. In UBIK^®, there is an inherent structure to every project, which we can exploit for debugging. Let's find out, how.

Quick-fix check list

Many issues can be resolved by going through the following check list.

Check settings and configurations for typos, missing entries and other errors
Restart UBIK^® Studio and reconnect to your DB to avoid caching issues
Check whether all plugins were loaded correctly
In case the custom code was changed, or UBIK^® was upgraded to a new version:
1. Compile and publish the customizing (F6)
2. Restart the Enterprise Service
3. Restart all Web Services
In case the data model for the client was changed:
1. Rebuild and publish the ACM meta definitions using the ACM manager
2. Restart all web services
Restart the UBIK^® client application to make sure new meta definitions and content are received

A general policy for debugging

Debugging can be approached methodically. Here's a basic plan for debugging software.

Reproduction: Get all available, relevant information about the bug and confirm the problem in a test setup
Inspection: Inspect the actual behavior to understand the cause
Fix: Design and implement a solution
Retest: Test the fix

Debugging a UBIK^® project

[edit]

Reproduction
Inspection
Hypothesizing
Fix: Performance Problems
Fix: Crashes
Fix: Faulty data
Fix: Other misbehavior

Reproduction

Full Test System

To reproduce the problem with UBIK^®, you require a test setup. This usually means creating a local copy of the affected database, and installing the UBIK^® products relevant for the problem. It is important to use the same binaries, plugins and versions as in the system where the problem occurred. Then, we can try to provoke the reported issue in the test setup. This might require getting more information about the issue.

Isolation Testing

If a full test setup is not feasible, isolating a (presumably) faulty part and testing it individually often makes sense.

In UBIK^® Studio, there are two tools for this:

Who-Bert Debugging Tool
View Test Tool

Both can be used to test the behavior of UBIK^® objects (and custom code) on the server side. With Who-Bert code and manually created test data, you can additionally set up a "mock" or "fake" situation, to test the behavior under very specific circumstances. The View Test Tool simulates how the web service assembles data for the client, ignoring the ACM meta definitions (context, scopes etc.).

Another way to isolation-test your Plugin code is writing unit tests, which is strongly encouraged.

Inspection

Once you have a test setup and are able to reproduce the issue, you can inspect what's happening in detail to find out why the problem occurs. This can be done either by debugging with Visual Studio, or by producing diagnostic output in the form of log entries, UBIK^® objects and property values, or UI customizing.

Inspect the mobile client

Use the Developer Mode to inspect the currently visible view models and their values.
Inspect the log files of the mobile client, including the web service client log.

Inspect the web services or the Enterprise Service

Inspect the log files of the web service or Enterprise Service.
Modify your plugin or programmatic customizing to output log messages describing the state of your program at critical points.
Modify your plugin or programmatic customizing to write diagnostic UBIK^® objects describing the state of your program at critical points.
Use a Who-Bert script to test a specific setup and output log messages to the console.

Hypothesizing

In order to narrow down the cause of the problem, we can try to formulate an idea of what could have gone wrong. Optimally, we actually go and look for a proof, to see it happen in action, but it's always good to know potential error sources. In general, there are several common types of problems, and from another perspective, a set of common sources for such problems.

Visualizing the architecture and algorithm

In order to come up with a good hypothesis, you must understand the architecture and algorithm at work. This means you have to find out which UBIK^® products and modules are involved and how the affected use-case is implemented in the project.

The UBIK platform architecture

Nearly all use-cases in UBIK^® projects are either related to the mobile client or to interfacing with 3rd party systems. Though the specific implementation can be very different from others, the general flow of information throughout UBIK^® modules will almost always be similar. If there is a problem, it has to occur in one of the respective steps, caused by one of the listed dependencies.

Mobile client

The mobile client requests data from the Content web service, using
- Hardware (network, client hardware)
- Profile settings
- Credentials
The web service establishes a connection to the UBIK Environment, using
- The network
- The web service configuration
- The database
- Injected UBIK^® Plugins
- The programmatic customizing
The web service tries to perform the requested action based on
- Hardware (network, server hardware)
- The data model
- The programmatic customizing
- The View configuration
- Content data
The client receives the result and tries to display it depending on
- Hardware (network, server & client hardware)
- The XAML customizing
- The ACM meta definitions
- Content data

Interfacing

Similarly, interfaces to other systems like SAP or Comos usually perform the following workflow:

Somebody (or something) configures a UBIK^® Enterprise Service task using its web service interface or a configuration file.
An ES run is triggered, most likely using the Windows Task Scheduler.
The ES establishes a connection to the UBIK Environment, using
- The network
- The web service configuration
- The database
- Injected UBIK^® Plugins
- The programmatic customizing
The ES tries to perform the requested action, usually based on
- Hardware (network, server hardware)
- The data model
- The programmatic customizing
- The Proxy configuration
- Content data
- The external system

In this case, the UBIK^® Proxy mechanism is an additional source of complexity; but there's a separate article for that.

Types of problems

Performance issues

Performance issues can be caused by:

Hardware problems, e.g., slow network or weak devices
Huge loads of data
Inefficient algorithms (with poor scalability)

A combination of the above is no rarity.

Sometimes, weak hardware is something we cannot change easily. In many cases, we can optimize our algorithm to make it perform well even on weak hardware. In other cases, we can restructure the problem or the data in a way that makes it easier to process. Before we can solve the problem though, we have to find the cause. For now, this means finding the bottle neck. Is the internet connection too slow, do we work with too large amounts of data, is our interface algorithm taking too long, or do we launch too many web requests?

Crashes

Crashes often are caused by bugs in the software. In some cases customizing could be able to crash the app, too. In case of a bug in {{UBIK}, once you have identified the faulty module and pinned down the reproduction, please notify the development department by creating a support ticket. Until the bug is fixed, you might want to find a workaround to avoid the crash.

If the customizing is at fault, the most likely source of a crash is an unhandled exception. E.g., a null-reference exception: the program tries to access a property or function of an object that actually is null.

Missing or erroneous data

Data that doesn't look as we expected it can make our customizing (both on the server and on the client) misbehave. The simplest example is a value being NULL when we assume it isn't, which usually leads to a NullReferenceException (and potentially, even a crash) in C#. But there are many other potential problems, for example a value being outside of an expected range or a value being technically acceptable, but semantically wrong. E.g., for some reason we could have a document "chair.jpg", that is a picture of a table, because the import confused a mapping.

Missing or erroneous data can be caused by:

The data having been imported erroneously from an external system
Data was input wrong manually
A misconfiguration transforming the data or making the data unavailable on the client (or other presenting module)
- User rights
- Faulty filters
- Incomplete ACM/View customizing
A bug in the UI customizing, so the data is just hidden or presented wrong

Other misbehavior

The app or interface just doesn't behave as expected. The technical concept is sound, the input data looks fine, but the result is wrong. In this case it's a good approach to get a complete list of prerequisites and check one thing after another, separately. Surprisingly often, it's something like a typo in the settings.

If you've been looking for ages and still can't find the error, you probably need some distance from the problem. Make a pause, sleep, do something else. Then, try to get an overview and a plan before you get back to debugging. Even the most hopeless problems usually look different after a good night's sleep.

Finally, if you can't find a good hypothesis - that's not a problem. Try to inspect what's actually happening instead, and the hypotheses will follow.

Modules and problem sources

Different types of problems are not the only categories we can think in. UBIK^® is a complex ecosystem with multiple products and many modules, and it's using other products and frameworks to do its job. Hence, the problem can be caused by different sources. It even can be caused in one place and, as a consequence, surface in another. Here's a list of potential problem causes (a combination of multiple points is possible):

Infrastructure
- (Network) hardware problem
- Network security restriction
- User rights restriction
Web Service, Studio or Enterprise Service
- A manual step was forgotten (rebuilding the custom code, releasing the ACM meta definitions, restarting the web service, ...)
- Erroneous data (unexpected values provoke the problem)
- Wrong configuration (a configuration file or an object is misconfigured, potentially including ACM and Proxy/IF configuration)
- Plugin code (a standard or customer plugin has a bug)
- Custom code (custom code of meta classes or the custom code library has a bug)
Client App
- Erroneous data (unexpected values provoke the problem)
- Wrong configuration (the profile or a configuration object coming from the server is misconfigured)
- UI customizing (some XAML contains an error)
- Core implementation (the app itself has a bug)

Fix: Performance Problems

If you're in the technical design stage, you've already found out the reason for the performance issues. In case of a hardware or infrastructure bottle neck, you can either try to get better circumstances - or adapt to them, optimizing your solution.

Often, the bottle neck is the network connection or the mobile application. However, optimization is also required if the infrastructure is fine but UBIK^® takes too long to process the use-case.

In both scenarios, we want to apply the following measures:

Leverage strengths instead of weak points
Perform as few processing steps as possible
Partition the problem into several smaller problems

Leverage strengths

Usually, the server is strong and fast, the mobile device not so much, and the network is a performance graveyard. If you want to waste as much performance and time as possible, then you try to maximize the amount of network interactions and shift all the workload to the client application. Vice-versa, leveraging the strengths in UBIK^® means to shift all the calculation and preparation to the server and deliver the results in a most compact way to the client in one request-response cycle. Often, this means you have to create a new data model on the server to reflect what you want to show on the client, and to use programmatic customizing to prepare it. Even if you have all the data already on the server side, it often pays off to restructure it, just for the client, so it can use basic features to just show the data. So, the rule of thumb is: The less client customizing you need to do, the better.

Minimize processing steps

If an algorithm scales badly, even a super computer can be too slow to perform well.

The idea of time complexity in computer science is:

Every step in a program consumes a certain amount of processing time. If we have N steps, and one step takes x seconds, we have x * N seconds in total.
Most programs have loops or recursions (often even loops inside of loops).
This can lead to the situation where the number of steps N is not a fixed number but a (complicated) function of your input data n: N = f(n).
E.g., if you have two loops inside of each other, for each of which you iterate over all n inputs, then N = n * n, because we're doing n iterations in the outer loop and for every one of those, we do n iterations in the inner loop.
In this case, we can say the time complexity of our program is O(n²).

The simple consequence of this is that we should avoid situations where we have a lot of combinations.

Fortunately, there is a way to solve this: caching. Caching means remembering things you already did so you don´t have to do them again.

Our goal is to find things we do repeatedly - and then extract them and do them only once in the beginning, and later reuse the result. In the simplest case, this means you have to create a variable you can use to store the reusable result. In more complex cases, you can use a data storage structure so you can collect and retrieve your data efficiently. Quite often, you can use a hash map or Dictionary, but depending on what you need, other structures can be better of course.

Partition the problem

As an engineer, one wants to provide the best solution for a user. Often, this means as few clicks and navigation steps as possible, and all the required information on one page. However, this can be very expensive, because we have to aggregate so much data in one place. Also, there might be an even more user-friendly approach, because sometimes, too much at once isn't the best solution.

I'm going to make up a very abstract and stupid example: Consider the requirement that a user wants to choose a pair of shoes to wear, with a showcase video for every shoe. Let's imagine the user has a huge amount of shoes, like, thousands. Showing them all at once might be computationally expensive, and also it would be a bit overwhelming. Instead, maybe we can group the shoes in the (overlapping) categories indoor/outdoor and elegant/functional, and color. The user has to perform a few additional navigation steps, but on the other hand, they have to make that choice anyway. We even help them selecting a pair of shoes by leading them through the right choices. As a nice side-effect, the result consists of much fewer shoes, so it's computationally cheaper to load all the videos. Optimally, the parameters for the filtering can be inferred even without the user inputting them explicitly, e.g., by looking at the weather and the user's calendar (sunny weather, hiking trip: probably not the rain boots).

Anyway, in some cases the use-case can be rearranged so the amount of data and information presented to the user at one point in time is smaller.

Fix: Crashes

As explained in the hypothesizing section, crashes usually happen because of an unhandled exception being thrown by some module.

The basic approach to solve crashes consists of two measures:

Avoid the crash by checking for the problematic circumstance provoking the crash (e.g., check for a null-reference)
Find out where the problem originally comes from. For this reason, also log all relevant details when the above check prevents a crash.

The second point implies that the situation leading to the crash is not the real problem. The real problem is either that the situation shouldn't occur in the first place or that the program cannot deal with that case; maybe it's a buggy dependency or erroneous input data.

Fix: Faulty data

For faulty data, we have to find out where it comes from and solve the problem at its source (or as close to it as possible). Nearly always, it's much harder to deal with erroneous data when processing it than where it originates. The reason for that being, it is much harder to infer the correct data from erroneous data than to prevent the data being wrong. Consider the following fictional example: In a project, we import a lot of text, but somehow during the import, all line breaks get lost. It's hard to find out afterwards where the line breaks should be. In that case, it's better to fix the corrupting element and repeat the import.

The rule of thumb here is: Don't try to cope with the faulty data when processing or showing it. Instead, fix the problem at the source and repair the data by reimporting.

Fix: Other misbehavior

Maybe the issue is a simple typo or wrong setting and you can fix the problem with a simple measure. Since you're reading this, the solution might not be so simple and we have to approach it conceptually.

This depends a lot on the specific problem, but most misbehavior arises because the implementation currently in place is not well-designed architecturally.

Interestingly enough, this in turn often is because of a bad functional design. This means, if one hasn't defined the behavior nicely, one can also not define a nice technical solution for it. The rule of thumb here is to re-evaluate what the target behavior should be (in contrast to what the designated behavior is currently believed to be).

If the designated behavior is clear but the implementation still misbehaves, the implementation probably has to be redesigned for clarity and architectural correctness.