Experimentation key concepts & best practices

Nick Gavriil

Jul 5, 20245 min read

Updated: Jul 6, 2024

In this post we will outline some good experimentation practices that can get you from 0 experiments per week to a consistent flow of weekly experiments.

Tools & processes

Let's start from the basics. Usually the biggest blockers to experimentation are the lack of data, reporting and experimentation infrastructure. So if you are behind on data infrastructure I would start from there, as >80% of an analyst's time is spent on data processing that can be done within the DWH (data warehouse).

In order to better demonstrate the tools and processes needed, we need to first outline the stages of an experiment. A minimal set of stages could be:

Design
Trial
Analysis
Reporting

So businesses will usually have a task management software to manage their experiments across their lifecycle (e.g. Asana, Notion), an AB testing software to run their tests (e.g. Optimizely), the same AB testing software or an experienced Analyst to measure impact and then a BI tool (e.g. Power BI) or presentation software (e.g. Google slides) for reporting. Alternatively you can use Causalysis as you can do all of the above within the same platform.

Hierarchy of experimental methods

When it comes to experimentation efficiency it's really important to understand your options and when it's best to use each. As with everything in life, it's all about trade-offs.

The "king", when it comes to experimentation methods, is Randomized Control Trials (RCTs). Under this setup, the experiment units are randomly allocated in different groups and a treatment is prescribed in each group. While the terminology is borrowed from the medical literature, a treatment can be anything like the color of a button or the price of a product. The randomization process is there to ensure that nothing else can impact the outcome besides the treatment and that is what makes this method very accurate. In addition, RCTs are very efficient as you can measure the treatment effect of multiple variants at the same time instead of changing one parameter at a time (parallel vs. sequential experimentation).

A lower efficiency experimentation method would be the pre-post method. Here we apply a treatment to everyone in a group and we measure the impact on that group's outcome. This can be a good alternative when RCTs are not feasible. For example, if you don't have randomization software yet, then you'd have to go for a pre-post experiment. Other cases involve situations where it would be unethical or illegal to apply RCTs. For example, if your customers complain because they didn't get the service they paid for, you can't refund some of them and not others. RCTs can reach good accuracy but with a higher cost. This could mean that by changing your price back and forth every two weeks you will eventually manage to remove the impact of other factors that could have impacted any one of the trials.

The final option would be natural experiments (observational data). In that case, we rely on exogenous events that have caused variation in the data we are interested in. We then analyze that data, hoping our domain knowledge is enough to include all potential confounding variables. This option requires more complex techniques and usually calls for an experienced analyst in collaboration with a domain expert to perform the analysis.

After understanding the above it's clear that whenever AB testing is possible it should be the method of choice. However, you still need to be able to extract knowledge in cases where AB testing is not feasible. A tool like Causalysis can help you analyze any type of experiment carrying the weight of the technical side as long as you can bring the domain expertise.

Experiment Programs

An experiment program at its core is a collection of experiments that share common levers and outcomes.

For example:

pricing and its impact on revenue
paid ads and their impact on revenue

Basically, an experiment program should include experiments where the conclusion of one adds knowledge that can be applied to other experiments of the same program. Additionally, the analysis methods or experiment design and tooling should be very similar. In that sense, an experiment program rewards the experiment owner with domain knowledge and leads to increasing gains in efficiency.

Another way to think about experiment programs is like a finite grid of actions that once tested will provide a conclusion around the value of the lever(s) on your metric of choice.

Causalysis is designed around the concept of experiment programs as we've found that it significantly improves the efficiency and knowledge retention of teams that use them.

Experiment Planning

Let's go through a few reasons why experiments fail:

The target audience has not been estimated and the experiment ends up addressing a very small audience with no meaningful impact for the business.
The same experiment has been done in the past without success. The lack of an experiment plan and institutional memory led to the repetition of the same trial.
A vital step of the experiment (that should have been documented in the plan) has been skipped, leading to an error that causes damages to clients.
The arguments behind the hypothesis have no data backing.
There is no target or target metric for the experiment. The owner ends up scanning through the list of metrics for a metric that has been positively impacted even if the metric is irrelevant to the problem.

There are so many ways that an experiment can go wrong. These losses can really hinder your progress. Keep in mind that many times an error takes no effort to happen but a lot of effort to be corrected. Good planning is the backbone of experimentation quality and velocity will come with practice and time. Without any quality though, increasing velocity will only lead to more mistakes, increasing the probability of the appearance of a fatal one.

Causalysis offers a dedicated space for your experiment design as well as features for experiment reviews and comments in order to support you with high quality planning for your experiments.

Experiment Sequence Design

There should be a strategy behind the design of root and follow up experiments. Each program might have a different experiment sequence design. The high level idea is that experiments should start simple and get more complex as long as value is increasing with each iteration.

So, for example, if you want to experiment on your pricing you would test in the following order:

A single offering
Multiple offerings
Personalized offerings
Add-ons etc.

If you start with an expensive and sophisticated solution that hasn't proven value, you are taking an unnecessary risk and you'd be better off building your experiments incrementally.

With Causalysis you can link follow up experiments in order to track their performance holistically and ensure that further additions keep adding value to your business.

Conclusion

Experimentation is fun and building that muscle really pays dividends for your business. Doing it right is critical to your success and thus investment in the right tools and talent can go a long way.