2025 Winter Internship

Carl Flottmann | Winter Intern | 19 August 2025

Introduction

Over June and July in 2025, I was offered the opportunity to complete a four-week internship with Fortian’s Managed Security Services (MSS) team. This was an opportunity for me to immerse myself in a company that is cybersecurity and learn about their Security Operations Centre (SOC) and all the moving parts that make it run. As I arrived in Melbourne and met the team, they had concocted a selection of projects focusing on real scenarios they had faced in their investigations. At this little cyber tasting bar, I had discussions with the team and did some research into each before arriving at this selection.

The SOC team perform comprehensive investigations to provide accurate resolutions to potential incidents, of which a key component is analysis of logs from client infrastructure and services. Some logs may not be automatically ingested as part of day-to-day monitoring services, or a client’s SIEM solution may not include all logs necessary for a comprehensive investigation, meaning they must be obtained and analysed separately. The main challenge is collating all these log sources together in a centralised location where we can leverage powerful analytical tools (simple CTRL+F or grep will not cut it). Many SIEMs do not allow for arbitrary log analysis, and different sources produce different log formats, structured (like CSV or JSON) or unstructured, but consistent, text. Some data analytics platforms offer powerful analytical capabilities but will only ingest certain structured formats.

We need a centralised location to perform ad hoc log investigations that leverage powerful analytical capabilities of a data analytics platform and will accept arbitrary log files of varying formats. This solution must maintain the privacy of client data and provide secure access. Ideally, this solution will be easy for the team to understand or use formats and infrastructure they already understand and be cost effective.

Getting Started with Azure Data Explorer

The first step was to get friendly with my computer mouse and start setting things up manually before diving into any automation. In my initial brainstorming with the team, we found that Azure Data Explorer (ADX) may be a suitable platform. ADX is a powerful big data analytics tool from Microsoft that is part of the Azure suite, offering various data ingestion methods into clusters of databases and tables which can be queried using the team’s favourite query language, Kusto Query Language (KQL). Click happy, I set up an ADX cluster in a sandbox, made a database, and investigated how I could upload my sample log files. ADX has a “Get Data” options, where you can upload data from a few different sources, including local files. It accepts various common file formats, JSON, CSV, TSV, and more, but my log file was in this format:

Whilst consistent (always “key=value”), this was not a supported format by ADX. Each line was ingested as a single column, and now our KQL queries become a mess of string manipulation. There was also a lot of clicking involved in setting this process up, my mouse is looking a bit worn out, and my keyboard feels unloved.

ADX offers many ways to ingest data. We need a way to standardise the log files into a format it can ingest and minimises the amount of manual setup required.

Logstash

Logstash is an open-source extensible data processing pipeline that can be downloaded and run as a binary locally. It has a wide range of input and output plugins, meaning it can accept data from a variety of sources (local files, Azure blob storage, TCP/UDP connections, another Logstash instance) and similarly output to a variety of locations (an ADX table, Elasticsearch, an on-disk CSV, another Logstash instance). In between the input and output configurations we can place filters, custom configurations governing how input data is manipulated and modified before being sent to the output. Filters are also offered through a range of plugins, such as parsing of standard structured formats (JSON, CSV, XML), the dissect filter for extracting fields with delimiters, grok for converting unstructured data to fields, or the kv filter for data in the format “key=value” (of particular use for our earlier example). The pipeline looks like this:

A diagram of a diagram of a computerAI-generated content may be incorrect.

‍

Azure maintains and develops an open-source Kusto output plugin for Logstash, allowing data to be sent to an ADX table with an appropriate JSON mapping defined in it. This JSON mapping can be specified in a KQL query, and maps JSON data fields to table columns, like so:

A screen shot of a computerAI-generated content may be incorrect.

‍

This means the table columns and JSON mapping must be known before running Logstash. It authenticates to ADX from outside the Azure network using an app registration that has “Ingestor” permissions on the database and/or table and has a valid client secret.

A diagram of a data processing processAI-generated content may be incorrect.

‍

After writing a configuration file that takes input from my local log sample, parses it into JSON fields using the kv filter, and creating an app registration and client secret, I can run Logstash and watch as my bundle of log entries are organised by ADX into query-friendly columns in my table.

I mentioned before that the table columns and JSON mapping must be known and created before running Logstash. This means that, to avoid requiring the team members to write their own custom filters for each log format, we need readily available table definitions, mappings, and Logstash filters for the team to use.

Logstash allows you to specify a “type” field in the input block and later perform conditional checks on it in subsequent blocks. This field can be used to select the appropriate filter and output for different inputs. The plan then becomes this:

A diagram of a computer systemAI-generated content may be incorrect.

‍

We maintain a team-wide repository containing a collection of filters for different log formats, and corresponding KQL scripts with table and mapping definitions. When a team member encounters a new log format, they write a new filter and set of KQL commands and contribute to the repository. In the input, a unique string “type” can be specified, and later used in a conditional check in individual filters to ensure they are only applied to logs of that “type”. Logstash allows you to specify a directory of configuration files, and concatenates them together, so provided those conditional checks are in place, a team member only needs to provide that directory.

Azure Infrastructure

We have now a working solution that can ingest arbitrary log files into ADX, but there are still a few key considerations left. There are still a few too many clicks involved withstanding up the required Azure infrastructure, but it is not cost effective to leave it running as these investigations only happen on demand. There is also the issue of secure access, Logstash requires that client secret to authenticate. How is this key managed? What permissions are needed for different operations within the required infrastructure?

Key Rotation and Secure Storage for an Azure App Registration

The client secret required by the app registration must be stored securely in a way all team members can access and must also be regularly rotated to maintain our security posture. Azure offers a secure remote location for such uses, Key Vault, where client secrets, keys, and certificates can be safely stored and accessed based on RBAC permission provisioning. Implementing a mechanism to regularly and securely rotate these keys however is no small task. With the assistance of the brainpower of several MSS team members, we devised a potential solution.

A diagram of a key systemAI-generated content may be incorrect.

‍

An Azure Function App containing some custom code can be implemented that automatically performs the functions of creating the new key in the app registration and copying it into the key vault. This function app is triggered from a cron job that may run on some regular interval (daily, weekly, monthly) that checks if the keys in the key vault are closed to expiry. This means for a short period of time, after a new key is created, there will be two valid keys, giving user time to the new one, before Azure automatically invalidates them based on the expiry times. An alternative to running a cron job that is functionally equivalent would be to use an Azure event grid, where the key vault emits an event when a key is close to expiring, and the function app subscribes to this event, and is triggered when it is emitted.

‍

I wrote a simple script using Azure CLI that can pull valid keys from the key vault and set them as environment variables in the current session for Logstash to read from. The following diagram demonstrates the data flow:

A diagram of a key rotationAI-generated content may be incorrect.

‍

The ability to modify an app registration’s credentials and create secrets in the key vault is part of quite a privileged role in Azure RBAC, which may lead to the function app able to perform overly permissive actions. I did not have time to implement this solution, nor find a way to adequately provision the required permissions for the logic app. The choice between using a cron job and an event grid will be based on what resources are consumed for each, so which is more cost effective, as they are functionally equivalent.

Infrastructure Automation

The main question with our infrastructure is on what resources need, or can, be ephemeral, and what resources need, or can, be permanent. Our full solution currently looks like this (minus the key rotation):

A diagram of a data flowAI-generated content may be incorrect.

‍

App registrations can be created and left available at no additional cost, and it is inconvenient for it to be ephemeral. Similarly, the key vault also costs very little to leave running and maintain (in the order of cents per tens of thousands of transactions) for our use case, so can also be permanent. The ADX cluster and its contents are where the bulk of the cost for this solution will come from, so these can be ephemeral resources. The cluster is only created if it does not already exist, and databases are spun up per investigation, and torn down when the investigation is closed out. How can we quickly stand up and tear down Azure infrastructure?

Terraform! It is an infrastructure as code tool that lets you specify resources in terraform code files and use terraform tools to create the resources from those specifications. This means that the infrastructure can easily be created and destroyed when needed and can also be version controlled and updated/modified by the team as this project evolves.

Conclusion

At the end of this internship, we now have a design and initial codebase that provides a centralised location for investigations with ADX, leveraging the power of KQL for advanced data analytics. The solution supports ad hoc investigations through terraform for on-demand infrastructure and local Logstash instances, which are run when required, and can ingest log files of varying formats using the team-wide configuration codebase and send the parsed data straight to ADX.

I have had a great time completing this internship with the team at Fortian, familiarising myself with SOC operations and diving deep into the tools of the trade.

The team was highly supportive and collaborated with me on this project, and showed genuine interest in the project and the value it may bring to their investigative capabilities.

Fortian is a great environment for an intern with a passion for cybersecurity, with many experienced security experts from a variety of backgrounds, all happy to chat and discuss things with you.

I thank everyone at Fortian who helped me throughout the internship, it has been a blast!

Bonus Round!

In my investigations I also tested out an alternative infrastructure that used Logstash inside an Azure Virtual Machine (VM) that authenticated using a Managed Identity (MI). A MI is an identity that can be assigned RBAC permissions in a similar way users of the platform may be assigned permissions. They come in two flavours:

System-Assigned MI: this is something you can enable on a resource, and is an identity attached to that resource for the duration of its lifecycle. This can only be used by that resource and is destroyed with the resource.

User-Assigned MI: this is a stand-alone resource itself, which can be assigned to one or more Azure resources. It is something that needs to be managed separately as its own resource.

Logstash’s Kusto output plugin offers, as an alternative to an app registration with a client secret, authentication using an MI (system-assigned or user-assigned). To do this, it requires access to Azure Instance Metadata Service (IMDS), which provides information about currently running VM instances, and is only accessible at a non-routable IP address within the Azure network (so only accessible within the VM). Testing this out, I assigned a VM a user-assigned MI, and assigned that same MI to an ADX cluster and gave it Ingestor permissions:

A diagram of a software applicationAI-generated content may be incorrect.

‍

This was successful as well, but the local files had to be uploaded to the virtual machine, which creates a bit of an intermediary file transfer step. So instead of needing to manage application secrets, we would instead need to manage secure access to a virtual machine.

Fortian Winter Internship 2025