Background
Consolidated Edison (ConEd) services over 10 million customers including electric, gas, and steam services. Having been in the business for over 200 years, the company partners with a vast network of vendors to provide quality service – from customer billing to business operations.
The integration team plays a critical role in managing the transmission and transformation of messages across disparate departments processing millions of meter reads, hundreds of thousands of customer bills, and other business messages through integration pipelines every single day. These business processes are handled in a hybrid cloud environment, which contributes to the complexity of the overall system. Development in these areas requires precision and excellence to ensure your middleware is correct as well as all upstream and downstream systems are not negatively impacted.
Challenge
When implementing a new customer billing system, the integration team had to create hundreds of new business-critical interfaces. Due to the importance of success, the team wanted to increase their visibility into the system and better manage their failure cases. The volume of data, messages, and systems that the integrations spanned meant that tracking down an issue could mean talking to multiple teams, inspecting many different logs from external sources, and sifting through mountains of data, to identify the root cause. The integration team needed a faster way to perform root-cause analysis as well as to be able to provide interface execution and failure details to the affected teams managing those external resources. Because of the complexity and size of the system, the integration team wanted to provide more context into the integrations’ behavior to all parties to help make judgments from a place of full information.
Solution
When large enterprises decide to implement a new system, it can be comparable to asking developers to juggle ten different priorities while constructing the foundation beneath them. IntelliTect was brought in to help with the effort to ensure everything was accurately documented and aligned throughout the process. To increase visibility into the integrated solution required by the new billing system for both business users and developers, decrease the integration team’s response time when responding to a failure, and aid in root cause analysis, IntelliTect architected an event-based logging system to capture and manage the status of all integration interfaces.
The initial vision was to leverage one of the client’s existing solutions and stand it up in the center of the new set of integrations. After the first round of requirements gathering and investigation, IntelliTect concluded that while it was possible, the architecture from the existing solution would not be able to handle the volume the team truly needed. Imagine over 700,000 integration executions happening daily, each needing to log a series of messages and checkpoints that the team wanted to monitor in real-time. IntelliTect proposed a completely new architecture, one built using cloud resources and industry-tested design patterns with a scale-out model. The new architecture was both more performant and efficient.
The visualization paradigm of a Flight Definition mapping to a business integration’s execution used by IntelliTect was based on the pre-existing client solution which both the integration team and IntelliTect agreed was a good representation. In a real-world Flight Definition, you may be traveling from Spokane, Washington to London, England, but you have a layover in Chicago, Illinois. The resulting Flight would be the following:
- Hop 1: Leave from Spokane
- Hop 2: Arrive at Chicago
- Hop 3: Leave Chicago
- Hop 4: Arrive at London
In the event of a failure, the Flight would track where you were last seen and any information regarding why it failed to complete.
The design of the new message processor was predicated on two major concerns: how the data was getting from the integration to the message processor and how the data was going to be processed once received. With Azure as the customer’s chosen cloud platform, Azure Service Bus was used as the central message broker for the message processor. The event-driven approach was designed to be a publish-subscribe model to ensure that integrations were not impacted by the process of publishing the message to the Azure Service Bus. IntelliTect implemented the following mechanisms to publish messages from any external system:
- Provided a REST endpoint to publish messages
- Designed and developed custom BizTalk Send and Receive pipelines to publish a message directly from BizTalk asynchronously (most commonly used option)
- Created Azure Data Factory pipelines to publish a message asynchronously
Once published to the Azure Service Bus, the messages were consumed by a set of Azure Functions which were then recorded in an Azure SQL database initiating the “Message Processing Engine.” The Message Processing Engine was designed as a chain of responsibility handlers, each serving its own function to map a set of JSON messages to a resulting set of interface runs or Flights. Each message would first be validated before it would then be correlated to a new or existing Flight.
The Processing Engine used an Azure Service Bus trigger to subscribe to the messages dropped on the topic to kick off the chain of responsibility handlers. One major improvement in performance was when IntelliTect tested and implemented Batch Message triggers. By processing messages as a collection, they could be sorted and grouped to increase the efficiency of the correlation process, reducing overall processing time.
The final piece of the solution was the “Flight Status Board.” The original design for the dashboard was to offer a simple search screen where users could look at a single Flight at a time. After deploying an early version of the status board, IntelliTect realized there were more opportunities presented by the information. IntelliTect pushed and designed a more full-featured dashboard that provided visibility to authenticated users wherein they were able to self-serve, reducing both the time it took to identify the issue and the impact on the middleware team. The final dashboard brought a visual aspect to the information and allowed users to search the Flight data, visualize the trends of the system, view the configured reports, and various administrative functions such as managing invalid or pending messages, configuring notifications, and managing the Flight Definitions. As a result, developers, testers, and business users utilized the dashboard across all environments from Development through Production to verify the behavior of the integrated system.
To help the users respond to important issues and to call out specific interfaces that require immediate attention, IntelliTect designed a dynamic email notification system. Notifications can be configured based on the specified Flight Definition, their status, and even as granular as the individual Flight Hop so that each owner of the different systems could be alerted that their step completed successfully or otherwise.
To give the users a better view of the system landscape and subsequently drill down directly into the important information, aggregation screens on the dashboard were created with data visualizations to show the Flight Data over a selected period of time and give them drilldown capabilities into each data point. This feature could be used on a single Flight Definition or across a set of Definitions to get a picture of the entire system. IntelliTect designed these screens to serve as a single stop for checking in on the system.
The final new feature that IntelliTect added to aid in proactive responses from the integration team was a set of reports that would be sent out at specified time intervals to highlight the important data that had occurred recently. These reports were sent out as emails and could be designated as important depending on configurable thresholds.
The “Flight Status Board” and “Message Processing Engine” utilized various Azure services to develop a highly available, asynchronous, durable, and distributed engine to aid a central integration team in root cause analysis and error tracking.
IntelliTect’s expertise gave us real-time visibility into our integrations, drastically reducing response times and improving efficiency. The Flight Status Board is now an essential tool across our organization.
-David Byrne, Department Manager of Enterprise Data and Analytics
Outcome
The solutions used standard stage containment and testing at each step of the way, with self-driven exploration and new features on IntelliTect’s part being the motivating factors for improvement within the design. The system was put under strenuous testing to ensure the system could keep up with the scale that production required. The efficiency of the system’s design and performance enhancements and testing coupled with the dynamic scale out paradigm used in Azure helped to fulfill the requirement of near real-time data while ensuring cost remained low especially during quiet times in the system.
After implementation, the Flight Status Board and Message Processing Engine provided near real-time data for a central integration team at an enterprise level across their integrated systems and helped provide clarity into the status of their interfaces while also enabling the team to provide more accurate and timelier root-cause analysis. The solution was broadcast to a larger audience within the enterprise and became a must-have for all new integrations. Other internal teams within ConEd reached out to IntelliTect’s resources asking for the Flight Status Board to be implemented for their integrations as well. Further teams have since been added to the system due to its success and continue to help improve their visibility. This speaks volumes to the efficacy of the system if it is being touted across the organization.

Does Your Organization Need a Similar Solution?
Let’s chat about how we can help you achieve excellence on your next project!