Blog #9 — Spotting Emerging I.T. Adoption Trends in the Federal Market

Leif Ulstrup
9 min readNov 23, 2020

--

Gaining an Analytics Edge Using Federal Spending Open Data Series

Context

Premise of this Blog Series: Publicly available “open data” sources and open-source technology analytics can provide information to reduce uncertainty when making critical business investment decisions.

Purpose: Demonstrate to executives how easy it is to access and analyze free, open-data sources and motivate them to expect more insight and context from their team when making crucial go/no-go investment decisions on bids and new offerings.

Scope: US Federal Contracting Market.

TL;DR — Analyze GB’s of US Federal spending data with a few lines of Python code. Uncover competitive insights from information and analytics.

Previous Blog Post in this series can be found at this link.

Introduction

Many professional services companies in the Federal market have successfully expanded their business by riding an information technology (I.T.) adoption wave. Whether your company is a system integrator, technology specialists, business consulting, or staff augmentation firms, there are growth and profit opportunities if you can gain a competitive advantage before services around a growing technology are commoditized. Those advantages include:

  • a trusted partnership with the technology vendor,
  • opportunities to develop mission and function-specific add-ons,
  • work plan estimation and risk management insights,
  • the attraction of talent,
  • marketing attention for early wins with customers,
  • valuable past performance, and
  • an opportunity to provide thought leadership in influencing prospective customer adoption of the technology.

How does one spot these growth trends in the Federal market (and declines of older technologies) before the broader market catches on?

Many larger firms in the federal market subscribe to I.T. industry analysts such as Gartner, Forrester, and IDC. Those services provide excellent insights on emerging technologies, the relative strengths of technology companies in a specific segment (e.g., Gartner’s famous Magic Quadrant), and analysis of how commercial organizations are adopting the technologies. They are excellent resources. These services can help develop an inventory of technology types and specific companies to focus your investment strategy.

The purpose of this blog is to demonstrate ways you can use Federal open data sources and open-source software to look for broad market signals and explore Agency-specific buying patterns around emerging technology.

Let’s get started with the Analysis

We will use USAspending.gov open data for this analysis (we will use beta.SAM.gov opportunity and award archive text with NLP tools in a future blog).

In previous blogs in this series, I’ve explained how to download USAspending.gov archives.

The Python code for this post can be found at this link -— https://github.com/leifulstrup/USAspending_Medium_Blog_Analytics_Series/blob/master/Blog_9b_Simple_Ways_to_Identify_Federal_Technology_Buying_Trends_github.ipynb.

I will restrict my analysis to the Government Fiscal Year (GFY) 2016–2020 period (October 2015-September 2020). When writing this, the GFY2020 data is close to complete but is still missing data for the last ~60 days from DoD. Keep that in mind, when you interpret the charts that follow. By the time you replicate this analysis, you may have the complete GFY2020 files (they should be near complete by mid-January 2021) for a more thorough examination of the trends.

See this note embedded on the USAspending.gov site about attribution of D&B data and “D&B Open Data” that is embedded in the download data and USAspending.gov website reports such as the one below — https://www.usaspending.gov/db_info

My first data processing step is to load USAspending.gov data from GFY2016–2020 into my jupyter notebook, where I will use Python and several open-source add-on packages to analyze the data. I will restrict the fields (aka columns) read into memory to a fraction (~14) of the total fields available from each source record (~280). Doing this will speed processing and reduce the memory needed. The GFY2016-GFY2020 data fits into ~3.3GB of memory in a Python pandas dataframe.

Next, read in the USAspending.gov data:

It has ~27 millions obligation action records and uses ~3.3GB of RAM.

The next step is to build a master list of popular I.T. software products and brand names, including emerging technology companies. For ease in this example, I will read a list of top 2020 software products (aka platforms) from a market research website,

I supplement that with brands and technologies that I track. Hopefully, many of those will overlap.

Once you have a search terms list, we can use lightweight text searching to find matching text records in the award_description field. This technique is a crude but simple approach, and there is significant room to improve this (future blog discussion).

This simple approach will include noise since it is not uncommon for technology names to have multiple meanings in different contexts. For instance, the term “zoom” refers to the video conferencing technology and appears in the name of other technologies and can be a verb or adjective describing a feature of different requirements or systems. This approach will help us with a broad sweep to spot trends and require additional work to refine and validate those trends. I’ll demonstrate the use of NLP tools to improve these searches in future blog posts.

I’ll start this process with a brute-force approach and elementary text searching and add more complexity to refine the analysis.

I’ve restricted the search to Federal product_or_service _codes that start with the I.T. products category ‘70’, “D” (I.T. and Telecomm services and solutions), and “R” (professional services).

This subset is ~4.3M records of ~27.4M records in the complete GFY2016-GFY2020 data source. The code below cycles through the list of I.T. brands and product names and assembles a pandas dataframe to look for trends at a maco-level.

The dataframe include spending associated with that search term and counts of records that cite the search term. These are gross estimates. Also, I split up some products by the same company, such as Microsoft’s Azure cloud platform and Amazon’s AWS or “Amazon Web Services,” to see citations specific to those offerings. For a more rigorous analysis, I need to recombine those under the parent company’s name. Here is an example of that:

Convert to Pivot Table Format for Easier Trend Spotting

The next step is to create pivot table versions of the records to identify trends in mentions (‘count’) and spending (‘federal_action_obligation’) over time and compute some year-over-year (YoY) ratios. Viewing the records in this way makes it easy to sort the data and also plot trends. The spending amounts listed are not one-for-one with the products. It only means those obligation dollars are associated with that term. It might be the case where multiple technologies appear in the text (e.g., Splunk hosted on AWS), and I have allocated the same spending to both, which oversates the amounts. The purpose of this stage of analysis is to paint a broad picture to motivate a detailed analysis.

The spending analysis gives one a sense of the market’s magnitude, but the count of references to the product or brand may be an even better clue of picking up signals on buying trends. For instance, a sort of the 2019_vs_2019_ratio of counts for various search terms highlights references to technologies growing in mentions relative to a small base for most of them (note: the “999.00” is artificial where it would be infinite otherwise). These might be early signals of emerging interest in these technologies.

Trends in Count of References to the Technology Name
Trends in Obligations Associated with References to the Technology Name

This table shows some of the faster growing reference counts for the larger brands (at least 100 references) — note that GFY20 is incomplete:

This table shows GFY2019 vs GFY2018 declines in references to the larger, established technologies (greater than 100 references):

The declining references may be erroneous due to the multiple ways that a company or its products are referenced (e.g., Amazon vs Amazon Web Services vs AWS). More thorough analysis is needed to fully understand the trends. Also, a moving-average may make more sense since a large acquisition may not be replicated in the next. These references include services contract references also to these technologies and are NOT exclusively software (SW) purchases. Analysis of beta.SAM.gov award and opportunity text and attachments is needed to fully understand the trends.

Exploration of the Growing RPA Segment

One of the hot growth areas in enterprise software is Robotic Process Automation (RPA). You can see firms such as UIPath, Blue Prism, Automation Anywhere, WorkFusion, and Pegasystems growing in citations.

It is notable that the fourth member of the Gartner Magic Quadrant for RPA, Workfusion, is missing from the list even though it is part of the search terms list.

Is this an opportunity for a partnership with a Federal systems integrator to introduce them to the market?

Maybe the Federal market is not part of WorkFusion’s priorities like it is for their other Gartner Magic Quadrant competitors.

This chart shows the growing spending associated with these RPA firms:

Here is a table that shows which Agencies are adopting these RPA tools and how much they are spending in conjunction with references to the tools:

Here is a view of RPA tool adoption by Agency:

There are other tools in the RPA category. Expanding the list may uncover more growth opportunities for this popular segment of the enterprise I.T. market. You can products and company names of interest using the code from this example as a starting point for your analysis.

Though this approach only provides a rough estimate of the trends for the most popular information technology brands and products, it gives an analyst some ‘signal’ to explore in-depth.

Also, there is ambiguity in the names of technology companies/products with other related concepts (e.g., Stripe can refer to the company and many other I.T. associated products such as magnetic stripe readers). One needs to fully use the Federal product_or_service_code system to narrow the search further and hope that everything is coded correctly.

In a future blog, I will use NLP tools to explore the beta.SAM.gov opportunity and awards archive to see what else we can learn about how the government describes technologies in award and opportunity announcements to look for deeper pattern in the requirements that specific technologies are satisfying.

The Python code for this post can be found at this linkhttps://github.com/leifulstrup/USAspending_Medium_Blog_Analytics_Series/blob/master/Blog_9b_Simple_Ways_to_Identify_Federal_Technology_Buying_Trends_github.ipynb.

See this note embedded on the USAspending.gov site about attribution of D&B data and “D&B Open Data” that is embedded in the download data and USAspending.gov website reports such as the one below — https://www.usaspending.gov/db_info

Coming Attractions

I will explore more analytical and market topics in future blog posts — including more market-share trend analysis and NLP techniques using USAspending.gov and beta.SAM.gov award and opportunity textual data and mashups with other open data sources.

The code listed in the examples above and more can be found here:

https://github.com/leifulstrup/USAspending_Medium_Blog_Analytics_Series

MIT Open Source License Copyright 2020 Leif C Ulstrup

--

--