Premise of this Blog Series: Publicly available “open data” sources and open-source technology analytics can provide information to reduce uncertainty when making critical business investment decisions.
Purpose: Demonstrate to executives how easy it is to access and analyze free, open-data sources and motivate them to expect more insight and context from their team when making crucial go/no-go investment decisions on bids and new offerings.
Scope: US Federal Contracting Market.
TL;DR — Analyze GB’s of US Federal spending data with a few lines of Python code. Uncover competitive insights from information and analytics.
In earlier Medium.com blog posts in this series, I focused on the mechanics of downloading and preparing USAspending.gov Federal contractor obligations (spending) data using Python and its ecosystem of open-source packages such as pandas and Dask. In this post, I demonstrate how a business analyst and strategist can use this information to analyze the data in ways that provide context for business investment decisions.
This blog uses Federal contractor spending from Government Fiscal Year (GFY) 2010 through GFY2019 and covers these topics:
(1) Relative and absolute changes in expenditures by Federal “Product and Services Code” categories (PSC_Cat).
(2) Analysis of contractor spending on work designated by Federal procurement officers as “IT and Telecommunications” using the criteria “predominant product or service that is being purchased.” Exploring the expansion and contraction of contractor spending at the Department, Agency, and contractor levels.
The Federal government codes every contractor obligation with a designator that indicates the predominant type of work in that contract obligation. Per the instructions in the “Federal Procurement Data System Product and Service Codes Manual”:
“The Product and Service Codes (PSC) Manual provides codes to describe products, services, and research and development (R&D) purchased by the federal government. These codes indicate “WHAT” was bought for each contract action reported in the Federal Procurement Data System (FPDS). “
“These product/service codes are used to record the products and services being purchased by the Federal Government. In many cases, a given contract/task order/purchase order will include more than one product and/or service. In such cases, the “Product or Service Code” data element code should be selected based on the predominant product or service that is being purchased.
For example, a contract for $1000 of lumber and $500 of pipe would be coded under 5510, “Lumber & Related Wood Materials.”
This coding of obligations with the “Product and Service Codes” (“PSC_Cat”) gives an analyst a good starting point to develop market context using the USAspending.gov obligation transaction data. Every obligation transaction is coded with one of the designators from the scheme described here:
Based on guidance from this link.
Here is a sample of the major category codes:
Here is a detailed breakdown of the detailed codes under the PSC_Cat “D” category for “IT and Telecommunications”:
Caveats: When analyzing obligations for a specific product or service, this coding structure creates a broad picture of the spending. To get a detailed and more fine-grained picture, you may need to study the original RFP and Award documents posted on beta.sam.gov (former FBO.gov). Also, note that IT-related spending may be coded in the ‘D’ codes for integrated services and products, while independent hardware and software purchases can usually be found in PSC_Cat ‘70’. This document shows an even more complex hierarchy of the relationship of the detailed PSC_Categories and topics such as “information technology,” “medical”, “human capital”, etc. For Information Technology (IT), it is not unusual to find significant content in programs coded in the ‘R’ family of services — “SUPPORT SVCS (PROF, ADMIN, MGMT).” Depending on the fidelity needed and your analysis’s purpose, you may need to go beyond the depth of the examples I show below.
Big Picture Context
I usually start my analysis by calculating analytics for the broader outline of the study subject and then dig deeper from multiple angles and levels. Doing this also helps as a way to check whether the results make sense.
For instance, I usually calculate GFY annual totals and then reference those later in the form of checksums to confirm that I have not lost something in translation. When I first started working with the USAspending data and the Python pandas tools, I was not familiar with how pandas computes sums when it encounters NaN (blank values). I discovered that it stops counting when it encounters a NaN instead of treating them as 0.0. You will see in my data preparation the uses of the pandas fillna(0.0) and fillna(“UNSPECIFIED”) to address this. It is not unusual in formulating pandas selections, slices, and other translations to mistakenly create the wrong calculation, so liberal use of Python assert statements to check values can be valuable as you refine your code.
For this analysis, I start with a quick calculation and graph plotting the total obligations for each GFY from 2010–2019:
Next, I load the reference “Product and Service Code” Excel sheet and use that for subsequent calculations and reports.
A fast and straightforward (the source data has ~46 million records) pandas groupby([‘action_date_fiscal_year’, ‘PSC_Cat’]) and summation of federal obligations reveal the distribution of spending on the major categories of work:
A translation to a pivot_table structure and sorting with a plot shows the relative spending by major PSC_Category and the fraction of total spending for FY19, FY15 (market bottom), and FY10. One can see that the product PSC_Category “1” (10–19) that includes everything from Aircraft and Ships to Ammunition and Weapons has grown from ~14% of contractor obligations in GFY10 to ~18% in GFY19 while R&D has dropped from ~10.5% to ~8%.
The next plot shows the top six (6) categories over time with a thick black line that shows the relationship to total contractor spending bottomed in GFY2015 and recovered in GFY2018 to GFY2010 levels (this graph does not consider the inflationary differences between GFY10 and GFY19):
Since I am personally most familiar with the work in PSC_Cat codes ‘D’ (IT) and ‘R’ (Contract Services), I narrow the analysis to those categories and plot that relationship:
We now have context as we dive further into the obligation transactions.
Exploring PSC_Cat “D” — IT and Telecommunications Spending
The first step is to restrict my analysis to a restricted view of the data. I select obligations where the df_PSC_Cat_D = df.query(“PSC_Cat == ‘D’”) and restrict to GFY16-GFY19. That takes ~2 seconds to extract that view from the ~46 million records.
Next, I use the pandas pivot_table (similar to Excel or Google Sheets Pivot Table functionality) to reorganize the view of the data.
I then calculate the three (3) year Compound-Annual-Growth-Rate (CAGR) (GFY2019 vs. GFY2016). Here are the top ten (10) largest Departments:
The GFY16 to GFY19 overall Federal CAGR was ~7.5%, while the growth for PSC_Cat “D” was:
- overall market: 8.9%
- DOD: 10.3%
- Civilian Departments: 7.8%
The Departments with PSC_Cat D spending greater than $50 million/year in GFY19 and higher than the 8.9% CAGR:
The lowest growth Departments:
We can take this analysis to a lower level and inspect where the dollar volume of growth was by Agency inside Departments:
Outside of DoD, the biggest dollar growth from GFY16 to GFY19 was the VA at #1, the US Census Bureau preparing for the 2020 Decennial, HHS/CMS, and Commerce’s NOAA. The VA PSC_Cat D spending grew from ~$2.7B to ~$4.1B => ~47% increase in spending. The increased VA spending is larger than the GFY19 annual budgets for every Department except for DoD, HHS, DHS, DOC (due to US Census Bureau Decennial spending), and Treasury!
Some Agencies inside Departments saw significant reductions in PSC_Cat D spending too:
Analysis of Contractors that Benefited from the Increased Spending
With the context at the Department and Agency level, we can explore which contractors were the most successful in capturing PSC_Cat D market share faster than market growth rates from GFY16-GFY19. The PSC_Cat D market three (3) year CAGR was ~8.9% (~20% faster per year than the overall Federal market grew, which is a big difference with compounding growth). For the ten (10) largest recipients of PSC_Cat D contractor obligations in GFY19, this is how they performed:
Note that Perspecta emerged from M&A, and Cerner jumped into this list very recently. Understanding the baseline for the Perspecta requires more analysis. Cerner experienced impressive GFY19/GFY18 growth of ~$723M/~$490M => ~48% YoY.
How much of the growth was due to M&A, and how much from ‘organic’ expansion of the legacy business? Given the size of these businesses and the amount of M&A over the last several years, answering that requires a deeper dive into the obligation transactions that I won’t do here. It is possible to assess that by looking at the additional fields in the USAspending.gov obligation records.
Which contractors with at least $50M/year (GFY19) in obligations in PSC_Cat D work had the highest PSC_Cat D CAGR?
Which contractors increased their annual prime obligations for PSC_Cat D work the most?
Which contractors with at least $350M/year (GFY19) in PSC_Cat D work obligations had PSC_Cat D CAGR’s that exceeded the 8.9% CAGR benchmark (sorted from high to low)?
Exporting Data for Data Sharing with Colleagues
Once you get the data into a format and record size you need, you can easily export the data into CSV or Excel format files using the convenient pandas to_csv or to_excel functions. Excel and Google Worksheets have limits on the rows and columns, so you need to winnow the data down to a size that can be imported into those packages first. I frequently use Python pandas to organize and pre-process data and import that smaller dataset into Excel or Google Worksheets.
The Excel Pivot Table functionality is powerful, easy to use, and fast. It is often an excellent choice for quick analysis and sharing with others. You have lots of options once you get the data into thousands of rows and tens of columns from tens of millions of rows and ~300 columns.
You can import the data into other popular business intelligence (BI) tools such as Tableau, Microsoft Power BI, Microstrategy, Google Data Studio, etc.
There are other tools emerging that will host interactive analytics on a cloud service using Python open-source tools that are options also. This enables you to both share the data but also build your analytics tools using the power of the Python open-source ecosystem. Some of these are:
- holoviz.org on a platform such as Heroku, AWS, Azure, Google Cloud
- colab — I used Google’s colab for my second blog in the series to demonstrate how to access the USAspending.gov API’s directly without having to stage the GB’s of obligation records on your machine or cloud storage directory
I hope this example inspires you to use some of these techniques to explore relevant topics to your business and the strategic and tactical business investments you must make. The ~280 columns of the original records include information on contract vehicles used, task order expiration dates, the number of bidders on a contract, set-aside status, “contracting agency” in addition to “funding agency,” and many more fields. Many important market insights can be extracted from this open-data source. You can assess how your business performs relative to other firms and guide your business investment decisions, such as selecting strategic accounts to invest in and significant business development pursuits.
Previous post: #7 USAspending.gov Analytics — Time Series Forecasting with Facebook’s Open Sou
Previous Blog Post in this series: #7 USAspending.gov Analytics — Time Series Forecasting with Facebook’s Open Source Prophet Package
I will explore more analytical and market topics in future blog posts — including more market-share trend analysis and NLP techniques using USAspending.gov data and mashups with other open data sources.
The code listed in the examples above and more can be found here:
MIT Open Source License Copyright 2020 Leif C Ulstrup