Much of Fraym’s work is geospatial, with the majority of the input data coming in the raster format. Raster data in this context is simply a grid of pixels, where each pixel tells us something about that location on the earth’s surface. Fraym’s raster products rely on a diverse set of input rasters. These inputs come in inconsistent formats, units, and standards that need to be processed before any machine learning task can begin. This is a labor intensive process that the Data Science team is responsible for. A recent accomplishment of the team was the creation of a new raster process to help streamline raster data production. This blog first delves into how we approach implementing new features to our existing codebase, before delving into how we approached refactoring to manage technical debt.
The Design Phase
A major challenge the team faced was to design for flexibility, because we wanted to easily add additional features based on our users’ experience after deployment. The existing structure of the code base did not make this task easy, as different processes functioned largely independently while sharing a large amount of logic and code duplication. In response, we identified the need to refactor existing pipelines in two ways: 1. break down the complex logics into steps to add flexibility, and 2. consolidate the logics of existing and new processes to add clarity. This flexible design was an outcome of close collaborations between one of our subject matter experts, Ali Filipovic, and Sheng Pei Wang, a Data Engineer and new hire at the time. The contrast in their familiarity with the product along with unique perspectives allowed them to come up with fresh ideas without reinventing the wheel.
The Refactoring Phase
Even with careful task planning, many tasks were interdependent and needed to happen in a specific order to accomplish our goals. To ensure that the team integrated changes quickly, one team member took on the role of primary reviewer and prioritized supporting other team members. To accelerate the integration, we democratized task assignment. As a project team, rather than assigning all tasks at the onset, we let each team member choose their next task as they finished one. This way, the team was able to follow task orders in tandem. Additionally, the team encouraged small and tightly scoped tasks to ensure a smooth operation. Rather than mandating sizes of Pull Requests, the team prioritized changing only one idea at a time. This helped ensure that reviewers could quickly understand what changes took place and decreased the idle time per Pull Request. Together the team delivered the refactoring phase collaboratively, efficiently, and within the planned time frame.
This Workflow Led To Other Desirable Outcomes
- A high velocity of change because team members faced minimal delays caused by interdependencies from other, concurrent tasks.
- A robust final product that combined perspectives of both the designers and the implementer, taking full advantage of our diverse technical backgrounds.
- The workflow encouraged close collaboration among the whole team, making it an exhilarating learning opportunity for all involved.
The Implementation Phase
Careful consideration during the design phase allowed our team to implement the new features easily, and with grace. When building the new process, the new structure was designed to minimize any code duplication. Since we established the architecture and public interface at the beginning, the team was even able to build and test new functional features in parallel with the refactoring phase.
Overall, Fraym is extremely proud of our collaborative environment and amazing talent that allows us to accomplish difficult tasks and continue to strive for excellence. It is our commitment to these values that allow us to efficiently and effectively remain at the cutting edge of the machine learning and geospatial markets. Many thanks to our wonderful and talented team members who undertook this major process and worked together so smoothly to make it happen.