work | Jacob Dunefsky's personal page

Work.

On this page, you may find an overview of some of my work experience in software engineering, computer science research, and more.

Research Intern at IBM Research in Hybrid Cloud

May 2023 to August 2023

During the summer of 2023, I returned to IBM Research in the Hybrid Cloud division as an intern at IBM Research in the Hybrid Cloud division. There, I developed a set of benchmarks that simulated computationally-intensive optical proximity correction workloads in automated chip design. I designed and carried out experiments in order to determine the data-flow and control-flow architectures that are most efficient for this type of workload; this entailed deploying the benchmark systems on an OpenShift cluster using Kubernetes.

Developing these benchmarks required, in addition to distributed systems programming skills, the ability to step back and think through the space of possible architectures from a theoretical perspective. Doing so yielded a set of axes along which different possible architectures -- each corresponding to a different sort of workload -- could vary. The result of this was a principled understanding of the tradeoffs incurred by different architectures for different workloads, along with an understanding of which architectures were Pareto-dominated by others.

I very much appreciated having the opportunity to work on this project: it both allowed me to get my hands dirty building and deploying distributed systems, and also gave me the chance to engage with more theoretical work.

Research Intern at IBM Research in Hybrid Cloud

May 2022 to August 2022

During the summer of 2022, I was fortunate to be able to work as an intern at IBM Research in the Hybrid Cloud division. The project on which I primarily worked was to develop a debugger for KAR, a runtime framework being developed currently at IBM.

KAR aims to make it easier for developers to build reliable distributed applications. The runtime implements an actor model, in which various components of the application are realized as single-threaded actors which communicate via message-passing. Features that KAR provides include automatic safety-preserving retries of failed message invocations, automatic actor routing, persistent state, reconciliation on node failure, and more.

When developing a KAR application, traditional logging and tracing tools for distributed systems can be used to aid in debugging. But, to our knowledge, none of these tools provide the functionality and experience of a traditional debugger, where one sets breakpoints, pauses the application, and inspects/modifies state in real-time. The debugger that I worked on this summer, however, does provide such an experience; we believe that it is the first such debugger for distributed applications.

Features of the debugger include, but are not limited to, the following:

Users can set breakpoints on method invocations (and method returns) based on complex conditions on the arguments/return values of the method.
Breakpoints can have different user-specified behaviors (e.g. pause only the actor that triggered the breakpoint; pause the entire node; pause the entire application).
Users can view information about currently in-flight invocations along with historical invocations. This information can be filtered based on a query language.
Users can single-step through call chains, with similar semantics to those of a traditional debugger.
The debugger can detect deadlocks in the application; it will print out the resource dependency cycle responsible for the deadlock.
Easy deployment as a part of a Kubernetes cluster along with the rest of the application.
Modularity and support for file formats such as JSON that allow for use in shell scripts.

Developing this debugger required crafting an architecture that allowed for a traditional debugger UX, while accounting for the asynchronous flow of events and cloud deployment requirements. The final architecture consists of debugger code in the KAR runtime itself, which speaks to a debugger server running inside of the application cluster. Then, the user sends debugger client commands from their local machine, which are processed by the debugger server.

The debugger is already available as a part of the public KAR Github repository. Working on this project -- in addition to exercising my system-building skills, allowing me to engage with distributed systems problems, exposing me to technologies such as Kafka and WebSockets, and helping me practice my ability to design a solid user experience -- also gave me experience in producing an end-to-end product which can be integrated as a part of a larger codebase.

Overall, I am grateful to have been able to work on the KAR debugger, and grateful to my manager and teammates (Edward Epstein, David Grove, Olivier Tardieu) for the help that they provided. I believe that the KAR debugger can embody a new paradigm for debugging distributed applications.

Research Intern with Yale University FLINT Group

February 2021 to September 2021

During my gap semester and into the summer of 2021, I worked as a research intern with the FLINT Group at Yale University, whose overall mission is "to develop a novel and practical programming infrastructure for constructing large-scale certified systems software".

One area in which such certified systems is incredibly important is the field of autonomous vehicles, since even the slightest bug in software controlling a self-driving car can result in a tragedy. Therefore, the FLINT group is taking steps to utilize their certifiable software framework in the development of self-driving car systems. It was here that I made my contributions as research intern.

To begin with, the basic self-driving car system had to be developed. I thus implemented an autonomous vehicle controller in Python using OpenCV, which controlled a car simulated with the CARLA simulator. This controller was capable of automatically navigating between two user-chosen points on a highway, following lanes and changing lanes at the appropriate time.

Once this foundation was developed, it was time to add safety guarantees. This came in the form of a safety controller which would be running within an ARM TrustZone Secure World. The safety controller needed to be able to detect when the car was located too close to an obstacle; if such an obstacle was detected, then the safety controller would cut the throttle and deploy the brakes.

In addition to the work of writing the safety controller itself, I had to figure out how to get the data from CARLA (the simulator) to the safety controller, and how to get the commands from the safety controller to CARLA. This entailed:

finding a compact data format that allowed the necessary data to be transmitted over a UART cable,
figuring out how to write this data from a Linux userland program to a QEMU emulated UART,
writing driver code to read and write via this emulated UART,
and writing code to make this data available to the userland safety controller.

In the end, I was able to go deep into the weeds of writing systems code (even modifying QEMU source in order to fix one particularly nasty bug). For this reason, I was grateful to have this opportunity to apply the knowledge I learned in classes like Operating Systems and Self-Driving Cars, in a research setting.

Backend Developer with Yale Entrepeneurial Society

February 2021 to September 2021

I worked as a backend developer with the "YES Internships" team of Yale Entrepeneurial Society, a team of Yale students who have been developing a website that connects budding entrepreneurs with startups seeking interns. Already, the website has been in use at universities including Yale, Harvard, Princeton, UPenn, and Columbia.

As backend developer, my role was to develop, maintain, and take ownership of APIs written in Python, using AWS and Serverless technology, such as DynamoDB, Cognito, S3, and Lambda. This entailed writing out specs based on the requirements of the other subteams, such as frontend and design, and then implementing those specs.

One of the APIs which I built out was responsible for managing partner organizations from other university campuses. For this API, I wrote extensive documentation, targeted not just towards other developers on the YES Internships team, but also towards non-technical recruiters. I also developed an API which manages recruiters and startups, intended to replace a previous Google Forms-based solution.

During my time with the YES Internships, I was able to not just experience the fast-paced development culture of a startup, but also work as a part of a talented team of developers, designers, and managers.

Research Intern at Burke Medical Research Institute

June 2016 to January 2020

Throughout high school and into my first year of college, I developed from scratch WellPATH, an Android application used in clinical trials by Stanford, Johns Hopkins, and Weill-Cornell psychologists to reduce suicidal ideation in elderly patients. This entailed continuously working hand-in-hand with a team of psychologists (non-technical domain experts) in order to translate their requirements into the actual mobile application.

The application, designed to be used in conjunction with a regular therapy routine, walks patients through a series of steps which both monitors the user's mental state and provides strategies that could immediately be used to deal with suicidal ideation. Other features included:

a scheduling system, such that at regular intervals, the app checks in with the patient;
a file browser, used to integrate video messages from the therapists with the app; and
a desktop administrative interface, allowing for usage statistics to be analyzed and configuration to be deployed to multiple devices.

In the end, a paper was published related to this work. Over the course of my time working on WellPATH, I became more skilled in Android development, obtained experience in communicating with domain experts, maintained and improved a single project over many years — and most importantly, was able to see how technology can be used for good.