Annual Report for 2023

Every year, I get annual reports for some organizations that I’m affiliated with (e.g., the Simons Foundation). The Annual Report summarizes what the organization accomplished through grant-making, in-house research, outreach activities, and so on. The report is both reflective and forward-looking. In the final blog post of this year, I will write a brief annual report for myself.

Harvard Commencement 2023

Technically, I earned my Ph.D. in 2022: I defended and submitted my dissertation in the 2022 calendar year. But I did not attend the 2022 commencement ceremony as I knew it would take significant effort to have a subset of my extended family attend the ceremony. In 2023, the Alabi family showed up and it was a glorious way to officially culminate my graduate career. Sometimes, I think the general public underestimates how much support from friends/family is necessary to succeed in academia (especially, during the latter years of the Ph.D. when the dissertation writing process is more likely to be more isolating than earlier years). I am blessed to have a strong support network of friends and family.

NaijaCoder

The NaijaCoder 2023 summer camp took place in Abuja, Nigeria. Alida Monaco and I were the main instructors for the class of 2023. The camp ran for 2 weeks. Every day, we had 6-hour lectures with a lunch break. Despite the intense schedule, the students were really engaged in class 🔥🧠.

The 2023 camp was physically located in the premises of Lifegate Academy in Abuja. Mr. Anywanwu Ebere is the head of schools; he was pivotal in getting the board of the academy to approve our use of their premises for our activities. We had guest lectures, from EducationUSA (from the U.S. Embassy in Nigeria) and GIEVA (Global Integrated Education Volunteers Education), during which they discussed study-abroad opportunities mostly targeted to U.S. schools. We also wrote up some results of our research on early algorithms education in Nigeria. Check it out: https://arxiv.org/abs/2310.20488 (to appear at SIGCSE 2024).

Planning is underway for the 2024 iteration. Owing to increased demand, we will host the program in Abuja and Lagos. Hopefully, more students from the southern parts of Nigeria can attend the program. There will be more instructors, more participants, and more food. Going forward, we would like to maintain the same rigor as we scale instruction to more participants.

Simons Foundation Junior Fellowship

I am in the middle of my 2nd year as a Junior Fellow in the Simons Society of Fellows. Fellows are expected to attend the weekly dinners. When I’m in town, I go to the dinners. It is always fun to hang out with fellow Junior Fellows and gain wisdom from the Senior Fellows. In March 2023, the Simons Society of Fellows held a retreat at the Ritz Carlton in Sarasota, Florida. For me, the highlight of the trip was going birdwatching!

The Simons Junior Fellowship is a grant, in the applicant’s name, given to an institution in NYC. As such, I am hosted at Columbia University as a post-doc. This year, most of the papers I published centered around data privacy and graph generation algorithms. I have also begun exploring some topics in quantum information. It is nice to have a post-doc that affords me the opportunity to explore interests outside my dissertation topic.

This year, I also spent some time learning from scientists at the Flatiron Institute at the Simons Foundation. I have one ongoing project, with a friend at Flatiron, which I hope to continue in 2024.

2024

Next year, I will continue, at the same pace, with research and my non-profit work. Also, I plan to read more books that are not directly related to my research (e.g., just started reading “Surely You’re Joking, Mr. Feynman!”). Finally, I’m looking forward to the 2024 Simons Society of Fellows retreat in San Juan, Puerto Rico.

Data Markets for Federated Learning

The Database Community (e.g., see this symposium on data markets) has recently been championing frameworks for data access, search, commodification, manipulation, extraction, refinement, and storage. I heard about data markets from Eugene Wu; it seems like a market area and research opportunity that is ripe for exploration.

In recent work that was presented in-person by Jerry at VLDB 2023, we wrote about a data search platform (called Saibot) that satisfies differential privacy. Essentially, the main algorithm is able to identify augmentations (join or union compatible via the group operations +, x) that will lead to highly accurate models (the evaluation objective is the \ell_2 metric but it scales to other objectives as well). This has implications for improving data quality (e..g, perhaps one can identify the right augmentations that will lead to better outcomes) and heterogenous collaboration of all kinds. We evaluate our algorithms on over 300 datasets and compare to leading alternative mechanisms.

Opportunities and Challenges in Data Markets

  1. Data Quality and Accuracy: In my opinion, the biggest challenge to the proliferation of data markets is the availability of high-quality data. No amount of analytical sophistication can get over the basic problem of high-quality data. For example, there are certain subgroups in America (e.g., African-American females) that are under-represented in datasets about academia. In fact, most academic departments in the U.S. do not even have any African-Amerian females. So if a social science researcher wishes to study the academic progression of women in academia and observe trends, the researcher cannot make broad claims about departments that do not even have a single Black woman. So first the researcher must seek out data sources of higher quality. e.g., by including data from HBCUs (Historically Black Colleges and Universities).
  2. Privacy and Security Concerns: Suppose a hospital has data on patient check-ins, health, characteristics, and disorders. If released, the data could help researchers gain valuable insight about diseases in specific areas. Unfortunately, it is known that exactly releasing aggregate information about individuals (even from datasets that are “anonymized”) could lead to de-anonymization/re-identification attacks. Our work on Saibot provides mechanisms to ensure that data search platforms satisfy certain notions of differential privacy.
  3. Collaboration and Knowledge Sharing: Data markets encourage collaboration between organizations and industries. They facilitate the sharing of knowledge and expertise, breaking down silos (especially within academia) and fostering a culture of collective problem-solving. However, one could ask: how much collaboration—between industries—is needed to solve a problem or achieve a certain level of accuracy for statistical models? This problem needs further study.
  4. Economic Value: Some technology companies (e.g., Netflix and Facebook) earn their value proposition (almost) entirely from having lots of users and interactions on their platforms. Having more specific forms of data (e.g., the data on African-American females) could give companies a competitive advantage in data markets. So having access to data markets can create new revenue streams. I would personally like to see more economic analysis of the value of data markets!

Recap: INFORMS 2023 and the Applied Probability Society

I attended INFORMS (for the first time!) 2023, hosted in Phoenix, Arizona 🥵 . It was a nice experience overall! I mostly attended the Applied Probability Society sessions during the conference.

About INFORMS

The Institute for Operations Research and the Management Sciences, or INFORMS, is the world’s largest professional society dedicated to operations research and analytics. With a mission to promote the scientific approach to decision-making, INFORMS plays a critical role in connecting researchers, practitioners, and educators, fostering a vibrant community dedicated to advancing related fields: operations research, statistics, computer science, mathematics, and so on. I also learned a fair bit about what the fields of revenue and supply-chain management are about. The 4-day program had 84 tracks, 11 major tutorials, and hundreds of sessions (one of which I chaired).

Applied Probability Society (APS)

The society is “concerned with the application of probability theory to systems that involve random phenomena” and “members include practitioners, educators, and researchers with backgrounds in business, engineering, statistics, mathematics, economics, computer science, and other applied sciences.” I attended the APS business meeting, where the inaugural Blackwell Award was presented (David Blackwell was an INFORMS fellow) and other APS-specific issues were discussed.

APS Session on “Optimization over Probability Distributions”

I chaired a session with the following talks:

1) Abdul Canatar (Flatiron Institute) on “Out-of-Distribution Generalization in Kernel Regression” https://arxiv.org/abs/2106.02261

2) Prayaag Venkat (Harvard) on “Near-optimal fitting of ellipsoids to random points” https://arxiv.org/abs/2208.09493

3) Ellen Vitercik (Stanford) on “Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty” https://arxiv.org/abs/2302.09700

4) R. Srikant (UIUC) on “Crowdsourcing with Hard and Easy Tasks”

5) Daniel Alabi (Columbia) on “Degree Distribution Identifiability of Stochastic Kronecker Graphs” https://arxiv.org/abs/2310.00171

Until the conference, I hadn’t heard the speakers talk about these specific works. So the APS session was a direct way to learn about what they have been up to recently. Overall, I learned a lot from the conference and I’m looking forward to attending future iterations.