An automated pair programmer

An automated pair programmer: fact or fiction?

Oege de Moor, Alex Gravely, Albert Ziegler – GitHub OCTO. August 31st, 2020

Executive Summary

We evaluate the use of OpenAI’s language models trained on source code, for the specific task of Python code synthesis from natural language descriptions.

Our findings are as follows:

  • Out of 233 hand-crafted programming exercises supplied by 30 GitHub engineers, 93% are successfully solved. The exercises include StackOverflow-type problems involving the use of an unfamiliar package, as well as programming challenges typically used in coding interviews at high-tech companies, and some elementary examples. The success in solving self-contained programming problems demonstrates that an “automated StackOverflow’ is around the corner.

  • ON 58,391 functions taken from open source repositories, the model achieves a 52.3% success rate in creating an alternative implementation from the original documentation of that function, which passes the test. Furthermore, we project that with more computational power the success rate increases to 60%. There are obvious ways in which the success rate can be further improved, even with today’s OpenAI models, by providing the model with more context about the repository that contains the function. These results prove and IDE plugin that helps developers write non-trivial code is not far away.

  • While the OpenAI models are already amazing today, they are further improving at a ferocious pace — just a few weeks ago the success percentage on rewriting arbitrary functions was 43.3%, instead of 52.3%. Where that earlier model needed 150 attempts to find a correct solution, the newer model needs 14. We therefore anticipate applications way beyond mere program synthesis, where developers create new code in an interactive conversation with the model.

We conclude that OpenAI’s technology is poised to change developer tools in fundamental ways. In particular an ‘automated pair programmer’ can be built that puts the collective knowledge of the entire GitHub community at the fingertips of every individual.

Introduction

An automated pair programmer, which puts the collective knowledge of the entire GitHub community at the fingertips of every individual, would transform the software industry. Many aspiring developers who do not have access to adequate training and advice today would instantaneously become productive…

2 thoughts on “An automated pair programmer

  1. shinichi Post author

    Generative AI’s Real Killer App Announces Chat and Enterprise Features

    Software development will never be the same

    by BRET KINSELLA

    https://synthedia.substack.com/p/generative-ais-real-killer-app-announces

    GitHub introduced several new updates to its developer “pair programmer,” Copilot. The news includes an upgrade to using OpenAI’s GPT-4 model, which should improve accuracy. However, three announcements stood out.

    1. GitHub Copilot Chat will be generally available in December. It will integrate natural language interactions broadly across Copilot features.

    2. GitHub is rolling out an Enterprise Copilot service in February that will offer new productivity enhancements throughout the software development lifecycle (SDLC) and even customize the solution to the company’s current code base.

    3. GitHub Workspace will streamline issue resolution.

    Notably, in a blog post announcing the updates, GitHub CEO Thomas Dohmke also made a profound statement that seems to have largely gone unnoticed.

    Just as GitHub was founded on Git, today we are re-founded on Copilot. Open source and Git have fundamentally transformed how we build software. It is now evident that AI is ushering in the same sweeping change, and at an exponential pace. In just a short period, GitHub Copilot has expanded and evolved GitHub into the world’s leading AI-powered developer platform.

    Until recently, terms like “code repository,” “version control,” and “collaboration” were the common descriptions for GitHub’s products. “Developer platform” was unusual.

    Copilot is transforming GitHub’s business. While it was an SDLC utility, it is becoming a platform for the entire lifecycle. Pushing that concept further is another announcement from this week: the GitHub Copilot Partner Program. That effort will likely take some time to mature, but it will complement the features already in place and those coming soon.

    GitHub Copilot is also transforming users’ software development practices and expectations. A video shown during the conference keynote highlighted how Accenture has changed software development since embracing Copilot.

    Dabble Lab founder and CEO Steve Tingiris told me, “From our experience, the gains on developer productivity that AI tools like Copilot and ChatGPT provide are so significant that we no longer hire engineers who haven’t embraced them.”

    GiHub Copilot Chat

    Next month, subscribers will get access to GitHub Copilot chat. This natural language coding assistant will enable users to execute a number of tasks or access useful information. From the blog post:

    Coding is the centerpiece of the software development lifecycle. With GitHub Copilot Chat we’re enabling the rise of natural language as the new universal programming language for every developer on the planet. Whether it’s finding an error, writing unit tests, or helping debug code, Copilot Chat is your AI companion through it all, allowing you to write and understand code using whatever language you speak.

    GitHub found in an earlier analysis that Copilot has a higher impact on productivity for more junior developers. The Chat interface should provide even more leverage for these less-experienced coders and enable everyone to make more complex requests.

    GitHub Copilot Enterprise

    GitHub Copilot Business is the company’s current premium product. It is about augmenting the developer during the software development process. However, Dohmke noted in his blog post that the Enterprise edition will significantly expand the solution feature set:

    Developers often write code only around 2 hours a day, and are bogged down with mundane tasks across the software development lifecycle…Copilot Enterprise allows your teams of developers to quickly get up to speed on your codebase, search through and build documentation, get suggestions based on internal and private code, and quickly review pull requests. Additionally, smart actions, such as the ability to generate pull request summaries, will be available throughout GitHub.

    Enterprise is twice the cost per user as the current Business subscription package. However, it will offer more personalization to the organization and features throughout the software development lifecycle (SDLC). The code review and documentation summaries will likely justify the added expense. Personalization features, if they work, could become viewed as essential for any organization with a large code base developed or supported by large teams.

    GitHub Copilot Workspace

    Also, along the lines of adding features throughout the SDLC, GitHub will introduce Workspace in 2024. Think of it as an automated issue resolution planning, execution, and testing tool.

    When you open an issue in Copilot Workspace, you’re presented with an automatically proposed plan for how to implement the intended change. Because the workspace is fully editable, you’re able to steer the AI in the exact direction you want, while benefiting from its understanding of the issue’s intent, and your entire codebase. To validate that the change behaves as expected, Copilot Workspace enables you to build, run and test the code. And if it encounters an error, it offers to fix it automatically. Copilot Workspace is like a pair programming session with a partner that knows about every inch of the project, and can follow your lead to make repository-wide changes from the issue to the pull request with the power of AI.

    Even though this feature is still several months away, it will clearly be attractive and reduce the time to launch new products and features. It is also a tool that GitHub’s competitors are unlikely to replicate in the near future. This is a platform advantage.

    GitHub Copilot’s Origin

    During his GitHub Universe keynote address, Dohmke also took the time to review the history of GitHub Copilot. This was not a tool that simply burst onto the scene in 2022. GitHub engineers wrote a memo about it in August 2020. While I don’t believe this document has been shared publicly before, Dohmke quoted from it at length, and I think it is instructive in understanding that the generative AI Cambrian explosion was years in the making (By the way, if someone would like to forward me the entire memo, please send a DM or email ).

    An automated pair programmer: fact or fiction?

    Oege de Moor, Alex Gravely, Albert Ziegler – GitHub OCTO. August 31st, 2020

    Executive Summary

    We evaluate the use of OpenAI’s language models trained on source code, for the specific task of Python code synthesis from natural language descriptions.

    Our findings are as follows:

    • Out of 233 hand-crafted programming exercises supplied by 30 GitHub engineers, 93% are successfully solved. The exercises include StackOverflow-type problems involving the use of an unfamiliar package, as well as programming challenges typically used in coding interviews at high-tech companies, and some elementary examples. The success in solving self-contained programming problems demonstrates that an “automated StackOverflow’ is around the corner.
    • ON 58,391 functions taken from open source repositories, the model achieves a 52.3% success rate in creating an alternative implementation from the original documentation of that function, which passes the test. Furthermore, we project that with more computational power the success rate increases to 60%. There are obvious ways in which the success rate can be further improved, even with today’s OpenAI models, by providing the model with more context about the repository that contains the function. These results prove and IDE plugin that helps developers write non-trivial code is not far away.
    • While the OpenAI models are already amazing today, they are further improving at a ferocious pace — just a few weeks ago the success percentage on rewriting arbitrary functions was 43.3%, instead of 52.3%. Where that earlier model needed 150 attempts to find a correct solution, the newer model needs 14. We therefore anticipate applications way beyond mere program synthesis, where developers create new code in an interactive conversation with the model.

    We conclude that OpenAI’s technology is poised to change developer tools in fundamental ways. In particular an ‘automated pair programmer’ can be built that puts the collective knowledge of the entire GitHub community at the fingertips of every individual.

    Introduction

    An automated pair programmer, which puts the collective knowledge of the entire GitHub community at the fingertips of every individual, would transform the software industry. Many aspiring developers who do not have access to adequate training and advice today would instantaneously become productive…

    Dohmke commented:

    We tooks a risk and we built the world’s first at scale AI pair programmer—a novel tool with a large language model before the world was ready. And today, with more than one million paid users across 190 countries, GitHub Copilot is the most widely adopted AI developer tool in history. And from this broad based adoption, we have seen the most stunning evidence of productivity gains sine we go rid of punch cards and assembly language.

    Already, Copilot is making developers 55% faster in coding. A 55% productivity gain is the biggest ever experienced in the first yeaer of a novel developer tool.

    It is worth considering a couple of points illustrated by the memo excerpt and Dohmke’s statement.

    1. GitHub has a runaway success that has delivered a big impact, but it also did not come overnight. GitHub has been working on the Copilot concept for three years. It is still an impressively short time frame to scale. However, it would be a mistake to equate this to products that are new since GPT-3.5 or ChatGPT.

    2. GitHub has a built-in market advantage. The data contained in GitHub’s public repositories alone are invaluable assets for AI model training and fine-tuning and as a knowledge source. In addition, that information is curated and includes crowdsourced metadata that can signal quality.

    3. GitHub Copilot already has a large-scale user base serving millions of developers (free and paid combined) and 37,000 organizations. That usage data can be employed to improve accuracy.

    GitHub Copilot Altneratives

    Tabnine announced a $25 million funding round this week and announced it had one million developers, though that appears to be a mix of paid and free users. Tabnine has also not arrived here overnight. The company was founded five years ago, has gone through an acquisition, and more recently expanded to code generation from a background in code completion. It has the backing of Atlassian, which offers reach and access to data, and it also supports a wide range of IDEs and self-hosting.

    The software developer market is clearly large enough for solutions to succeed beyond Copilot. Tabnine has a significant user base and experience serving developers over time. It would likely be a more formidable competitor if Atlassian acquired Tabnine and Stack Overflow and combined the tooling with the giant information repository.

    Amazon CodeWhisperer and Google Codey are two other obvious competitors. Both have access to large user bases, access to training data, and ease of distribution through their respective cloud platforms. Then there is ChatGPT, the conversational bot with access to the same OpenAI Codex originally used to create GitHub Copilot. However, OpenAI has not yet put an application feature wrapper around it. Today, Copilot is the best positioned to take advantage of the shift to AI-enabled coding. The SDLC feature expansion, personalization, automation, and collaboration are the type of solution footprint that added more value while locking in organizations at the business process level.

    AI Economics

    Satya Nadella, Microsoft’s CEO, said during the 2023 first-quarter earnings call in October 2022 that GitHub was generating $1 billion in annual recurring revenue. The paid Copilot program was sold on an individual basis to developers at the time for $10 per month or $100 per year. It would have only represented around $10 – $20 million in revenue at that point.

    However, the business edition was introduced in early 2023, and over nine months, it has risen to 37,000 organizations paying $19 per developer per month. The business edition is likely generating north of $150 million in annual recurring revenue today. Earlier this year, GitHub’s Dohmke revealed the company has over 100 million users. New Copilot users become GitHub platform users, which drives even more revenue.

    It is reasonable to assume that GitHub could double its user base each of the next three years, and some of that will be at the higher $39 per user per month Enterprise price point. That would likely mean revenue from Copilot will grow to about $300 million in the 2024 fiscal year, $600 million for 2025, and over $1 billion in 2026.

    Of course, there is a cost associated with all of those inference jobs that generate code, summaries, and analysis. The Wall Street Journal reported in early October 2023 that Microsoft is losing an average of $20 per month per user on Copilot and, in extreme cases, $80 per month. Price reductions from OpenAI should reduce inference costs, and the new, higher-priced enterprise product can offset some of those costs with higher monthly revenue.

    Another consideration is that the paying users today are likely skewed toward developers who spend more time coding and, therefore, rack up more inference costs. The enterprise features will likely capture more developers and other SDLC support personnel who don’t use the system as frequently yet still pay for monthly access. It will be interesting to see how the economics ultimately play out, but a fair assumption is that Microsoft sees more value in rapid expansion.

    What’s Next

    https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa071da2e-acc5-448c-b202-15fe02bbf9db_900x1056.png

    GitHub made plenty of interesting announcements that will arrive next month and into 2024. And Satya Nadella joined Dohmke on stage and suggested a few items that might arrive in a year’s time. However, there is another feature that seems like a logical addition within the next year.

    Whether through partners or directly from GitHub, you can expect software development learning and training to arrive in Copilot in 2024. GitHub’s data already show that developers look at Copilot as a tool that helps them learn. In this case, it is teaching by writing the code, correcting it, or explaining it. That is valuable ad hoc learning for developers.

    However, Copilot could also offer more structured instruction either in the moment or as part of an intentional learning program. This will obviously benefit developers and their employers if they are upskilling while delivering software products.

    Generative AI provides a number of benefits to users. That is a key reason so many organizations are willing to pay to use the technology. However, if I were to pick one generative AI application that has delivered the most immediate impact, it would be AI-enabled code development. It is unquestionably generative AI’s first killer app. And GitHub is the company that is shaping the solution space.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *