Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog

Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog

Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog


Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text.

Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual errors, so wary readers may want to know if generative AI tools have been used to ghostwrite news articles or other sources before trusting them.

What can teachers and consumers do? Existing tools to detect AI-generated text sometimes do poorly on data that differs from what they were trained on. In addition, if these models falsely classify real human writing as AI-generated, they can jeopardize students whose genuine work is called into question.

Our recent paper introduces Ghostbuster, a state-of-the-art method for detecting AI-generated text. Ghostbuster works by finding the probability of generating each token in a document under several weaker language models, then combining functions based on these probabilities as input to a final classifier. Ghostbuster doesn’t need to know what model was used to generate a document, nor the probability of generating the document under that specific model. This property makes Ghostbuster particularly useful for detecting text potentially generated by an unknown model or a black-box model, such as the popular commercial models ChatGPT and Claude, for which probabilities aren’t available. We’re particularly interested in ensuring that Ghostbuster generalizes well, so we evaluated across a range of ways that text could be generated, including different domains (using newly collected datasets of essays, news, and stories), language models, or prompts.

Read more

MIT launches new Music Technology and Computation Graduate Program | MIT News

MIT launches new Music Technology and Computation Graduate Program | MIT News

A new, multidisciplinary MIT graduate program in music technology and computation will feature faculty, labs, and curricula from across the Institute. The program is a collaboration between the Music and Theater Arts Section in the School of Humanities, Arts, and Social Sciences (SHASS) and the School of Engineering. Faculty for the program share appointments between the … Read more

Gemma Scope: helping the safety community shed light on the inner workings of language models

Gemma Scope: helping the safety community shed light on the inner workings of language models

Technologies Published 31 July 2024 Authors Language Model Interpretability team Announcing a comprehensive, open suite of sparse autoencoders for language model interpretability. To create an artificial intelligence (AI) language model, researchers build a system that learns from vast amounts of data without human guidance. As a result, the inner workings of language models are often … Read more

Rohit Aggarwal, COO at DecisionNext – Interview Series

Rohit Aggarwal, COO at DecisionNext – Interview Series

Rohit Aggarwal is Chief Operating Officer at DecisionNext, a leading AI platform that enables companies to optimize the buying or selling of commodities at the best possible time and price. He leverages a strong background in supply chain and product management as well as experience directly leading very large teams to execute complex multi-disciplinary projects and … Read more

How Yatter AI Enhances Healthy Food Choices

How Yatter AI Enhances Healthy Food Choices

Introduction  In today’s fast-paced world, choosing healthy food choices has become more essential. With so many goods on grocery store shelves, it can be difficult to understand nutrition labels. Fortunately, technological improvements have created opportunities for innovative solutions to this problem. One such solution is Yatter , an innovative AI tool that simplifies the process … Read more

Researchers from Moore Threads AI Introduce TurboRAG: A Novel AI Approach to Boost RAG Inference Speed

Researchers from Moore Threads AI Introduce TurboRAG: A Novel AI Approach to Boost RAG Inference Speed

High latency in time-to-first-token (TTFT) is a significant challenge for retrieval-augmented generation (RAG) systems. Existing RAG systems, which concatenate and process multiple retrieved document chunks to create responses, require substantial computation, leading to delays. Repeated computation of key-value (KV) caches for retrieved documents further exacerbates this inefficiency. As a result, RAG systems struggle to meet … Read more

The Shift from Models to Compound AI Systems – The Berkeley Artificial Intelligence Research Blog

The Shift from Models to Compound AI Systems – The Berkeley Artificial Intelligence Research Blog


AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on models as the primary ingredient in AI application development, with everyone wondering what capabilities new LLMs will bring.
As more developers begin to build using LLMs, however, we believe that this focus is rapidly changing: state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.

For example, Google’s AlphaCode 2 set state-of-the-art results in programming through a carefully engineered system that uses LLMs to generate up to 1 million possible solutions for a task and then filter down the set. AlphaGeometry, likewise, combines an LLM with a traditional symbolic solver to tackle olympiad problems. In enterprises, our colleagues at Databricks found that 60% of LLM applications use some form of retrieval-augmented generation (RAG), and 30% use multi-step chains.
Even researchers working on traditional language model tasks, who used to report results from a single LLM call, are now reporting results from increasingly complex inference strategies: Microsoft wrote about a chaining strategy that exceeded GPT-4’s accuracy on medical exams by 9%, and Google’s Gemini launch post measured its MMLU benchmark results using a new CoT@32 inference strategy that calls the model 32 times, which raised questions about its comparison to just a single call to GPT-4. This shift to compound systems opens many interesting design questions, but it is also exciting, because it means leading AI results can be achieved through clever engineering, not just scaling up training.

In this post, we analyze the trend toward compound AI systems and what it means for AI developers. Why are developers building compound systems? Is this paradigm here to stay as models improve? And what are the emerging tools for developing and optimizing such systems—an area that has received far less research than model training? We argue that compound AI systems will likely be the best way to maximize AI results in the future, and might be one of the most impactful trends in AI in 2024.

Read more

Helping robots zero in on the objects that matter | MIT News

Helping robots zero in on the objects that matter | MIT News

Imagine having to straighten up a messy kitchen, starting with a counter littered with sauce packets. If your goal is to wipe the counter clean, you might sweep up the packets as a group. If, however, you wanted to first pick out the mustard packets before throwing the rest away, you would sort more discriminately, … Read more

Mapping the misuse of generative AI

Mapping the misuse of generative AI

Responsibility & Safety Published 2 August 2024 Authors Nahema Marchal and Rachel Xu New research analyzes the misuse of multimodal generative AI today, in order to help build safer and more responsible technologies Generative artificial intelligence (AI) models that can produce image, text, audio, video and more are enabling a new era of creativity and … Read more

Rohit Aggarwal, COO at DecisionNext – Interview Series

The Financial Challenges of Leading in AI: A Look at OpenAI’s Operating Costs

OpenAI is currently facing significant financial challenges. For example, in 2023, it was reported that to maintain its infrastructure and run its flagship product, OpenAI pays around $700,000 per day. However, in 2024, the company’s total spending on inference and training could reach $7 billion, driven by increasing computational demands. This large operational cost highlights … Read more