Introduction to Gemini:

Billed as our largest and most powerful AI model, Gemini is our most flexible model yet – able to run efficiently on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with artificial intelligence.

Gemini 1.0 (our first release) is optimized for three different sizes:

Gemini Ultra – Our largest and most capable model, suitable for highly complex tasks.
Gemini Pro – Our best model that scales for a variety of tasks.
Gemini Nano - Our most efficient on-device task model.

State-of-the-art performance

We have been rigorously testing the Gemini model and evaluating its performance on a variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra's performance exceeds current state-of-the-art results on 30 of 32 widely used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to surpass human experts in MMLU (Massive Multi-Task Language Understanding), which combines 57 subjects including mathematics, physics, history, law, medicine and ethics to Test knowledge and problem-solving skills.

Our new MMLU benchmark method enables Gemini to use its reasoning power to think more carefully before answering difficult questions, resulting in significant improvements over using first impressions alone.

This chart shows Gemini Ultra's performance compared to GPT-4 on common text benchmarks (API numbers calculated with missing reported numbers).

Gemini exceeds state-of-the-art performance on a range of benchmarks including text and encoding.

Gemini Ultra also achieved a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multi-modal tasks that span different domains and require thoughtful reasoning.

On the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models without the help of object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini's innate multimodality and show early signs of Gemini's more complex reasoning abilities.

Please see our Gemini technical report for more details.

This chart shows Gemini Ultra's performance on a multi-mode benchmark compared to GPT-4V. The previous SOTA model lists features that GPT-4V does not support.

Gemini: A large model that leads the era of artificial intelligence to surpass ChatGPT is released! — Gemini exceeds state-of-the-art performance across a range of multi-mode benchmarks.

As can be seen in the figure, Gemini surpasses state-of-the-art performance on a series of multi-mode benchmarks.

next generation features

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of those features. These models are sometimes good at certain tasks, such as describing images, but struggle with more conceptual and complex reasoning.

We designed Gemini to be natively multi-modal, pre-trained for different modalities from the beginning. We then fine-tune it using additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about a variety of inputs from scratch, far better than existing multi-modal models – and its capabilities are state-of-the-art in almost every domain.

Learn more about Gemini's features and understand how it works.

complex reasoning

Gemini 1.0's sophisticated multimodal reasoning capabilities help understand complex written and visual information. This gives it a unique ability to discover indiscernible knowledge in large amounts of data.

Its remarkable ability to extract insights from hundreds of thousands of documents by reading, filtering and understanding information will help enable new breakthroughs at digital speed in many fields from science to finance.

Gemini opens up new scientific insights.

Understand text, images, audio, and more
Gemini 1.0 is trained to simultaneously recognize and understand text, images, audio, and more, so it can better understand subtle information and answer questions about complex topics. This makes it particularly good at explaining reasoning in complex subjects like mathematics and physics.

Gemini explains mathematical and physical reasoning.

advanced coding

Our first version of Gemini understands, interprets and generates high-quality code in the world's most popular programming languages such as Python, Java, C++ and Go. Its ability to work across languages and reason about complex information makes it one of the world's leading coding-based models.

Gemini Ultra performs well on several coding benchmarks, including HumanEval (an important industry standard for evaluating performance on coding tasks) and Natural2Code (our internally kept dataset that uses author-generated sources rather than web-based information.

Gemini can also be used as an engine for more advanced encoding systems. Two years ago, we launched AlphaCode, the first artificial intelligence code generation system to achieve competitive performance levels in programming competitions.

Using a specialized version of Gemini, we created AlphaCode 2, a more advanced code generation system that excels at solving competitive programming problems that go beyond coding and involve complex mathematics and theoretical computer science.

Gemini excels at coding and competitive programming.

When evaluated on the same platform as the original AlphaCode, AlphaCode 2 shows a huge improvement, solving almost twice the number of problems, and we estimate that it outperforms the competition participant 85%, while AlphaCode 2 The ratio is close to 50%. When programmers work with AlphaCode 2 by defining certain properties for code examples, it performs better.

We’re excited that programmers are increasingly using powerful AI models as collaboration tools to help them reason about problems, propose code designs, and assist with implementation, so they can ship applications faster and design better services.

Please see our AlphaCode 2 technical report for more details.

More reliable, scalable and efficient
We use Google’s in-house designed tensor processing units (TPU) v4 and v5e in our AI Large-scale training of Gemini 1.0 on optimized infrastructure. We designed it to be the most reliable and scalable training model, as well as the most efficient service model.

On the TPU, the Gemini runs significantly faster than earlier, smaller, and less powerful models. These custom-designed AI accelerators have been a Google AI product

At their core, these products serve billions of users in Search, YouTube, Gmail, Google Maps, Google Play and Android. They also enable companies around the world to cost-effectively train large-scale AI models.

Today, we are announcing Cloud TPU v5p, the most powerful, efficient, and scalable TPU system to date, designed specifically for training cutting-edge AI models. This next-generation TPU will accelerate the development of Gemini, help developers and enterprise customers train large-scale generative AI models faster, and allow new products and capabilities to reach customers faster.

Built with responsibility and safety at the core

At Google, we're committed to advancing bold and responsible artificial intelligence in everything we do. Based on Google's AI principles and the strong security policies in our products, we are adding new protections to account for Gemini's multi-modal capabilities. At every stage of development, we consider potential risks and work to test and mitigate them.

Gemini has the most comprehensive safety assessment of any Google AI model to date, including bias and toxicity. We conducted novel research into potential risk areas such as cyberattacks, persuasion, and autonomy, and applied Google Research's best-in-class adversarial testing technology to help identify critical security issues prior to Gemini deployment.

To identify blind spots in our internal assessment methods, we are working with a variety of external experts and partners to stress-test our models on a range of issues.

To diagnose content safety issues during the Gemini training phase and ensure that its output complies with our policies, we use benchmarks such as Real Toxicity Tips, a set of 100,000 tips of varying degrees of toxicity extracted from the network, developed by Developed by experts at the Allen Institute. For artificial intelligence. More details about this effort are coming soon.

To limit harm, we have built dedicated safety classifiers to identify, flag and curate content that involves violence or negative stereotypes. Combined with powerful filters, this layered approach is designed to make Gemini safer and more inclusive for everyone. Additionally, we are continuing to address known challenges to the model, such as factuality, foundationality, attribution, and corroboration.

Responsibility and safety are always at the core of the development and deployment of our models. This is a long-term commitment that needs to be built collaboratively, so we are working with industry and the wider ecosystem through MLCommons, the Frontier Model Forum and its AI Security Fund, and our Secure Artificial Intelligence Framework (SAIF), which aims to help mitigate Security risks unique to public and private sector AI systems. As we develop Gemini, we will continue to collaborate with researchers, governments and civil society groups around the world.

Let Gemini go to the world

Gemini 1.0 is now available on a range of products and platforms:

Gemini Pro among Google products
We bring Gemini to billions of people through Google products.

Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding, and more. This is the biggest upgrade to Bard since its launch. It will be available in English in more than 170 countries and regions, and we plan to expand to different modes and support new languages and locations in the near future.

We're also bringing Gemini to Pixel. Pixel 8 Pro is the first smartphone to run Gemini Nano, which supports new features such as Summarize in the Recorder app, and launches Smart Reply in Gboard starting with WhatsApp, with more messaging apps to come next year.

In the coming months, Gemini will appear in more of our products and services, such as Search, Ads, Chrome, and Duet AI.

We've begun piloting Gemini in search, which makes the search generation experience (SGE) faster for users, with 40% less latency in US English and improved quality.

Building with Gemini

Starting December 13, developers and enterprise customers can access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Google AI Studio is a free web-based developer tool that allows you to quickly prototype and launch applications using API keys. When a fully managed AI platform is required, Vertex AI allows Gemini to be customized, provides comprehensive data control, and benefits from additional Google Cloud capabilities for enterprise security, security, privacy, and data governance and compliance.

Android developers can also build with Gemini Nano, our most efficient on-device task model, through AICore, a new system feature available in Android 14, starting with Pixel 8 Pro devices. Sign up to get an early preview of AICore.

Gemini is coming soon

For Gemini Ultra, we are currently completing extensive trust and security checks, including red team checks by trusted external parties, and further refining the model using fine-tuning and reinforcement learning based on human feedback (RLHF) before widespread use.

As part of this process, we will make Gemini Ultra available to select customers, developers, partners, and security and liability experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.

Early next year, we'll also launch Bard Advanced, a new cutting-edge AI experience that gives you access to our best models and features, starting with Gemini Ultra.

The Age of Gemini: Unlocking the Future of Innovation

This is an important milestone in the development of artificial intelligence and the beginning of a new era for us at Google, where we will continue to innovate quickly and responsibly improve the capabilities of our models.

We've made great progress on Gemini so far, and we're working to further expand its capabilities in future versions, including advancements in planning and memory, as well as increasing context windows to process more information to provide better responses.

We’re excited about the amazing possibilities of a world powered by AI—a future of innovation that will enhance creativity, expand knowledge, advance science, and transform the way billions of people live and work around the world .

Reading reference:

https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf

https://cloud.google.com/vertex-ai

https://deepmind.google/technologies/gemini/

Original article, author: Chief Security Officer, if reprinted, please indicate the source: https://cncso.com/en/google-gemini-ai-mega-model-surpasses-chatgpt-on-all-fronts.html