Fictions.ai
Posts
AI Weekly Roundup #2: Discover What You May Have Missed this week

AI Weekly Roundup #2: Discover What You May Have Missed this week

New toys: One better than Stable Diffusion and one better than GPT4-Vision?

Thibaud Zamora
December 03, 2023

What You May Have Missed this week:

A powerful Fullstack solution for AI Video: Pika!
An opensource competitor for StableDiffusion: PixArt-α
CogVLM : The OpenSource version of GPT4-vision (even better?)
Screenshot-to-code: a groundbreaking tool
StableDiffusion: Turbo
Our latest tutorial

1 Pika's Game-Changing AI Model Set to Revolutionize Film Making!

I'm thrilled to share some electrifying news from @pika_labs, a company that's been a cornerstone in our AI filmmaking toolkit for the past three months.

Their latest teaser reveals a groundbreaking new model that promises to be a seismic shift for AI filmmakers everywhere. What's New? Pika's upcoming model isn't just an upgrade; it's a revolution in AI-assisted video production. Having extensively used their previous version, which was already the state of the art, I can confidently say that this new release is set to redefine our expectations.

Behind the Scenes: Pika's journey is as impressive as its technology. With a recent funding of $55 million, valuing the company at $200-250 million, they've shown that big things come in small packages. The most astonishing part? Their team is currently just four incredibly talented individuals, some of whom were students or interns just a few months ago. Their plan to expand to a 20-person team in 2024 only adds to the excitement.

Reflecting on the Impact: The question that lingers in my mind is, "If Pika has achieved such remarkable feats with only four team members, what will they accomplish with $55 million and a 20-person team?" 2024 is shaping up to be an extraordinary year for AI in filmmaking, thanks to Pika's innovation.

Pika's journey is a testament to the power of small teams with big dreams. Their success story is an inspiration for startups and innovators worldwide. I can't wait to see how their new model will transform the landscape of AI-assisted filmmaking.

2 PixArt-α is not just another text-to-image model; it's a revolutionary leap forward.

Developed as a Transformer-based diffusion model, its image generation quality rivals top-tier models like Imagen, SDXL, and even Midjourney, achieving near-commercial standards. But what truly sets it apart?

1 Innovative Training Strategy: It breaks down the training process into distinct steps, optimizing pixel dependency, text-image alignment, and image aesthetic quality. This methodical approach enhances image quality and semantic control, vital for nuanced and detailed artworks.

2 Efficient Text-to-Image Transformer: Incorporating cross-attention modules, it streamlines the complex class-condition branch, focusing on the effective integration of text conditions into the visual output.

3 High-Quality Data Utilization: Emphasizing concept density, PixArt-α uses a large Vision-Language model to auto-label dense pseudo-captions, aligning text and image learning more accurately.

The result? A staggering reduction in training time and costs. PixArt-α requires only 10.8% of the training time of its contemporaries like Stable Diffusion v1.5, translating into massive savings and a significant drop in CO2 emissions.

It ensures that the AI's output is not just artistic but also precisely aligned with the creator's vision.

In a world where AI-generated art is gaining prominence, PixArt-α stands out for its efficiency, quality, and sustainability. It empowers creators and startups to build their high-quality, low-cost generative models, democratizing access to advanced AI technology.

3 CogVLM : The OpenSource version of GPT4-vision (even better!)

CogVLM: The Open-Source AI Revolutionizing How We See and Understand Images

Have you heard about CogVLM? It's an open-source visual language model that's taking the AI world by storm. Imagine a technology that not only understands images but also describes them as accurately as a human might, and sometimes even better. CogVLM does exactly that, and it's accessible to everyone. It's like having the power of advanced AI vision and language understanding right at your fingertips.

What makes CogVLM truly unique is its blend of visual and language processing abilities. It's designed to understand and answer a wide range of questions, making it incredibly versatile. Whether it's for personal use, business applications, or academic research, CogVLM is a tool that brings the future of AI into the present.

Here are some mindblowing uses for a vision tool like CogVLM:

Education and Training: Creating interactive, visual-based learning tools for subjects like biology, engineering, or art.
Accessible Technology: Assisting visually impaired individuals by describing their surroundings or reading text from images.
Retail Experience: Enhancing online shopping by offering virtual try-on features or suggesting products based on visual preferences.
Art Mentor: Artists could upload their artworks, and CogVLM could analyze the images, providing feedback on composition, use of color, and technique. It could compare the work with a vast database of art to offer constructive criticism or suggest areas for improvement.
Annotate images: Automatically caption tons of images to create better datasets for machine learning (like stable-diffusion and other text2image, text2video tools)
Style Learning and Adaptation: The tool could help artists learn about different art styles by analyzing and explaining the key features of various art movements or famous artworks.

4 Screenshot-to-code: a groundbreaking tool

This remarkable application is redefining efficiency and creativity in coding. By simply dropping a screenshot, it converts the image into pristine HTML, Tailwind CSS, and JS code.

What sets it apart? It's the power of GPT-4 Vision and DALL-E 3 at work. These AI giants not only generate code but also produce similar-looking images, making it a perfect blend of accuracy and aesthetics. Imagine the time and effort saved for developers and designers!

But there's more. The app's functionality extends beyond static images. It allows users to input a URL to clone a live website. And for any minor mistakes or missed sections, the AI can be instructed to update the code accordingly.

The technology behind this is as impressive as its output. With a React/Vite frontend and a FastAPI backend, it's a testament to the potential of open-source collaboration and AI integration in software development.

And with CogVLM and PixArt-α or Stable Diffusion, we can do a full open-source version of it now!

5 StableDiffusion: Turbo

Big news in the world of artificial intelligence! Stability AI recently unveiled its latest innovation - the SDXL Turbo. This cutting-edge technology marks a significant leap in AI efficiency, setting a new benchmark in the industry. “Realtime generation”

6 Our latest tutorial:

Fun moment: the “What if” video (Parody of iconic characters):

AI moves fast – don't get left behind! Let us be your guide.

Follow us on social networks: