top of page
CAVE_bg02_edited.jpg

PROJECT

CAVE

The collaborative multi-device audio/video experiences
with AI-assisted composition

My Contribution

Conducted UX research, development, and prototype

As a lead designer, worked with the project leader, SW engineers, and design intern

Duration

Jan ~ May 2023 (Phase 1)

Project Introduction

Offer the best flexibility and quality with the use of multiple devices for content creations

Phone cameras have greatly improved over the years, especially in how they handle light, colors, and details. Users can enjoy different features like HDR, portrait, 3D, low-light photography, and zoom. These features help users create high-quality content in different situations and environments.

Another trendy feature for users is the use of multiple-cameras-users in smartphones, which gives users more options and perspectives to capture the world around them. We aim to offer the best flexibility and quality for content creation based on user needs and challenges.

Impact
​Building an ecosystem for synchronized multi-device video experiences and implementing AI-powered composition 

Increasing workflow efficiency of multiple device usage by building an ecosystem. 

Enables content creation to be more flexible and collaborative with minimal effort.

Catering to more users on their everyday use devices enables building brand advocacy, and generating revenue for gaining competitive advantages

UX Research

My responsibilities are to define target users, and analysis market competitors to set the project UX goal and scope.

Used the generative research methods

Define Users,

Brainstormed people's behaviors who most use multiple devices and why they use multiple devices

discover Users.jpg

Defined and narrowed down the specific users

UX research01.png
Understand Users,

Conducted the field survey and focus- interviewed them to understand their real pain points and challenges

UX research02.png
Analysis Competitors

Analyzed multi-view streaming competitors for the market landscape

  • Phone manufacturers with their native App

  • 3rd party software applications

 

 

My findings (SWOT): 

1. Strength: users use multi-mobile devices for tasks requiring productivity, creativity, collaboration, and communication across devices.
 

2. Weakness: less intuitive data sharing interface across devices, and consumes a huge time editing and maintaining content 
 

3. Opportunity: S***** sells the most phones with good cameras in Europe and Asia
 

4. Threats: how to align users' intention on AI-generated outcome more accurately

UX research03.png
Set UX Goal
Build effortless experiences
for the use of multiple devices and the flexibility of content creation

 
  • Build an ecosystem that supports content creation by simplifying the steps involved in syncing multi-devices, streamlining the process of collaboration and outputting the final content

  • Provide creative freedom, and flexibility with minimal efforts

Acknowledge user's desire while leveraging AI power in editing capability
 

  • While leveraging AI-powered editing, how to ensure the final output aligns with the user's desired narrative or intention.

Define UX Scope
UX scope.jpg
UX Development Process
Set Design Principals

Build multiple devices with collaborative steps for an ecosystem of Audio/Video Experience

  • Connectivity
    Synchronize co-location and data from multi-devices-users for collaboration

  • Capture
    Enable real-time scenes and audio sharing from collaborators aligning capture settings

  • Edit & Composition
    Minimize efforts in  video editing for the best storytelling

Capture

Real-time sharing & align capture setting

Effortless

content creation

Connectivity

Synchronized data

Ecosystem

CAVE

Edit & Composition

Slide15.jpg

Discovery Collaborators

  • Multi-users-devices connectivity

  • Awareness of device relative position

Capture Together

  • Real-time camera feed sharing

  • Collaborative capture the moment

AI Composition

  • Raw footage sharing among collaborators

  • Effortless AI-assisted composition

  • Revision based on preferences

Created User Flow
Approach to Problems I & II

  Develop Key features & Prototype

Discovery Collaborators

Multi-users-devices connectivity

  1. Identify participants with existing contact lists (Tel# or email)

  2. Find a nearby presence with peer to peer connection, not cloud-based, that allows collaborative video capturing

  3. Create participant groups by connecting frequently

 

Awareness of device relative position

  1. Wireless camera synchronization

  2. Location data provides relative directional cues of each collaborators

  3. Identify the viewpoints of the camera including orientation, white balance, tone, etc.

Samsung Galaxy Note20 5G.png
discoveryCollaborators01.png
Samsung Galaxy Note20 5G.png
discoveryCollaborators02.png

Capture Together

Switch camera feed among collaborators in real-time

  1. Preview other camera feeds while taking video

  2. Align with screen settings while collaboration

  3. Capture the spatial audio

Synchronize the capture setting

  1. Guidance for the screen orientation, setting for stylized tone and white balance

Samsung Galaxy Note20 5G.png
capture02.png
Samsung Galaxy Note20 5G.png
capture01.png
  • My screen

AI Composition

Seamless transfer of raw footage

  1. Raw footage is automatically shared and accessed to preview among collaborators

  2. Store all footage from collaborators on the device with the original quality

 

Effortless AI-assisted composition

  1. AI/rule-based composition by finding the best scenes, bookmarking by users' interests, and adapting the user's device resources such as own personal photo / video data in the gallery

  2. Use semantic information from capture time for composition

  3. Identify and remove obstruction in the field of view and background

Perfect outcomes and manual Iteration

  1. Create a perfect storytelling version

  2. Offer manual editing revision for user preference

Samsung Galaxy Note20 5G.png
AI composition01.png

AI-assisted composition

  • Top track: AI-assisted video

  • Raw footage: below 4 tracks are raw footage from other collaborators

Samsung Galaxy Note20 5G.png
AI composition02.png

Manual Iteration

  • Manual edit available by preference

  • Cut and grab into my track from any footage

Proposed advanced UX solutions

Capture assistance

  • Auto-reframe the angle/posture by tracking an object/person and audio context 

  • Enhance capture behavior and engage the user's attention on the interests constantly without requiring manual adjustment

  • Functions for smart directing among collaborative multi-cameras

  • Possible use cases: Live shopping, online cooking class, vlogger

Auto reframe by voice context

reframebyvoice.png

Camera B

reframebyvoice02.png

Camera A

reframebyvoice03.png
reframevoice05.png

Camera C

Auto-reframe by tagging of the object/person

object reframe01.png
object reframe03.png
object reframe02.png
Proposed extended user scenarios on top of CAVE framework
Scenario 1.jpg

Dynamic content Creation for brand advertising, showcase, and campaign.

  • Switch effortlessly between various perspective shots combining drone, 360 camera and smartphone close-up or wide-angle shots.

  • Live collaboration between other influencers or audiences

Cinematographic with AI-driven stitch

  • Create an extended filmmaking and storytelling

  • Smooth transition of the different angles simultaneously by filling in gaps or overlapping the transition of sequence

  • Synthesize the missing portions of the scene to create a visual content  coherently

  • Involving the motion tracking, content generation and style matching

Scenario 2.jpg
Scenario 3.jpg

2D static convert to 3D spatial next-generation video

  • Blurring the lines between photography and videography

  • 3d depth for perspective, relative position, and object size for animating and interactivity

  • Use cases for product 3D-visualization, architectural walk-through, interactive storytelling 

Approach to Problem III
   
    Feasibility Study (Tech team)
Problems

- How to decide what will be shown from multiple camera feeds at capture time?

- How to interpret what is important from different viewpoints and focus?

- How to automatically create a video composition?

Slide25.jpg

Interviewer & Interviewee

People Dance

Reaction and body movement for indicating what is attentioned

Audio cues for indicating who is talking

Solutions
  • AI finds and combines the best scenes and audio sources from footage to create the perfect storytelling version
     

  • User attention prediction based on the user's on-device metadata
    (photo gallery, frequent contacts)

     

  • Use the front camera to detect gaze prediction by monitoring on-screen users' gazes and facial expression
     

  • Monitor the user's action of switching views, pause, device shake, zoom in/out, and user location change during capture time
     

  • On-chip semantic segmentation from capture time
     

  • Shot clustering based on shot type, object of interest, and learning user patterns 

AI rule based.png
Outputs Prototype
AI-assisted Composition

 

AI captures holistic moments with greater creative freedom with collaborative Audio Video Experience

Device 1

Device 2

Device 3

AI-composition

Success Metrics & Result

Humans can edit videos for 20 minutes for 52 seconds of video output

- AI optimization for 2 seconds, clip stitching for 40 seconds, and facial appearance reward calculation for 13 seconds

total less than 1 min.

Conclude

CAVE Product Values
Value to the business
  • ​Building a seamless multi-device ecosystem can enable users to create, share, and enjoy smart edit content

  • Improve user personalization and engagement with advanced camera HW, connectivity, and AI-generative technology 

  • Catering to Gen-Z's preferences on their everyday use device for user-generated content creation is boosting brand loyalty and gaining competitive advantages.

Value to the Users
  • Simplify the interactive process of creating, analyzing, and editing professional-quality content for different purposes and audiences

  • Create an ecosystem that enables real-time collaboration among multiple cameras that are physically separated

3D video

WORK LISTS
bottom of page