Jennifer Story
PROJECT
CAVE
The collaborative multi-device audio/video experiences
with AI-assisted composition
My Contribution
Conducted UX research, development, and prototype
As a lead designer, worked with the project leader, SW engineers, and design intern
Duration
Jan ~ May 2023 (Phase 1)
Project Introduction
Offer the best flexibility and quality with the use of multiple devices for content creations
Phone cameras have greatly improved over the years, especially in how they handle light, colors, and details. Users can enjoy different features like HDR, portrait, 3D, low-light photography, and zoom. These features help users create high-quality content in different situations and environments.
Another trendy feature for users is the use of multiple-cameras-users in smartphones, which gives users more options and perspectives to capture the world around them. We aim to offer the best flexibility and quality for content creation based on user needs and challenges.
Impact
Building an ecosystem for synchronized multi-device video experiences and implementing AI-powered composition
Increasing workflow efficiency of multiple device usage by building an ecosystem.
Enables content creation to be more flexible and collaborative with minimal effort.
Catering to more users on their everyday use devices enables building brand advocacy, and generating revenue for gaining competitive advantages
UX Research
My responsibilities are to define target users, and analysis market competitors to set the project UX goal and scope.
Used the generative research methods
Define Users,
Brainstormed people's behaviors who most use multiple devices and why they use multiple devices
Defined and narrowed down the specific users
Understand Users,
Conducted the field survey and focus- interviewed them to understand their real pain points and challenges
Analysis Competitors
Analyzed multi-view streaming competitors for the market landscape
-
Phone manufacturers with their native App
-
3rd party software applications
My findings (SWOT):
1. Strength: users use multi-mobile devices for tasks requiring productivity, creativity, collaboration, and communication across devices.
2. Weakness: less intuitive data sharing interface across devices, and consumes a huge time editing and maintaining content
3. Opportunity: S***** sells the most phones with good cameras in Europe and Asia
4. Threats: how to align users' intention on AI-generated outcome more accurately
Set UX Goal
Build effortless experiences
for the use of multiple devices and the flexibility of content creation
-
Build an ecosystem that supports content creation by simplifying the steps involved in syncing multi-devices, streamlining the process of collaboration and outputting the final content
-
Provide creative freedom, and flexibility with minimal efforts
Acknowledge user's desire while leveraging AI power in editing capability
-
While leveraging AI-powered editing, how to ensure the final output aligns with the user's desired narrative or intention.
Define UX Scope
UX Development Process
Set Design Principals
Build multiple devices with collaborative steps for an ecosystem of Audio/Video Experience
-
Connectivity
Synchronize co-location and data from multi-devices-users for collaboration -
Capture
Enable real-time scenes and audio sharing from collaborators aligning capture settings -
Edit & Composition
Minimize efforts in video editing for the best storytelling
Capture
Real-time sharing & align capture setting
Effortless
content creation
Connectivity
Synchronized data
Ecosystem
CAVE
Edit & Composition
Discovery Collaborators
-
Multi-users-devices connectivity
-
Awareness of device relative position
Capture Together
-
Real-time camera feed sharing
-
Collaborative capture the moment
AI Composition
-
Raw footage sharing among collaborators
-
Effortless AI-assisted composition
-
Revision based on preferences
Created User Flow
Approach to Problems I & II
Develop Key features & Prototype
Discovery Collaborators
Multi-users-devices connectivity
-
Identify participants with existing contact lists (Tel# or email)
-
Find a nearby presence with peer to peer connection, not cloud-based, that allows collaborative video capturing
-
Create participant groups by connecting frequently
Awareness of device relative position
-
Wireless camera synchronization
-
Location data provides relative directional cues of each collaborators
-
Identify the viewpoints of the camera including orientation, white balance, tone, etc.
Capture Together
Switch camera feed among collaborators in real-time
-
Preview other camera feeds while taking video
-
Align with screen settings while collaboration
-
Capture the spatial audio
Synchronize the capture setting
-
Guidance for the screen orientation, setting for stylized tone and white balance
-
My screen
AI Composition
Seamless transfer of raw footage
-
Raw footage is automatically shared and accessed to preview among collaborators
-
Store all footage from collaborators on the device with the original quality
Effortless AI-assisted composition
-
AI/rule-based composition by finding the best scenes, bookmarking by users' interests, and adapting the user's device resources such as own personal photo / video data in the gallery
-
Use semantic information from capture time for composition
-
Identify and remove obstruction in the field of view and background
Perfect outcomes and manual Iteration
-
Create a perfect storytelling version
-
Offer manual editing revision for user preference
AI-assisted composition
-
Top track: AI-assisted video
-
Raw footage: below 4 tracks are raw footage from other collaborators
Manual Iteration
-
Manual edit available by preference
-
Cut and grab into my track from any footage
Proposed advanced UX solutions
Capture assistance
-
Auto-reframe the angle/posture by tracking an object/person and audio context
-
Enhance capture behavior and engage the user's attention on the interests constantly without requiring manual adjustment
-
Functions for smart directing among collaborative multi-cameras
-
Possible use cases: Live shopping, online cooking class, vlogger
Auto reframe by voice context
Camera B
Camera A
Camera C
Auto-reframe by tagging of the object/person
Proposed extended user scenarios on top of CAVE framework
Dynamic content Creation for brand advertising, showcase, and campaign.
-
Switch effortlessly between various perspective shots combining drone, 360 camera and smartphone close-up or wide-angle shots.
-
Live collaboration between other influencers or audiences
Cinematographic with AI-driven stitch
-
Create an extended filmmaking and storytelling
-
Smooth transition of the different angles simultaneously by filling in gaps or overlapping the transition of sequence
-
Synthesize the missing portions of the scene to create a visual content coherently
-
Involving the motion tracking, content generation and style matching
2D static convert to 3D spatial next-generation video
-
Blurring the lines between photography and videography
-
3d depth for perspective, relative position, and object size for animating and interactivity
-
Use cases for product 3D-visualization, architectural walk-through, interactive storytelling
Approach to Problem III
Feasibility Study (Tech team)
Problems
- How to decide what will be shown from multiple camera feeds at capture time?
- How to interpret what is important from different viewpoints and focus?
- How to automatically create a video composition?
Interviewer & Interviewee
People Dance
Reaction and body movement for indicating what is attentioned
Audio cues for indicating who is talking
Solutions
-
AI finds and combines the best scenes and audio sources from footage to create the perfect storytelling version
-
User attention prediction based on the user's on-device metadata
(photo gallery, frequent contacts)
-
Use the front camera to detect gaze prediction by monitoring on-screen users' gazes and facial expression
-
Monitor the user's action of switching views, pause, device shake, zoom in/out, and user location change during capture time
-
On-chip semantic segmentation from capture time
-
Shot clustering based on shot type, object of interest, and learning user patterns
Outputs Prototype
AI-assisted Composition
AI captures holistic moments with greater creative freedom with collaborative Audio Video Experience
Device 1
Device 2
Device 3
AI-composition
Success Metrics & Result
- Humans can edit videos for 20 minutes for 52 seconds of video output
- AI optimization for 2 seconds, clip stitching for 40 seconds, and facial appearance reward calculation for 13 seconds
total less than 1 min.
Conclude
CAVE Product Values
Value to the business
-
Building a seamless multi-device ecosystem can enable users to create, share, and enjoy smart edit content
-
Improve user personalization and engagement with advanced camera HW, connectivity, and AI-generative technology
-
Catering to Gen-Z's preferences on their everyday use device for user-generated content creation is boosting brand loyalty and gaining competitive advantages.
Value to the Users
-
Simplify the interactive process of creating, analyzing, and editing professional-quality content for different purposes and audiences
-
Create an ecosystem that enables real-time collaboration among multiple cameras that are physically separated
3D video