Files

김보곤 397b3ba711 docs:음성입력 STT 가이드 v1.1 - Alpine.js 구현 패턴 추가

- 영업 전략 시나리오 / 매니저 상담 프로세스 STT 개선 내용 반영
- Alpine.js vs React 구현 비교표
- Alpine.js startSpeechRecognition() 코드 + 프리뷰 패널 Blade 코드
- 영업 시나리오 추가 기능 (음성 녹음, 파형 시각화, GCS 백업, 재생)
- 데이터 흐름도 (MediaRecorder + STT + 서버 저장)
- onend 자동 재시작 패턴 (긴 녹음 대응)
- 참조 구현 파일 목록 확장

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-10 09:20:49 +09:00

34 KiB

Raw Blame History

음성 입력(STT) 기술 가이드

문서 버전: 1.1 작성일: 2026-02-10 적용 페이지: 공사현장 사진대지, 영업 전략 시나리오, 매니저 상담 프로세스 대상 프로젝트: MNG (React 18 + Alpine.js)

1. 개요

1.1 목적

텍스트 입력 필드(input, textarea)에 마이크 버튼을 배치하여, 사용자가 음성으로 텍스트를 입력할 수 있게 하는 브라우저 내장 STT(Speech-to-Text) 기능.

1.2 기술 선택

방식	비용	정확도	지연	채택
Web Speech API (브라우저 내장)	무료	높음 (Google STT 엔진)	실시간	채택
Google Cloud STT API	유료 ($0.006/15초)	매우 높음	서버 왕복	미채택
Whisper (OpenAI)	유료 ($0.006/분)	매우 높음	서버 왕복	미채택

선택 이유: 브라우저 내장 Web Speech API는 Chrome 기반에서 Google STT 엔진을 무료로 사용하며, 실시간 스트리밍으로 interim/final 결과를 즉시 받을 수 있다. 비용 없이 충분한 한국어 인식률을 제공한다.

1.3 브라우저 지원

브라우저	지원	비고
Chrome (Desktop/Android)	✅	최적 지원, Google STT 엔진 사용
Edge	✅	Chromium 기반
Safari (iOS/macOS)	✅	`webkitSpeechRecognition`
Firefox	❌	미지원 (버튼 자동 숨김)

2. 핵심 개념: Interim vs Final

Web Speech API의 핵심은 미확정(interim) 텍스트와 확정(final) 텍스트의 구분이다.

2.1 텍스트 상태 흐름

[음성 입력 시작]
    │
    ├─ interim: "안녕하"          ← 인식 진행 중 (수정될 수 있음)
    ├─ interim: "안녕하세"         ← 교정 발생 (이전 interim 덮어씀)
    ├─ interim: "안녕하세요"       ← 교정 발생
    │
    ├─ ★ FINAL: "안녕하세요"      ← 확정! (절대 삭제 불가)
    │
    ├─ interim: "반갑습"          ← 새로운 인식 시작
    ├─ interim: "반갑습니다"
    │
    ├─ ★ FINAL: "반갑습니다"      ← 확정!
    │
[음성 입력 종료]

2.2 렌더링 규칙 (필수 준수)

상태	스타일	동작	삭제 가능
interim (미확정)	`italic` + `text-gray-400`	실시간 교정됨. 이전 interim을 덮어씀	교정만 허용
final (확정)	`font-normal` + `text-white`	`finalizedSegments[]` 배열에 영구 추가	절대 불가

2.3 input 반영 규칙

final 이벤트 발생 시에만 onResult(transcript) 호출하여 input에 텍스트 추가
interim 텍스트는 프리뷰 패널에만 표시하고, input에는 반영하지 않음
input에 추가된 텍스트는 사용자가 직접 수정 가능 (일반 텍스트)

3. 컴포넌트 아키텍처

3.1 VoiceInputButton 컴포넌트

┌─────────────────────────────────┐
│  VoiceInputButton               │
│                                 │
│  Props:                         │
│    onResult: (text) => void     │  ← final 텍스트만 전달
│    disabled: boolean            │  ← 비활성화 (읽기 모드 등)
│                                 │
│  State:                         │
│    recording: boolean           │  ← 녹음 중 여부
│    finalizedSegments: string[]  │  ← 확정 텍스트 누적 (프리뷰용)
│    interimText: string          │  ← 현재 미확정 텍스트
│                                 │
│  Refs:                          │
│    recognitionRef               │  ← SpeechRecognition 인스턴스
│    startTimeRef                 │  ← 녹음 시작 시각 (사용량 추적)
│    dismissTimerRef              │  ← 프리뷰 닫기 타이머
│    previewRef                   │  ← 프리뷰 DOM (자동 스크롤)
│                                 │
│  Output:                        │
│    [마이크 버튼] + [프리뷰 패널] │
└─────────────────────────────────┘

3.2 전체 코드

function VoiceInputButton({ onResult, disabled }) {
    const [recording, setRecording] = useState(false);
    const [finalizedSegments, setFinalizedSegments] = useState([]);
    const [interimText, setInterimText] = useState('');
    const recognitionRef = useRef(null);
    const startTimeRef = useRef(null);
    const dismissTimerRef = useRef(null);
    const previewRef = useRef(null);

    // 브라우저 지원 확인
    const isSupported = typeof window !== 'undefined' &&
        (window.SpeechRecognition || window.webkitSpeechRecognition);

    // STT 사용량 로깅 (AI 토큰 사용량 추적)
    const logUsage = useCallback((startTime) => {
        const duration = Math.max(1, Math.round((Date.now() - startTime) / 1000));
        apiFetch(API.logSttUsage, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ duration_seconds: duration }),
        }).catch(() => {});
    }, []);

    // 프리뷰 패널 자동 스크롤
    useEffect(() => {
        if (previewRef.current) {
            previewRef.current.scrollTop = previewRef.current.scrollHeight;
        }
    }, [finalizedSegments, interimText]);

    // 녹음 중지
    const stopRecording = useCallback(() => {
        recognitionRef.current?.stop();
        recognitionRef.current = null;
        if (startTimeRef.current) {
            logUsage(startTimeRef.current);
            startTimeRef.current = null;
        }
        setRecording(false);
        setInterimText('');
        // 녹음 종료 후 2초 뒤 프리뷰 닫기
        dismissTimerRef.current = setTimeout(() => {
            setFinalizedSegments([]);
        }, 2000);
    }, [logUsage]);

    // 녹음 시작
    const startRecording = useCallback(() => {
        // 이전 타이머 정리
        if (dismissTimerRef.current) {
            clearTimeout(dismissTimerRef.current);
            dismissTimerRef.current = null;
        }

        const SR = window.SpeechRecognition || window.webkitSpeechRecognition;
        const recognition = new SR();
        recognition.lang = 'ko-KR';           // 한국어
        recognition.continuous = true;          // 연속 인식 (자동 종료 안 함)
        recognition.interimResults = true;      // interim 결과 수신
        recognition.maxAlternatives = 1;        // 후보 1개만

        recognition.onresult = (event) => {
            // dismiss 타이머 취소 (아직 인식 중)
            if (dismissTimerRef.current) {
                clearTimeout(dismissTimerRef.current);
                dismissTimerRef.current = null;
            }

            let currentInterim = '';
            for (let i = event.resultIndex; i < event.results.length; i++) {
                const transcript = event.results[i][0].transcript;
                if (event.results[i].isFinal) {
                    // ★ 확정: input에 반영 + 프리뷰에 영구 저장
                    onResult(transcript);
                    setFinalizedSegments(prev => [...prev, transcript]);
                    currentInterim = '';
                } else {
                    // 미확정: 교정은 허용하되 이전 확정분은 보존
                    currentInterim = transcript;
                }
            }
            setInterimText(currentInterim);
        };

        recognition.onerror = () => stopRecording();

        recognition.onend = () => {
            // 브라우저가 자동 종료한 경우 처리
            if (startTimeRef.current) {
                logUsage(startTimeRef.current);
                startTimeRef.current = null;
            }
            setRecording(false);
            setInterimText('');
            recognitionRef.current = null;
            dismissTimerRef.current = setTimeout(() => {
                setFinalizedSegments([]);
            }, 2000);
        };

        recognitionRef.current = recognition;
        startTimeRef.current = Date.now();
        setFinalizedSegments([]);
        setInterimText('');
        recognition.start();
        setRecording(true);
    }, [onResult, stopRecording, logUsage]);

    // 토글 (시작/중지)
    const toggle = useCallback((e) => {
        e.preventDefault();
        e.stopPropagation();
        if (disabled || !isSupported) return;
        recording ? stopRecording() : startRecording();
    }, [disabled, isSupported, recording, stopRecording, startRecording]);

    // 컴포넌트 언마운트 시 정리
    useEffect(() => {
        return () => {
            recognitionRef.current?.stop();
            if (dismissTimerRef.current) clearTimeout(dismissTimerRef.current);
        };
    }, []);

    // 미지원 브라우저에서는 렌더링하지 않음
    if (!isSupported) return null;

    const hasContent = finalizedSegments.length > 0 || interimText;

    return (
        <div className="relative flex-shrink-0">
            {/* 마이크 버튼 */}
            <button
                type="button"
                onClick={toggle}
                disabled={disabled}
                title={recording ? '녹음 중지 (클릭)' : '음성으로 입력'}
                className={`inline-flex items-center justify-center w-8 h-8 rounded-full transition-all
                    ${recording
                        ? 'bg-red-500 text-white shadow-lg shadow-red-200'
                        : 'bg-gray-100 text-gray-500 hover:bg-blue-100 hover:text-blue-600'}
                    ${disabled ? 'opacity-30 cursor-not-allowed' : 'cursor-pointer'}`}
            >
                {recording ? (
                    <span className="relative flex items-center justify-center w-4 h-4">
                        <span className="absolute inset-0 rounded-full bg-white/30 animate-ping" />
                        <svg className="w-3.5 h-3.5 relative" fill="currentColor" viewBox="0 0 24 24">
                            <rect x="6" y="6" width="12" height="12" rx="2" />
                        </svg>
                    </span>
                ) : (
                    <svg className="w-4 h-4" fill="currentColor" viewBox="0 0 24 24">
                        <path d="M12 14c1.66 0 3-1.34 3-3V5c0-1.66-1.34-3-3-3S9 3.34
                            9 5v6c0 1.66 1.34 3 3 3z" />
                        <path d="M17 11c0 2.76-2.24 5-5 5s-5-2.24-5-5H5c0 3.53 2.61
                            6.43 6 6.92V21h2v-3.08c3.39-.49 6-3.39 6-6.92h-2z" />
                    </svg>
                )}
            </button>

            {/* 스트리밍 프리뷰 패널 */}
            {(recording || hasContent) && (
                <div
                    ref={previewRef}
                    className="absolute bottom-full mb-2 right-0 bg-gray-900 rounded-lg
                        shadow-xl z-50 w-[300px] max-h-[120px] overflow-y-auto px-3 py-2"
                    style={{ lineHeight: '1.6' }}
                >
                    {/* 확정 텍스트: 일반체 + 흰색 */}
                    {finalizedSegments.map((seg, i) => (
                        <span key={i} className="text-white text-xs font-normal
                            transition-colors duration-300">
                            {seg}
                        </span>
                    ))}

                    {/* 미확정 텍스트: 이탤릭 + 연한 회색 */}
                    {interimText && (
                        <span className="text-gray-400 text-xs italic
                            transition-colors duration-200">
                            {interimText}
                        </span>
                    )}

                    {/* 녹음 중 + 텍스트 없음: 대기 표시 */}
                    {recording && !hasContent && (
                        <span className="text-gray-500 text-xs flex items-center gap-1.5">
                            <span className="inline-block w-1.5 h-1.5 bg-red-400
                                rounded-full animate-pulse" />
                            말씀하세요...
                        </span>
                    )}

                    {/* 녹음 종료 후 확정 텍스트 완료 표시 */}
                    {!recording && finalizedSegments.length > 0 && !interimText && (
                        <span className="text-green-400 text-xs ml-1">&#10003;</span>
                    )}
                </div>
            )}
        </div>
    );
}

4. 사용 패턴

4.1 기본 사용법 (input 옆에 배치)

function MyForm() {
    const [value, setValue] = useState('');

    return (
        <div>
            <label className="block text-sm font-medium text-gray-700 mb-1">
                현장명 *
            </label>
            <div className="flex items-center gap-2">
                <input
                    type="text"
                    value={value}
                    onChange={e => setValue(e.target.value)}
                    className="flex-1 px-3 py-2 border border-gray-300 rounded-lg text-sm"
                    placeholder="입력하세요"
                />
                <VoiceInputButton
                    onResult={(text) => setValue(prev =>
                        prev ? prev + ' ' + text : text
                    )}
                />
            </div>
        </div>
    );
}

4.2 textarea와 함께 사용

<div className="flex items-start gap-2">  {/* items-start: 상단 정렬 */}
    <textarea
        value={description}
        onChange={e => setDescription(e.target.value)}
        className="flex-1 px-3 py-2 border rounded-lg text-sm"
        rows={3}
    />
    <VoiceInputButton
        onResult={(text) => setDescription(prev =>
            prev ? prev + ' ' + text : text
        )}
    />
</div>

4.3 조건부 활성화 (수정 모드에서만)

<VoiceInputButton
    onResult={(text) => setSiteName(prev => prev ? prev + ' ' + text : text)}
    disabled={!editing}  // 수정 모드가 아닐 때 비활성화
/>

4.4 onResult 콜백 패턴

// 패턴 1: 기존 텍스트에 이어붙이기 (공백 구분)
onResult={(text) => setValue(prev => prev ? prev + ' ' + text : text)}

// 패턴 2: 덮어쓰기
onResult={(text) => setValue(text)}

// 패턴 3: 커스텀 후처리
onResult={(text) => {
    const cleaned = text.trim().replace(/\s+/g, ' ');
    setValue(prev => prev + ' ' + cleaned);
}}

5. 프리뷰 패널 UI 상세

5.1 위치와 스타일

                    ┌─────────────────────────────┐
                    │ 확정텍스트 미확정텍스트...     │  ← 프리뷰 패널
                    │ (흰색,일반체) (회색,이탤릭)    │     bg-gray-900
                    └─────────────────────────────┘     w-[300px]
                                               ┌──┐    max-h-[120px]
                                               │🎤│    line-height: 1.6
                                               └──┘

위치: 버튼 상단 (absolute bottom-full mb-2 right-0)
배경: 다크 (bg-gray-900) - 밝은 폼 위에서 눈에 잘 띔
너비: 300px 고정, 높이 최대 120px (스크롤)
자동 스크롤: 텍스트가 길어지면 하단으로 자동 스크롤

5.2 상태별 표시

상태	표시 내용
녹음 시작 직후 (텍스트 없음)	🔴 `말씀하세요...` (빨간 점 + 회색 텍스트)
interim 수신 중	확정 텍스트(흰) + 미확정 텍스트(회색 이탤릭)
final 확정 순간	이전 확정 + 새 확정(흰) 추가, interim 초기화
녹음 종료 직후	모든 확정 텍스트 + ✓ 표시(녹색)
종료 후 2초	패널 자동 닫힘 (`finalizedSegments` 초기화)

5.3 transition 설정

확정 텍스트:  transition-colors duration-300  (0.3초 색상 전환)
미확정 텍스트: transition-colors duration-200  (0.2초 색상 전환)
line-height:  1.6 고정 (줄 높이 변동 방지)

6. SpeechRecognition 설정 상세

6.1 주요 옵션

const recognition = new SpeechRecognition();
recognition.lang = 'ko-KR';           // 언어 (한국어)
recognition.continuous = true;          // 연속 인식 모드
recognition.interimResults = true;      // interim 결과 수신
recognition.maxAlternatives = 1;        // 인식 후보 수

옵션	값	설명
`lang`	`'ko-KR'`	한국어 인식. 다국어 필요 시 변경
`continuous`	`true`	말을 멈춰도 자동 종료하지 않음. 사용자가 직접 중지
`interimResults`	`true`	미확정 결과를 실시간 수신 (false면 final만)
`maxAlternatives`	`1`	인식 결과 후보 1개만 (속도 최적화)

6.2 이벤트 핸들러

이벤트	발생 시점	처리
`onresult`	인식 결과 수신	interim/final 구분 후 상태 업데이트
`onerror`	인식 오류	녹음 중지
`onend`	인식 세션 종료	정리 + 사용량 로깅 + 프리뷰 dismiss 타이머

6.3 onresult 이벤트 상세

recognition.onresult = (event) => {
    // event.resultIndex: 이번 이벤트에서 변경된 결과의 시작 인덱스
    // event.results: SpeechRecognitionResultList (누적)
    // event.results[i].isFinal: 확정 여부
    // event.results[i][0].transcript: 인식된 텍스트

    for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
            // → input에 반영 + finalizedSegments에 추가
        } else {
            // → interimText 업데이트 (이전 interim 덮어씀)
        }
    }
};

주의: event.resultIndex부터 순회해야 한다. 전체(0부터)를 순회하면 이미 처리한 final 결과를 중복 처리하게 된다.

7. 백엔드 (STT 사용량 추적)

7.1 라우트

// routes/web.php (juil 그룹 내)
Route::post('/construction-photos/log-stt-usage',
    [ConstructionSitePhotoController::class, 'logSttUsage']
)->name('construction-photos.log-stt-usage');

7.2 컨트롤러

public function logSttUsage(Request $request): JsonResponse
{
    $validated = $request->validate([
        'duration_seconds' => 'required|integer|min:1',
    ]);

    AiTokenHelper::saveSttUsage(
        '공사현장사진대지-음성입력',  // 메뉴명 (사용처 식별)
        $validated['duration_seconds']
    );

    return response()->json(['success' => true]);
}

7.3 AiTokenHelper::saveSttUsage

// App\Helpers\AiTokenHelper

/**
 * STT 사용량 기록
 * - 과금 기준: $0.009 / 15초
 * - Google Cloud Speech-to-Text 기준 단가
 *
 * @param string $menuName  사용처 메뉴명
 * @param int    $durationSeconds  녹음 시간(초)
 */
public static function saveSttUsage(string $menuName, int $durationSeconds): void

7.4 새 페이지에 STT 적용 시 라우트 추가 패턴

// 1. 컨트롤러에 logSttUsage 메서드 추가
public function logSttUsage(Request $request): JsonResponse
{
    $validated = $request->validate([
        'duration_seconds' => 'required|integer|min:1',
    ]);

    AiTokenHelper::saveSttUsage(
        '새메뉴명-음성입력',      // ← 메뉴명 변경
        $validated['duration_seconds']
    );

    return response()->json(['success' => true]);
}

// 2. 라우트 등록
Route::post('/new-page/log-stt-usage', [NewController::class, 'logSttUsage'])
    ->name('new-page.log-stt-usage');

// 3. 프론트엔드 API 객체에 추가
const API = {
    logSttUsage: '/path/to/log-stt-usage',
};

8. 새 페이지에 음성 입력 적용 체크리스트

8.1 프론트엔드

□ 1. VoiceInputButton 컴포넌트 코드 복사 (또는 공통 모듈화 후 import)
□ 2. API 객체에 logSttUsage 엔드포인트 추가
□ 3. input/textarea 옆에 VoiceInputButton 배치
□ 4. onResult 콜백에서 기존 텍스트에 이어붙이기 패턴 적용
□ 5. disabled prop으로 수정 모드에서만 활성화 (필요 시)
□ 6. flex 레이아웃 확인:
     - input: items-center gap-2 (한 줄)
     - textarea: items-start gap-2 (상단 정렬)

8.2 백엔드

□ 1. 컨트롤러에 logSttUsage 메서드 추가
□ 2. AiTokenHelper::saveSttUsage() 호출 (메뉴명 지정)
□ 3. routes/web.php에 POST 라우트 등록

8.3 레이아웃 참고

┌───────────────────────────────────────────┐
│ label                                     │
│ ┌──────────────────────────────────┐ ┌──┐ │
│ │ input text                       │ │🎤│ │
│ └──────────────────────────────────┘ └──┘ │
│                                           │
│ label                                     │
│ ┌──────────────────────────────────┐ ┌──┐ │
│ │ textarea                         │ │🎤│ │
│ │                                  │ │  │ │
│ │                                  │ │  │ │
│ └──────────────────────────────────┘ └──┘ │
└───────────────────────────────────────────┘

9. 주의사항 및 트러블슈팅

9.1 HTTPS 필수

Web Speech API는 HTTPS 환경에서만 동작한다 (localhost는 예외). HTTP 배포 시 마이크 접근이 차단된다.

9.2 브라우저 자동 종료

continuous: true로 설정해도, 브라우저가 긴 무음 구간에서 자동으로 인식을 종료할 수 있다. onend 이벤트에서 이를 처리한다.

9.3 마이크 권한

첫 사용 시 브라우저가 마이크 접근 권한을 요청한다. 사용자가 거부하면 onerror가 발생하고 버튼이 중지 상태로 돌아간다.

9.4 컴포넌트 언마운트 시 정리

모달 안에서 사용할 경우, 모달이 닫힐 때 컴포넌트가 언마운트된다. useEffect cleanup에서 반드시 recognition.stop()과 clearTimeout을 호출해야 한다.

useEffect(() => {
    return () => {
        recognitionRef.current?.stop();
        if (dismissTimerRef.current) clearTimeout(dismissTimerRef.current);
    };
}, []);

9.5 이벤트 전파 방지

마이크 버튼이 form 안에 있으면 클릭 시 form submit이 발생할 수 있다. 반드시 e.preventDefault() + e.stopPropagation()을 호출한다.

const toggle = useCallback((e) => {
    e.preventDefault();
    e.stopPropagation();
    // ...
}, []);

9.6 다중 VoiceInputButton

한 페이지에 여러 VoiceInputButton을 배치할 수 있다. 각 인스턴스는 독립적인 recognitionRef를 가지므로 충돌하지 않는다. 단, 동시에 2개 이상 녹음은 불가하다 (브라우저 마이크 제한). 한 버튼이 녹음 중일 때 다른 버튼을 누르면 기존 녹음이 중단된다 (브라우저 동작).

9.7 onend 자동 재시작 (긴 녹음)

continuous: true여도 브라우저가 무음 감지 시 자동으로 onend를 호출한다. 녹음이 계속 진행 중이라면 onend에서 재시작해야 한다.

// Alpine.js 패턴
recognition.onend = () => {
    if (this.isRecording && this.recognition) {
        try { this.recognition.start(); } catch (e) {}
    }
};

// React 패턴 (VoiceInputButton)
// onend에서 logUsage + dismiss 타이머 처리
recognition.onend = () => {
    if (startTimeRef.current) {
        logUsage(startTimeRef.current);
        startTimeRef.current = null;
    }
    setRecording(false);
    dismissTimerRef.current = setTimeout(() => setFinalizedSegments([]), 2000);
};

영업 시나리오는 onend에서 재시작하여 긴 상담도 끊김 없이 인식한다. 반면 공사현장 사진대지는 짧은 입력이므로 재시작하지 않는다.

10. Alpine.js 구현 (영업/매니저 시나리오)

영업 전략 시나리오와 매니저 상담 프로세스는 Alpine.js + Blade 기반이다. React 없이 동일한 STT 규칙을 적용한다.

10.1 적용 파일

파일	경로	용도
voice-recorder.blade.php	`resources/views/sales/modals/voice-recorder.blade.php`	음성 녹음 + STT 컴포넌트
scenario-modal.blade.php	`resources/views/sales/modals/scenario-modal.blade.php`	시나리오 모달 (voice-recorder 포함)
consultation-log.blade.php	`resources/views/sales/modals/consultation-log.blade.php`	상담 기록 표시/재생

10.2 React vs Alpine.js 차이점

항목	React (공사현장 사진대지)	Alpine.js (영업 시나리오)
상태 관리	`useState`, `useRef`	`x-data` 속성
확정 텍스트	`finalizedSegments` state	`finalizedSegments` 배열
미확정 텍스트	`interimText` state	`interimTranscript`
자동 스크롤	`useEffect` + `previewRef`	`$nextTick()` + `$refs`
반복 렌더링	`{arr.map((seg, i) => <span>)}`	`<template x-for="(seg, i) in arr">`
조건부 표시	`{condition && <Component />}`	`x-show="condition"`
용도	input 필드 옆 간단 음성 입력	음성 녹음 + 파일 저장 + STT

10.3 핵심 코드 (Alpine.js)

x-data 상태 정의

x-data="{
    // ... 기존 녹음 상태 ...
    transcript: '',              // 확정 텍스트 합산 (서버 저장용)
    interimTranscript: '',       // 현재 미확정 텍스트
    finalizedSegments: [],       // 확정 텍스트 세그먼트 배열 (프리뷰용)
    // ...
}"

startSpeechRecognition()

startSpeechRecognition() {
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    if (!SpeechRecognition) return;

    this.recognition = new SpeechRecognition();
    this.recognition.lang = 'ko-KR';
    this.recognition.continuous = true;
    this.recognition.interimResults = true;
    this.recognition.maxAlternatives = 1;

    this.transcript = '';
    this.interimTranscript = '';
    this.finalizedSegments = [];

    this.recognition.onresult = (event) => {
        let currentInterim = '';

        // ★ event.resultIndex부터 순회 (중복 방지)
        for (let i = event.resultIndex; i < event.results.length; i++) {
            const text = event.results[i][0].transcript;

            if (event.results[i].isFinal) {
                // ★ 확정: finalizedSegments에 영구 저장
                this.finalizedSegments.push(text);
                currentInterim = '';
            } else {
                // 미확정: 교정만 허용
                currentInterim = text;
            }
        }

        // transcript 합산 (서버 저장용)
        this.transcript = this.finalizedSegments.join(' ');
        this.interimTranscript = currentInterim;

        // 자동 스크롤
        this.$nextTick(() => {
            if (this.$refs.transcriptContainer) {
                this.$refs.transcriptContainer.scrollTop =
                    this.$refs.transcriptContainer.scrollHeight;
            }
        });
    };

    // 긴 녹음 시 자동 재시작
    this.recognition.onend = () => {
        if (this.isRecording && this.recognition) {
            try { this.recognition.start(); } catch (e) {}
        }
    };

    this.recognition.start();
}

10.4 프리뷰 패널 UI (Alpine.js Blade)

{{-- 다크 프리뷰 패널 --}}
<div x-show="finalizedSegments.length > 0 || interimTranscript"
     class="bg-gray-900 rounded-lg border border-gray-700 overflow-hidden">

    {{-- 헤더: 인식 중/완료 상태 표시 --}}
    <div class="flex items-center justify-between px-3 py-2 border-b border-gray-700">
        <div class="flex items-center gap-2">
            <p class="text-xs font-medium text-gray-400">음성 인식 결과</p>
            <template x-if="isRecording">
                <span class="flex items-center gap-1 text-xs text-red-400">
                    <span class="w-1.5 h-1.5 bg-red-400 rounded-full animate-pulse"></span>
                    인식 중
                </span>
            </template>
            <template x-if="!isRecording && finalizedSegments.length > 0">
                <span class="text-green-400 text-xs">&#10003; 완료</span>
            </template>
        </div>
        <p class="text-xs text-gray-500" x-text="transcript.length + ' 자'"></p>
    </div>

    {{-- 텍스트 영역 --}}
    <div class="p-3 max-h-32 overflow-y-auto" x-ref="transcriptContainer"
         style="line-height: 1.6;">

        {{-- 확정: 흰색 일반체 (삭제 불가) --}}
        <template x-for="(seg, i) in finalizedSegments" :key="i">
            <span class="text-white text-sm font-normal
                transition-colors duration-300" x-text="seg"></span>
        </template>

        {{-- 미확정: 회색 이탤릭 (교정 가능) --}}
        <span x-show="interimTranscript"
              class="text-gray-400 text-sm italic
                transition-colors duration-200"
              x-text="interimTranscript"></span>

        {{-- 대기: 녹음 중 + 텍스트 없음 --}}
        <span x-show="isRecording && finalizedSegments.length === 0 && !interimTranscript"
              class="text-gray-500 text-sm flex items-center gap-1.5">
            <span class="w-1.5 h-1.5 bg-red-400 rounded-full animate-pulse"></span>
            말씀하세요...
        </span>
    </div>
</div>

10.5 영업 시나리오만의 추가 기능

영업/매니저 시나리오의 voice-recorder는 단순 STT 외에 다음 기능을 포함한다:

기능	설명	API
음성 파일 녹음	MediaRecorder로 webm 캡처	`navigator.mediaDevices.getUserMedia()`
파형 시각화	Canvas + Web Audio API	`AudioContext.createAnalyser()`
자동 저장	녹음 중지 시 서버로 FormData 전송	`ConsultationController::uploadAudio()`
GCS 백업	10MB 이상 파일은 GCS에도 저장	`GoogleCloudStorageService`
Transcript 저장	STT 결과를 audio 레코드와 함께 DB 저장	`sales_consultations.transcript`
재생/다운로드	저장된 음성 파일 재생 및 다운로드	`ConsultationController::downloadAudio()`

10.6 데이터 흐름 (영업 시나리오)

사용자 마이크
    │
    ├──→ MediaRecorder (webm 녹음)
    │        └──→ audioBlob
    │
    ├──→ Web Audio API (파형 시각화)
    │        └──→ Canvas 파형 그리기
    │
    └──→ SpeechRecognition (STT)
             │
             ├──→ finalizedSegments[] (확정 세그먼트)
             │        └──→ transcript (합산, 서버 저장용)
             │
             └──→ interimTranscript (미확정)
                      └──→ 프리뷰 패널에만 표시

[녹음 중지]
    └──→ FormData { audio, transcript, duration }
             └──→ POST /sales/consultations/upload-audio
                      └──→ DB + (GCS if > 10MB)
                               └──→ HTMX 상담기록 갱신

11. 향후 확장 가능성

기능	설명	난이도
화자 분리 (Speaker Diarization)	여러 사람의 음성을 구분하여 각각 텍스트화	Google Cloud STT API 필요
다국어 전환	`recognition.lang`을 동적으로 변경	낮음
음성 명령	특정 키워드 인식 시 동작 수행 (예: "저장", "다음")	중간
녹음 파일 저장	MediaRecorder API로 음성 파일을 GCS에 저장	중간
실시간 번역	STT 결과를 번역 API로 전달	중간

부록 A: 참조 구현 파일

React 구현 (공사현장 사진대지)

파일	설명
`mng/resources/views/juil/construction-photos.blade.php`	VoiceInputButton 전체 코드 (React)
`mng/app/Http/Controllers/Juil/ConstructionSitePhotoController.php`	logSttUsage 엔드포인트
`mng/app/Helpers/AiTokenHelper.php`	saveSttUsage / saveGcsStorageUsage 헬퍼

Alpine.js 구현 (영업/매니저 시나리오)

파일	설명
`mng/resources/views/sales/modals/voice-recorder.blade.php`	음성 녹음 + STT (Alpine.js)
`mng/resources/views/sales/modals/scenario-modal.blade.php`	시나리오 모달 (voice-recorder 포함)
`mng/resources/views/sales/modals/consultation-log.blade.php`	상담 기록 재생/표시
`mng/app/Http/Controllers/Sales/ConsultationController.php`	음성 업로드/다운로드/삭제

부록 B: CSS 클래스 요약

요소	Tailwind 클래스
마이크 버튼 (대기)	`bg-gray-100 text-gray-500 hover:bg-blue-100 hover:text-blue-600 w-8 h-8 rounded-full`
마이크 버튼 (녹음)	`bg-red-500 text-white shadow-lg shadow-red-200`
프리뷰 패널	`bg-gray-900 rounded-lg shadow-xl w-[300px] max-h-[120px] overflow-y-auto`
확정 텍스트	`text-white text-xs font-normal transition-colors duration-300`
미확정 텍스트	`text-gray-400 text-xs italic transition-colors duration-200`
대기 표시	`text-gray-500 text-xs` + 빨간 점 `animate-pulse`
완료 표시	`text-green-400 text-xs` ✓
비활성화	`opacity-30 cursor-not-allowed`

34 KiB Raw Blame History