# 음성 입력(STT) 기술 가이드

> **문서 버전**: 1.1
> **작성일**: 2026-02-10
> **적용 페이지**: 공사현장 사진대지, 영업 전략 시나리오, 매니저 상담 프로세스
> **대상 프로젝트**: MNG (React 18 + Alpine.js)

---

## 1. 개요

### 1.1 목적

텍스트 입력 필드(input, textarea)에 **마이크 버튼**을 배치하여, 사용자가 음성으로 텍스트를 입력할 수 있게 하는 브라우저 내장 STT(Speech-to-Text) 기능.

### 1.2 기술 선택

| 방식 | 비용 | 정확도 | 지연 | 채택 |
|------|------|--------|------|------|
| Web Speech API (브라우저 내장) | **무료** | 높음 (Google STT 엔진) | 실시간 | **채택** |
| Google Cloud STT API | 유료 ($0.006/15초) | 매우 높음 | 서버 왕복 | 미채택 |
| Whisper (OpenAI) | 유료 ($0.006/분) | 매우 높음 | 서버 왕복 | 미채택 |

**선택 이유**: 브라우저 내장 Web Speech API는 Chrome 기반에서 Google STT 엔진을 무료로 사용하며, 실시간 스트리밍으로 interim/final 결과를 즉시 받을 수 있다. 비용 없이 충분한 한국어 인식률을 제공한다.

### 1.3 브라우저 지원

| 브라우저 | 지원 | 비고 |
|----------|------|------|
| Chrome (Desktop/Android) | ✅ | 최적 지원, Google STT 엔진 사용 |
| Edge | ✅ | Chromium 기반 |
| Safari (iOS/macOS) | ✅ | `webkitSpeechRecognition` |
| Firefox | ❌ | 미지원 (버튼 자동 숨김) |

---

## 2. 핵심 개념: Interim vs Final

Web Speech API의 핵심은 **미확정(interim)** 텍스트와 **확정(final)** 텍스트의 구분이다.

### 2.1 텍스트 상태 흐름

```
[음성 입력 시작]
    │
    ├─ interim: "안녕하"          ← 인식 진행 중 (수정될 수 있음)
    ├─ interim: "안녕하세"         ← 교정 발생 (이전 interim 덮어씀)
    ├─ interim: "안녕하세요"       ← 교정 발생
    │
    ├─ ★ FINAL: "안녕하세요"      ← 확정! (절대 삭제 불가)
    │
    ├─ interim: "반갑습"          ← 새로운 인식 시작
    ├─ interim: "반갑습니다"
    │
    ├─ ★ FINAL: "반갑습니다"      ← 확정!
    │
[음성 입력 종료]
```

### 2.2 렌더링 규칙 (필수 준수)

| 상태 | 스타일 | 동작 | 삭제 가능 |
|------|--------|------|-----------|
| **interim** (미확정) | `italic` + `text-gray-400` | 실시간 교정됨. 이전 interim을 덮어씀 | 교정만 허용 |
| **final** (확정) | `font-normal` + `text-white` | `finalizedSegments[]` 배열에 영구 추가 | **절대 불가** |

### 2.3 input 반영 규칙

- **final 이벤트 발생 시에만** `onResult(transcript)` 호출하여 input에 텍스트 추가
- interim 텍스트는 **프리뷰 패널에만** 표시하고, input에는 반영하지 않음
- input에 추가된 텍스트는 사용자가 직접 수정 가능 (일반 텍스트)

---

## 3. 컴포넌트 아키텍처

### 3.1 VoiceInputButton 컴포넌트

```
┌─────────────────────────────────┐
│  VoiceInputButton               │
│                                 │
│  Props:                         │
│    onResult: (text) => void     │  ← final 텍스트만 전달
│    disabled: boolean            │  ← 비활성화 (읽기 모드 등)
│                                 │
│  State:                         │
│    recording: boolean           │  ← 녹음 중 여부
│    finalizedSegments: string[]  │  ← 확정 텍스트 누적 (프리뷰용)
│    interimText: string          │  ← 현재 미확정 텍스트
│                                 │
│  Refs:                          │
│    recognitionRef               │  ← SpeechRecognition 인스턴스
│    startTimeRef                 │  ← 녹음 시작 시각 (사용량 추적)
│    dismissTimerRef              │  ← 프리뷰 닫기 타이머
│    previewRef                   │  ← 프리뷰 DOM (자동 스크롤)
│                                 │
│  Output:                        │
│    [마이크 버튼] + [프리뷰 패널] │
└─────────────────────────────────┘
```

### 3.2 전체 코드

```jsx
function VoiceInputButton({ onResult, disabled }) {
    const [recording, setRecording] = useState(false);
    const [finalizedSegments, setFinalizedSegments] = useState([]);
    const [interimText, setInterimText] = useState('');
    const recognitionRef = useRef(null);
    const startTimeRef = useRef(null);
    const dismissTimerRef = useRef(null);
    const previewRef = useRef(null);

    // 브라우저 지원 확인
    const isSupported = typeof window !== 'undefined' &&
        (window.SpeechRecognition || window.webkitSpeechRecognition);

    // STT 사용량 로깅 (AI 토큰 사용량 추적)
    const logUsage = useCallback((startTime) => {
        const duration = Math.max(1, Math.round((Date.now() - startTime) / 1000));
        apiFetch(API.logSttUsage, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ duration_seconds: duration }),
        }).catch(() => {});
    }, []);

    // 프리뷰 패널 자동 스크롤
    useEffect(() => {
        if (previewRef.current) {
            previewRef.current.scrollTop = previewRef.current.scrollHeight;
        }
    }, [finalizedSegments, interimText]);

    // 녹음 중지
    const stopRecording = useCallback(() => {
        recognitionRef.current?.stop();
        recognitionRef.current = null;
        if (startTimeRef.current) {
            logUsage(startTimeRef.current);
            startTimeRef.current = null;
        }
        setRecording(false);
        setInterimText('');
        // 녹음 종료 후 2초 뒤 프리뷰 닫기
        dismissTimerRef.current = setTimeout(() => {
            setFinalizedSegments([]);
        }, 2000);
    }, [logUsage]);

    // 녹음 시작
    const startRecording = useCallback(() => {
        // 이전 타이머 정리
        if (dismissTimerRef.current) {
            clearTimeout(dismissTimerRef.current);
            dismissTimerRef.current = null;
        }

        const SR = window.SpeechRecognition || window.webkitSpeechRecognition;
        const recognition = new SR();
        recognition.lang = 'ko-KR';           // 한국어
        recognition.continuous = true;          // 연속 인식 (자동 종료 안 함)
        recognition.interimResults = true;      // interim 결과 수신
        recognition.maxAlternatives = 1;        // 후보 1개만

        recognition.onresult = (event) => {
            // dismiss 타이머 취소 (아직 인식 중)
            if (dismissTimerRef.current) {
                clearTimeout(dismissTimerRef.current);
                dismissTimerRef.current = null;
            }

            let currentInterim = '';
            for (let i = event.resultIndex; i < event.results.length; i++) {
                const transcript = event.results[i][0].transcript;
                if (event.results[i].isFinal) {
                    // ★ 확정: input에 반영 + 프리뷰에 영구 저장
                    onResult(transcript);
                    setFinalizedSegments(prev => [...prev, transcript]);
                    currentInterim = '';
                } else {
                    // 미확정: 교정은 허용하되 이전 확정분은 보존
                    currentInterim = transcript;
                }
            }
            setInterimText(currentInterim);
        };

        recognition.onerror = () => stopRecording();

        recognition.onend = () => {
            // 브라우저가 자동 종료한 경우 처리
            if (startTimeRef.current) {
                logUsage(startTimeRef.current);
                startTimeRef.current = null;
            }
            setRecording(false);
            setInterimText('');
            recognitionRef.current = null;
            dismissTimerRef.current = setTimeout(() => {
                setFinalizedSegments([]);
            }, 2000);
        };

        recognitionRef.current = recognition;
        startTimeRef.current = Date.now();
        setFinalizedSegments([]);
        setInterimText('');
        recognition.start();
        setRecording(true);
    }, [onResult, stopRecording, logUsage]);

    // 토글 (시작/중지)
    const toggle = useCallback((e) => {
        e.preventDefault();
        e.stopPropagation();
        if (disabled || !isSupported) return;
        recording ? stopRecording() : startRecording();
    }, [disabled, isSupported, recording, stopRecording, startRecording]);

    // 컴포넌트 언마운트 시 정리
    useEffect(() => {
        return () => {
            recognitionRef.current?.stop();
            if (dismissTimerRef.current) clearTimeout(dismissTimerRef.current);
        };
    }, []);

    // 미지원 브라우저에서는 렌더링하지 않음
    if (!isSupported) return null;

    const hasContent = finalizedSegments.length > 0 || interimText;

    return (
        <div className="relative flex-shrink-0">
            {/* 마이크 버튼 */}
            <button
                type="button"
                onClick={toggle}
                disabled={disabled}
                title={recording ? '녹음 중지 (클릭)' : '음성으로 입력'}
                className={`inline-flex items-center justify-center w-8 h-8 rounded-full transition-all
                    ${recording
                        ? 'bg-red-500 text-white shadow-lg shadow-red-200'
                        : 'bg-gray-100 text-gray-500 hover:bg-blue-100 hover:text-blue-600'}
                    ${disabled ? 'opacity-30 cursor-not-allowed' : 'cursor-pointer'}`}
            >
                {recording ? (
                    <span className="relative flex items-center justify-center w-4 h-4">
                        <span className="absolute inset-0 rounded-full bg-white/30 animate-ping" />
                        <svg className="w-3.5 h-3.5 relative" fill="currentColor" viewBox="0 0 24 24">
                            <rect x="6" y="6" width="12" height="12" rx="2" />
                        </svg>
                    </span>
                ) : (
                    <svg className="w-4 h-4" fill="currentColor" viewBox="0 0 24 24">
                        <path d="M12 14c1.66 0 3-1.34 3-3V5c0-1.66-1.34-3-3-3S9 3.34
                            9 5v6c0 1.66 1.34 3 3 3z" />
                        <path d="M17 11c0 2.76-2.24 5-5 5s-5-2.24-5-5H5c0 3.53 2.61
                            6.43 6 6.92V21h2v-3.08c3.39-.49 6-3.39 6-6.92h-2z" />
                    </svg>
                )}
            </button>

            {/* 스트리밍 프리뷰 패널 */}
            {(recording || hasContent) && (
                <div
                    ref={previewRef}
                    className="absolute bottom-full mb-2 right-0 bg-gray-900 rounded-lg
                        shadow-xl z-50 w-[300px] max-h-[120px] overflow-y-auto px-3 py-2"
                    style={{ lineHeight: '1.6' }}
                >
                    {/* 확정 텍스트: 일반체 + 흰색 */}
                    {finalizedSegments.map((seg, i) => (
                        <span key={i} className="text-white text-xs font-normal
                            transition-colors duration-300">
                            {seg}
                        </span>
                    ))}

                    {/* 미확정 텍스트: 이탤릭 + 연한 회색 */}
                    {interimText && (
                        <span className="text-gray-400 text-xs italic
                            transition-colors duration-200">
                            {interimText}
                        </span>
                    )}

                    {/* 녹음 중 + 텍스트 없음: 대기 표시 */}
                    {recording && !hasContent && (
                        <span className="text-gray-500 text-xs flex items-center gap-1.5">
                            <span className="inline-block w-1.5 h-1.5 bg-red-400
                                rounded-full animate-pulse" />
                            말씀하세요...
                        </span>
                    )}

                    {/* 녹음 종료 후 확정 텍스트 완료 표시 */}
                    {!recording && finalizedSegments.length > 0 && !interimText && (
                        <span className="text-green-400 text-xs ml-1">&#10003;</span>
                    )}
                </div>
            )}
        </div>
    );
}
```

---

## 4. 사용 패턴

### 4.1 기본 사용법 (input 옆에 배치)

```jsx
function MyForm() {
    const [value, setValue] = useState('');

    return (
        <div>
            <label className="block text-sm font-medium text-gray-700 mb-1">
                현장명 *
            </label>
            <div className="flex items-center gap-2">
                <input
                    type="text"
                    value={value}
                    onChange={e => setValue(e.target.value)}
                    className="flex-1 px-3 py-2 border border-gray-300 rounded-lg text-sm"
                    placeholder="입력하세요"
                />
                <VoiceInputButton
                    onResult={(text) => setValue(prev =>
                        prev ? prev + ' ' + text : text
                    )}
                />
            </div>
        </div>
    );
}
```

### 4.2 textarea와 함께 사용

```jsx
<div className="flex items-start gap-2">  {/* items-start: 상단 정렬 */}
    <textarea
        value={description}
        onChange={e => setDescription(e.target.value)}
        className="flex-1 px-3 py-2 border rounded-lg text-sm"
        rows={3}
    />
    <VoiceInputButton
        onResult={(text) => setDescription(prev =>
            prev ? prev + ' ' + text : text
        )}
    />
</div>
```

### 4.3 조건부 활성화 (수정 모드에서만)

```jsx
<VoiceInputButton
    onResult={(text) => setSiteName(prev => prev ? prev + ' ' + text : text)}
    disabled={!editing}  // 수정 모드가 아닐 때 비활성화
/>
```

### 4.4 onResult 콜백 패턴

```jsx
// 패턴 1: 기존 텍스트에 이어붙이기 (공백 구분)
onResult={(text) => setValue(prev => prev ? prev + ' ' + text : text)}

// 패턴 2: 덮어쓰기
onResult={(text) => setValue(text)}

// 패턴 3: 커스텀 후처리
onResult={(text) => {
    const cleaned = text.trim().replace(/\s+/g, ' ');
    setValue(prev => prev + ' ' + cleaned);
}}
```

---

## 5. 프리뷰 패널 UI 상세

### 5.1 위치와 스타일

```
                    ┌─────────────────────────────┐
                    │ 확정텍스트 미확정텍스트...     │  ← 프리뷰 패널
                    │ (흰색,일반체) (회색,이탤릭)    │     bg-gray-900
                    └─────────────────────────────┘     w-[300px]
                                               ┌──┐    max-h-[120px]
                                               │🎤│    line-height: 1.6
                                               └──┘
```

- **위치**: 버튼 상단 (`absolute bottom-full mb-2 right-0`)
- **배경**: 다크 (`bg-gray-900`) - 밝은 폼 위에서 눈에 잘 띔
- **너비**: 300px 고정, 높이 최대 120px (스크롤)
- **자동 스크롤**: 텍스트가 길어지면 하단으로 자동 스크롤

### 5.2 상태별 표시

| 상태 | 표시 내용 |
|------|-----------|
| 녹음 시작 직후 (텍스트 없음) | 🔴 `말씀하세요...` (빨간 점 + 회색 텍스트) |
| interim 수신 중 | 확정 텍스트(흰) + 미확정 텍스트(회색 이탤릭) |
| final 확정 순간 | 이전 확정 + 새 확정(흰) 추가, interim 초기화 |
| 녹음 종료 직후 | 모든 확정 텍스트 + ✓ 표시(녹색) |
| 종료 후 2초 | 패널 자동 닫힘 (`finalizedSegments` 초기화) |

### 5.3 transition 설정

```
확정 텍스트:  transition-colors duration-300  (0.3초 색상 전환)
미확정 텍스트: transition-colors duration-200  (0.2초 색상 전환)
line-height:  1.6 고정 (줄 높이 변동 방지)
```

---

## 6. SpeechRecognition 설정 상세

### 6.1 주요 옵션

```javascript
const recognition = new SpeechRecognition();
recognition.lang = 'ko-KR';           // 언어 (한국어)
recognition.continuous = true;          // 연속 인식 모드
recognition.interimResults = true;      // interim 결과 수신
recognition.maxAlternatives = 1;        // 인식 후보 수
```

| 옵션 | 값 | 설명 |
|------|-----|------|
| `lang` | `'ko-KR'` | 한국어 인식. 다국어 필요 시 변경 |
| `continuous` | `true` | 말을 멈춰도 자동 종료하지 않음. 사용자가 직접 중지 |
| `interimResults` | `true` | 미확정 결과를 실시간 수신 (false면 final만) |
| `maxAlternatives` | `1` | 인식 결과 후보 1개만 (속도 최적화) |

### 6.2 이벤트 핸들러

| 이벤트 | 발생 시점 | 처리 |
|--------|-----------|------|
| `onresult` | 인식 결과 수신 | interim/final 구분 후 상태 업데이트 |
| `onerror` | 인식 오류 | 녹음 중지 |
| `onend` | 인식 세션 종료 | 정리 + 사용량 로깅 + 프리뷰 dismiss 타이머 |

### 6.3 onresult 이벤트 상세

```javascript
recognition.onresult = (event) => {
    // event.resultIndex: 이번 이벤트에서 변경된 결과의 시작 인덱스
    // event.results: SpeechRecognitionResultList (누적)
    // event.results[i].isFinal: 확정 여부
    // event.results[i][0].transcript: 인식된 텍스트

    for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
            // → input에 반영 + finalizedSegments에 추가
        } else {
            // → interimText 업데이트 (이전 interim 덮어씀)
        }
    }
};
```

**주의**: `event.resultIndex`부터 순회해야 한다. 전체(`0`부터)를 순회하면 이미 처리한 final 결과를 중복 처리하게 된다.

---

## 7. 백엔드 (STT 사용량 추적)

### 7.1 라우트

```php
// routes/web.php (juil 그룹 내)
Route::post('/construction-photos/log-stt-usage',
    [ConstructionSitePhotoController::class, 'logSttUsage']
)->name('construction-photos.log-stt-usage');
```

### 7.2 컨트롤러

```php
public function logSttUsage(Request $request): JsonResponse
{
    $validated = $request->validate([
        'duration_seconds' => 'required|integer|min:1',
    ]);

    AiTokenHelper::saveSttUsage(
        '공사현장사진대지-음성입력',  // 메뉴명 (사용처 식별)
        $validated['duration_seconds']
    );

    return response()->json(['success' => true]);
}
```

### 7.3 AiTokenHelper::saveSttUsage

```php
// App\Helpers\AiTokenHelper

/**
 * STT 사용량 기록
 * - 과금 기준: $0.009 / 15초
 * - Google Cloud Speech-to-Text 기준 단가
 *
 * @param string $menuName  사용처 메뉴명
 * @param int    $durationSeconds  녹음 시간(초)
 */
public static function saveSttUsage(string $menuName, int $durationSeconds): void
```

### 7.4 새 페이지에 STT 적용 시 라우트 추가 패턴

```php
// 1. 컨트롤러에 logSttUsage 메서드 추가
public function logSttUsage(Request $request): JsonResponse
{
    $validated = $request->validate([
        'duration_seconds' => 'required|integer|min:1',
    ]);

    AiTokenHelper::saveSttUsage(
        '새메뉴명-음성입력',      // ← 메뉴명 변경
        $validated['duration_seconds']
    );

    return response()->json(['success' => true]);
}

// 2. 라우트 등록
Route::post('/new-page/log-stt-usage', [NewController::class, 'logSttUsage'])
    ->name('new-page.log-stt-usage');

// 3. 프론트엔드 API 객체에 추가
const API = {
    logSttUsage: '/path/to/log-stt-usage',
};
```

---

## 8. 새 페이지에 음성 입력 적용 체크리스트

### 8.1 프론트엔드

```
□ 1. VoiceInputButton 컴포넌트 코드 복사 (또는 공통 모듈화 후 import)
□ 2. API 객체에 logSttUsage 엔드포인트 추가
□ 3. input/textarea 옆에 VoiceInputButton 배치
□ 4. onResult 콜백에서 기존 텍스트에 이어붙이기 패턴 적용
□ 5. disabled prop으로 수정 모드에서만 활성화 (필요 시)
□ 6. flex 레이아웃 확인:
     - input: items-center gap-2 (한 줄)
     - textarea: items-start gap-2 (상단 정렬)
```

### 8.2 백엔드

```
□ 1. 컨트롤러에 logSttUsage 메서드 추가
□ 2. AiTokenHelper::saveSttUsage() 호출 (메뉴명 지정)
□ 3. routes/web.php에 POST 라우트 등록
```

### 8.3 레이아웃 참고

```
┌───────────────────────────────────────────┐
│ label                                     │
│ ┌──────────────────────────────────┐ ┌──┐ │
│ │ input text                       │ │🎤│ │
│ └──────────────────────────────────┘ └──┘ │
│                                           │
│ label                                     │
│ ┌──────────────────────────────────┐ ┌──┐ │
│ │ textarea                         │ │🎤│ │
│ │                                  │ │  │ │
│ │                                  │ │  │ │
│ └──────────────────────────────────┘ └──┘ │
└───────────────────────────────────────────┘
```

---

## 9. 주의사항 및 트러블슈팅

### 9.1 HTTPS 필수

Web Speech API는 **HTTPS** 환경에서만 동작한다 (localhost는 예외). HTTP 배포 시 마이크 접근이 차단된다.

### 9.2 브라우저 자동 종료

`continuous: true`로 설정해도, 브라우저가 긴 무음 구간에서 자동으로 인식을 종료할 수 있다. `onend` 이벤트에서 이를 처리한다.

### 9.3 마이크 권한

첫 사용 시 브라우저가 마이크 접근 권한을 요청한다. 사용자가 거부하면 `onerror`가 발생하고 버튼이 중지 상태로 돌아간다.

### 9.4 컴포넌트 언마운트 시 정리

모달 안에서 사용할 경우, 모달이 닫힐 때 컴포넌트가 언마운트된다. `useEffect` cleanup에서 반드시 `recognition.stop()`과 `clearTimeout`을 호출해야 한다.

```javascript
useEffect(() => {
    return () => {
        recognitionRef.current?.stop();
        if (dismissTimerRef.current) clearTimeout(dismissTimerRef.current);
    };
}, []);
```

### 9.5 이벤트 전파 방지

마이크 버튼이 form 안에 있으면 클릭 시 form submit이 발생할 수 있다. 반드시 `e.preventDefault()` + `e.stopPropagation()`을 호출한다.

```javascript
const toggle = useCallback((e) => {
    e.preventDefault();
    e.stopPropagation();
    // ...
}, []);
```

### 9.6 다중 VoiceInputButton

한 페이지에 여러 VoiceInputButton을 배치할 수 있다. 각 인스턴스는 독립적인 `recognitionRef`를 가지므로 충돌하지 않는다. 단, **동시에 2개 이상 녹음은 불가**하다 (브라우저 마이크 제한). 한 버튼이 녹음 중일 때 다른 버튼을 누르면 기존 녹음이 중단된다 (브라우저 동작).

### 9.7 onend 자동 재시작 (긴 녹음)

`continuous: true`여도 브라우저가 무음 감지 시 자동으로 `onend`를 호출한다. 녹음이 계속 진행 중이라면 `onend`에서 재시작해야 한다.

```javascript
// Alpine.js 패턴
recognition.onend = () => {
    if (this.isRecording && this.recognition) {
        try { this.recognition.start(); } catch (e) {}
    }
};

// React 패턴 (VoiceInputButton)
// onend에서 logUsage + dismiss 타이머 처리
recognition.onend = () => {
    if (startTimeRef.current) {
        logUsage(startTimeRef.current);
        startTimeRef.current = null;
    }
    setRecording(false);
    dismissTimerRef.current = setTimeout(() => setFinalizedSegments([]), 2000);
};
```

영업 시나리오는 `onend`에서 재시작하여 긴 상담도 끊김 없이 인식한다. 반면 공사현장 사진대지는 짧은 입력이므로 재시작하지 않는다.

---

## 10. Alpine.js 구현 (영업/매니저 시나리오)

영업 전략 시나리오와 매니저 상담 프로세스는 **Alpine.js + Blade** 기반이다. React 없이 동일한 STT 규칙을 적용한다.

### 10.1 적용 파일

| 파일 | 경로 | 용도 |
|------|------|------|
| voice-recorder.blade.php | `resources/views/sales/modals/voice-recorder.blade.php` | 음성 녹음 + STT 컴포넌트 |
| scenario-modal.blade.php | `resources/views/sales/modals/scenario-modal.blade.php` | 시나리오 모달 (voice-recorder 포함) |
| consultation-log.blade.php | `resources/views/sales/modals/consultation-log.blade.php` | 상담 기록 표시/재생 |

### 10.2 React vs Alpine.js 차이점

| 항목 | React (공사현장 사진대지) | Alpine.js (영업 시나리오) |
|------|--------------------------|--------------------------|
| 상태 관리 | `useState`, `useRef` | `x-data` 속성 |
| 확정 텍스트 | `finalizedSegments` state | `finalizedSegments` 배열 |
| 미확정 텍스트 | `interimText` state | `interimTranscript` |
| 자동 스크롤 | `useEffect` + `previewRef` | `$nextTick()` + `$refs` |
| 반복 렌더링 | `{arr.map((seg, i) => <span>)}` | `<template x-for="(seg, i) in arr">` |
| 조건부 표시 | `{condition && <Component />}` | `x-show="condition"` |
| 용도 | input 필드 옆 간단 음성 입력 | 음성 녹음 + 파일 저장 + STT |

### 10.3 핵심 코드 (Alpine.js)

#### x-data 상태 정의

```javascript
x-data="{
    // ... 기존 녹음 상태 ...
    transcript: '',              // 확정 텍스트 합산 (서버 저장용)
    interimTranscript: '',       // 현재 미확정 텍스트
    finalizedSegments: [],       // 확정 텍스트 세그먼트 배열 (프리뷰용)
    // ...
}"
```

#### startSpeechRecognition()

```javascript
startSpeechRecognition() {
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    if (!SpeechRecognition) return;

    this.recognition = new SpeechRecognition();
    this.recognition.lang = 'ko-KR';
    this.recognition.continuous = true;
    this.recognition.interimResults = true;
    this.recognition.maxAlternatives = 1;

    this.transcript = '';
    this.interimTranscript = '';
    this.finalizedSegments = [];

    this.recognition.onresult = (event) => {
        let currentInterim = '';

        // ★ event.resultIndex부터 순회 (중복 방지)
        for (let i = event.resultIndex; i < event.results.length; i++) {
            const text = event.results[i][0].transcript;

            if (event.results[i].isFinal) {
                // ★ 확정: finalizedSegments에 영구 저장
                this.finalizedSegments.push(text);
                currentInterim = '';
            } else {
                // 미확정: 교정만 허용
                currentInterim = text;
            }
        }

        // transcript 합산 (서버 저장용)
        this.transcript = this.finalizedSegments.join(' ');
        this.interimTranscript = currentInterim;

        // 자동 스크롤
        this.$nextTick(() => {
            if (this.$refs.transcriptContainer) {
                this.$refs.transcriptContainer.scrollTop =
                    this.$refs.transcriptContainer.scrollHeight;
            }
        });
    };

    // 긴 녹음 시 자동 재시작
    this.recognition.onend = () => {
        if (this.isRecording && this.recognition) {
            try { this.recognition.start(); } catch (e) {}
        }
    };

    this.recognition.start();
}
```

### 10.4 프리뷰 패널 UI (Alpine.js Blade)

```blade
{{-- 다크 프리뷰 패널 --}}
<div x-show="finalizedSegments.length > 0 || interimTranscript"
     class="bg-gray-900 rounded-lg border border-gray-700 overflow-hidden">

    {{-- 헤더: 인식 중/완료 상태 표시 --}}
    <div class="flex items-center justify-between px-3 py-2 border-b border-gray-700">
        <div class="flex items-center gap-2">
            <p class="text-xs font-medium text-gray-400">음성 인식 결과</p>
            <template x-if="isRecording">
                <span class="flex items-center gap-1 text-xs text-red-400">
                    <span class="w-1.5 h-1.5 bg-red-400 rounded-full animate-pulse"></span>
                    인식 중
                </span>
            </template>
            <template x-if="!isRecording && finalizedSegments.length > 0">
                <span class="text-green-400 text-xs">&#10003; 완료</span>
            </template>
        </div>
        <p class="text-xs text-gray-500" x-text="transcript.length + ' 자'"></p>
    </div>

    {{-- 텍스트 영역 --}}
    <div class="p-3 max-h-32 overflow-y-auto" x-ref="transcriptContainer"
         style="line-height: 1.6;">

        {{-- 확정: 흰색 일반체 (삭제 불가) --}}
        <template x-for="(seg, i) in finalizedSegments" :key="i">
            <span class="text-white text-sm font-normal
                transition-colors duration-300" x-text="seg"></span>
        </template>

        {{-- 미확정: 회색 이탤릭 (교정 가능) --}}
        <span x-show="interimTranscript"
              class="text-gray-400 text-sm italic
                transition-colors duration-200"
              x-text="interimTranscript"></span>

        {{-- 대기: 녹음 중 + 텍스트 없음 --}}
        <span x-show="isRecording && finalizedSegments.length === 0 && !interimTranscript"
              class="text-gray-500 text-sm flex items-center gap-1.5">
            <span class="w-1.5 h-1.5 bg-red-400 rounded-full animate-pulse"></span>
            말씀하세요...
        </span>
    </div>
</div>
```

### 10.5 영업 시나리오만의 추가 기능

영업/매니저 시나리오의 voice-recorder는 단순 STT 외에 다음 기능을 포함한다:

| 기능 | 설명 | API |
|------|------|-----|
| **음성 파일 녹음** | MediaRecorder로 webm 캡처 | `navigator.mediaDevices.getUserMedia()` |
| **파형 시각화** | Canvas + Web Audio API | `AudioContext.createAnalyser()` |
| **자동 저장** | 녹음 중지 시 서버로 FormData 전송 | `ConsultationController::uploadAudio()` |
| **GCS 백업** | 10MB 이상 파일은 GCS에도 저장 | `GoogleCloudStorageService` |
| **Transcript 저장** | STT 결과를 audio 레코드와 함께 DB 저장 | `sales_consultations.transcript` |
| **재생/다운로드** | 저장된 음성 파일 재생 및 다운로드 | `ConsultationController::downloadAudio()` |

### 10.6 데이터 흐름 (영업 시나리오)

```
사용자 마이크
    │
    ├──→ MediaRecorder (webm 녹음)
    │        └──→ audioBlob
    │
    ├──→ Web Audio API (파형 시각화)
    │        └──→ Canvas 파형 그리기
    │
    └──→ SpeechRecognition (STT)
             │
             ├──→ finalizedSegments[] (확정 세그먼트)
             │        └──→ transcript (합산, 서버 저장용)
             │
             └──→ interimTranscript (미확정)
                      └──→ 프리뷰 패널에만 표시

[녹음 중지]
    └──→ FormData { audio, transcript, duration }
             └──→ POST /sales/consultations/upload-audio
                      └──→ DB + (GCS if > 10MB)
                               └──→ HTMX 상담기록 갱신
```

---

## 11. 향후 확장 가능성

| 기능 | 설명 | 난이도 |
|------|------|--------|
| 화자 분리 (Speaker Diarization) | 여러 사람의 음성을 구분하여 각각 텍스트화 | Google Cloud STT API 필요 |
| 다국어 전환 | `recognition.lang`을 동적으로 변경 | 낮음 |
| 음성 명령 | 특정 키워드 인식 시 동작 수행 (예: "저장", "다음") | 중간 |
| 녹음 파일 저장 | MediaRecorder API로 음성 파일을 GCS에 저장 | 중간 |
| 실시간 번역 | STT 결과를 번역 API로 전달 | 중간 |

---

## 부록 A: 참조 구현 파일

### React 구현 (공사현장 사진대지)

| 파일 | 설명 |
|------|------|
| `mng/resources/views/juil/construction-photos.blade.php` | VoiceInputButton 전체 코드 (React) |
| `mng/app/Http/Controllers/Juil/ConstructionSitePhotoController.php` | logSttUsage 엔드포인트 |
| `mng/app/Helpers/AiTokenHelper.php` | saveSttUsage / saveGcsStorageUsage 헬퍼 |

### Alpine.js 구현 (영업/매니저 시나리오)

| 파일 | 설명 |
|------|------|
| `mng/resources/views/sales/modals/voice-recorder.blade.php` | 음성 녹음 + STT (Alpine.js) |
| `mng/resources/views/sales/modals/scenario-modal.blade.php` | 시나리오 모달 (voice-recorder 포함) |
| `mng/resources/views/sales/modals/consultation-log.blade.php` | 상담 기록 재생/표시 |
| `mng/app/Http/Controllers/Sales/ConsultationController.php` | 음성 업로드/다운로드/삭제 |

## 부록 B: CSS 클래스 요약

| 요소 | Tailwind 클래스 |
|------|----------------|
| 마이크 버튼 (대기) | `bg-gray-100 text-gray-500 hover:bg-blue-100 hover:text-blue-600 w-8 h-8 rounded-full` |
| 마이크 버튼 (녹음) | `bg-red-500 text-white shadow-lg shadow-red-200` |
| 프리뷰 패널 | `bg-gray-900 rounded-lg shadow-xl w-[300px] max-h-[120px] overflow-y-auto` |
| 확정 텍스트 | `text-white text-xs font-normal transition-colors duration-300` |
| 미확정 텍스트 | `text-gray-400 text-xs italic transition-colors duration-200` |
| 대기 표시 | `text-gray-500 text-xs` + 빨간 점 `animate-pulse` |
| 완료 표시 | `text-green-400 text-xs` ✓ |
| 비활성화 | `opacity-30 cursor-not-allowed` |