Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AymanMahfuz27/tiktok-auto-collection-sorter/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The TikTok Auto Collection Sorter compares three model types during training and selects the best performer via cross-validation:
  1. k-Nearest Neighbors (k-NN): Non-parametric baseline
  2. Logistic Regression: Linear classifier with L2 regularization
  3. Multi-Layer Perceptron (MLP): Two-layer neural network
This guide covers when to use each model, how to modify the MLP architecture, and how to add custom models.

Model Comparison

k-Nearest Neighbors

How it works (train.py:146-152):
k = min(5, len(X_train) - 1)
knn = KNeighborsClassifier(n_neighbors=k, metric="cosine")
knn.fit(X_train, y_train)
knn_preds = knn.predict(X_val)
Characteristics:
  • No training required (stores all training data)
  • Uses cosine similarity between embeddings
  • k=5 neighbors by default
When to use:
  • Best for small datasets (<100 samples)
  • When classes have tight, well-separated clusters
  • When you want instant “training” (no optimization step)
Limitations:
  • Slow inference on large datasets (compares against all training data)
  • No learned decision boundaries
  • Sensitive to noisy features

Logistic Regression

How it works (train.py:155-160):
lr = LogisticRegression(max_iter=1000, C=1.0, class_weight="balanced")
lr.fit(X_train, y_train)
lr_preds = lr.predict(X_val)
Characteristics:
  • Linear decision boundaries
  • L2 regularization (C=1.0 controls strength)
  • Built-in class balancing
When to use:
  • When classes are linearly separable
  • For interpretability (can inspect feature weights)
  • When you need fast, reliable inference
Limitations:
  • Cannot learn non-linear patterns
  • May underfit complex relationships

Multi-Layer Perceptron (MLP)

Architecture (train.py:31-45):
class MLP(nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),      # 1024 → 256
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim // 2), # 256 → 128
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim // 2, num_classes), # 128 → N
        )

    def forward(self, x):
        return self.net(x)
Characteristics:
  • Two hidden layers (256 → 128 neurons)
  • ReLU activations
  • Dropout regularization (0.3 and 0.2)
  • Adam optimizer with weight decay
When to use:
  • When classes have non-linear decision boundaries
  • With sufficient training data (>50 samples per class)
  • When logistic regression underfits
Limitations:
  • Requires more data than linear models
  • Slower training than k-NN or logistic regression
  • Risk of overfitting on very small datasets

Modifying MLP Hyperparameters

Hidden Layer Size

Increase capacity for complex datasets:
class MLP(nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=512):  # Was 256
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),       # 1024 → 512
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim // 2),  # 512 → 256
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim // 2, num_classes), # 256 → N
        )
Larger networks require more training data. If you have <200 labeled samples, stick with hidden_dim=256 or smaller to avoid overfitting.

Dropout Rates

Reduce overfitting by increasing dropout:
nn.Dropout(0.5),  # Was 0.3 - more aggressive regularization
Or decrease for small datasets where model is underfitting:
nn.Dropout(0.1),  # Was 0.3 - less regularization

Learning Rate and Optimizer

Modify train_mlp function (train.py:48-51):
def train_mlp(X_train, y_train, X_val, y_val, num_classes, device, 
              epochs=100, lr=5e-4):  # Was 1e-3
    model = MLP(input_dim, num_classes).to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-3)  # Was 1e-4
Guidelines:
  • Lower learning rate (5e-4) for more stable training
  • Higher weight decay (1e-3) for stronger L2 regularization
  • More epochs (200) if training stops improving early

Batch Size

Change in train.py:64:
loader = DataLoader(train_ds, batch_size=64, shuffle=True)  # Was 32
  • Larger batches (64) → more stable gradients, faster training
  • Smaller batches (16) → more noise, better generalization (useful for small datasets)

Adding a Third Hidden Layer

For very complex classification tasks:
class DeepMLP(nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),           # 1024 → 256
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim),          # 256 → 256
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim // 2),     # 256 → 128
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim // 2, num_classes),    # 128 → N
        )

    def forward(self, x):
        return self.net(x)
Replace the MLP class in both train.py and predict.py with DeepMLP.
Deeper networks need significantly more data. Only use 3+ hidden layers if you have >500 labeled samples.

Custom Model: Attention-Based MLP

Add an attention mechanism to weight feature importance:
import torch
import torch.nn as nn
import torch.nn.functional as F

class AttentionMLP(nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=256):
        super().__init__()
        # Attention layer
        self.attention = nn.Sequential(
            nn.Linear(input_dim, input_dim),
            nn.Tanh(),
            nn.Linear(input_dim, input_dim),
            nn.Softmax(dim=1)
        )
        
        # Main network
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim // 2, num_classes),
        )

    def forward(self, x):
        # Compute attention weights
        attn_weights = self.attention(x)
        # Apply attention to input features
        x_attended = x * attn_weights
        # Pass through main network
        return self.net(x_attended)
This model learns which features (visual vs. audio) are most important for classification.

Integrating Custom Models

  1. Add model class to train.py
  2. Update training loop in main() function:
# After line 166 in train.py, add:
# 4. Custom Attention MLP
attn_model, attn_acc = train_custom_mlp(
    X_train, y_train, X_val, y_val, num_classes, device
)
attn_preds = attn_model(torch.FloatTensor(X_val).to(device)).argmax(dim=1).cpu().numpy()
results["attention_mlp"].append((attn_preds == y_val).mean())
all_preds["attention_mlp"][val_idx] = attn_preds
  1. Update prediction script (predict.py) to handle new model type
  2. Update model config to save model type metadata

Cross-Validation Strategy

The system uses Stratified K-Fold to ensure balanced folds (train.py:136):
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
This guarantees each fold has proportional class representation. For custom models, this happens automatically. Key parameters:
  • n_splits: Adjusted based on smallest class size (min 2, max 5)
  • shuffle=True: Randomizes data before splitting
  • random_state=42: Ensures reproducibility

Hyperparameter Tuning Example

Systematic grid search for best MLP configuration:
import itertools

# Define hyperparameter grid
hidden_dims = [128, 256, 512]
dropout_rates = [(0.2, 0.1), (0.3, 0.2), (0.4, 0.3)]
learning_rates = [1e-4, 5e-4, 1e-3]

best_acc = 0
best_config = None

for hidden_dim, (drop1, drop2), lr in itertools.product(
    hidden_dims, dropout_rates, learning_rates
):
    print(f"\nTesting: hidden={hidden_dim}, dropout=({drop1},{drop2}), lr={lr}")
    
    # Modify MLP class with current hyperparameters
    # (you'd need to pass these as arguments to MLP.__init__)
    
    # Run cross-validation
    cv_results = []
    for train_idx, val_idx in skf.split(X, y):
        X_train, X_val = X[train_idx], X[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]
        model, acc = train_mlp(X_train, y_train, X_val, y_val, 
                               num_classes, device, lr=lr)
        cv_results.append(acc)
    
    mean_acc = np.mean(cv_results)
    if mean_acc > best_acc:
        best_acc = mean_acc
        best_config = (hidden_dim, (drop1, drop2), lr)
    
    print(f"Mean CV accuracy: {mean_acc:.1%}")

print(f"\nBest config: {best_config} with {best_acc:.1%} accuracy")
Hyperparameter tuning requires many training runs. Each configuration multiplied by K folds can take 10-20 minutes on CPU. Consider using a GPU or reducing the search space.

Model Selection Insights

From train.py:176-179, the system automatically picks the best model:
mean_accs = {name: np.mean(accs) for name, accs in results.items()}
best_name = max(mean_accs, key=mean_accs.get)
print(f"\nBest model: {best_name} ({mean_accs[best_name]:.1%})")
Typical outcomes:
  • k-NN wins: Very small dataset (<50 samples) or highly clustered embeddings
  • Logistic Regression wins: Linearly separable classes, medium dataset (50-200 samples)
  • MLP wins: Complex boundaries, sufficient data (>200 samples), multimodal signals

Debugging Poor Performance

If all models perform poorly (<70% accuracy):
  1. Check feature quality: Visualize embeddings with t-SNE/UMAP
  2. Verify labels: Ensure folder assignments are consistent
  3. Increase data: Collect more labeled samples per class
  4. Adjust class weights: See Class Imbalance
  5. Try different architectures: Add/remove layers, change activations

Class Imbalance

Handle skewed class distributions

Active Learning

Efficiently collect training data