Turn your photo library into a searchable knowledge graph
Photos are more than pixels: they capture people, places, and relationships. Make those connections searchable and you unlock business intelligence for audits, marketing, HR, and compliance. By combining computer vision, a graph database, and generative AI you can build an intelligent photo search that understands context—not just tags—and answers natural-language queries like “show managers photographed with company cars” or “find images of Sarah at the product booth.”
Executive summary
A reference pattern using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock (Anthropic Claude 3.5 Sonnet) turns a photo collection into a relationship-aware, searchable knowledge graph. Images stored in Amazon S3 trigger a serverless pipeline (AWS Lambda) that extracts faces and labels, populates a Neptune graph of people/objects/relationships, generates contextual captions with an LLM, and indexes metadata in DynamoDB for fast retrieval. This delivers an AI image search that supports natural-language queries and multi-hop traversals while remaining practical for pilots and production with built-in security and governance controls.
Why a searchable photo library matters
- Faster audits and compliance: find images tied to policies, events, or people in seconds.
- Content reuse and monetization: marketing teams locate relevant assets faster and with contextual captions ready for repurposing.
- Operational efficiency: HR and legal find event photos, proof of attendance, or asset evidence without manual tagging.
- Better discovery: relationship-aware search surfaces multi-person and multi-object patterns that flat tags miss.
How it works—plain English
- Upload photos to S3. Each upload triggers serverless processing.
- Computer vision (Amazon Rekognition) detects faces, labels objects/scenes, and returns confidence scores.
- Lambda functions write people, objects, and co-occurrences into Amazon Neptune as nodes and edges.
- Bedrock (Anthropic Claude 3.5 Sonnet) generates contextual captions tuned for tone and compliance; captions and metadata go into DynamoDB for low-latency retrieval.
- APIs and a natural-language UI query the graph to return images by relationships, not just keywords.
The system turns photo collections into a knowledge graph of people, objects, and moments so search becomes semantic and relationship-aware.
Architecture overview (components and roles)
- Amazon S3: stores images and triggers processing events.
- AWS Lambda: serverless orchestration for image processing, graph ingestion, captioning orchestration, and search handlers.
- Amazon Rekognition: face detection/recognition and object/scene labeling (the raw visual signals).
- Amazon Neptune: graph database that models people, objects, and relationships for multi-hop queries.
- Amazon Bedrock (Anthropic Claude 3.5 Sonnet): generates relationship-aware captions and styled text for searchability and downstream workflows.
- Amazon DynamoDB: stores searchable metadata and caption indices for fast lookup.
- Amazon API Gateway & Amazon Cognito: expose APIs and handle authentication.
- AWS KMS, CloudTrail, GuardDuty: provide encryption, auditing, and threat detection.
Reference implementation
The open-source repo (aws-samples/sample-serverless-image-captioning-neptune) contains an AWS CDK deployment, Lambda handlers (image_processor.py, search_handler.py, relationships_handler_neptune.py, label_relationships.py, face_indexer.py), and the JSON configuration that drives graph instantiation. The configuration-driven approach means people and relationships are defined in JSON and can be adapted without code changes.
Example queries and sample code
Sample natural-language queries this pattern supports:
- “Show images of Sarah with the product booth from the last 12 months.”
- “Find managers photographed with company cars.”
- “List photos that include more than three people and a contract.”
Sample Gremlin (Neptune) query for “find images where person ‘Sarah’ appears with any node labeled ‘CompanyCar’”:
g.V().has('person','name','Sarah').as('p').
out('appears_in').as('img').
in_('appears_in').has('object','type','CompanyCar').path()
Example Bedrock prompt (for relationship-aware captions):
You are a concise caption generator. Given detected people and objects, produce a short, factual caption (20-40 words). Input: - People: ["Sarah", "John"] - Objects: ["product booth", "brochure"] - Context: "trade show, daytime" Output: "Sarah and John at the product booth holding the new brochure during a daytime trade show."
Example JSON snippet for configuration-driven people/relationships:
{
"people": [
{"id": "sarah", "name": "Sarah", "role": "product_manager"},
{"id": "john", "name": "John", "role": "sales"}
],
"relationships": [
{"from": "sarah", "to": "john", "type": "works_with"}
]
}
Security, privacy, and governance checklist
- Consent capture: explicit opt-in flows for people detected in photos; store consent metadata in DynamoDB.
- Data minimization: store only derived metadata and hashed face embeddings when possible (avoid raw faceprints unless necessary).
- Anonymization: apply one-way hashing to face templates or keep face match indices in a separate encrypted store.
- Access controls: least-privilege IAM roles, Cognito-authenticated APIs, and role-based access to Neptune/DynamoDB.
- Encryption & audit: KMS-managed AES-256 encryption at rest, TLS in transit, CloudTrail for immutable logs, GuardDuty for threat monitoring.
- Retention & deletion: retention policies and automated deletion workflows to meet GDPR/CCPA requests.
- Human-in-the-loop verification: confidence thresholds that route lower-confidence matches to manual review before committing to the graph.
Cost & scalability (practical estimates)
Estimates below are illustrative and region-dependent. Label them as pilot-level guidance rather than hard promises.
- Per-image processing (estimate): Rekognition face + labels ≈ $0.006–0.01; LLM captioning (short output) via Bedrock ≈ $0.01–0.02; Lambda orchestration negligible at small scale. Rough processing cost for 1,000 images: $15–25 (includes API calls and compute).
- Storage: S3 object storage typically < $1 per 1,000 images depending on resolution and retention.
- Neptune baseline: a small cluster baseline ≈ $100–150/month; scales up based on throughput and replication.
- Scaling patterns: use serverless Lambdas with retry and rate-limit handling; batch captioning and caching reduce LLM spend; Neptune requires indexing and query design for sub-second multi-hop traversals at scale.
Risks, trade-offs, and mitigations
- Privacy & legal: face recognition can trigger jurisdictional restrictions—implement pre-launch legal review, consent, and opt-out mechanisms.
- Bias in CV models: measure precision/recall across demographic slices and retrain or filter as needed; use manual review for sensitive cases.
- LLM hallucinations: avoid trusting raw captions for critical decisions—use confidence thresholds, human verification, and keep original CV outputs as the authoritative source.
- Vendor lock-in: the reference uses AWS-managed services; alternatives exist (see below) and a modular design reduces coupling (abstract LLM & CV layers).
- Cost drift: LLM usage can dominate spend—control via caching, batch jobs, and selective captioning policies.
Pilot plan: quick path from idea to MVP
- Dataset: start with 1k–5k representative images (include event photos, portraits, and product shots).
- Duration: two 2-week sprints: Sprint 1—ingest and CV pipeline + Neptune schema; Sprint 2—captioning, UI, and evaluation.
- Success metrics: face detection precision/recall, label accuracy, caption hallucination rate (<5% for MVP), search success rate (user satisfaction), Neptune query latency (target <500ms for common queries).
- Governance: implement consent capture, retention rules, and an escalation path for false positives.
- Go/No-Go criteria: acceptable accuracy on targeted searches, cost per search within budget, and legal sign-off for face recognition use.
Alternatives & when to use them
- Computer vision: Google Vision or Azure Face API if your organization prefers those clouds or regional availability differs.
- Graph databases: Neo4j, TigerGraph, or JanusGraph for on-premise or specialized graph features; Neptune is attractive for managed AWS integration.
- Captioning / LLMs: open-source captioners like BLIP or local LLMs if data residency is required; Bedrock offers managed models and simpler ops.
- Semantic search: combine graph queries with vector DBs (Milvus, Pinecone) to support hybrid graph+embedding search.
Quick reference: repo & deployment
Reference implementation and deployment artifacts live on GitHub: aws-samples/sample-serverless-image-captioning-neptune. The repo includes an AWS CDK stack, sample Lambda handlers, and JSON configuration for people and relationships.
Key questions leaders and engineers should answer
-
How will you manage consent and compliance for face recognition?
Implement explicit consent capture, retention policies, region-specific controls, and anonymization or opt-in-only flows for face recognition. Legal sign-off is mandatory before production deployment.
-
What guardrails prevent LLM hallucinations or misattribution?
Use confidence thresholds, human-in-the-loop verification for low-confidence outputs, caption versioning, and audit logs to track and correct hallucinations.
-
Can the system be tuned to control cost?
Yes. Reduce real-time LLM calls by caching captions, batch captioning, selective caption policies, or limiting LLM use to high-value assets.
Final notes and next moves
The pattern—computer vision + graph database + LLMs—provides a pragmatic path to convert visual assets into searchable knowledge. It isn’t magic; it’s a repeatable architecture that makes relationships explicit, serves natural-language queries, and surfaces business value fast. Start with a focused pilot (1k–5k images), validate accuracy and governance, and iterate toward broader production deployment. For teams that need lower LLM cost or stricter data residency, swap the captioning layer for batch or on-prem alternatives and keep the graph-first approach.
Reference implementation: aws-samples/sample-serverless-image-captioning-neptune. For a privacy checklist, a cost-optimized variant, or sample Gremlin/Cypher queries tailored to your use case, prepare a pilot dataset and expected search patterns—those inputs make the second sprint much faster.
Integrating graph database capabilities with AI services enables natural language photo search that understands context and relationships—not just tags.