HeyDonto Data Transition Policy

This policy describes how HeyDonto securely and efficiently transitions dental practice data—whether from on-premises or cloud-based EHR systems—into our cloud infrastructure. It covers our event-driven approach using Kafka, best practices for encryption and access control, and how we observe and monitor the process using Kubernetes, Google Cloud Logging, and (optionally) Prometheus + Grafana for metrics/dashboards.

Note: This policy does not address any AI/knowledge-graph–related data usage. Please refer to our separate AI Data Privacy & Training Policy for details on how AI models interact with standardized data.

1. Data Flow Overview

1. Data Sources (On-Prem or Cloud-Based EHRs)

On-Prem: A secure HeyDonto synchronizer connects to SQL databases on the local network.
Cloud-Based: We connect via a secure API or data export endpoint provided by the remote EHR platform.
Transport Encryption (TLS) is required in both cases to ensure no plaintext data traverses untrusted networks.

2. Kafka Ingestion

The synchronizer or cloud connector converts extracted data into event messages and publishes them to Kafka topics in our cloud.
Kafka serves as a decoupled, real-time messaging layer for subsequent microservices.

3. Transformation & Routing

Our microservices (deployed on Kubernetes) subscribe to Kafka topics.
Data is validated, mapped to the FHIR standard (details in separate documentation), and prepared for storage.

4. Storage in GCP Healthcare API (FHIR)

Finalized data is stored in Google Cloud’s Healthcare API (FHIR datastore).
External applications or user-facing dashboards retrieve data as needed from this secure datastore.

2. Security & Compliance

2.1 Encryption in Transit

TLS/SSL for Kafka:
- All Kafka traffic (producers/consumers) is encrypted using TLS.
- Mutual TLS (mTLS) may be used to authenticate both client and broker in higher-security deployments.
Intra-Service Communication:
- Within our Kubernetes environment, microservices communicate via TLS where feasible.
- All calls to GCP APIs (including Healthcare API) are secured over HTTPS.

2.2 Authentication & Authorization

Role-Based Access Control (RBAC):
- Kubernetes uses RBAC to ensure microservices can only access the resources they need.
- Kafka ACLs further restrict which microservices (identified by service accounts) can produce or consume specific topics.
Least Privilege Principle:
- Each microservice / synchronizer is assigned minimal privileges to reduce the risk of lateral movement if compromised.

2.3 Network Segmentation & Isolation

Kubernetes Namespaces:
- We isolate production workloads and dev/test workloads into distinct namespaces and/or clusters.
Dedicated VPC:
- Our Kafka infrastructure resides in a dedicated virtual network. Traffic is restricted by firewall rules or private endpoints.
On-Prem Connectivity:
- For on-prem data sources, we may use VPN or mTLS over the public internet, depending on site capabilities.

2.4 Compliance Alignment

HIPAA:
- Since the data may contain PHI, HeyDonto enforces HIPAA-compliant encryption in transit and at rest, and signs BAAs as needed.
GDPR:
- If EU data is processed, relevant GDPR data protection measures (e.g., data subject rights, data minimization) are observed.

3. Kafka Topic Management & Data Retention

3.1 Topic Naming & Partitioning

Descriptive Topic Names:
- We use clear conventions (e.g., patient-appointments, record-updates) to identify data domains.
Tenant Partitioning:
- In multi-tenant scenarios, we may create separate partitions or distinct topics per dental office to streamline data isolation.

3.2 Retention & Purging

Minimal Retention Windows:
- Kafka typically retains messages for 24–72 hours—long enough for reprocessing if a consumer fails.
- Retaining PHI in Kafka for extended periods is avoided.
Automated Purging:
- Once data is confirmed in the FHIR datastore, older Kafka messages are automatically removed per retention policy.

4. Observability & Monitoring

4.1 Logging

Google Cloud Logging:
- All Kubernetes, Kafka, and microservice logs are collected and aggregated via GCP Logging.
- Standard logging levels (INFO, WARN, ERROR) are used to separate routine events from anomalies.

4.2 Metrics & Dashboards

Prometheus (Optional):
- If deployed, Prometheus scrapes metrics from microservices and Kafka exporters, storing them for real-time analysis.
Grafana Dashboards:
- Grafana can be installed in Kubernetes to visualize metrics (from Prometheus or GCP Monitoring).
- User-Facing Dashboards: Our React-based admin portal can embed or link to Grafana panels, allowing clinics or API owners to see high-level integration metrics (e.g., last sync time, record counts, error rates).

4.3 Alerts & Incident Response

Alerts:
- We configure thresholds for Kafka consumer lag, microservice errors, etc. in Prometheus (or GCP Monitoring). Alerts can route to PagerDuty, Slack, or email.
Incident Response Plan:
- In case of major outages or security events, we follow our standardized plan (escalation paths, forensic logging, timely notifications).

5. Operational Best Practices

5.1 Kubernetes Deployment

Immutable Deployments:
- We use container images built through CI/CD pipelines. Configuration is stored in version control, ensuring reproducibility.
Namespace & Secret Management:
- Sensitive credentials (e.g., Kafka SASL keys, TLS certs) are kept in Kubernetes Secrets with restricted RBAC access.

5.2 High Availability & Disaster Recovery

Redundant Kafka Instances:
- We run Kafka brokers across multiple Availability Zones for failover.
Backups:
- Kafka’s cluster config and Zookeeper states (if applicable) are backed up regularly.
Periodic Restoration Testing:
- We test the restore process to ensure data can be recovered in the event of a cluster-level failure.

5.3 Versioning & Updates

Kafka & Microservices:
- Updates are tested in staging before production rollout.
- Zero-downtime deployments via Kubernetes rolling updates or blue-green strategies.
Continuous Improvement:
- The Data Transition Policy is reviewed semi-annually (or upon major architecture changes) to reflect the latest security, compliance, and performance best practices.

6. Summary & Key Takeaways

Event-Driven & Scalable
- Kafka underpins our architecture, ensuring near-real-time data flow from on-prem or cloud-based EHRs into our cloud environment.
Security & Compliance First
- TLS, RBAC, minimal data retention, and alignment with HIPAA/GDPR are cornerstones of our strategy.
Kubernetes for Orchestration
- All microservices run on Kubernetes, with cloud-native practices (immutable deployments, secrets management) enhancing reliability.
Observability
- Google Cloud Logging provides centralized logs; Prometheus (optional) and Grafana deliver real-time dashboards and alerting.
- Clinics or external API owners can view sync metrics via embedded Grafana panels in our React admin portal.
Ongoing Review
- We regularly revisit this policy to adapt to new regulations, technologies, and user needs.

By following this Data Transition Policy, HeyDonto ensures that patient and appointment data is transferred securely, efficiently, and in a compliant manner—laying the groundwork for reliable synchronization and standardized FHIR storage, while preserving privacy and integrity at each step.