OBJECTIVE: To determine the sensitivity and positive predictive value (PPV) of computerized diagnostic data from health maintenance organizations (HMOs) in identifying incident breast cancer cases. STUDY DESIGN: An HMO without a cancer registry developed an algorithm identifying incident breast cancer cases using computerized diagnostic codes. Two other HMO sites with Surveillance, Epidemiology, and End Results (SEER) registries duplicated this case-identification approach. Using the SEER registries as the criterion standard, we determined the sensitivity and PPV of the computerized data. METHODS: Data were collected from HMO computerized data-bases between January 1, 1996, and December 31, 1999. Surveillance, Epidemiology, and End Results data were also used. RESULTS: The overall sensitivity of the HMO databases was between 0.92 (95% confidence interval [CI], 0.91-0.96) and 0.99 (95% CI, 0.98-0.99). Sensitivity was high (range, 0.94-0.98), for the first 3 (of 4) years, dropping slightly (range, 0.81-0.94) in the last year. The overall PPV ranged from 0.34 (95% CI, 0.32-0.35) to 0.44 (95% CI, 0.42-0.46). Positive predictive value rose sharply (range, 0.18-0.20) after the first year to 0.83 and 0.92 in the last year because prevalent cases were excluded. Review of a random sample of 50 cases identified in the computerized data-bases but not by SEER data indicated that, while SEER usually identified the cases, the registry did not associate every case with the health plan. CONCLUSIONS: Health maintenance organization computerized databases were highly sensitive for identifying incident breast cancer cases, but PPV was low in the initial year because the systems did not differentiate between prevalent and incident cases. Health maintenance organizations depending solely on SEER data for cancer case identification will miss a small percentage of cases.