Trying to learn about ADX/DXF, quick questions...

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.

  2. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.

  3. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

···

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hi Paul,

Good questions. I think I can shed a bit more light. Your link works for me, by the way.

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

···

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

···

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.

  2. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.

  3. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

···

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

···

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:
  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Hope this helps?

-Paul

···

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Thanks for this, Paul.

···

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hi All,

Thank you for starting the discussion, it has been very informative to me as well.

I like Bob’s idea regarding as it allows the message to be understandable and distinctive. But, it has the drawback of having many @XMLAttributes as Paul mentioned.

On the other side Paul’s idea regarding

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

is also a great option as it would not be very restrictive of having the @XMLAttributes as it include only what is required. But, again Bob’s idea about it not being distinctive comes into play.

What about the following idea that combines both and can be generic and distinctive. I am not sure about how acceptable the format is but, it might be worth your consideration.

   <dataElement>HIV positive on ARVs</dataElement>

   <disaggregates>

         <disaggregate type="AGE" value="0-5"/>

         <disaggregate type="SEX" value="male"/>

   </disaggregates>

   <orgUnit>Bob's Clinic</orgUnit>

   <timePeriod>201403</timePeriod>

   <value>33</value>

in this case the disaggregates can be a list of disaggregate objects(having type and value as @XMLAttributes) in the dataValue object. The disaggregate objects can be added in the list as and when required

–Regards,

Sri Maurya Kummamuru

···

On Fri, Jan 16, 2015 at 12:46 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks for this, Paul.

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

If having the two types of XML is not a good practice it can also be viewed as:

   <dataElement>HIV positive on ARVs</dataElement>

   <disaggregates>

         <disaggregate>

              <categoryType>AGE</categoryType>

              <optionValue>0-5</optionValue>

         </disaggregate>

         <disaggregate>

              <categoryType>SEX</categoryType>

              <optionValue>male</optionValue>

         </disaggregate>

   </disaggregates>

   <orgUnit>Bob's Clinic</orgUnit>

   <timePeriod>201403</timePeriod>

   <value>33</value>
···

On Fri, Jan 16, 2015 at 3:45 PM, srimaurya kummamuru kmit.maurya@gmail.com wrote:

Hi All,

Thank you for starting the discussion, it has been very informative to me as well.

I like Bob’s idea regarding as it allows the message to be understandable and distinctive. But, it has the drawback of having many @XMLAttributes as Paul mentioned.

On the other side Paul’s idea regarding

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

is also a great option as it would not be very restrictive of having the @XMLAttributes as it include only what is required. But, again Bob’s idea about it not being distinctive comes into play.

What about the following idea that combines both and can be generic and distinctive. I am not sure about how acceptable the format is but, it might be worth your consideration.

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate type="AGE" value="0-5"/>
         <disaggregate type="SEX" value="male"/>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

in this case the disaggregates can be a list of disaggregate objects(having type and value as @XMLAttributes) in the dataValue object. The disaggregate objects can be added in the list as and when required

–Regards,

Sri Maurya Kummamuru

On Fri, Jan 16, 2015 at 12:46 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks for this, Paul.

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

Hi Sri

···

On 16 January 2015 at 20:45, srimaurya kummamuru kmit.maurya@gmail.com wrote:

Hi All,

Thank you for starting the discussion, it has been very informative to me as well.

I like Bob’s idea regarding as it allows the message to be understandable and distinctive. But, it has the drawback of having many @XMLAttributes as Paul mentioned.

On the other side Paul’s idea regarding

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

is also a great option as it would not be very restrictive of having the @XMLAttributes as it include only what is required. But, again Bob’s idea about it not being distinctive comes into play.

What about the following idea that combines both and can be generic and distinctive. I am not sure about how acceptable the format is but, it might be worth your consideration.

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate type="AGE" value="0-5"/>
         <disaggregate type="SEX" value="male"/>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

in this case the disaggregates can be a list of disaggregate objects(having type and value as @XMLAttributes) in the dataValue object. The disaggregate objects can be added in the list as and when required

That is not a bad approach though it becomes a bit of a verbose way of saying effectively the same thing as my earlier example, Its certainly more friendly to those who want to use annotations.

Though before we get too carried away its worth noting that IHE at least has no stomach to actually create a new standard. So we have to find a way to profile something existing. SDMX (with all its many warts) comes close to what we need.

Regards

Bob

–Regards,

Sri Maurya Kummamuru

On Fri, Jan 16, 2015 at 12:46 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks for this, Paul.

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

No your first example is better. But no need to be too afraid to use attributes. So for example:

is perhaps a neater representation of what you are after. With the added advantage that the dataValue then becomes quite close to a valid SDMX data row, with the disaggregates being optional additional content. Taking a small step further:

could also work as an alternative compact representation. Put in words (rather than schematron rules):

  1. a datavalue has mandatory dataelement, orgunit, period and value attributes

  2. a datavalue may have additional content of a element

  3. a element must have either:

3.1 a combo attribute; or

3.2 a sequence of 1 or more elements

Or another even more concise variation:

  1. a datavalue has mandatory dataelement, orgunit, period and value attributes

  2. a datavalue may have either a disaggregations attribute or additional content of a element. But not both,

  3. a element must have a sequence of 1 or more elements

Which caters for the current reality as well as offering a route to a more saner representation.

Food for thought … but its Friday night and I am going to try real hard not to think about this any more till Monday :slight_smile: I think I still prefer my original variant.

Bob

···

On 16 January 2015 at 20:56, srimaurya kummamuru kmit.maurya@gmail.com wrote:

If having the two types of XML is not a good practice it can also be viewed as:

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate>
              <categoryType>AGE</categoryType>
              <optionValue>0-5</optionValue>
         </disaggregate>
         <disaggregate>
              <categoryType>SEX</categoryType>
              <optionValue>male</optionValue>
         </disaggregate>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

On Fri, Jan 16, 2015 at 3:45 PM, srimaurya kummamuru kmit.maurya@gmail.com wrote:

Hi All,

Thank you for starting the discussion, it has been very informative to me as well.

I like Bob’s idea regarding as it allows the message to be understandable and distinctive. But, it has the drawback of having many @XMLAttributes as Paul mentioned.

On the other side Paul’s idea regarding

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

is also a great option as it would not be very restrictive of having the @XMLAttributes as it include only what is required. But, again Bob’s idea about it not being distinctive comes into play.

What about the following idea that combines both and can be generic and distinctive. I am not sure about how acceptable the format is but, it might be worth your consideration.

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate type="AGE" value="0-5"/>
         <disaggregate type="SEX" value="male"/>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

in this case the disaggregates can be a list of disaggregate objects(having type and value as @XMLAttributes) in the dataValue object. The disaggregate objects can be added in the list as and when required

–Regards,

Sri Maurya Kummamuru


Regards,
Sri Maurya Kummamuru

On Fri, Jan 16, 2015 at 12:46 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks for this, Paul.

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Both the options appear good to me.

The only point that bothers me (originally pointed out by Paul) is excepting the disaggregation part the whole dataValue is readable. I agree having a ‘ghjg5h43g5’ is unique and cleaner for the machine. But, it would require for a human to have the reference dictionary opened to find out what that would refer to. I don’t know if it is possible but is there a way of maintaining the disaggregations in a csv type format or any other human readable format and having the machine do the referencing based on the parameters in the disaggregations and figuring out the UID specific to that set.

That is:

instead of

The message would look something like (I agree it looks a lot similar to json message in XML but, this is just a prototype for the thought subject to change)

or

both referring to the same UID based on machine look up to ghjg5h43g5

This might need some work on both ends and I know there might be many other “good to haves”, but I just wanted to float the idea to see if you see it as worth the effort.

···

On Fri, Jan 16, 2015 at 4:21 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

No your first example is better. But no need to be too afraid to use attributes. So for example:

is perhaps a neater representation of what you are after. With the added advantage that the dataValue then becomes quite close to a valid SDMX data row, with the disaggregates being optional additional content. Taking a small step further:

could also work as an alternative compact representation. Put in words (rather than schematron rules):

  1. a datavalue has mandatory dataelement, orgunit, period and value attributes
  1. a datavalue may have additional content of a element
  1. a element must have either:

3.1 a combo attribute; or

3.2 a sequence of 1 or more elements

Or another even more concise variation:

  1. a datavalue has mandatory dataelement, orgunit, period and value attributes
  1. a datavalue may have either a disaggregations attribute or additional content of a element. But not both,
  1. a element must have a sequence of 1 or more elements

Which caters for the current reality as well as offering a route to a more saner representation.

Food for thought … but its Friday night and I am going to try real hard not to think about this any more till Monday :slight_smile: I think I still prefer my original variant.

Bob

On 16 January 2015 at 20:56, srimaurya kummamuru kmit.maurya@gmail.com wrote:

If having the two types of XML is not a good practice it can also be viewed as:

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate>
              <categoryType>AGE</categoryType>
              <optionValue>0-5</optionValue>
         </disaggregate>
         <disaggregate>
              <categoryType>SEX</categoryType>
              <optionValue>male</optionValue>
         </disaggregate>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

On Fri, Jan 16, 2015 at 3:45 PM, srimaurya kummamuru kmit.maurya@gmail.com wrote:

Hi All,

Thank you for starting the discussion, it has been very informative to me as well.

I like Bob’s idea regarding as it allows the message to be understandable and distinctive. But, it has the drawback of having many @XMLAttributes as Paul mentioned.

On the other side Paul’s idea regarding

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

is also a great option as it would not be very restrictive of having the @XMLAttributes as it include only what is required. But, again Bob’s idea about it not being distinctive comes into play.

What about the following idea that combines both and can be generic and distinctive. I am not sure about how acceptable the format is but, it might be worth your consideration.

   <dataElement>HIV positive on ARVs</dataElement>
   <disaggregates>
         <disaggregate type="AGE" value="0-5"/>
         <disaggregate type="SEX" value="male"/>
   </disaggregates>
   <orgUnit>Bob's Clinic</orgUnit>
   <timePeriod>201403</timePeriod>
   <value>33</value>

in this case the disaggregates can be a list of disaggregate objects(having type and value as @XMLAttributes) in the dataValue object. The disaggregate objects can be added in the list as and when required

–Regards,

Sri Maurya Kummamuru


Regards,
Sri Maurya Kummamuru

On Fri, Jan 16, 2015 at 12:46 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks for this, Paul.

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru

On 16 January 2015 at 16:25, Paul Biondich pbiondic@regenstrief.org wrote:

One other comment: I’m not sure you want to go down the path of representing disaggregates uniquely as XML attributes:

In your example, SEX and AGE.

Given the large number of potential disaggregates, that makes for a messy specification, where you’re baking content into the spec.

Not sure if i completely agree. The requirement of typical facility level monthly report is messy anyway. I’d rather try to be explicit about it than trying and kick the validation problem down the road.

Note also that strict W3C XSD or ISO RelaxNG schema are only one (imperfect) species of validation mechanism. It is quite possible to have a relatively permissive schema and to impose your validation constraints over and above that . Eg using schematron constraints and/or application level validation.

Or, to follow the sdmx route, we profile a meta schema which dictates that there are 3 mandatory concepts (dataelement, period, orgunit) and provides a mechanism for specifying additional ones - rather than baking them in. And in practice there are probably not as many of these common dimensions as you might think. The ADX spec could specify the well known ones like SEX, AGE_GROUP, ICD10_CODE etc etc as non-exhaustive list. Without being prescriptive about the underlying codelists.

Thinking through your technical conundrum however (and talking with Burke), you could represent model 3 in three ways:

  1. “tokenize” each of the disaggregates by using a fairly typical space as the delimiter between each:

Your first option, tokenizing, is something I’ve considered before and is certainly doable. It maps well against the dhis2 data model and is also compliant with the letter (if not quite the spirit) of sdmx. A Content Creator would need to know the codes for ‘Male’ and ‘0-5’ but is not, and should not, be required to know how the Consumer codes this combination as in its data store.

It is however poor XML to require parsing of the content in this way in order to process the document. For example it would make an xpath/xquery expression to return all the dataelements referring to Males in the set more difficult to write than it should be.

There is also an implicit assumption here (and in your later suggestions) that you can deduce the concept from the instance. For example “0-5” being an age category. This in turn implies a constraint that all options across all codelists will be unique which is not a reasonable constraint. “0-5” could also be the correct answer to “how many beers do you have on a Friday night” :slight_smile: Within dhis2 currently this doesn’t present a problem, but it would be a problem if one were importing these from elsewhere.

  1. instead of calling out each disaggregate as a XML attribute, genericize them to disaggregate0-n:
  1. have a more generic model (which I’m not sure whether it’s allowed in SDMX:

HIV positive on ARVs

0-5

male

Bob’s Clinic

33

#1 you could do quickly for first iteration, and perhaps consider #3 if the spec will allow?

Spec doesn’t allow 3. For I think the same good reason I discussed above with ‘0-5’ beers.

As you see, there is still some thinking to be done here, but its really very helpful to get all these suggestions on the table to pull apart and play off against each other.

Thanks

Bob

Hope this helps?

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 11:01 AM, Paul Biondich pbiondic@regenstrief.org wrote:

Very, very clear Bob. Thank you.

#3 seems good to me as well, as you’re simply moving to a wholly “post-coordinated” model, instead of a partial strategy. Easier to comprehend.

So instead of:

“0-5, male” + “HIV patients on ARVs”

you do:

“0-5” + “male” + “HIV patients on ARVs”

So, in essence… the new model is: data element + 1-to-n disaggregates. Each unique row is the combination of at least 1 data element + at least one desegregate (which can be ALL or default).

Then, validation of the codes would be done at the singleton disaggregate level as well as validation of each of the data elements.

Do I have this right?

-Paul

On Fri, Jan 16, 2015 at 10:49 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

Good questions - which have been circulating around for a LONG time. Here’s my opinion, and its only my opinion at this stage. There are 3 possibilities:

  1. Retain the categoryoptioncombo attribute (but call it something more sensible like ‘disaggregation’). That’s the least painful in terms of supporting current implementations. The downside is that it is an ugly and fragile lattice to move between systems. For example, while its reasonable for 3rd party systems to have and maintain a shared understanding of codes like SEX={Male, Female} AGE_GROUP={under5, 5 and over} or what have you, dealing with their combinations is another matter.
  1. Collapse into atomic dataelements as you have suggested. This is not actually a bad approach and is the way it was historically with the dhisv1 model. Unfortunately there is already too much dhis2 legacy buiilt on top of the existing dimensional dataelements for this to have much traction.
  1. Explode the disaggregation into its constituent parts for external consumption. The McDonalds combos are then hidden and become an internal detail of dhis2. So for example we would have datavalues like:

instread of

Note I have just stuck in labels but these could be coded values. I am very much in favour of the last of the 3 approaches. There is a downside that whereas 1 or 2 are easily expressable in a standard like SDMX, (3) would require us to take some liberties as the data will be ragged in nature. But this would be the easiest to implement and map to 3rd party systems which output dimensional data (for example an openmrs cohort report). Also easy to link with 3rd party vocabulary providers - terminology services and the like.

There is also the possibility of an incremental path. Starting with something based on 1 and moving towards 3.

Regards

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 15:26, Paul Biondich pbiondic@regenstrief.org wrote:

Hi Bob and Jim, thanks a lot for the detailed feedback.

Jim, your descriptions (and corrections) validate my understanding on category option combos, so thanks a lot.

Before I go to much further in asking questions, I want to verify something really quickly Bob. You said:

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

I could imagine you going a couple of ways with the first iteration. You could choose to “pre-coodinate” the data element and categoryoptioncombo into a single “indicator” or something of that ilk: so instead of “HIV positive and on antiretrovirals” + “0 - 5, male”, you could have “# of patients 0-5 & male who are HIV positive and on antiretrovirals” as a single descriptor.

You could also do some form of a post-coordination (ANDing a data element with some form of description of disaggregates different than category option combo).

Which direction are you choosing to go down for iteration 1? What do you imagine the ultimate specification to look like in this regard?

I’m asking these questions, as I’m trying to prepare my mind for how we’d validate incoming messages at scale. Thanks in advance for the education. :slight_smile:

-Paul

Knowing the “category combination” for each data element can help explain why the category option combinations are the way they are:

HTC_TST (N, DSD, Age) has category combination “Age (0-50+,10)” (which is a shorthand to say ages from 0 to 50+ split into 10 ranges.) FN_THER (N, DSD, Age/Sex) should have a combination of age and sex as you point out, but it doesn’t. I’ll report this as a problem. What it does have is a combination of “Age (0-18+, 6)” which is a different set of age ranges and therefore generates a different set of “combinations”. The actual age value (category option) in both cases is “(1-4)”, and this in fact is represented internally by a single object. But for each distinct category combination, a different set of “combination” values are generated.

FN_ASSESS (N, DSD, Age/Sex) has a category combination of “Age (0-18+, 6)” (another apparent misconfiguration that I will report.) And FN_ASSESS (N, DSD, Age) has the same category combination of “Age (0-18+, 6)”. So they will have the same category combination codes.

A DHIS 2 Web API client could navigate all this is as follows: For each data element, find the category combo. From the category combo, find the enumerated category option combos. For each option combo, find all the options in that combo. For the disaggregations you have, find the right category option combo, and use that combo to transmit the data.

The important semantic things are the category options like “(1-4)”. When you do analysis in DHIS 2, for example, that is what you see. The category option combos are more of an artifact of how several of these options are specified together, both for internal storage and for data transfer. It’s a part of the design that I’ve heard is confusing to many, and causes some maintenance problems. There are some discussions about whether it could be improved – Bob and I talk about this with some regularity for example. But I hope this email helps to clarify how it currently works.

Cheers,

Jim

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On Fri, Jan 16, 2015 at 4:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Paul

I am not very familiar with the particular data and the link you presented is no longer working. So I can only attempt to answer some of your questions:

  1. There is no hard coded set of categoryoptioncombos, nor a global central authority. So PEPFAR would create and govern these within their “universe”.
  1. A dataelement is associated with a categorycombo. Like (Age) or (Age,Sex). So two dataelements with different categorycombos shouldn’t share the same categoryoptioncombo uids. So I would draw the same conclusion as you for the first example - there is some mistake in the second categoryoptioncombo name.
  1. The second example you give is mysterious to me. There seems something wrong with this metadata,

I am not greatly in favour of trying to percolate this categoryoptioncombo model (what I call McDonalds) through to ADX. Rather we will probably leave disaggregation out of scope for first iteration.

Bob

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

On 16 January 2015 at 03:40, Paul Biondich pbiondic@regenstrief.org wrote:

Have spent a few hours this week looking through what’s on the web about DXF, and congratulations to you all, as lots can be inferred from what’s there.

I’m spending time to look in particular at PEPFAR’s DATIM web presence.

Please correct anything I write here, just trying to learn how the message structure works.

It looks as if each unique “row” within a message is a combination of a “data element” and a “category option combo”.

“category option combos” are one or more pre-coordinated disaggregates. For example, “male” is a valid answer, and “males, 0-5” is also a valid answer. Each one of these combos are uniquely identified with a UID.

Who is the central authority for a given set of category option combo “options” for a given report. For example, the MER report has a number of potential disaggregate options… is PEPFAR the central authority, or is “male” for all times a unique category option combo for all reports generated in DXF/ADX format?

Perhaps the detail is missing, but when I look at this page: https://www.datim.org/api/sqlViews/GN3OuQmFfd8/data.html+css

…what I see are a collection of data element / category option combo pairs, one seemingly for each indicator (which would match a cell in a report back table.

But then I see cases where a category option combo looks to be the same (like 0 - 4), but it has more than one UID represented within the page:

HTC_TST (N, DSD, Age): HTC received results
HTC_TST_N_DSD_Age
(1-4)
MTqU4a2fG8c
FN_THER (N, DSD, Age/Sex): Undernourished PLHIV fed
FN_THER_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe

Is the second one just a mistake (as the named disaggregates are age and sex, but the categoryoptioncombo_name doesn’t reflect both.

Then I see things like:

FN_ASSESS (N, DSD, Age/Sex): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age_Sex
(1-4)
Bz0GWknldhe
FN_ASSESS (N, DSD, Age): PLHIV nutrionally assessed
FN_ASSESS_N_DSD_Age
(1-4)
Bz0GWknldhe

…and I note different named disaggregates share the same UID.

Is this just an error in the coding of the indicators on the page, or am I fundamentally misunderstanding how category option combos are being given UIDs?

Thanks,

-Paul

You received this message because you are subscribed to the Google Groups “Open HMIS” group.

To unsubscribe from this group and stop receiving emails from it, send an email to open-hmis+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Regards,
Sri Maurya Kummamuru