Showing posts with label business. Show all posts
Showing posts with label business. Show all posts

Wednesday, March 7, 2012

Math is driving me nuts

The business rule is, the sales manager is commissioned on the avg. number
of appointments set up per salesrep per day during the month.

I have 2 tables: The UserLog table records only 1 entry per day per user
(salesrep). This will log how many salesreps worked a particular day. The
second table logs any appointments set up.

UserLog: ID, UserName, EnteredTime
Appointment: ApptID, EnteredTime, ApptDate

I figured that, for a given date ranged, I could
1. sum the number of appointments
2. sum the number of days worked
2. sum the salesreps / number of days = avg number of salesreps per day
3. number of appointments / avg number of salesreps per day = avg number
of appointments per sales rep

But this logic is flawed. If I average out every day and then take an
average of this daily average, I get a different result. Any ideas on how
to best solve this problem?

Thanks.Sounds weird; the more sales reps there are,
the less the manager makes for the same amount of work
(setting up appointments). I guess this statistic
is rational, if your prime concern is penalizing the
manager for setting up appointments for reps who
never show up for work :-)

Ignoring that ...
Your first algorithm is the one that's off.
You need to calculate appointments per day,
and reps (working) per day, to get appts per rep per day.
The average comes after, and you don't need to
count the number of days at all.

For simplicity, I'm going to write "EnteredDate"
instead "convert(varchar,EnteredTime,111)"
-- i.e. the yyyymmdd part of the time.

select avg(AD.Appts / RD.Reps)
from (select EnteredDate,count(*) Appts
from Appointment group by EnteredDate) AD
join (select EnteredDate,count(*) Reps
from UserLog group by EnteredDate) RD
on RD.EnteredDate = AD.EnteredDate
where RD.EnteredDate between ...

Hope that helps.

"J" <sorrynospam@.nobody.com> wrote in message
news:40fb37ea$0$74187$39cecf19@.news.twtelecom.net. ..
> The business rule is, the sales manager is commissioned on the avg. number
> of appointments set up per salesrep per day during the month.
> I have 2 tables: The UserLog table records only 1 entry per day per user
> (salesrep). This will log how many salesreps worked a particular day.
The
> second table logs any appointments set up.
> UserLog: ID, UserName, EnteredTime
> Appointment: ApptID, EnteredTime, ApptDate
> I figured that, for a given date ranged, I could
> 1. sum the number of appointments
> 2. sum the number of days worked
> 2. sum the salesreps / number of days = avg number of salesreps per
day
> 3. number of appointments / avg number of salesreps per day = avg
number
> of appointments per sales rep
> But this logic is flawed. If I average out every day and then take an
> average of this daily average, I get a different result. Any ideas on how
> to best solve this problem?
> Thanks.
>|||The "average number of appointments per salesrep per day" could mean two things. It
could mean the "average number of appointments per sales rep for the month", divided
by the number of days, or it could be the average of (the daily average appointments/sales rep).

Here's an example of where these are different.

Case 1: 100 reps working each day, and 100 appt each day.
Either calculation gives an average of 1 appt/rep for the month

Case 2: 1 rep worked alone during half the month, and had 10000 appointment,
and 99 reps worked together the other half, but had 0 appointments.

Now it's either 10000 appointments total / daily avg of 50 reps ,
= 200 appts/rep
or
15 days where the appt/ref value was 10000/15 and 15 days where the
appt/ref value was zero, which averages to [15*(10000/15) + 15*0]/30,
or 10000/30 = 333 on average. This method is influenced by whether
slackers worked only on dead days or only on busy days, even if they
did nothing either way.

You probably want one or the other of these, but they measure different things.

Steve Kass
Drew University

J wrote:
> The business rule is, the sales manager is commissioned on the avg. number
> of appointments set up per salesrep per day during the month.
> I have 2 tables: The UserLog table records only 1 entry per day per user
> (salesrep). This will log how many salesreps worked a particular day. The
> second table logs any appointments set up.
> UserLog: ID, UserName, EnteredTime
> Appointment: ApptID, EnteredTime, ApptDate
> I figured that, for a given date ranged, I could
> 1. sum the number of appointments
> 2. sum the number of days worked
> 2. sum the salesreps / number of days = avg number of salesreps per day
> 3. number of appointments / avg number of salesreps per day = avg number
> of appointments per sales rep
> But this logic is flawed. If I average out every day and then take an
> average of this daily average, I get a different result. Any ideas on how
> to best solve this problem?
> Thanks.
>
>|||"Steve Kass" <skass@.drew.edu> wrote in message
news:mwIKc.7057$mL5.2991@.newsread1.news.pas.earthl ink.net...
> The "average number of appointments per salesrep per day" could mean two
things. It
> could mean the "average number of appointments per sales rep for the
month", divided
> by the number of days, or it could be the average of (the daily average
appointments/sales rep).

And in simpler terms, why you're getting different results
is because A/B + C/D is not (A + C)/(B + D).

On a different level: what the heck are these two things measuring?

The number of appointments per month is a clear-cut measure of
how hard the manager is working. Perhaps dividing by the number
of available salesreps is supposed to eve things out across months
where half the reps are on vacation or sick?

Well, if you work out the appointments per rep on a daily
basis, you are getting a better picture (to the limit of accuracy of your
data) on what the manager is doing.

> Here's an example of where these are different.
> Case 1: 100 reps working each day, and 100 appt each day.
> Either calculation gives an average of 1 appt/rep for the month
> Case 2: 1 rep worked alone during half the month, and had 10000
appointment,
> and 99 reps worked together the other half, but had 0
appointments.
> Now it's either 10000 appointments total / daily avg of 50 reps
,
> = 200 appts/rep
> or
> 15 days where the appt/ref value was 10000/15 and 15 days where
the
> appt/ref value was zero, which averages to [15*(10000/15) +
15*0]/30,
> or 10000/30 = 333 on average. This method is influenced by
whether
> slackers worked only on dead days or only on busy days, even if
they
> did nothing either way.
> You probably want one or the other of these, but they measure different
things.
> Steve Kass
> Drew University
> J wrote:
> > The business rule is, the sales manager is commissioned on the avg.
number
> > of appointments set up per salesrep per day during the month.
> > I have 2 tables: The UserLog table records only 1 entry per day per
user
> > (salesrep). This will log how many salesreps worked a particular day.
The
> > second table logs any appointments set up.
> > UserLog: ID, UserName, EnteredTime
> > Appointment: ApptID, EnteredTime, ApptDate
> > I figured that, for a given date ranged, I could
> > 1. sum the number of appointments
> > 2. sum the number of days worked
> > 2. sum the salesreps / number of days = avg number of salesreps per
day
> > 3. number of appointments / avg number of salesreps per day = avg
number
> > of appointments per sales rep
> > But this logic is flawed. If I average out every day and then take an
> > average of this daily average, I get a different result. Any ideas on
how
> > to best solve this problem?
> > Thanks.|||It entirely depends on which 'average' you want. Different ways of
grouping / summing the data can produce slightly different results.

Do you want :

Sum per rep per day added together for each manager then divided by
the number of reps ?
or
Sum of all appointments for all reps divided by the numbers of reps ?
or
Something else

There are a few combinations to consider. Also, there are different
ways of looking at averages. Mean, mode, median etc... (from what I
can remember from my GCSE's some 14 years ago - apologies if this is
wrong / incomplete)

I would suggest defining exactly what is required and then choosing a
solution to suit. Comparing different requirements will give different
answers.

Ryan

"J" <sorrynospam@.nobody.com> wrote in message news:<40fb37ea$0$74187$39cecf19@.news.twtelecom.net>...
> The business rule is, the sales manager is commissioned on the avg. number
> of appointments set up per salesrep per day during the month.
> I have 2 tables: The UserLog table records only 1 entry per day per user
> (salesrep). This will log how many salesreps worked a particular day. The
> second table logs any appointments set up.
> UserLog: ID, UserName, EnteredTime
> Appointment: ApptID, EnteredTime, ApptDate
> I figured that, for a given date ranged, I could
> 1. sum the number of appointments
> 2. sum the number of days worked
> 2. sum the salesreps / number of days = avg number of salesreps per day
> 3. number of appointments / avg number of salesreps per day = avg number
> of appointments per sales rep
> But this logic is flawed. If I average out every day and then take an
> average of this daily average, I get a different result. Any ideas on how
> to best solve this problem?
> Thanks.|||Let me make more sense of this:

I should have used the term 'telemarketers' instead of 'salesreps'.
The telemarketers set up appointments for an outside sales force.
The manager is commissioned on the appointments set up, but
the telemarketers workforce may very...

"Mischa Sandberg" <mischa_sandberg@.telus.net> wrote in message
news:mFHKc.88627$eO.41568@.edtnps89...
> Sounds weird; the more sales reps there are,
> the less the manager makes for the same amount of work
> (setting up appointments). I guess this statistic
> is rational, if your prime concern is penalizing the
> manager for setting up appointments for reps who
> never show up for work :-)
> Ignoring that ...
> Your first algorithm is the one that's off.
> You need to calculate appointments per day,
> and reps (working) per day, to get appts per rep per day.
> The average comes after, and you don't need to
> count the number of days at all.
> For simplicity, I'm going to write "EnteredDate"
> instead "convert(varchar,EnteredTime,111)"
> -- i.e. the yyyymmdd part of the time.
> select avg(AD.Appts / RD.Reps)
> from (select EnteredDate,count(*) Appts
> from Appointment group by EnteredDate) AD
> join (select EnteredDate,count(*) Reps
> from UserLog group by EnteredDate) RD
> on RD.EnteredDate = AD.EnteredDate
> where RD.EnteredDate between ...
> Hope that helps.
> "J" <sorrynospam@.nobody.com> wrote in message
> news:40fb37ea$0$74187$39cecf19@.news.twtelecom.net. ..
> > The business rule is, the sales manager is commissioned on the avg.
number
> > of appointments set up per salesrep per day during the month.
> > I have 2 tables: The UserLog table records only 1 entry per day per
user
> > (salesrep). This will log how many salesreps worked a particular day.
> The
> > second table logs any appointments set up.
> > UserLog: ID, UserName, EnteredTime
> > Appointment: ApptID, EnteredTime, ApptDate
> > I figured that, for a given date ranged, I could
> > 1. sum the number of appointments
> > 2. sum the number of days worked
> > 2. sum the salesreps / number of days = avg number of salesreps per
> day
> > 3. number of appointments / avg number of salesreps per day = avg
> number
> > of appointments per sales rep
> > But this logic is flawed. If I average out every day and then take an
> > average of this daily average, I get a different result. Any ideas on
how
> > to best solve this problem?
> > Thanks.|||J,

That's great, but the point is that you have not stated the business rule unambiguously. How about inventing some sample data for which
the two definitions of this average are unequal, so that you can try to decide which specific answer you want?

SK

J wrote:
> Let me make more sense of this:
> I should have used the term 'telemarketers' instead of 'salesreps'.
> The telemarketers set up appointments for an outside sales force.
> The manager is commissioned on the appointments set up, but
> the telemarketers workforce may very...
> "Mischa Sandberg" <mischa_sandberg@.telus.net> wrote in message
> news:mFHKc.88627$eO.41568@.edtnps89...
>>Sounds weird; the more sales reps there are,
>>the less the manager makes for the same amount of work
>>(setting up appointments). I guess this statistic
>>is rational, if your prime concern is penalizing the
>>manager for setting up appointments for reps who
>>never show up for work :-)
>>
>>Ignoring that ...
>>Your first algorithm is the one that's off.
>>You need to calculate appointments per day,
>>and reps (working) per day, to get appts per rep per day.
>>The average comes after, and you don't need to
>>count the number of days at all.
>>
>>For simplicity, I'm going to write "EnteredDate"
>>instead "convert(varchar,EnteredTime,111)"
>>-- i.e. the yyyymmdd part of the time.
>>
>>select avg(AD.Appts / RD.Reps)
>>from (select EnteredDate,count(*) Appts
>> from Appointment group by EnteredDate) AD
>>join (select EnteredDate,count(*) Reps
>> from UserLog group by EnteredDate) RD
>> on RD.EnteredDate = AD.EnteredDate
>>where RD.EnteredDate between ...
>>
>>Hope that helps.
>>
>>"J" <sorrynospam@.nobody.com> wrote in message
>>news:40fb37ea$0$74187$39cecf19@.news.twtelecom.net. ..
>>
>>>The business rule is, the sales manager is commissioned on the avg.
> number
>>>of appointments set up per salesrep per day during the month.
>>>
>>>I have 2 tables: The UserLog table records only 1 entry per day per
> user
>>>(salesrep). This will log how many salesreps worked a particular day.
>>
>>The
>>
>>>second table logs any appointments set up.
>>>
>>>UserLog: ID, UserName, EnteredTime
>>>Appointment: ApptID, EnteredTime, ApptDate
>>>
>>>I figured that, for a given date ranged, I could
>>> 1. sum the number of appointments
>>> 2. sum the number of days worked
>>> 2. sum the salesreps / number of days = avg number of salesreps per
>>
>>day
>>
>>> 3. number of appointments / avg number of salesreps per day = avg
>>
>>number
>>
>>>of appointments per sales rep
>>>
>>>But this logic is flawed. If I average out every day and then take an
>>>average of this daily average, I get a different result. Any ideas on
> how
>>>to best solve this problem?
>>>
>>>Thanks.
>>>
>>>
>>>
>>>
>>>
>>
>>
>|||I understand very well the points you are making, and thanks for clarifying
these issues. They want the average of the daily averages of Appointments
per Telemarketer. In this case, Mischa's example works nicely.

"Steve Kass" <skass@.drew.edu> wrote in message
news:mD_Kc.7718$mL5.2515@.newsread1.news.pas.earthl ink.net...
> J,
> That's great, but the point is that you have not stated the business
rule unambiguously. How about inventing some sample data for which
> the two definitions of this average are unequal, so that you can try to
decide which specific answer you want?
>
> SK
> J wrote:
> > Let me make more sense of this:
> > I should have used the term 'telemarketers' instead of 'salesreps'.
> > The telemarketers set up appointments for an outside sales force.
> > The manager is commissioned on the appointments set up, but
> > the telemarketers workforce may very...
> > "Mischa Sandberg" <mischa_sandberg@.telus.net> wrote in message
> > news:mFHKc.88627$eO.41568@.edtnps89...
> >>Sounds weird; the more sales reps there are,
> >>the less the manager makes for the same amount of work
> >>(setting up appointments). I guess this statistic
> >>is rational, if your prime concern is penalizing the
> >>manager for setting up appointments for reps who
> >>never show up for work :-)
> >>
> >>Ignoring that ...
> >>Your first algorithm is the one that's off.
> >>You need to calculate appointments per day,
> >>and reps (working) per day, to get appts per rep per day.
> >>The average comes after, and you don't need to
> >>count the number of days at all.
> >>
> >>For simplicity, I'm going to write "EnteredDate"
> >>instead "convert(varchar,EnteredTime,111)"
> >>-- i.e. the yyyymmdd part of the time.
> >>
> >>select avg(AD.Appts / RD.Reps)
> >>from (select EnteredDate,count(*) Appts
> >> from Appointment group by EnteredDate) AD
> >>join (select EnteredDate,count(*) Reps
> >> from UserLog group by EnteredDate) RD
> >> on RD.EnteredDate = AD.EnteredDate
> >>where RD.EnteredDate between ...
> >>
> >>Hope that helps.
> >>
> >>"J" <sorrynospam@.nobody.com> wrote in message
> >>news:40fb37ea$0$74187$39cecf19@.news.twtelecom.net. ..
> >>
> >>>The business rule is, the sales manager is commissioned on the avg.
> > number
> >>>of appointments set up per salesrep per day during the month.
> >>>
> >>>I have 2 tables: The UserLog table records only 1 entry per day per
> > user
> >>>(salesrep). This will log how many salesreps worked a particular day.
> >>
> >>The
> >>
> >>>second table logs any appointments set up.
> >>>
> >>>UserLog: ID, UserName, EnteredTime
> >>>Appointment: ApptID, EnteredTime, ApptDate
> >>>
> >>>I figured that, for a given date ranged, I could
> >>> 1. sum the number of appointments
> >>> 2. sum the number of days worked
> >>> 2. sum the salesreps / number of days = avg number of salesreps per
> >>
> >>day
> >>
> >>> 3. number of appointments / avg number of salesreps per day = avg
> >>
> >>number
> >>
> >>>of appointments per sales rep
> >>>
> >>>But this logic is flawed. If I average out every day and then take an
> >>>average of this daily average, I get a different result. Any ideas on
> > how
> >>>to best solve this problem?
> >>>
> >>>Thanks.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >

Saturday, February 25, 2012

Matching records using Business Intelligence Studio

Hello,

I have a database with thousands of records that contain personal details of customers. Some of these records pertain to the same customer - however, they have been submitted by different people, so they differ slightly in detail.

I've been looking to see if any of the data mining tools provided by Business Intelligence Studio in SQL Server 2005 will enable me to achieve a high degree of accuracy in matching records that pertain to the same customer. From what I can see, these tools seem more suited to making general predictions based on large groupings rather than the kind of precise prediction I am looking for.

So I'd appreciate it if anyone could tell me if there is any way I could use Business Intelligence Studio to match these 'duplicate' records together, or whether I will have to create a more SQL-based solution which attempts to match the customer records using SELECT statements and making assumptions about the data.

TIA,

Kweri

One solution is to start by creating an Integration Services project.

In the project, define a Data Flow task and add the following transforms:

- a Data Source transform, which reads from your database

- a Fuzzy Matching transform

- a Data Destination transform

The Fuzzy Matching Integration Services transform is intended to resolve the kind of problem you describe (matching records based on similarity)