Final for Proseminar

I have provided you a data set with 1419 observations on salaries along with a number of other variables.

You work is to generate and interpret a multiple linear regression with
salary as a function of age (variable age), education (variable educ), gender (variable female), minority status (variable minority), and time on the job (variable jobtime)

Before running that model, you should look at these variables individually and provide detail for each: possible statistics might be the mean values, minimums, maximums, etc. The idea is to describe these data to me, your reader.

Because salary is the crucial variable, you might want to look at dependent variable salary and how it relates to each of these variables independently:

1. how do salary and educ relate (what is the statistic used? Why? What does it mean?)
2. how do salary and jobtime relate (what is the statistic used? Why? What does it mean?)
3. how do salary and age relate (what is the statistic used? Why? What does it mean?)
4. how do salary and female relate (what is the statistic used? Why? What does it mean?)
5. how do salary and minority relate (what is the statistic used? Why? What does it mean?
6. how do salary and jobcat (job category) relate (what is the statistic used? Why? What does it mean?

You should also look at the relationship between the independent variables:

1. how does education differ
• between men and women (what is the statistic used? Why? What does it mean?)
• between minority and non-minority workers (what is the statistic used? Why? What does it mean?)
• by job category (what is the statistic used? Why? What does it mean?)

Finally, generate a regression model using all of the variables discussed in the introduction and discuss the output.

Make sure to speak about each of the coefficients: for example,

• what is the coefficient on education?
• What does it mean/how do you interpret it?
• Is it significant?
• How do you know?, and so forth.

Final words

All of the work that you do above should be presented in a paper using sentences in paragraphs. Please do indicate the statistics used but do not include any Stata output directly.

Have fun. Work hard. Do bring questions to class.

 variable name variable label id Employee Code bdate Date of Birth educ Educational Level (years) jobcat Employment Category salary Current Salary salbegin Beginning Salary jobtime Months since Hire minority Minority Classification female Female dummy 1=female 0=male clerical01 clerical dummy 1=clerical 0=not clerical custodial01 custodial dummy 1=custodial 0= not custodial manager01 manager dummy 1=manager 0=not manager age age in years agesquare age squared