# Hypothesis Testing

### Purpose

Use this module to run hypothesis tests, compare groups, and fit simple models.

### Key functions

* `ChiSquareTestOfIndependence` — Chi-square test from two categorical columns
* `ChiSquareTestOfIndependenceFromTable` — Chi-square test from a contingency table
* `OneSampleTTest` — One-sample t-test against a hypothesized mean
* `TwoSampleTTestOfIndependence` — Independent samples t-test between groups
* `TwoSampleTTestPaired` — Paired samples t-test for before/after designs
* `OneWayANOVA` — One-way ANOVA across multiple groups
* `TTestOfMeanFromStats` — One-sample t-test from summary statistics
* `TTestOfTwoMeansFromStats` — Two-sample mean comparison from summary statistics
* `TTestOfProportionFromStats` — One-sample proportion test from summary statistics
* `ConductLinearRegressionAnalysis` — Linear regression with optional diagnostics
* `ConductLogisticRegressionAnalysis` — Logistic regression for binary outcomes
* `ConductCoxProportionalHazardRegression` — Cox proportional hazards survival model

### Common use cases

* Categorical association testing (contingency tables)
* Mean comparisons (one-sample, independent, paired)
* Multi-group mean comparisons (ANOVA)
* Working from summary stats instead of raw data
* Regression modeling (linear, logistic)
* Survival analysis (time-to-event)

### Examples

See [Hypothesis Testing Examples](https://analysis-toolbox.gitbook.io/home/readme/broken-reference) for clean, copy-paste snippets.

<details>

<summary>Legacy draft (collapsed)</summary>

{% hint style="info" %}
Examples assume you already have a `pandas.DataFrame` (named `df`) in memory.
{% endhint %}

#### ChiSquareTestOfIndependence

#### TwoSampleTTestPaired

Performs a chi-square test of independence between two categorical variables.

```python
import pandas as pd
from analysistoolbox.hypothesis_testing import ChiSquareTestOfIndependence

data = {
    'Education': ['High School', 'College', 'High School', 'Graduate', 'College'],
    'Employment': ['Employed', 'Unemployed', 'Employed', 'Employed', 'Unemployed']
}
df = pd.DataFrame(data)

ChiSquareTestOfIndependence(
    dataframe=df,
    first_categorical_column='Education',
    second_categorical_column='Employment',
    plot_contingency_table=True
)
```

#### ChiSquareTestOfIndependenceFromTable

Performs a chi-square test from a precomputed contingency table.

```python
import pandas as pd
from analysistoolbox.hypothesis_testing import ChiSquareTestOfIndependenceFromTable

contingency_table = pd.DataFrame(
    {'Online': [100, 150], 'In-Store': [200, 175]},
    index=['Male', 'Female']
)

ChiSquareTestOfIndependenceFromTable(
    contingency_table=contingency_table,
    plot_contingency_table=True
)
```

#### OneSampleTTest

Performs a one-sample t-test against a hypothesized mean.

```python
from analysistoolbox.hypothesis_testing import OneSampleTTest

OneSampleTTest(
    dataframe=df,
    outcome_column='score',
    hypothesized_mean=70,
    alternative_hypothesis='two-sided',
    confidence_interval=0.95
)
```

#### TwoSampleTTestOfIndependence

Performs an independent samples t-test to compare means between two groups.

Runs a chi-square test using two categorical columns.

Performs a paired samples t-test for before/after comparisons.

```python
from analysistoolbox.hypothesis_testing import TwoSampleTTestOfIndependence

TwoSampleTTestOfIndependence(
    dataframe=df,
    outcome_column='score',
    grouping_column='group',
    alternative_hypothesis='two-sided',
    homogeneity_of_variance=True
)
```

#### TwoSampleTTestPaired

Performs a one-way ANOVA to compare means across multiple groups.

```python
from analysistoolbox.hypothesis_testing import TwoSampleTTestPaired

TwoSampleTTestPaired(
    dataframe=df,
    first_outcome_column='pre_score',
    second_outcome_column='post_score',
    alternative_hypothesis='greater'
)
```

#### TTestOfMeanFromStats

#### OneWayANOVA

```python
from analysistoolbox.hypothesis_testing import OneWayANOVA

OneWayANOVA(
    dataframe=df,
    outcome_column='performance',
    grouping_column='treatment_group',
    plot_sample_distributions=True
)
```

Compares two means using summary statistics.

#### TTestOfTwoMeansFromStats

Performs a one-sample t-test using summary statistics rather than raw data.

Tests a sample proportion against a hypothesized value.

#### TTestOfProportionFromStats

```python
from analysistoolbox.hypothesis_testing import TTestOfMeanFromStats

TTestOfMeanFromStats(
    sample_mean=75,
    sample_size=30,
    sample_standard_deviation=10,
    hypothesized_mean=70,
    alternative_hypothesis='greater'
)
```

Performs linear regression analysis with optional diagnostics.

#### ConductLinearRegressionAnalysis

```python
from analysistoolbox.hypothesis_testing import TTestOfTwoMeansFromStats

TTestOfTwoMeansFromStats(
    first_sample_mean=75,
    first_sample_size=30,
    first_sample_standard_deviation=10,
    second_sample_mean=70,
    second_sample_size=30,
    second_sample_standard_deviation=12
)
```

Performs logistic regression for binary outcomes.

#### ConductLogisticRegressionAnalysis

```python
from analysistoolbox.hypothesis_testing import TTestOfProportionFromStats

TTestOfProportionFromStats(
    sample_proportion=0.65,
    sample_size=200,
    hypothesized_proportion=0.50,
    alternative_hypothesis='two-sided'
)
```

Performs survival analysis using Cox proportional hazards regression.

#### ConductCoxProportionalHazardRegression

```python
from analysistoolbox.hypothesis_testing import ConductLinearRegressionAnalysis

results = ConductLinearRegressionAnalysis(
    dataframe=df,
    outcome_column='sales',
    list_of_predictor_columns=['advertising', 'price'],
    plot_regression_diagnostic=True
)
```

```python
from analysistoolbox.hypothesis_testing import ConductLogisticRegressionAnalysis

results = ConductLogisticRegressionAnalysis(
    dataframe=df,
    outcome_column='purchased',
    list_of_predictor_columns=['age', 'income'],
    plot_regression_diagnostic=True
)
```

```python
from analysistoolbox.hypothesis_testing import ConductCoxProportionalHazardRegression

model = ConductCoxProportionalHazardRegression(
    dataframe=df,
    outcome_column='event',
    duration_column='time',
    list_of_predictor_columns=['age', 'sex', 'treatment'],
    plot_survival_curve=True
)
```

</details>
