The Geography of Ridesharing A Case Study On New York City
The Geography of Ridesharing A Case Study On New York City
Abstract
Despite the popularity of ridesharing, there is limited empirical evidence on how rideshar-
ing activities differ across regions with different levels of accessibility and the implication for
consumers. In this paper, we study the market for rides across New York City neighborhoods.
We construct a novel data set that contains massive API queries on route-specific estimates of
pricing, wait time, and travel time of Uber, Lyft, and the public transit. After linking this data
with actual trip records of taxis, Uber, and Lyft, we document a strong pattern that ridesharing
has a larger market share relative to taxis in neighborhoods with lower accessibility, defined
either in terms of geographic distance to Midtown Manhattan or “economic distance” to job
opportunities. Next, we estimate a discrete-choice model of demand for rides and interpret the
geography of ridesharing through the lens of the model. We find that consumer surplus from
ridesharing varies drastically across geography: passengers that are 5 to 15 miles (resp. more
than 15 miles) from Midtown experience a 60% (resp. 19%) larger consumer surplus relative
to passengers that are within 5 miles from Midtown. Additionally, over half of these gains
comes from reduced wait time. We discuss the implications of the distributional results for
policy makers.
∗
The authors would like to thank Erik Brynjolfsson, Babur De los Santos, Dean Eckles, Chiara Farronato, Andrey
Fradkin, Avi Goldfarb, Daniel Miller, Catherine Tucker, Joel Waldfogel, Patrick Warren for their invaluable comments
and suggestions. We also thank conference and seminar participants at Stanford Workshop at Marketplace Innovation,
Marketing Science Conference, Toulouse Conference on the Digital Economy, AEA, MIT, WashU, Clemson, and
Erasmus University. This study is not sponsored by Uber, Lyft, or any other organizations.
†
Assistant Professor of Economics, Clemson University. Email: [email protected]
‡
Assistant Professor of Marketing, Washington University in St. Louis. Email: [email protected]
§
Assistant Professor of Marketing, Washington University in St. Louis. Email: [email protected]
Ridesharing platforms have gained popularity around the world in recent years and, alongside,
scrutiny from policy makers. Policies and regulations about ridesharing across cities seem to be
policy makers ignore the potential distributional impacts of ridesharing across geographies, a one-
size-fits-all policy will benefit some neighborhoods while being detrimental to others. Studying
the geography of ridesharing and its implication for consumers helps us understand where and
when ridesharing could benefit consumers, therefore creating opportunities for designing nuanced
In this paper, we study how ridesharing affects consumer welfare across neighborhoods in
a metropolitan setting. To achieve this goal, we aim to answer two research questions. First,
how do ridesharing activities differ across neighborhoods with different levels of accessibility?
Second, how might this geography translate into consumer welfare in different neighborhoods?
To answer these questions, we construct a comprehensive data set that describes the market of
rides in New York City (NYC hereafter). Specifically, we gather a massive set of API queries on
route-specific estimates of pricing, wait time, and travel time of Uber, Lyft, as well as the NYC
public transit. We link this data with actual trip records of taxis, Uber, and Lyft published by NYC
Taxi and Limousine Commission (TLC hereafter). We further enrich the data by conducting a field
collection of around 70,000 historical trip records from a sample of Uber and Lyft drivers. The
resulting data set provides a comprehensive picture of the market that allows for analysis at the
We first summarize and visualize geographic patterns in ridesharing and taxi rides. We find
where accessibility is defined either in terms of geographic distance to the city center or “economic
distance”. The geographic distance is calculated as the distance between the pickup location and
Manhattan Midtown, and economic distance is measured by the neighborhood’s total number of
jobs accessible within one hour’s commute by public transit (Kaufman et al. (2014)). However,
the rate at which the number of pickups decreases with respect to distance, which we will refer to
as the trip elasticity of distance henceforth, is smaller than that of taxis. Consequently, the market
share of ridesharing relative to that of taxi is higher in low accessibility neighborhoods under both
definitions of distance. In addition, the geography of ridesharing is also robust to using pickups
by green boro taxis as a benchmark, which follow the same pricing rule as yellow medallion
taxis but are allowed to pick up passengers only outside Manhattan Core1 , i.e., predominantly low
accessibility neighborhoods. We find that the market share of ridesharing remains larger than that
The geographical distribution of market shares is an equilibrium outcome, driven by both con-
sumer preference toward various transportation modes and drivers’ decisions on when and where
to provide the service. We provide empirical evidence that is consistent with a high degree of
matching friction between taxi drivers and passengers, which likely gives rise to the low presence
of taxi pickups in low accessibility neighborhoods. First, we show that low accessibility neighbor-
hoods more frequently exhibit excess demand (the incidence of more people needing a ride than
the number of available drivers in the area) instead of insufficient demand for taxis. Second, the
trip elasticity of distance is larger for yellow taxis than for green taxis in low accessibility regions,
1
Manhattan Core is defined as the part of Manhattan below the north edge of Central Park.
2
This finding is consistent with Mitchell L. Ross’ argument: “(green taxis) basically cluster at transit and retail
hubs because they are more likely to find passengers there than if they cruise the streets they are authorized to cruise.”
https://www.nytimes.com/2018/09/03/nyregion/green-cabs-yellow-uber.html
of the Manhattan Core. This difference suggests that the matching friction in low accessibility
neighborhoods relative to the Manhattan Core is even higher for yellow taxis than for green taxis.
The matching friction present among taxis may be alleviated by ridesharing through technology-
enabled matching, incentives created by dynamic pricing, or both. Ridesharing platforms seem to
reduce this information gap in low accessibility neighborhoods mainly through the use of advanced
matching algorithms: According to our data, surge pricing is rare in low accessibility neighbor-
The second part of the paper aims at translating the observed geography to consumer welfare,
leading us one step closer to answering relevant policy questions. We model consumer’s decision
as a discrete choice problem between various competing transportation modes for a given route
map the observed rides into consumer preference toward price, wait time, as well as other observed
In the demand estimation, endogeneity arises when the price of a region is correlated with
the unobserved demand condition in the same region, conditional on service type, route, and time
fixed effects. To obtain a consistent estimate of price elasticity, we use an instrumental variable
(IV) that affects the market share of a transportation mode on a given route only through its impact
on price, but it is otherwise uncorrelated with local demand shocks. We construct such an IV using
a design feature of the ridesharing Apps that were present during our sample period: passengers
see and commit to a surge multiplier on Uber and Lyft Apps prior to entering their destination
on their phones.3 Exploiting this feature, we use the surge prices of trips into the focal zone to
3
After our sample period, this old version of Uber was updated to upfront pricing, and now riders need to input
information on where the consumer is going, they cannot adjust their surge multiplier according to
the consumer’s destination, which creates the uncorrelatedness of surge prices at the origin and the
underlying demand condition at the destination. On the other hand, surge pricing at the origin will
affect the price in the focal zone through its effect on the number of available drivers in the focal
Using this instrument, we identify heterogeneous consumer preference toward price and wait
time. The parameter estimates show that consumers prefer lower price and less wait time. Also, the
elasticity estimates across regions and different times of the day are consistent with the intuition
that people are less price-sensitive and care more about short wait time during rush hours and in
destinations with more offices. The fact that consumers prefer less wait time in certain locations
during certain times of the day suggests that ridesharing platforms may increase consumer surplus
Through the lens of the estimated demand model, we translate the geography of ridesharing
to the geography of consumer surplus. Conceptually, we adopt the idea of compensating variation
and estimate how much consumers should be compensated if Uber and Lyft were to be removed
from the market in order to maintain the same level of utility for consumers in different regions.
We find that consumer surplus from ridesharing varies drastically across geography: passengers
that are 5 to 15 miles (resp. more than 15 miles) from Midtown experience a 60% (resp. 19%)
larger consumer surplus relative to passengers that are within 5 miles from Midtown. Additionally,
over half of these gains comes from reduced wait time for getting a ride.
The distributional results on the relative market share and the uneven consumer gains from
their destination to get a fixed price. Lyft also went through a similar design change.
gest that the impact of ridesharing can be highly uneven across regions with different levels of
accessibility. It is important to note that metropolitan areas in the United States exhibit significant
geographical disparity in transit access (Tomer et al., 2011; Owen and Murphy, 2018), even for
New York City, which has the nation’s best public transit system. Its neighborhoods at the 90th
percentile in accessibility have access to 18 times more jobs via public transit within an hour than
the neighborhoods at the 10th percentile in accessibility (using data provided by Kaufman et al.
(2014)). This geographical disparity is unlikely to change in the near future, because it depends
on long-run factors such as city planning and re-designing of the public transit network. Taxis are
another important component of metropolitan transit systems, yet we show that they are highly
concentrated in the dense areas, even for taxis that are specifically designed to pick up passengers
in less accessible regions, due to the lack of technology that provides real-time information about
demand.
Based on our findings, urban and transportation planners should recognize the important role
incentives for ridesharing in these neighborhoods, holding the social costs constant. A more inclu-
sive transit network is important because mobility and proximity to markets could affect individu-
als’ labor market outcome and overall well-being, especially those of disadvantaged demographic
groups and individuals without car ownership (e.g., Ihlanfeldt and Sjoquist (1990), Ihlanfeldt and
Sjoquist (1991), Holzer et al. (1994), Hering and Poncet (2010), among other papers in the urban
economics literature).
Our paper contributes to three literature strands. First, it contributes to the growing literature on
estimating the impact of ridesharing technology on consumers. The closest paper to ours is Cohen
et al. (2016), which uses internal data from Uber and exploits a series of regression discontinuity
designs (RDD) around the surge multiplier cutoffs and estimate a consumer surplus of $1.6 per
dollar spent on Uber. Our paper is different in two ways. First, our goal is to understand the distri-
butional gain in consumer surplus across regions with different levels of accessibility. Our welfare
estimate may be biased due to data limitation, but we expect the qualitative result on the distri-
butional effect of ridesharing to hold. Second, as the authors admit in the paper, the RDD yields
short-run consumer surplus from Uber, whereas we estimate consumer welfare gain in a long-run
steady state. To do so, we allow for consumer substitution among transportation alternatives, and
study consumers’ substitution pattern towards taxi and public transit when ridesharing platforms
Researchers have also examined other important welfare topics associated with ridesharing
and the sharing economy in general, such as flexible work arrangements (Chen et al., 2017; Hall
and Krueger, 2016; Hall et al., 2017), routing efficiency and driver moral hazard (Cramer and
Krueger, 2016; Liu et al., 2018; Wang et al., 2019), reduced matching friction (Buchholz, 2015;
Frechette et al., 2016; Zhang et al., 2019), reduced drunk driving (Greenwood and Wattal, 2017),
local consumption patterns (Zhang and Li, 2017), traffic congestion (Li et al., 2016), and the use of
public transit (Hall et al., 2018; Babar and Burtch, 2017). We add to this literature by being among
Second, this paper contributes to the research on geographical impact in transportation in U.S.
areas, which finds mobility inequality across U.S. metropolitan areas as well as the correlation
between neighborhood accessibility and measures such as unemployment and economic oppor-
tunities (Tomer et al., 2011; Kaufman et al., 2014; Owen and Murphy, 2018). In addition, these
studies agree that only marginal improvements in public transit systems can be made in light of
deep budget cuts at all levels of government. Building on the measures and insights provided by
these studies, we offer empirical evidence that ridesharing can fit into the economic and social
Lastly, researchers have studied the geographical impact of information technology in other
contexts, such as in the area of international trade (e.g., Blum and Goldfarb (2006); Hortaçsu
et al. (2009)), crowdfunding (e.g., (Agrawal et al., 2015)), and in the accommodation market (e.g.,
2 Data
Our data set focuses on the five boroughs of NYC, namely the Bronx, Brooklyn, Manhattan, Staten
Island, and Queens from June 1, 2016 to August 31, 2016. The city is divided by TLC into 263
taxi zones that vary in size.4 For example, an average taxi zone in Manhattan is approximately
equivalent to an area of 6 by 6 avenue blocks. Throughout this paper, we treat these taxi zones as
our basic geographical units of analysis, given the nature of the data. Our data set contains data
(1) Real-time Uber and Lyft dynamic pricing and wait time
4
Refer to the following link for a shape file of taxi zones provided by TLC: http://www.nyc.gov/html/
tlc/html/about/trip_record_data.shtml.
at the geographical centroids of all 263 taxi zones at the minute level, for all service types on
Uber and Lyft in our sample period, namely UberX, UberXL, UberBlack, UberSUV, UberPOOL,
Lyft, LyftLine, and LyftPlus.5 We also queried trip distance, trip duration, and trip cost for a
route between a given pickup zone and a randomly chosen drop-off zone, in any given minute.
Therefore, for each unique route, i.e., a pickup-drop-off pair, we have information on trip distance,
duration, and cost approximately every 4 hours 6 . We use this route-level information mainly in
NYC TLC publishes taxi, Uber, and Lyft trip records. Taxi trip records (summarized in Ap-
pendix Table 4) contain detailed trip information, such as pickup and drop-off date and time, the
GPS coordinates of the pickup and drop-off locations, number of passengers, trip fares, etc. Uber
and Lyft trips are identified from other For Hire Vehicles (FHV hereafter) trip records using the
dispatching base numbers.7 Far from the level of detail of taxi trip records, Uber and Lyft trip
data only contain pickup date, time, and locations in the form of taxi zones. Restricted by this,
we then map the GPS coordinates of taxi trips into their corresponding taxi zones so that taxi and
Uber and Lyft trip records provided by TLC lack information on drop-off locations, drop-off
5
UberX and Lyft are the regular and mostly used service types on each platform, respectively. UberXL and LyftPlus
are service types with cars of larger capacity, which usually seat up to six passengers. UberBlack and UberSUV are
the luxury options on Uber. UberPool and LyftLine are the carpool options on Uber and Lyft, respectively.
6
More accurately, if we query on each route in exactly one minute, it takes on average about 263 minutes to query
the same route again.
7
Refer to http://www.nyc.gov/html/tlc/downloads/pdf/find_a_ride.pdf. See the data ap-
pendix Appendix C for how the base numbers are identified.
level. To address this problem, we conducted field data collections to acquire 70,277 historical
trip records from 443 Uber and Lyft drivers in NYC. The purpose is to get a representative sample
in order to estimate the empirical distribution of drop-off time and locations. We then use these
probabilities to impute the service-type-route-time trip counts (the detailed procedure in Appendix
D). The representativeness of the field data is supported by a comparison between their pickup
We queried Google Maps API to obtain total travel time via public transit (subway and bus).
This data is collected at the level of 263 × 263 routes, by night/day and by weekend, where night is
defined as 0:00am to 6:59am. Specifically, the total travel time for a given route via public transit
includes (1) walking time from the zone centroid to the subway station or bus stop that Google
Maps assigns, (2) subway or bus wait time, (3) subway or bus travel time, and (4) walking time
out of the subway station or from the bus stop to the centroid of the destination zone. Note that
the total public transit time is the best route suggested by Google Maps, which may include solely
subway, solely bus, or a combination of both, depending on specific routes and times.
In this section, we summarize and visualize model-free evidence using our data set. The average
share of trips that experience a price surge is 7% – 8% for Uber non-luxury services and is around
19% for Lyft services. In terms of wait time, both Uber and Lyft’s flagship services, namely UberX
and Lyft, are around 7 minutes. To focus on our research question, we visualize how price and wait
Manhattan
20
15
10
5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour
Queens
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour
Brookyn
20
15
10
5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour
Bronx
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour
Surge Multiple
1.0 1.5 2.0 2.5 3.0
Note: This graph plots how UberX minute-level surge multiple and wait time, averaged over all zones of a given NYC
borough, vary across time of the day, on Monday, June 6th 2016. Surge multiples are in varying shades as illustrated
by the legend, and wait time are measured by the bar length.
time vary across geography, and leave the reporting of other summary statistics in Table 3 in the
Appendix.
In Figure 1, we plot the average surge multipliers and wait time across different regions at
different time of the day. Note that conceptually, the same region at different time of the day can
be regarded as a different region, because they could be subject to different underlying market
condition [Tom: conditions]. Two features of the data emerge: First, prices and wait time change
rapidly across geography and time of the day. Second, surge multipliers and wait times are usually
greater in Manhattan than in outer boroughs, and are greater in rush hours than in non-rush hours.
These variations suggest that the prevalence of ridesharing and its implication for consumers could
10
Next, we plot the number of taxi trips and the share of ridesharing trips out of all trips (taxi plus
ridesharing) in Figure 2. The graph shows that the number of taxi pickups are concentrated in high
accessibility region, i.e., the Manhattan Core, and it decreases as the distance to the Manhattan
core increases. Meanwhile, the share of ridesharing pickups relative to taxi is larger in regions
To see this pattern more clearly, in Figure 3a, we plot the relative share of ridesharing to yellow
taxis against different neighborhoods that differ in the distance to Midtown Manhattan measured
in miles. Consistent with Figure 2, we see that the market share of ridesharing is greater in neigh-
borhoods that are farther away from Midtown Manhattan, suggesting that ridesharing supplements
11
across space, but also across time: in Figure 11 in the appendix, we find evidence that passengers
substitute toward ridesharing more during the hours when taxis become less available due to shift
change.
In Figure 3b, we performed the same analyses by comparing ridesharing with pickups by green
taxis, which was introduced by TLC in August 2013 to increase taxi coverage in neighborhoods
outside Manhattan. Note that green boro taxis use the pricing structure of yellow medallion taxis,
but they have limited pickup areas.8 We see two interesting patterns. First, although green taxis
are specifically designed to serve outer boroughs, ridesharing still has a larger market share than
green taxis in neighborhoods that are farther away from Midtown Manhattan. Next, comparing
across Figure 3a and Figure 3b, we see that the second graph has a less steep slope, which means
that green taxis have a lower trip elasticity of distance compared to yellow taxis. This difference
reflects different drivers’ cruising strategy, because the two types of taxis essentially face the same
demand due to their same pricing rules outside of the Manhattan Core. The different cruising
strategy is consistent with drivers responding optimally to different matching frictions: Yellow
taxi drivers prefer to cruise towards the Manhattan Core likely because their matching friction of
finding passengers is higher in outer boroughs, relative to that of green taxi drivers, possibly due
Subsequently, we show that the geography of ridesharing activities is robust to the use of “eco-
nomic distance”. Specifically, we follow the literature to measure the accessibility of a neighbor-
hood by the amount of employment and other economic activities it is connected to. We adopt the
8
Specifically, they can only accept street hails anywhere in NYC except south of West 110th Street and East 96th
Street in Manhattan, and they can take pre-arranged trips from JFK and LGA airports. Additionally, green taxis may
drop passengers off anywhere in NYC. By the end of 2015, 7,676 green boro taxi licenses were active in the market.
12
hour of travel by public transit from a given neighborhood of NYC’s 177 neighborhoods. In con-
structing the measure, the authors drew on Census data and Google Maps API. To fit the measure
into our analyses that use taxi zones as the unit of analysis, we built a crosswalk between NYC
neighborhoods and NYC taxi zones using a geographical software which returns the portion of a
given neighborhood that overlaps with a taxi zone in terms of geographical area. We then computed
the weighted sum of jobs accessible across all neighborhoods that (partially) overlap with a given
taxi zone, using the above-mentioned portions as weights. In addition, we chose to use per-capita
jobs accessible as our preferred measure of accessibility, which is computed by dividing the total
number of jobs accessible by the population of a given zone. The distribution of this accessibility
measure exhibits a clear long tail — the neighborhood at the 90th percentile has access to 18 times
more jobs than the neighborhood at the 10th percentile. In addition, this accessibility measure neg-
atively correlates with the geographical distance measure (with a correlation coefficient at -0.7454,
Using this measure of accessibility, we visualize the correlation between the relative share
of ridesharing with respect to taxis and the level of accessibility of a region. Figure 3c shows a
clear negative correlation between accessibility and the prevalence of ridesharing relative to yellow
taxis. In Figure 3d, we observe a similar pattern for the relative share of ridesharing to green taxis.
Similar to the evidence using geographical distance, we see that ridesharing has a larger market
share than taxis in less accessible neighborhoods. Additionally, comparing across Figure 3c and
Figure 3d, we see again that green taxis are less elastic with respect to economic distance when it
comes to picking up passenger in low accessible region. This suggests that the difference in the
trip elasticity with respect to economic distance is due to differences in drivers’ cruising strategy,
13
One alternative hypothesis is that the smaller share of taxis in less accessible neighborhoods is
driven by the lack of demand for rides. However, this is not consistent with the evidence that there
is more unfulfilled demand for taxis in less accessible neighborhoods. We use realized pickups that
we observe in the data as a proxy for demand of rides. Specifically, we use July 2016 data and count
the total number of ridesharing and taxi pickups in a given 15-minute period in a given zone. We
use the number of taxi drop-offs in the previous 15-minute period as a proxy for the taxi supply
in the focal period (call this X), and the number of pickups by both taxi and ridesharing in the
14
(a) Yellow and Green Taxis (b) Yellow and Green Taxis
focal period as a proxy for demand for rides (call this Y). We define a dummy variable “enough
demand” that equals 1 if Y ≥ X and 0 otherwise. Figure 4a plots each zone’s frequency of
“enough demand”, averaged across all 15-minute intervals of the month. We see that as we move
away from Midtown, the frequency of “enough demand” increases. Similarly, Figure 4b shows
a higher rate of enough demand in less accessible regions. These existence of excess demand,
combined with the finding that yellow and green taxis have different trip elasticity of distance in
outer boroughs, suggest that low demand for taxi is unlikely to be the sole reason for a relatively
small share of taxi pickups in the low accessibility regions. Admittedly, Taxi and ridesharing
services are different products, and therefore the observed differences could in principle be driven
for these attributes and reach a similar conclusion in our counterfactual analyses.
Lastly, we study the role of technology-aided matching and surge pricing in explaining the
geography of ridesharing. Using zone-minute level dynamic pricing on Uber and Lyft from the
data on ridesharing APIs, we plot the relative market share of ridesharing and surge multipliers
15
(a) (b)
of the taxi zones across regions with different levels of accessibility in Figure 5a. In the graph,
triangles, circles, and cubes represent regions that are within a short, medium, and long distance
to Midtown, respectively (here “short”, “medium”, and “long” are defined by the terciles of the
distance to Midterm). We find that the average share of ridesharing is positively correlated with
the region’s distance to Midtown, but it is negatively correlated with surge multiplier both within
and across each distance bracket. Additionally, if we look at long-distance trips (squares), the
relative share of ridesharing is more than 90% even in regions where the average surge multiplier
is close to 1. In Figure 5b, we repeat the analysis on the standard deviation of surge multipliers,
instead the average. We similarly find that a lower variation in surge multiplier predicts higher
share of ridesharing within a distance bracket. These patterns suggest that, even though dynamic
pricing may play a big role in matching demand and supply in highly accessible regions (Hall et al.
(2015) and Castillo et al. (2017)), it cannot be the main factor that leads to a larger market share
of ridesharing in less accessible regions. In other words, technology-aided matching seems to play
16
plus
The descriptive results in Section 3 discover geographic patterns of ridesharing rides relative to taxi
rides across regions with different accessibility, measured in terms of the distance to Manhattan
and in terms of economic distance to job and other economic opportunities. While these results are
novel and interesting on their own rights, one could argue that they are steps away from informing
policy making because it is unclear how to interpret the relative market shares.
and demand for taxis and ridesharing. The purpose of this exercise is to translate the observed
geographic difference in the market share of ridesharing to consumer surplus through a counter-
factual analysis where ridesharing services are hypothetically removed. Additionally, the exercise
also sheds light on relative strength of different mechanisms of changes in consumer surplus, such
as from changes in price or wait time. We should note that the benefits come at the cost of making
more assumptions on the choice model, the data compatibility, and the validity of the instrument
for demand estimation, many of which can only be indirectly tested. Therefore, we view the this
Market conditions constantly change in the market of rides: factors such as prices, wait time, and
conditions of competing alternatives vary at a high frequency across time within a location. Con-
sumers make transit decisions by evaluating these relevant time-varying characteristics. Therefore,
17
The product in this market is a ride, which is a transportation service that moves a consumer i
from a starting point j to a destination k in time t. We restrict attention to taxis, various ridesharing
service types, as well as the public transit. For a given route jkt, the consumer’s choice of trans-
portation mode depends on price, travel time, wait time, other observed and unobserved service-
specific characteristics, as well as the consumer’s own idiosyncratic taste, which is specified as
0
Usijkt = −αjkt Psjkt + βjkt (T otalojkt − W Tsjt − T Tsjkt ) + Xsjkt Θ + ξsjkt + φjkt + sijkt (1)
where s denotes one of the “inside options”, namely taxis and all service types on Uber and Lyft,
while o stands for the outside option — the public transit. we take the public transit as the outside
option because demand for rides is a “derived” demand — once a trip need (to go from Point A to
Point B at Time T) is identified, the agent has to choose one of the transportation options to make
the trip. We let T otalojkt denote the total travel time of the public transit, or the door-to-door time,
which is the sum of the walking time to the subway station or bus stop, wait time, time en route,
and walking time from the station/stop to the destination. Let W Tsjt denote the wait time of s at
location j and time t, and T Tsjkt is the time en route of s. Therefore, T otalojkt − W Tsjt − T Tsjkt
represents the amount of time s can save compared to the outside option, and βjkt is the marginal
utility of time saved. For example, for a route that is 35-minute long in total via the public transit,
if the UberX wait time is 5 minutes and the travel time is 20 minutes (thus total time is 25 minutes),
then the consumer’s utility of UberX is βjkt × 10 minutes higher than that of the baseline.
We let Psjkt denote the price of the trip, and −αjkt is the marginal utility of price. Let Xsjkt
18
ciated vector of parameters. The other variables are defined as follows: ξsjkt is the unobserved
(to the researcher) route-service-specific utility component; φjkt is the utility difference between
all car options and the public transit, and it measures the comfort of not having to walk to the
subway station/bus stop and being able to sit for the entire trip, as opposed to not being able to sit
down with the public transit sometimes; sijkt is the consumer idiosyncratic error term. We further
assume that the travel time T Tsjkt is the same for all inside service types, since they travel on the
same route at the same time and thus they are subject to the same traffic and road conditions. As
a result, the subscript s in T Tsjkt is removed from here on. The purpose of regulating a common
travel duration will become clear later. Finally, we normalize the utilities of all inside options
against the utility of the outside option, i.e., the mean utility of the public transit is set at 0, i.e.,
Uoijkt = oijkt .9
Usually, the preference parameters of discrete choice models are obtained by estimating a mapping
from the characteristics space to realized market shares of options (e.g. Berry (1994), Berry et al.
(1995), Nevo (2000, 2001), and Petrin (2002)). However, market shares of service types cannot be
computed because we do not observe route-level (or, jkt-specific) public transit ridership. With a
market jkt defined so finely, we do not have a reasonable benchmark of market size to impute the
market shares like authors who study other settings (e.g. Berry et al. (1995) use population size of
a geographical area to proxy the market for automobiles). Nonetheless, Chevalier and Goolsbee
9
Liu et al. (2017) study the welfare impact of DiDi in China. They assume that consumers make choices at the
origin level, as opposed to the origin-destination pair level in our model.
19
We follow this approach to avoid the problem due to unavailable market shares, while at the same
time flexibly allow for preference heterogeneity across markets. Specifically, we assume a Type
1 extreme value distribution of the error term, which amounts to a standard logit at each jkt cell.
Then the market share of s in the market jkt, though not observable, has to satisfy the following:
exp(δsjkt )
M arketSharesjkt = PS , (2)
1 + n=1 exp(δsjkt )
0
where δsjkt = −αjkt Psjkt + βjkt (T otalojkt − W Tsjt − T Tjkt ) + Xsjkt Θ + φjkt + ξsjkt is the mean
conditional utility of service s at jkt. To ease illustration, let taxi cabs be denoted as c. Then taking
logs of the predicted odds ratios of taxis’ share and the share of any ridesharing service type yields
Dcjkt
log( ) = αjkt (Psjkt − Pcjkt ) + βjkt (W Tsjt − W Tcjt ) (3)
Dsjkt
where Dcjkt and Dsjkt are trip counts of taxi and service type s, respectively. Note that the terms
T otalojkt and T Tjkt are canceled out in the algebra, because they are common across service times
within the same route. Equation 3 indicates that the relative market shares of taxi and service type
s at the route-time level should be affected by their differences in price, wait time, as well as other
We assume that consumer preference for price and wait time is the same within a route jkt. We
make this assumption because we have defined routes at a very granular level, and consumers with
the same pickup and dropoff location within a narrow time window likely have similar preferences.
20
0
αjkt = Yjkt ΘA + α (4)
0
βjkt = Yjkt ΘB + β , (5)
where Yjkt is a row vector of dummy variables that contain various combinations of pickup areas
and time blocks, and ΘA and ΘB are the vectors of the corresponding coefficients for price and
time, respectively. Specifically, the areas include Lower Manhattan (dummy), Midtown Manhattan
(dummy), Upper East and West Manhattan (dummy), and Non-Manhattan Core (dummy). The
time blocks include morning rush (weekdays 7 a.m. - 9 a.m.), evening rush (weekdays 4 p.m. - 7
p.m.), weekday day time (weekdays 10 a.m. - 3 p.m.), weekday night (weekdays 8 p.m. - 11 p.m.),
weekday late night (weekdays midnight - 6 a.m.), weekend day time (weekends 5 a.m. - 5 p.m.),
weekend night (Friday 8 p.m. - 11 p.m. and weekends 6 p.m. - 11 p.m.), and weekend late night
Note that consumers do not know the precise taxi wait time at a given location at a given time,
unlike in the case of ridesharing. We assume that consumers act according to their expectations of
W Tcjt for taxi time when choosing transportation modes, and −βjkt W Tcjt can be absorbed by the
many location and time fixed effects in the regression. The parameter for consumers’ preference
towards wait time, βjkt , is identified by the variation in the wait time of Uber and Lyft, namely
W Tsjt .
We choose to use the simple logit framework primarily due to its tractability in generating
analytical solutions that allow for identification in the absence of market shares. The simple logit
21
of two options need to stay the same regardless of the availability of an alternative. In principle,
we could estimate a model that allows for more flexible substitution pattern, such as a nested logit
model. However, this would require an instrumental variable that shifts demand in one nest but not
in the other, which is difficult to find. We will discuss the validity of our current instrument under
Additionally, we need to make assumption on the data compatibility. Specifically, Uber and
Lyft trip records published by TLC lack drop-off information and trip service types, which prevents
us from getting the trip counts of various Uber and Lyft service types at the jkt route, or Dsjkt . To
address this issue, we use the surveyed 70,277 Uber and Lyft historical trips in the same time period
to construct proxies. This sample consists of a random subsample and a convenience subsample.
In Appendix D, we show that the convenience subsample resembles the random subsample closely,
so we can rely on the randomness of the full sample to infer Dsjkt . To infer Dsjkt , we first estimate
a probit function to predict the probability of a certain trip that takes place in a certain jkt cell,
using the full sample. We then impute Dsjkt by distributing the total Uber/Lyft pickups at a jt to
various service types and destinations, based on these empirical probabilities (detailed in Appendix
E).
window because it can approximate the real-time setting, while not dividing the time too finely
such that consumer information collection and decision making are split into different time periods.
Accordingly, all the explanatory variables are collapsed into the 15-minute averages, while trip
counts are sums over 15 minutes. We set the origin and destination at the taxi zone level and
PUMA (Public Use Microdata Areas) level, respectively, where taxi zones (263 in total) are a
22
of analysis is that prices and wait times vary across taxi zones so pickup locations need to be at
the taxi zone level to fully exploit the richness of the API data. On the other hand, having 263
destination zones means studying 263×263 markets, which is a stretch for imputation of Dsjkt .
4.3 Identification
Our estimation, like other demand estimation studies, is subject to price and wait time endogene-
ity, given that Uber and Lyft pricing algorithms can potentially take into account many factors that
affect demand, both observed and unobserved to the researchers. Then a simple OLS estimation of
Equation 3 would lead to biased estimates of variables of interest. To deal with this potential en-
dogeneity problem, note that we control for service type, route, and time fixed effects at a granular
level. Next, we implement an IV strategy, and the IV should affect the market share of a service
type on a given route only through its impact on price, but it is otherwise uncorrelated with other
demand shocks.
We construct such an IV using a design feature of the ridesharing Apps that were present during
our sample period: passengers see and commit to a surge multiplier on Uber and Lyft Apps prior to
entering their destination on their phones. Exploiting this feature, we instrument for Psjkt − Pcjkt
using the average surge prices of trips into the focal zone in the previous time period. Because the
ridesharing Apps do not possess information on where the consumer is going, they cannot adjust
their surge multiplier according to the consumer’s destination, which creates the uncorrelatedness
of surge prices at the origin and the underlying demand condition at the destination.10 The exclu-
10
This may not be the case anymore given that Uber and Lyft now practice “upfront pricing”, which requires rider
23
for trip requests at a given origin. We rely on the many location and time fixed effects to help
alleviate this concern. On the other hand, surge pricing at the origin will affect the price in the
focal zone through its effect on the number of available drivers in the focal zone after they drop off
their consumers.
To deal with the endogeneity of W Tsjt , we use the total number of drop-offs at all neighboring
zones of the focal zone as the instrument. This essentially measures the stock and availability of
cars close to the focal zone, which affects the wait time at the focal zone. The exclusion restriction
assumption that we make is that the availability of cars in the neighboring zones is not correlated
We perform first-stage regressions to test the strength of the instruments. Since we have dozens
of endogenous variables (because of interactions of time-location dummies with price and wait
time), the first-stage tests entail an equal number of regressions, each of which regresses an en-
dogenous variable on all regressors. F statistics and partial R2 ’s are reported in Appendix Table 6.
Given that all F-statistics are bigger than 10 (Staiger and Stock, 1997), and that the partial R2 ’s are
relatively high (Shea, 1997), our instruments are not weak instruments.
The stochastic unobserved utility component ξ is assumed to be normally distributed with mean
zero and independent across both service types and jkt’s. The error term ξcjkt − ξsjkt in Equation
(3), however, creates a within-jkt correlation among the observations, as these observations share
a common part ξcjkt in the error. Specifically, COV (ξcjkt − ξsjkt , ξcjkt − ξs0 jkt ) 6= 0 for two on-
demand service types s and s0 in the same jkt. Given that we use instrumental variables that are not
destinations before showing a fixed price for the trip. In this case, origin price can be affected by the destination de-
mand shock, if the ridesharing platforms practice dynamic programming and discriminate riders based on destinations.
For example, the demand shock at the destination may lead the platform to willingly offer a low price at the origin,
just to have the driver relocated to the destination.
24
in errors. One complication to the problem, however, is that the analysis sample is unbalanced —
as illustrated in Section 4.1, there are varying number of observations within a given jkt due to the
sub-sampling filters applied (Dcjkt > 0 and Dsjkt ≥ 0.1). We choose to follow the general method
Generalized Two-Stage Least Squares. Details of the application to generate our estimator are
reported in Appendix F.
In the estimation, we drop airport pickups from the sample because only very few trips end in
neighboring zones of airports, and the correlation between these trips and airport wait time is very
weak to justify the use of the wait time IV. To-airport trips, however, are kept in the sample. We
include the dummy variable “to-airport” in the vector Yjkt to allow consumers on trips to the airport
to have additional price and time sensitivities. Similarly, we include the dummy variable “rain”
to allow for extra dis-utility of wait time. The vector of observed characteristics Xsjkt includes
luxury and capacity. Luxury measures the units of luxury service provided by the trip, which is a
dummy variable on UberBlack and UberSUV, multiplied by the duration of the jkt trip. Capacity
is defined similarly, except for service types UberXL, UberSUV, and LyftPlus. Finally, the set of
fixed effects includes pickup zone, drop-off PUMA, pickup hour by weekend (dummy), pickup
PUMA by time block, pickup PUMA by drop-off borough, and drop-off PUMA by time block,
dummies of trips at various trip duration levels (e.g. 10-minute trip, 15-minute trip, etc.).
25
Note: This table presents the demand estimation results. Throughout the table, α indicates a row of price
sensitivity estimates and β indicates a row of time sensitivity estimates. “Lower” stands for lower Manhat-
tan; “UE & UW” stands for Upper East and Upper West; “Non Core” stands for non Manhattan core. “Air-
port” is a dummy for to-airport trips. The time blocks are defined as: morning rush (weekdays 7am-9am),
evening rush (weekdays 4pm-7pm), weekday daytime (weekdays 10am-3pm), weekday night (weekdays
8pm-11pm), weekday late night (weekdays0am-6am), weekend day time (weekends 5am-5pm), weekend
night (Friday 8pm-11pm and weekends6pm-11pm), and weekend late night (weekends 0am-4am). Stan-
dard errors are in parentheses. * represents statistical26
significance at 10% level, ** 5%, and *** 1%.
The estimation results are shown in Table 1. First of all, our results identify consumer preference
toward lower prices and shorter wait time — across almost all location-time-block combinations,
the price effects on utility are estimated to be significantly negative (−αjkt ), and the marginal utility
of time (βjkt ) is estimated positive and strong. These estimates directly indicate that consumers
are price-sensitive and value less wait time. In addition, these estimates suggest that consumers
are willing to make price-time trade-offs, and these economic fundamentals rationalize the use of
Second, we find sensible heterogeneity in consumer price and time preference across locations
and times. For example, consumers in Midtown Manhattan during morning rush hours tend to be
more time-sensitive and less price-sensitive, compared to consumers in most other location-times.
This may reflect the preference of relatively high-income workers who rush into their workplaces
on weekday mornings. A very similar pattern appears in Lower Manhattan during evening rush
hours, which may be driven by Wall Street workers. Also, consumers going to the airport value
time more and are additionally less sensitive to price. Similarly, extra disutility of wait time is
found on consumers who hail a ride during the rain. These heterogeneities are consistent with
intuition and provide confidence on the identification. In addition, the coefficient estimates of
luxury and capacity are sizable, indicating that NYC consumers value these features that are made
27
stand how taxi supply would respond in this hypothetical situation. In this section, we propose a
model of taxi market equilibrium. The uniqueness of the taxi market, well articulated in Orr (1969),
is the difference between operating hours and passenger service units supplied: taxi drivers’ costs
depend on the hours of driving and searching for passengers, while their revenues depend on the
fare multiplied by the number of passenger service units supplied. This is due to the nature of the
taxi matching technology. Here, we modify the framework used in Orr (1969) to fit our purpose.
D = f (F a , q; M, ΘD )
where D is the number of passenger service units demanded; F a is the fixed administered fare; q
is the total operating hours of taxi drivers; M is the number of potential consumers; and ΘD is a
∂D
vector of demand parameters. Also, D is continuous in F a and ∂F a
< 0; D is continuous in q, and
∂D
∂q
> 0, because as more taxi hours are provided, consumers are more quickly matched to drivers
∂D ∂2D
and the average consumer wait time decreases; D(q = 0, F a ; M, ΘD ) = 0, ∂q q=0
>> 0, ∂q∂q
< 0,
∂2D
and ∂q∂M
> 0.
The taxi market is a market with a rather elastic supply of labor; only modest skills are required
to operate taxi cabs, and the practice of daily lease of medallions to the drivers imposes quite low
entry and exit costs.11 Cab drivers respond positively to wage increases by working longer hours
11
Farber (2015) documents “a fair amount of entry, exit, and reentry among taxi drivers”. Hall et al. (2017) demon-
strate the horizontal labor supply curve for Uber drivers, which may as well be the case for cab drivers.
28
market equilibrium is characterized as a steady state where marginal cost and average revenue are
equalized:
F a ∗ D(F a , q; M, ΘD )
= M C(q) (6)
q
∂M C
where M C(q) is the marginal cost of operating a taxi, M C(q) > 0, M C(q) << ∞, ∂q
≥ 0,
∂2M C
and ∂q∂q
≥ 0. The fixed medallions impose a hard constraint on the number of operating hours
available in the market, that is, the maximal amount of daily operating hours is the number of
medallions multiplied by 24 hours. Let this maximum be q̄. Then the algebra leads to that the
12
equilibrium operating hours supplied is concave in D up till q̄. Let wait time for taxi passengers
be defined as,
W T = f (q, D; ΘW T )
∂W T ∂2W T
the vector ΘW T . Particularly, ∂q
< 0 and ∂q∂q
> 0: holding other things constant, more taxi
service supplied leads to less wait time for consumers, yet this decrease in wait time diminishes
∂W T ∂2W T
as service units increase. ∂D
> 0 and ∂D∂D
> 0: holding other things constant, more trips
demanded lead to longer consumer wait time, and this increase in wait time is greater as more trips
are demanded.
A graphical characterization of the taxi market equilibrium is presented in Figure 6a, where the
equilibrium path is the combination of the part of q ∗ before q̄ and the part of vertical line above
q ∗ (the curve in red). An immediate implication on the equilibrium service units and equilibrium
12 ∂q Fa ∂2q
Differentiating both sides of Equation 6 with respect to D leads to ∂D = M C(q)+ ∂M C > 0. Then ∂D∂D =
∂q q
∂q
F a ∂D [2M C 0 (q)+M C 00 (q)q]
− [M C(q)+ ∂M C 2 < 0.
∂q q]
29
units increase after the capacity constraint. This is because supply cannot further adjust after the
∂W T ∂W T ∂q∗
capacity constraint. Therefore, ∂D∗
= ∂q∗ ∂D∗
+ ∂W
∂D
T
= 0 + ∂W
∂D
T
> 0.13 The intuition is simple:
as demand increases and maximal taxi capacity is reached, more and more consumers compete
with each other to get matched to a fixed number of operating taxi cabs, which leads to longer
(a) Taxi Market Equilibrium (b) Model Prediction of Equilibrium Service Units
and Wait Time
One direct way to test the model prediction is to do a scatter plot of taxi pickups with taxi
wait time, and check whether the empirical pattern resembles Figure 6b. Unfortunately, in this
study we can not observe, estimate, or simulate the actual taxi wait time. However, Frechette et al.
(2016) are able to simulate taxi wait time from observed taxi cabs, taxi search time, and exogenous
time-varying factors, combined with a simulated matching function (Figure 6 of their paper). We
T ∂q∗
13
Before the capacity constraint q̄, the term ∂W ∂W T
∂q∗ ∂D∗ is negative, so the sign of ∂D∗ is undetermined. We use a
flat line in Figure 6b to describe the relationship between q and D before q̄, but it should be noted that this relationship
can be either positive, zero, or negative.
14
Note that the market equilibrium proposed here abstracts away from the spatial equilibrium models such as Lagos
(2000), Lagos (2003), and Buchholz (2015). It is possible that when there is an exogenous shock of demand, taxi cabs
relocate spatially and form a new spatial equilibrium, which results in a different average wait time than implied by
our model.
30
find that UberTaxi wait time follows a similar trend as their estimates across hours of day, although
UberTaxi wait time is less volatile. We believe that UberTaxi is a reasonable proxy for taxi wait
Figure 7: UberTaxi Wait Time and Taxi Wait Time from Frechette et al. (2016)
(a) Model Prediction of Equilibrium Service Units (b) Hourly Taxi Wait time and Taxi Trips in Manhat-
and Wait Time tan Core, Weekdays
The data strongly support the model prediction, as shown in Figure 8b. Overall, there is a pos-
itive correlation between taxi trips and wait time. In particular, the average wait time is relatively
low below some trip quantity threshold but becomes much higher after the threshold with a greater
variation. In addition, there is a sharp contrast between rush hours and non-rush hours: first, rush
31
after the threshold at 10,000 trips, is greater during rush hours (correlation coefficient is 0.42) than
non-rush hours (correlation coefficient is 0.19). This is likely due to the certain spatial distribu-
tion of commuting routes during rush hours, which exacerbates the within-location imbalance of
demand and supply. We leverage these empirical correlations in the counterfactual analysis.
To evaluate the change in consumer surplus due to ridesharing, we adopt the concept of com-
pensating variation – how much consumers should be compensated if Uber and Lyft were to be
removed from the market such that the consumers can maintain the same level of utility? In the
counterfactual, we assume that taxis and the public transit are the only viable options. We also
assume that the public transit remains the same operation, without capacity constraint when more
riders substitute toward it; the taxi system remains the current fixed number of medallions and
administered fares.
There are two major reasons why consumers could be worse off in the absence of ridesharing
services: first, existing ridesharing users would lose all the amenities from ridesharing services
which they value more than other alternatives, due to revealed preference (that is, they would not
have used ridesharing services had these services not provided the users with the highest utility);
second, as shown in Section 4.5, when more consumers willingly substitute toward taxis, the equi-
librium average wait time increases, which makes existing taxi users worse off. We then take the
1. First, we estimate φjkt , the utility difference between all car service types and the public
32
able to sit for the entire trip. We allow for this utility in rush hours to differ from the utility in
non-rush hours to account for the extra disutility of riding the public transit during rush hours,
which is denoted as φr . In the search for φ and φr . We rely on the fact that once Uber and Lyft
are removed from the market, the number of taxi trips will increase because of sheer substitution.
That is, the values of φ and φr must be such that the corresponding counterfactual taxi ridership
is greater than or equal to the current ridership. It is important to note that using this boundary
φ and φr , given the monotonic relationship between these values and consumers’ preference for
taxis, which then leads to a conservative estimate of taxi wait time change and a lower bound for
2. We use taxi wait time estimates of Frechette et al. (2016) as the counterfactual taxi wait
times, and directly compute the mean conditional utilities of the public transit and taxis in the
counterfactual. Comparing these mean conditional utilities yields the counterfactual taxi trips.
3. We then infer current taxi wait times. We use the empirical relation between UberTaxi wait
time and taxi trips in Figure 8b to approximate the true relation between taxi wait times and trip
volumes, where we estimate the following OLS regression on day-hours when taxi trips exceed the
where we estimate the equation separately for rush hours and non-rush hours; π1 is estimated at
0.155 for rush hours (N=233, t=7.09), and at 0.057 for non-rush hours (N=559, t = 4.51). The
33
(from Step (2)), and current taxi trips (directly from data) help us compute the current taxi wait
current wait time = 0.155*(current taxi trips - counterfactual taxi trips) + counterfactual wait time
(8)
4. With the inferred current taxi wait time, we compute the utility difference of current rideshar-
ing as well as taxi users by comparing consumers’ current options with counterfactual best options.
It is important to note that taxi wait time estimates in Frechette et al. (2016) are for Manhattan core
only. As a result, our inferred key values in Step 2, 3, and 4 so far, namely counterfactual taxi
trips and current taxi wait times, are for Manhattan core only. Therefore, we make assumptions on
Non-Manhattan core taxi wait times and computer consumer surplus accordingly. We take three
alternative values, namely 10 minute, 15 minutes, and 20 minutes, as various approximation of the
We present the counterfactual results in Table 2. The consumer surplus of ridesharing is about
72 cents per dollar, or $14 for an average trip, when the taxi wait time in non-Manhattan core is a
conservative 10 minutes. As the taxi wait time increases, the value of ridesharing also increases as
the incumbent options becomes even less appealing. We find the majority of the consumer surplus
comes from shortened wait time, compared to taxis and the public transit. Luxury cars are also
valued significantly. Similarly, consumers value the utility of sitting in a car (thus not having to
walk and possibly stand in the public transit), as 13% of welfare comes from “comfort”. A very
thin share of the consumer welfare increase comes from price, yet this is not surprising since in
NYC, Uber and Lyft prices overall compare to taxi prices at the base-price level. In addition, taxi
34
riders gain 16 cents per dollar spent, because taxi wait time becomes shorter as a result of consumer
35
Note: In the graph, the density of taxi and ridesharing trips across distance to midtown is plotted by lowess smoothing
against the left axis. The per-dollar consumer surplus across distance is plotted by lowess smoothing against the right
axis.
Given that the goal of this paper is to understand the geography of ridesharing and its implica-
tion on consumers, we compare the per-dollar consumer surplus across neighborhoods represented
by their distance to Midtown Manhattan (recall that this distance measure is highly correlated with
the accessibility measure). In Figure 9, we first plot the density of taxi and ridesharing trips across
distance to Midtown. Consistent with the previous results, we see that although both taxi and
ridesharing pickups are concentrated in regions that are closer to Midtown Manhattan, the level of
concentration is smaller for ridesharing trips. Next, we plot the geography of consumer surplus
from ridesharing. We find that it varies drastically across geography and exhibits a U-shape: pas-
sengers that are 5 to 15 miles from Midtown experience 60% larger consumer surplus (average
per-dollar consumer surplus is $1.15) relative to passengers that are within 5 miles from Mid-
town (average per-dollar consumer surplus is $0.72). For passengers that are more than 15 miles
from Midtown, their surplus ($0.86) is 19% larger than trip that start within 5 miles from Mid-
36
Manhattan than in outer boroughs, which makes ridesharing less valuable (in percentage terms) in
Manhattan than in outer boroughs. For consumers that are in the most remote areas (greater than
15 miles from Midtown Manhattan), the surplus becomes smaller relative to medium-range outer
boroughs (5 to 15 miles from Midtown Manhattan), likely because the alternative options (public
transit and taxis) are not as inaccessible as they are in medium-range outer boroughs.
Note: This graph plots the per-dollar consumer surplus against distance to midtown, separately for each of the five
utility components. All curves are plotted by lowess smoothing.
utility components. As shown in Figure 10, the U-shaped pattern observed in the total consumer
surplus in Figure 9 appears to be mainly driven by the consumer surplus from saved time for
consumer surplus is relatively uniform. Consumer surplus from luxury ridesharing options ap-
pears greater in Manhattan than in outer boroughs, which is consistent with the business-oriented
37
Taken together, our calculation of consumer surplus points to a highly uneven distributional
benefit of ridesharing across geographies. Neighborhoods that are within a medium distance to
Midtown Manhattan (i.e., 5 to 15 miles from Midtown) benefit the most from the presence of
ridesharing, and the major welfare source is the reduced time waste as compared to other trans-
portation modes.
5 Discussion
Mobility inequality correlates with economic inequality. Yet achieving inclusive mobility is a
difficult task, primarily due to the lack of incentives for public transit expansion. Taxis are a
good way of point-to-point transportation, but they tend to cluster in dense areas and leave non-
dense areas underserved, because this is an equilibrium response to the geographical distribution of
passengers and lack of real-time demand information. Therefore, the proposition that aims to make
taxi services more evenly distributed across space by expanding the medallion capacity constraint,
restricting pickup areas, or both is unlikely to produce satisfactory outcomes, as exemplified by the
In this paper, we find that ridesharing could promote inclusive mobility through tech-aided
matching, which makes the ride-hailing service conveniently available for consumers in areas
that are underserved by other transportation modes. Specifically, we use data from actual trips
to present evidence that ridesharing coverage dominates that of taxis in less accessible areas. We
then provide evidence that the advanced driver-passenger matching of ridesharing companies could
38
of ridesharing through the lens of a choice model, and the estimation results suggest a dispropor-
tionate larger gain in consumer surplus of ridesharing services in low accessibility areas.
The literature that explores the economics behind ridesharing platforms has primarily focused
on their average effects, i.e., how ridesharing platforms integrate massive real-time market infor-
mation to enhance capacity utilization, in a generic dense metropolitan market. Our findings shed
light on the importance of the underexplored distributional effects, i.e., the ability of ridesharing
to connect and serve various metropolitan neighborhoods with inferior access to other mobility
resources. Our findings suggest that technology can play a key role in mitigating geographical
disparity in transportation, and this calls for impact studies in other domains to explore other ways
References
Agrawal, A., C. Catalini, and A. Goldfarb (2015). Crowdfunding: Geography, social networks,
and the timing of investment decisions. Journal of Economics & Management Strategy 24(2),
253–274.
Babar, Y. and G. Burtch (2017). Examining the impact of ridehailing services on public transit use.
Balestra, P. and J. Varadharajan-Krishnakumar (1987). Full information estimations of a system of
simultaneous, equations with error component structure. Econometrics Theory 3(2), 223–246.
Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econo-
metrica: Journal of the Econometric Society, 841–890.
Berry, S. T. (1994). Estimating discrete-choice models of product differentiation. The RAND
Journal of Economics, 242–262.
Blum, B. S. and A. Goldfarb (2006). Does the internet defy the law of gravity? Journal of
International Economics 70(2), 384–405.
Buchholz, N. (2015). Spatial equilibrium, search frictions and efficient regulation in the taxi in-
dustry. Technical report, Working paper.
Castillo, J. C., D. Knoepfle, and G. Weyl (2017). Surge pricing solves the wild goose chase. In
Proceedings of the 2017 ACM Conference on Economics and Computation, pp. 241–242. ACM.
39
Chen, M. K. and M. Sheldon (2016). Dynamic pricing in a labor market: Surge pricing and flexible
work on the uber platform. In EC, pp. 455.
Chevalier, J. and A. Goolsbee (2009). Are durable goods consumers forward-looking? evidence
from college textbooks. The Quarterly Journal of Economics 124(4), 1853–1884.
Cohen, P., R. Hahn, J. Hall, S. Levitt, and R. Metcalfe (2016). Using big data to estimate consumer
surplus: The case of uber. Technical report, National Bureau of Economic Research.
Cramer, J. and A. B. Krueger (2016). Disruptive change in the taxi business: The case of uber. The
American Economic Review 106(5), 177–182.
Farber, H. S. (2015). Why you cant find a taxi in the rain and other labor supply lessons from cab
drivers. The Quarterly Journal of Economics 130(4), 1975–2026.
Farronato, C. and A. Fradkin (2017). The welfare effects of peer entry in the accommodations
market: The case of airbnb. Technical report, Working paper.
Frechette, G. R., A. Lizzeri, and T. Salz (2016). Frictions in a competitive, regulated market
evidence from taxis.
Goolsbee, A. and A. Petrin (2004). The consumer gains from direct broadcast satellites and the
competition with cable tv. Econometrica 72(2), 351–381.
Greenwood, B. N. and S. Wattal (2017). Show me the way to go home: An empirical investigation
of ride-sharing and alcohol related motor vehicle fatalitie. MIS Quarterly 41(1).
Hall, J., J. Horton, and D. Knoepfle (2017). Labor market equilibration: Evidence from uber.
Hall, J., C. Kendrick, and C. Nosko (2015). The effects of ubers surge pricing: A case study. The
University of Chicago Booth School of Business.
Hall, J. D., C. Palsson, and J. Price (2018). Is uber a substitute or complement for public transit?
Journal of Urban Economics 108, 36–50.
Hall, J. V. and A. B. Krueger (2016). An analysis of the labor market for ubers driver-partners in
the united states. Technical report, National Bureau of Economic Research.
Hering, L. and S. Poncet (2010). Market access and individual wages: Evidence from china. The
Review of Economics and Statistics 92(1), 145–159.
Holzer, H. J., K. R. Ihlanfeldt, and D. L. Sjoquist (1994). Work, search, and travel among white
and black youth. Journal of Urban Economics 35(3), 320–345.
Hortaçsu, A., F. A. Martı́nez-Jerez, and J. Douglas (2009). The geography of trade in online trans-
actions: Evidence from ebay and mercadolibre. American Economic Journal: Microeconomics,
53–74.
40
Ihlanfeldt, K. R. and D. L. Sjoquist (1991). The effect of job access on black and white youth
employment: A cross-sectional analysis. Urban Studies 28(2), 255–265.
Kaufman, S., M. L. Moss, J. Tyndall, and J. Hernandez (2014). Mobility, economic opportunity
and new york city neighborhoods.
Lagos, R. (2000). An alternative approach to search frictions. Journal of Political Economy 108(5),
851–873.
Lagos, R. (2003). An analysis of the market for taxicab rides in new york city. International
Economic Review 44(2), 423–434.
Li, Z., Y. Hong, and Z. Zhang (2016). An empirical analysis of on-demand ride sharing and traffic
congestion.
Liu, M., E. Brynjolfsson, and J. Dowlatabadi (2018). Do digital platforms reduce moral hazard?
the case of uber and taxis. Technical report, National Bureau of Economic Research.
Liu, M., T. Tunca, Y. Xu, and W. Zhu (2017). An empirical analysis of price formation, utilization,
and value generation in ride sharing services.
Nevo, A. (2000). Mergers with differentiated products: The case of the ready-to-eat cereal industry.
The RAND Journal of Economics, 395–421.
Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry. Econometrica 69(2),
307–342.
Orr, D. (1969). The” taxicab problem”: A proposed solution. Journal of Political Economy 77(1),
141–147.
Petrin, A. (2002). Quantifying the benefits of new products: The case of the minivan. Journal of
political Economy 110(4), 705–729.
Shea, J. (1997). Instrument Relevance in Multivariate Linear Models : A Simple Measure. The
Review of Economics and Statistics 79(2), 348–352.
Staiger, D. and J. H. Stock (1997). Instrumental Variables Regression with Weak Instruments.
Econometrica 65(3), 557.
Tomer, A., E. Kneebone, R. Puentes, and A. Berube (2011). Missed opportunity: Transit and jobs
in metropolitan america.
Wang, Y., C. Wu, and T. Zhu (2019). Mobile hailing technology and taxi driving behaviors.
Marketing Science 38(5), 734–755.
41
Zhang, K., H. Chen, S. Yao, L. Xu, J. Ge, X. Liu, and M. Nie (2019). An efficiency paradox of
uberization. Available at SSRN 3462912.
42
43
44
(a) NYC taxi, Uber, and Lyft trips by time of an (b) Shares of Uber and Lyft trips by time of an
average weekday average weekday
Appendix C Identifying Uber and Lyft Trips from TLC FHV Trip Records
Data
The FHV trip data does not specifically indicate the company name of each trip, instead it shows the
trip’s dispatching base number. Using the official TLC list of FHV bases 18 , we are able to identify
Uber and Lyft trips by the correspondence between base numbers and company names. Specif-
ically, the base numbers associated with Uber are B02512, B02395, B02617, B02682, B02764,
B02765, B02835, B02836, B02864, B02865, B02866, B02867, B02869, B02870, B02871, B02872,
B02875, B02876, B02877, B02878, B02879, B02880, B02882, B02883, B02884, B02887, B02888,
and B02889. The base numbers associated with Lyft are B02510 and B02844.
45
46
47
P r(1 trip in sjkt) =f (pickup zone, dropoff puma, service type, pickup hour,
pickup borough × pickup hour, pickup borough × dropoff puma,
pickup borough × service type, dropoff puma × service type)
U ber Lyf t
Step 3: Calculate Dsjkt by distributing Djt and Djt into service types and drop-off lo-
cations. This requires constructing weights using the estimated psjkt in Step 2, and applying the
following formulas (Note that s = 1, 2, 3, 4, 5, for 5 service types on Uber, and s = 6, 7, 8, for 3
service types on Lyft):
For Uber:
psjkt U ber
Dsjkt = P263 P5 Djt
k=1 s=1 psjkt
For Lyft:
psjkt Lyf t
Dsjkt = P263 P8 Djt
k=1 s=6 psjkt
U ber Lyf t
These weights ensure that the inferred Dsjkt ’s return the value of Djt and Djt , when summed
over service types and drop-off locations.
48
and
and
1 XXXX 1 XXXX
σ̂s2 = (ξˆsjkt − ξˆsjkt )2 (14)
Csjkt J Kj Tjk Sjkt
Csjkt J Kj Tjk Sjkt
Step 4: Construct a Csjkt × Csjkt block diagonal matrix with Cjkt number of blocks
..
.
O O
− 21
Ω̂ = O Q̂jkt |Sjkt |×|Sjkt | O
(15)
...
O O Csjkt ×Csjkt
where each block element ( Q̂jkt |S |×|S | ) is a square matrix with diagonal elements equal to
jkt jkt
1 1 1−|Sjkt | 1 1 1
(
|Sjkt | (|S |σ̂ 2 +σ̂ 2 ) 12
− σ̂s
) and off-diagonal elements equal to (
|Sjkt | (|S 2 2
1 − σ̂s
) . O are
jkt c s jkt |σ̂c +σ̂s ) 2
matrices of zeros.
Step 5: The random effects estimator can be constructed explicitly as
0 0 0 0 0 0
(α̂ β̂ Θ̂)0 = (X∗ Z∗ (Z∗ Z∗ )−1 Z∗ X∗ )−1 X∗ Z∗ (Z∗ Z∗ )−1 Z∗ D∗ (16)
where Z denotes the matrix that contains all the instruments, and
49
1
X∗ = Ω̂− 2 X
1
D∗ = Ω̂− 2 D
1
Z∗ = Ω̂− 2 Z
50
51